Journey to Python Part 2: Input, Output, and Documentation
In the last article in the Journey to Python series, I had gotten as far as doing some basic math, and displaying some data to the screen. Over the last few days I’ve been working on two new facets of Python which will hopefully allow me to get up to speed in writing programs that are actually useful. This week I’ve played around a bit with reading input from the command line in a python program, as well as interacting with the operating system with the OS library; I’ve also had a chance to look at pydoc to get some documentation, and as per a reader request I’ve spent a bit of time thinking about pythons type system, and how it compares to more strict type systems in languages such as C++ and Java.
The first thing I would like to mention in todays article is something that struck me as I was writing a bit of python code earlier this evening. For all of the (friendly and otherwise) jabs that the Python and Perl folks exchange, the more I work with python, I notice that there are a lot of small things that make it feel a little bit like perl to me. To some people, this might seem like an insult, but the truth is I quite like perl, and I mean the comparison as a sincere compliment. I imagine that a fair bit of this similarity is just in the fact that Python and Perl are both scripting languages, but there might be more to it. I will have to come back to this topic after a few months with Python when I’ll be better equipped to make a more informed comparison.
System Calls and Callbacks
With that segue aside, it’s time to move into the meat of todays article. I’d like to start by posting, of all things, a bit of C code that I’ve kept around to use every now and then when I need to walk through a directory structure.
void stat_file(const char* pathname)
{
if(NULL == pathname)
return;
struct stat st;
if(0 != stat(pathname, &st))
{
printf("error opening %s: %s\n",pathname, strerror(errno));
}
printf("%d %s",st.st_size,pathname);
return;
}
void callback_traverse(const char* dirname, void (*callback)(const char*), int traversal_type, int return_type, int level=0)
{
DIR* dir = NULL;
struct dirent* entry;
char* path = (char*)malloc( (255*(level+1)) * sizeof(char) );
strcpy(path,dirname);
strcat(path,"/");
callback(dirname);
if(NULL != (dir = opendir(dirname)))
{
while( (entry = readdir(dir)) != NULL)
{
if( (entry->d_name)[0]=='.')
continue;
strcat(path,entry->d_name);
callback_traverse(entry->d_name,callback,traversal_type,return_type,level+1);
strcpy(path,dirname);
strcat(path,"/");
}
}
closedir(dir);
free(path);
path=NULL;
}
and now the equivalent Python code:
def stat_file(pathname):
r = stat(pathname)
print r.st_size,pathname
#def
def traverse_path(pathname, fun):
for root, dirs, files in walk(pathname):
for file in files:
fun(file)
#for
#for
#traverse_path
One thing you can see immediately is that the Python code is quite a bit shorter. The C code certainly has more error checking, a small bit of which may be useful in the python application as well, but by and large the python version is shorter. That said, I have to say that I don’t necessarily find that the Python code was easier to write; certainly part of that is just my unfamiliarity with Python, but the algorithm is basically the same, except that Python is implementing parts of it under the hood.
Easier or not though, I do have to give credit to Python since the fact is that I was able to write the application. My programming forte is mucking about at the systems level; I do certainly write applications level things from time to time, but as a systems developer, I think it’s nice that Python does at least have access to system calls. That alone is enough to move python up a notch in my internal ranking of programming languages, thanks to the ease with which I was able to write the above application, I think I can now say that I would prefer to write Python code compared to Java (and the fact is that, while I don’t write a lot of Java these days, I’ve never particularly minded it).
Looking at the above code, the important technical bit is obviously the ability to pass a function as an argument to a function. Python, with it’s at least half-hearted nod to functional programming, does at least gain one important win over C and C++ in this area; I don’t think that anyone in their right mind would argue against the fact that function pointer syntax in those languages is a bit much at times.
But, passing functions as arguments and working with systems calls are things I generally expect of any language, and it works well enough and without anything spectacular enough to merit droning on about. There was, however, one thing that came up while figuring out how to write this code that I think is deserving of a bit of a rant.
Pydoc
I like man pages. Some people use reference books or google when they are doing development (I use both on occasion) but more than anything else I like man pages for reference on all of the ins and outs of functions and structures that I’m using while doing development. This, again, probably is a symptom of my systems programming roots, but pretty much every library and system call that I use in my day-to-day development has a corresponding and well written man page describing the return values, parameters, how exactly the function works, etc. Using vim as my text editor, I have only to move the cursor over the function that I want to reference, and type ‘K’ to get the man page up on the screen for my perusal.
Python does not use man pages. There is certainly a man page for python, and plenty of available documentation, but most of the information is found through Python’s pydoc application. pydoc works pretty much like man, supporting the main features that I use with man (forward and backward scrolling, and searching). Vim is even smart enough to use Pydoc instead of man when it sees that I’m working with a python file. The problem that I’ve run into with pydoc is, to be fair, as much of a problem with vim as with pydoc itself. The problem is that, to search for a given function, you need to type the full ‘path’; so for instance to look up the ‘walk’ function, one needs to run pydoc os.path.walk. Another big problem is that pydoc os contains information about walk as well, but this information is different than the information in os.path.walk – and in both cases, simply typing ‘walk’ and then ‘K’ doesn’t open up the right pydoc page, meaning that I’m forced to manually type in !pydoc in vim rather than just using ‘K’.
I understand that this eliminates the very real possiblity of two packages having functions with the same name that need documenting, but I’d much prefer that pydoc at least try to search for what I’m looking for, and prompt me to choose one if multiple results are found.
Typing in Python
In the last entry, a reader commented that they would like me to discuss my feelings on the auto typing that Python uses, compared to typing in other languages. Since I’m still getting started on my journey to learning python, I’m not qualified to get too in depth on a discussion in this arena yet, but I thought I would at least address the issue at a high level and offer up my thoughts.
The first thing to realize is that python isn’t, strictly speaking, a dynamically typed language. Under the hood, python resembles C++ and Java more strongly than, for instance, perl. Python variables have an internal type, and in python you are prohibited from doing things like adding a number to a string, without calling some sort of conversation function at any rate. Python also allows you to explicitly cast a type, allowing you to say, for instance, foo = (int)some_function().
Fundamentally, I think that this is a good compromise, with a few potential potholes. Having an internal type does mean that you will get errors on certain types of code that perl might happily parse, leading to a more subtly broken program. Having an exception up right away avoids these problems. Unfortunately, python doesn’t go quite far enough in my opinion. While you can do something like foo = (int)3.14 to get 3; you can’t cast between integer and character types (a side effect, I believe, of both pythons string object, as well as the way python handles unicode).
The other problem with implicit rather than explicit typing is that it hides what the programmer is thinking. When you’re looking at a bit of C or C++ code, and you say “int some_value = some_function()” then the next programmer knows that (unless you are a horrible horrible programming) odds are you intend that to be a number. In python, if you type “some_value = some_function()” then you have to either explicitly comment that (and we all know how much developers love to comment code) or rely on the next programming to go along and trace through some_function to see what it returns, so that they know what type of value “some_value” should be expected to have.
That said, implicit typing isn’t all bad, and I would say that as long as you choose variable names properly (or, shudder, use something like hungarian notation) you should be good. As an interesting side node, C++0x is adding the ‘auto’ keyword for automatically determining types, meaning that this will soon be one less advantage (or disadvantage depending on your perspective and circumstances) that Python has over C++.
Closing Remarks
Another few days with Python, and I have to say that some of my initial reservations are starting to go away. There are certainly still things that I dislike about the language, but I’m starting to see that it has cases where it could indeed be useful. I don’t anticipate that it will ever become my primary language choice, but I may be getting to the point where I could see myself not minding using here or there when it would be a good fit. Over the next few days I’ll be looking at OOP in python. I’ll be updating again soon so check back regularly, and if there are any topics you’d especially like me to cover, drop me an email or leave a comment and I’ll try to address it in the next post.
Thanks for reading; codeninja out.