I'm implementing a wrapper of a C library for Python. The obvious choice
is to write some C linked to libpython, and the not so obvious but
simpler one is to use ctypes. ctypes is really simple to use: it has
ways to declare structures and function types, with several classes that
represents the simple types: c_int, c_char, c_char_p, c_void_p,
etc.
Now, this library has a function, write(), that handles a buffer. In
this case, for buffer I mean a fixed size space in memory with data,
paired with an integer telling us how much of the space is really data.
So basically its declaration is like this:
void (*write) (const char *buf, size_t size);
The void pointer is because this is the type of a struct member that has
to point to such a function. Looking at that declaration, one would
think that, assuming a c_size_t is already declared with the correct
type, the corresponding declaration in ctypes is:
write_t = CFUNCTYPE(c_void_p, c_char_p, c_size_t)
This is what the ctypes
documentation
calls a callback function.
The problem arises with the c_char_p there. With this class, ctypes
assumes that the parameter is a string, and not a buffer. Both strings
and buffers in C are fixed size space in memory. The difference between
them is that strings are \x00 ended, so its size it's determined by
the first occurence of a \x00 in the memory space, while a buffer has
to be accompanied by an integer, as I mentioned before. A \x00 cannot
occur in a string (the trailing one is not always considered as part of
the string per se), while it can occur several times in a buffer. In
fact, a buffer can be entirely full of \x00's.
So what ctypes does here is to convert our buffer into a string. Any
occurence of a \x00 in the original data will make c_char_p end the
string and forget about the rest of the data, ignoring the real size of
the buffer. Even more, if the original data has no \x00 in it,
ctypes might cause a segmentation fault trying to find one beyond the
process' memory space. This not only corrupts data, but might even crash
the app!
The solution is simple, luckly enough. You just neet to treat your
buffer as a void * instead of a char *. So the declaration ends up
being:
write_t = CFUNCTYPE(c_void_p, c_void_p, c_size_t)
The later, in our callback, we can convert that buffer into a
str()[1] to manipulate it as such:
def write(buf, size):
data = string_at(buf, size)
The size parameter is again important; if not, string_at() will
again think in terms of string and not of buffer. I think this has to be
improved a little. Maybe next
PyCamp I'll file a bug and
develope a patch, either for the code or the documentation; maybe both.
[1] This is Python 2.5