I'm implementing a wrapper of a C library for Python. The
obvious choice is to write some C linked to libpython,
and the not so obvious but simpler one is to use
ctypes. ctypes is really simple to use:
it has ways to declare structures and function types, with several
classes that represents the simple types: c_int,
c_char, c_char_p, c_void_p,
etc.
Now, this library has a function, write(), that
handles a buffer. In this case, for buffer I mean a fixed size
space in memory with data, paired with an integer telling us how
much of the space is really data. So basically its declaration is
like this:
void (*write) (const char *buf, size_t size);
The void pointer is because this is the type of a struct member
that has to point to such a function. Looking at that declaration,
one would think that, assuming a c_size_t is already
declared with the correct type, the corresponding declaration in
ctypes is:
write_t = CFUNCTYPE(c_void_p, c_char_p, c_size_t)
This is what the
ctypes documentation calls a callback
function.
The problem arises with the c_char_p there. With
this class, ctypes assumes that the parameter is a
string, and not a buffer. Both strings and buffers in C are fixed
size space in memory. The difference between them is that strings
are \x00 ended, so its size it's determined by the
first occurence of a \x00 in the memory space, while a
buffer has to be accompanied by an integer, as I mentioned before.
A \x00 cannot occur in a string (the trailing one is
not always considered as part of the string per se), while
it can occur several times in a buffer. In fact, a buffer can be
entirely full of \x00's.
So what ctypes does here is to convert our buffer
into a string. Any occurence of a \x00 in the original
data will make c_char_p end the string and forget
about the rest of the data, ignoring the real size of the buffer.
Even more, if the original data has no \x00 in it,
ctypes might cause a segmentation fault trying to find
one beyond the process' memory space. This not only corrupts data,
but might even crash the app!
The solution is simple, luckly enough. You just neet to treat
your buffer as a void * instead of a char
*. So the declaration ends up being:
write_t = CFUNCTYPE(c_void_p, c_void_p, c_size_t)
The later, in our callback, we can convert that buffer into a
str()[1] to manipulate it as such:
def write(buf, size):
data = string_at(buf, size)
The size parameter is again important; if not,
string_at() will again think in terms of string and
not of buffer. I think this has to be improved a little. Maybe
next
PyCamp I'll file a bug and
develope a patch, either for the code or the documentation; maybe
both.
[1] This is Python 2.5
python c