archives/2009/03StyXman's globhttp://grulicueva.homelinux.net/~mdione/glob//archives/2009/03/StyXman's globikiwiki2009-03-12T14:30:10Zctypes-and-buffershttp://grulicueva.homelinux.net/~mdione/glob//posts/ctypes-and-buffers/2009-03-12T14:30:10Z2009-03-12T14:24:20Z
<p>I'm implementing a wrapper of a C library for Python. The
obvious choice is to write some C linked to <code>libpython</code>,
and the not so obvious but simpler one is to use
<code>ctypes</code>. <code>ctypes</code> is really simple to use:
it has ways to declare structures and function types, with several
classes that represents the simple types: <code>c_int</code>,
<code>c_char</code>, <code>c_char_p</code>, <code>c_void_p</code>,
etc.</p>
<p>Now, this library has a function, <code>write()</code>, that
handles a buffer. In this case, for buffer I mean a fixed size
space in memory with data, paired with an integer telling us how
much of the space is really data. So basically its declaration is
like this:</p>
<pre>
<code>void (*write) (const char *buf, size_t size);
</code>
</pre>
<p>The void pointer is because this is the type of a struct member
that has to point to such a function. Looking at that declaration,
one would think that, assuming a <code>c_size_t</code> is already
declared with the correct type, the corresponding declaration in
ctypes is:</p>
<pre>
<code>write_t = CFUNCTYPE(c_void_p, c_char_p, c_size_t)
</code>
</pre>
<p>This is what <a href=
"http://docs.python.org/library/ctypes.html#callback-functions">the
<code>ctypes</code> documentation</a> calls a <em>callback
function</em>.</p>
<p>The problem arises with the <code>c_char_p</code> there. With
this class, <code>ctypes</code> assumes that the parameter is a
string, and not a buffer. Both strings and buffers in C are fixed
size space in memory. The difference between them is that strings
are <code>\x00</code> ended, so its size it's determined by the
first occurence of a <code>\x00</code> in the memory space, while a
buffer has to be accompanied by an integer, as I mentioned before.
A <code>\x00</code> cannot occur in a string (the trailing one is
not always considered as part of the string <em>per se</em>), while
it can occur several times in a buffer. In fact, a buffer can be
entirely full of <code>\x00</code>'s.</p>
<p>So what <code>ctypes</code> does here is to convert our buffer
into a string. Any occurence of a <code>\x00</code> in the original
data will make <code>c_char_p</code> end the string and forget
about the rest of the data, ignoring the real size of the buffer.
Even more, if the original data has no <code>\x00</code> in it,
<code>ctypes</code> might cause a segmentation fault trying to find
one beyond the process' memory space. This not only corrupts data,
but might even crash the app!</p>
<p>The solution is simple, luckly enough. You just neet to treat
your buffer as a <code>void *</code> instead of a <code>char
*</code>. So the declaration ends up being:</p>
<pre>
<code>write_t = CFUNCTYPE(c_void_p, c_void_p, c_size_t)
</code>
</pre>
<p>The later, in our callback, we can convert that buffer into a
<code>str()</code>[1] to manipulate it as such:</p>
<pre>
<code>def write(buf, size):
data = string_at(buf, size)
</code>
</pre>
<p>The <code>size</code> parameter is again important; if not,
<code>string_at()</code> will again think in terms of string and
not of buffer. I think this has to be improved a little. Maybe
<a href="http://www.python.com.ar/moin/PyCamp/2009">next</a>
<a href="http://grulicueva.homelinux.net/~mdione/glob//archives/2009/03/../../../posts/pycamp/">PyCamp</a> I'll file a bug and
develope a patch, either for the code or the documentation; maybe
both.</p>
<hr />
<p>[1] This is Python 2.5</p>
<p><a href="http://grulicueva.homelinux.net/~mdione/glob//archives/2009/03/../../../tags/python/">python</a> <a href="http://grulicueva.homelinux.net/~mdione/glob//archives/2009/03/../../../tags/c/">c</a></p>