archives/2009/05 http://grulicueva.homelinux.net/~mdione/glob//archives/2009/05/ StyXman's glob ctypes-and-buffers http://grulicueva.homelinux.net/~mdione/glob//posts/ctypes-and-buffers/ http://grulicueva.homelinux.net/~mdione/glob//posts/ctypes-and-buffers/ tags/c tags/python Thu, 12 Mar 2009 15:24:20 +0100 2009-03-12T14:30:10Z <p>I'm implementing a wrapper of a C library for Python. The obvious choice is to write some C linked to <code>libpython</code>, and the not so obvious but simpler one is to use <code>ctypes</code>. <code>ctypes</code> is really simple to use: it has ways to declare structures and function types, with several classes that represents the simple types: <code>c_int</code>, <code>c_char</code>, <code>c_char_p</code>, <code>c_void_p</code>, etc.</p> <p>Now, this library has a function, <code>write()</code>, that handles a buffer. In this case, for buffer I mean a fixed size space in memory with data, paired with an integer telling us how much of the space is really data. So basically its declaration is like this:</p> <pre> <code>void (*write) (const char *buf, size_t size); </code> </pre> <p>The void pointer is because this is the type of a struct member that has to point to such a function. Looking at that declaration, one would think that, assuming a <code>c_size_t</code> is already declared with the correct type, the corresponding declaration in ctypes is:</p> <pre> <code>write_t = CFUNCTYPE(c_void_p, c_char_p, c_size_t) </code> </pre> <p>This is what <a href= "http://docs.python.org/library/ctypes.html#callback-functions">the <code>ctypes</code> documentation</a> calls a <em>callback function</em>.</p> <p>The problem arises with the <code>c_char_p</code> there. With this class, <code>ctypes</code> assumes that the parameter is a string, and not a buffer. Both strings and buffers in C are fixed size space in memory. The difference between them is that strings are <code>\x00</code> ended, so its size it's determined by the first occurence of a <code>\x00</code> in the memory space, while a buffer has to be accompanied by an integer, as I mentioned before. A <code>\x00</code> cannot occur in a string (the trailing one is not always considered as part of the string <em>per se</em>), while it can occur several times in a buffer. In fact, a buffer can be entirely full of <code>\x00</code>'s.</p> <p>So what <code>ctypes</code> does here is to convert our buffer into a string. Any occurence of a <code>\x00</code> in the original data will make <code>c_char_p</code> end the string and forget about the rest of the data, ignoring the real size of the buffer. Even more, if the original data has no <code>\x00</code> in it, <code>ctypes</code> might cause a segmentation fault trying to find one beyond the process' memory space. This not only corrupts data, but might even crash the app!</p> <p>The solution is simple, luckly enough. You just neet to treat your buffer as a <code>void *</code> instead of a <code>char *</code>. So the declaration ends up being:</p> <pre> <code>write_t = CFUNCTYPE(c_void_p, c_void_p, c_size_t) </code> </pre> <p>The later, in our callback, we can convert that buffer into a <code>str()</code>[1] to manipulate it as such:</p> <pre> <code>def write(buf, size): data = string_at(buf, size) </code> </pre> <p>The <code>size</code> parameter is again important; if not, <code>string_at()</code> will again think in terms of string and not of buffer. I think this has to be improved a little. Maybe <a href="http://www.python.com.ar/moin/PyCamp/2009">next</a> <a href="http://grulicueva.homelinux.net/~mdione/glob//archives/2009/05/../../../posts/pycamp/">PyCamp</a> I'll file a bug and develope a patch, either for the code or the documentation; maybe both.</p> <hr /> <p>[1] This is Python 2.5</p> <p><a href="http://grulicueva.homelinux.net/~mdione/glob//archives/2009/05/../../../tags/python/">python</a> <a href="http://grulicueva.homelinux.net/~mdione/glob//archives/2009/05/../../../tags/c/">c</a></p>