Hi, I finally found some time to work on moving the compression and decompression code to the buffer library. It seems like a good idea and an interesting one, but first I ran into a couple of questions I’d like to get your feedback on.
This patch series reorganizes the libdpkg buffer functions to make it a little easier to add new buffer types. I am sending it because it puts the signatures for the functions a buffer type might provide in one place, so even though the series does not do much, it makes API questions a little clearer. 1. Currently, error reporting uses the usual convention of returning -1 and putting a more specific error code in errno. Obviously, this is not a good way to report errors from backends that do not use the same convention. For example, if the error flag is already set on a stream before a call to stream_md5(), then the result would be an error message something like the following: failed to read on buffer copy for <whatever>: Success It would be more friendly for write_stream() to check the error flag itself and report that with a more meaningful error result. Of course, dpkg itself never makes such a mistake AFAICT. A more important problem is that it is hard to fit other backends (e.g., zlib) in to the “set errno, return -1” scheme. I suggest adding a pointer argument to return either: a. a universally valid error code like those used in libcom_err; or b. an error string. In this case, there is an ugly associated question: who is responsible for freeing the error strings? The simplest answer: the caller, except for a special "Out of memory" string (ick). Despite the ugliness, I’m inclined towards b. 2. Incomplete I/O is hard to handle rationally. The current code will busy-loop if a caller provides a non-blocking file descriptor to fd_fd_copy(); it might make sense to error out on EAGAIN instead (though this is also something dpkg will never run into). Complicating this is that Linux 2.2 used to return EAGAIN on 0-length reads from a nonempty pipe [1], but we already work around that in other ways. If read_stream() performs a partial read and then receives an I/O error, then the number of bytes successfully read is thrown away. Do we want the caller to be able to recover? Practically speaking, this is only relevant if read_stream() is something callers should be able to use from a select() loop (i.e., with file handles fdopen()'d from non-blocking file descriptors, which can return EAGAIN). Options: a. busy-loop on EAGAIN (which punishes callers who try this); a. transparently poll() on EAGAIN (bad idea IMHO); b. consider EAGAIN an unrecoverable error; or c. report number of bytes consumed/produced on errors. I’m inclined towards c. Thoughts? Jonathan Nieder (4): libdpkg: Collect MD5-related functions in one place libdpkg: Refactor generic I/O code into backend-specific functions libdpkg: Make buffer_init() and buffer_done() return void libdpkg: Reorganize buffer module around new buffer_type struct lib/dpkg/buffer.c | 274 +++++++++++++++++++++++++++++++++++++++-------------- lib/dpkg/buffer.h | 69 ++++++++----- 2 files changed, 243 insertions(+), 100 deletions(-) [1] <http://bugs.debian.org/81881> -- To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100327064019.ga29...@progeny.tock