mmap class has slow "in" operator
If I do the following: def mmap_search(f, string): fh = file(f) mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ) return mm.find(string) def mmap_is_in(f, string): fh = file(f) mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ) return string in mm then a sample mmap_search() call on a 50MB file takes 0.18 seconds, but the mmap_is_in() call takes 6.6 seconds. Is the mmap class missing an operator and falling back to a slow default implementation? Presumably I can implement the latter in terms of the former. Kris -- http://mail.python.org/mailman/listinfo/python-list
UNIX credential passing
I want to make use of UNIX credential passing on a local domain socket to verify the identity of a user connecting to a privileged service. However it looks like the socket module doesn't implement sendmsg/recvmsg wrappers, and I can't find another module that does this either. Is there something I have missed? Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: UNIX credential passing
Sebastian 'lunar' Wiesner wrote: [ Kris Kennaway <[EMAIL PROTECTED]> ] I want to make use of UNIX credential passing on a local domain socket to verify the identity of a user connecting to a privileged service. However it looks like the socket module doesn't implement sendmsg/recvmsg wrappers, and I can't find another module that does this either. Is there something I have missed? http://pyside.blogspot.com/2007/07/unix-socket-credentials-with-python.html Illustrates, how to use socket credentials without sendmsg/recvmsg and so without any need for patching. Thanks to both you and Paul for your suggestions. For the record, the URL above is linux-specific, but it put me on the right track. Here is an equivalent FreeBSD implementation: def getpeereid(sock): """ Get peer credentials on a UNIX domain socket. Returns a nested tuple: (uid, (gids)) """ LOCAL_PEERCRED = 0x001 NGROUPS = 16 #struct xucred { #u_int cr_version; /* structure layout version */ #uid_t cr_uid; /* effective user id */ #short cr_ngroups; /* number of groups */ #gid_t cr_groups[NGROUPS]; /* groups */ #void*_cr_unused1; /* compatibility with old ucred */ #}; xucred_fmt = '2ih16iP' res = tuple(struct.unpack(xucred_fmt, sock.getsockopt(0, LOCAL_PEERCRED, struct.calcsize(xucred_fmt # Check this is the above version of the structure if res[0] != 0: raise OSError return (res[1], res[3:3+res[2]]) Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: "Faster" I/O in a script
Gary Herron wrote: [EMAIL PROTECTED] wrote: On Jun 2, 2:08 am, "kalakouentin" <[EMAIL PROTECTED]> wrote: Do you know a way to actually load my data in a more "batch-like" way so I will avoid the constant line by line reading? If your files will fit in memory, you can just do text = file.readlines() and Python will read the entire file into a list of strings named 'text,' where each item in the list corresponds to one 'line' of the file. No that won't help. That has to do *all* the same work (reading blocks and finding line endings) as the iterator PLUS allocate and build a list. Better to just use the iterator. for line in file: ... Actually this *can* be much slower. Suppose I want to search a file to see if a substring is present. st = "some substring that is not actually in the file" f = <50 MB log file> Method 1: for i in file(f): if st in i: break --> 0.472416 seconds Method 2: Read whole file: fh = file(f) rl = fh.read() fh.close() --> 0.098834 seconds "st in rl" test --> 0.037251 (total: .136 seconds) Method 3: mmap the file: mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ) "st in mm" test --> 3.589938 (<-- see my post the other day) mm.find(st) --> 0.186895 Summary: If you can afford the memory, it can be more efficient (more than 3 times faster in this example) to read the file into memory and process it at once (if possible). Mmapping the file and processing it at once is roughly as fast (I didnt measure the difference carefully), but has the advantage that if there are parts of the file you do not touch you don't fault them into memory. You could also play more games and mmap chunks at a time to limit the memory use (but you'd have to be careful with mmapping that doesn't match record boundaries). Kris -- http://mail.python.org/mailman/listinfo/python-list
ZFS bindings
Is anyone aware of python bindings for ZFS? I just want to replicate (or at least wrap) the command line functionality for interacting with snapshots etc. Searches have turned up nothing. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: Looking for lots of words in lots of files
Calvin Spealman wrote: Upload, wait, and google them. Seriously tho, aside from using a real indexer, I would build a set of the words I'm looking for, and then loop over each file, looping over the words and doing quick checks for containment in the set. If so, add to a dict of file names to list of words found until the list hits 10 length. I don't think that would be a complicated solution and it shouldn't be terrible at performance. If you need to run this more than once, use an indexer. If you only need to use it once, use an indexer, so you learn how for next time. If you can't use an indexer, and performance matters, evaluate using grep and a shell script. Seriously. grep is a couple of orders of magnitude faster at pattern matching strings in files (and especially regexps) than python is. Even if you are invoking grep multiple times it is still likely to be faster than a "maximally efficient" single pass over the file in python. This realization was disappointing to me :) Kris -- http://mail.python.org/mailman/listinfo/python-list
Bit substring search
I am trying to parse a bit-stream file format (bzip2) that does not have byte-aligned record boundaries, so I need to do efficient matching of bit substrings at arbitrary bit offsets. Is there a package that can do this? This one comes close: http://ilan.schnell-web.net/prog/bitarray/ but it only supports single bit substring match. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: Bit substring search
[EMAIL PROTECTED] wrote: Kris Kennaway: I am trying to parse a bit-stream file format (bzip2) that does not have byte-aligned record boundaries, so I need to do efficient matching of bit substrings at arbitrary bit offsets. Is there a package that can do this? You may take a look at Hachoir or some other modules: http://hachoir.org/wiki/hachoir-core http://pypi.python.org/pypi/construct/2.00 Thanks. hachoir also comes close, but it also doesnt seem to be able to match substrings at a bit level (e.g. the included bzip2 parser just reads the header and hands the entire file off to libbzip2 to extract data from). construct exports a bit stream but it's again pure python and matching substrings will be slow. It will need C support to do that efficiently. http://pypi.python.org/pypi/FmtRW/20040603 Etc. More: http://pypi.python.org/pypi?%3Aaction=search&term=binary Unfortunately I didnt find anything else useful here yet :( Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: Bit substring search
[EMAIL PROTECTED] wrote: Kris Kennaway: Unfortunately I didnt find anything else useful here yet :( I see, I'm sorry, I have found hachoir quite nice in the past. Maybe there's no really efficient way to do it with Python, but you can create a compiled extension, so you can see if it's fast enough for your purposes. To create such extension you can: - One thing that requires very little time is to create an extension with ShedSkin, once installed it just needs Python code. - Cython (ex-Pyrex) too may be okay, but it's a bit trikier on Windows machines. - Using Pyd to create a D extension for Python is often the faster way I have found to create extensions. I need just few minutes to create them this way. But you need to know a bit of D. - Then, if you want you can write a C extension, but if you have not done it before you may need some hours to make it work. Thanks for the pointers, I think a C extension will end up being the way to go, unless someone has beaten me to it and I just haven't found it yet. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: Bit substring search
Scott David Daniels wrote: Kris Kennaway wrote: Thanks for the pointers, I think a C extension will end up being the way to go, unless someone has beaten me to it and I just haven't found it yet. Depending on the pattern length you are targeting, it may be fastest to increase the out-of-loop work. For a 40-bit string, build an 8-target Aho-Corasick machine, and at each match check the endpoints. This will only work well if 40 bits is at the low end of what you are hunting for. Thanks, I wasn't aware of Aho-Corasick. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
Paddy wrote: On Jul 4, 1:36 pm, Peter Otten <[EMAIL PROTECTED]> wrote: Henning_Thornblad wrote: What can be the cause of the large difference between re.search and grep? grep uses a smarter algorithm ;) This script takes about 5 min to run on my computer: #!/usr/bin/env python import re row="" for a in range(156000): row+="a" print re.search('[^ "=]*/',row) While doing a simple grep: grep '[^ "=]*/' input (input contains 156.000 a in one row) doesn't even take a second. Is this a bug in python? You could call this a performance bug, but it's not common enough in real code to get the necessary brain cycles from the core developers. So you can either write a patch yourself or use a workaround. re.search('[^ "=]*/', row) if "/" in row else None might be good enough. Peter It is not a smarter algorithm that is used in grep. Python RE's have more capabilities than grep RE's which need a slower, more complex algorithm. You could argue that if the costly RE features are not used then maybe simpler, faster algorithms should be automatically swapped in but I can and do :-) It's a major problem that regular expression parsing in python has exponential complexity when polynomial algorithms (for a subset of regexp expressions, e.g. excluding back-references) are well-known. It rules out using python for entire classes of applications where regexp parsing is on the critical path. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
samwyse wrote: On Jul 4, 6:43 am, Henning_Thornblad <[EMAIL PROTECTED]> wrote: What can be the cause of the large difference between re.search and grep? While doing a simple grep: grep '[^ "=]*/' input (input contains 156.000 a in one row) doesn't even take a second. Is this a bug in python? You might want to look at Plex. http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/ "Another advantage of Plex is that it compiles all of the regular expressions into a single DFA. Once that's done, the input can be processed in a time proportional to the number of characters to be scanned, and independent of the number or complexity of the regular expressions. Python's existing regular expression matchers do not have this property. " Very interesting! Thanks very much for the pointer. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
samwyse wrote: On Jul 4, 6:43 am, Henning_Thornblad <[EMAIL PROTECTED]> wrote: What can be the cause of the large difference between re.search and grep? While doing a simple grep: grep '[^ "=]*/' input (input contains 156.000 a in one row) doesn't even take a second. Is this a bug in python? You might want to look at Plex. http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/ "Another advantage of Plex is that it compiles all of the regular expressions into a single DFA. Once that's done, the input can be processed in a time proportional to the number of characters to be scanned, and independent of the number or complexity of the regular expressions. Python's existing regular expression matchers do not have this property. " I haven't tested this, but I think it would do what you want: from Plex import * lexicon = Lexicon([ (Rep(AnyBut(' "='))+Str('/'), TEXT), (AnyBut('\n'), IGNORE), ]) filename = "my_file.txt" f = open(filename, "r") scanner = Scanner(lexicon, f, filename) while 1: token = scanner.read() print token if token[0] is None: break Hmm, unfortunately it's still orders of magnitude slower than grep in my own application that involves matching lots of strings and regexps against large files (I killed it after 400 seconds, compared to 1.5 for grep), and that's leaving aside the much longer compilation time (over a minute). If the matching was fast then I could possibly pickle the lexer though (but it's not). Kris Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
John Machin wrote: Hmm, unfortunately it's still orders of magnitude slower than grep in my own application that involves matching lots of strings and regexps against large files (I killed it after 400 seconds, compared to 1.5 for grep), and that's leaving aside the much longer compilation time (over a minute). If the matching was fast then I could possibly pickle the lexer though (but it's not). Can you give us some examples of the kinds of patterns that you are using in practice and are slow using Python re? Trivial stuff like: (Str('error in pkg_delete'), ('mtree', 'mtree')), (Str('filesystem was touched prior to .make install'), ('mtree', 'mtree')), (Str('list of extra files and directories'), ('mtree', 'mtree')), (Str('list of files present before this port was installed'), ('mtree', 'mtree')), (Str('list of filesystem changes from before and after'), ('mtree', 'mtree')), (re('Configuration .* not supported'), ('arch', 'arch')), (re('(configure: error:|Script.*configure.*failed unexpectedly|script.*failed: here are the contents of)'), ('configure_error', 'configure')), ... There are about 150 of them and I want to find which is the first match in a text file that ranges from a few KB up to 512MB in size. > How large is "large"? What kind of text? It's compiler/build output. Instead of grep, you might like to try nrgrep ... google("nrgrep Navarro Raffinot"): PDF paper about it on Citeseer (if it's up), postscript paper and C source findable from Gonzalo Navarro's home- page. Thanks, looks interesting but I don't think it is the best fit here. I would like to avoid spawning hundreds of processes to process each file (since I have tens of thousands of them to process). Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
Jeroen Ruigrok van der Werven wrote: -On [20080709 14:08], Kris Kennaway ([EMAIL PROTECTED]) wrote: It's compiler/build output. Sounds like the FreeBSD ports build cluster. :) Yes indeed! Kris, have you tried a PGO build of Python with your specific usage? I cannot guarantee it will significantly speed things up though. I am pretty sure the problem is algorithmic, not bad byte code :) If it was a matter of a few % then that is in the scope of compiler tweaks, but we're talking orders of magnitude. Kris Also, a while ago I did tests with various GCC compilers and their effect on Python running time as well as Intel's cc. Intel won on (nearly) all accounts, meaning it was faster overall. From the top of my mind: GCC 4.1.x was faster than GCC 4.2.x. -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
samwyse wrote: On Jul 8, 11:01 am, Kris Kennaway <[EMAIL PROTECTED]> wrote: samwyse wrote: You might want to look at Plex. http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/ "Another advantage of Plex is that it compiles all of the regular expressions into a single DFA. Once that's done, the input can be processed in a time proportional to the number of characters to be scanned, and independent of the number or complexity of the regular expressions. Python's existing regular expression matchers do not have this property. " Hmm, unfortunately it's still orders of magnitude slower than grep in my own application that involves matching lots of strings and regexps against large files (I killed it after 400 seconds, compared to 1.5 for grep), and that's leaving aside the much longer compilation time (over a minute). If the matching was fast then I could possibly pickle the lexer though (but it's not). That's funny, the compilation is almost instantaneous for me. My lexicon was quite a bit bigger, containing about 150 strings and regexps. However, I just tested it to several files, the first containing 4875*'a', the rest each twice the size of the previous. And you're right, for each doubling of the file size, the match take four times as long, meaning O(n^2). 156000*'a' would probably take 8 hours. Here are my results: The docs say it is supposed to be linear in the file size ;-) ;-( Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
John Machin wrote: Uh-huh ... try this, then: http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/ You could use this to find the "Str" cases and the prefixes of the "re" cases (which seem to be no more complicated than 'foo.*bar.*zot') and use something slower like Python's re to search the remainder of the line for 'bar.*zot'. If it was just strings, then sure...with regexps it might be possible to make it work, but it doesn't sound particularly maintainable. I will stick with my shell script until python gets a regexp engine of equivalent performance. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
J. Cliff Dyer wrote: On Wed, 2008-07-09 at 12:29 -0700, samwyse wrote: On Jul 8, 11:01 am, Kris Kennaway <[EMAIL PROTECTED]> wrote: samwyse wrote: You might want to look at Plex. http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/ "Another advantage of Plex is that it compiles all of the regular expressions into a single DFA. Once that's done, the input can be processed in a time proportional to the number of characters to be scanned, and independent of the number or complexity of the regular expressions. Python's existing regular expression matchers do not have this property. " Hmm, unfortunately it's still orders of magnitude slower than grep in my own application that involves matching lots of strings and regexps against large files (I killed it after 400 seconds, compared to 1.5 for grep), and that's leaving aside the much longer compilation time (over a minute). If the matching was fast then I could possibly pickle the lexer though (but it's not). That's funny, the compilation is almost instantaneous for me. However, I just tested it to several files, the first containing 4875*'a', the rest each twice the size of the previous. And you're right, for each doubling of the file size, the match take four times as long, meaning O(n^2). 156000*'a' would probably take 8 hours. Here are my results: compile_lexicon() took 0.0236021580595 secs test('file-0.txt') took 24.8322969831 secs test('file-1.txt') took 99.3956799681 secs test('file-2.txt') took 398.349623132 secs Sounds like a good strategy would be to find the smallest chunk of the file that matches can't cross, and iterate your search on units of those chunks. For example, if none of your regexes cross line boundaries, search each line of the file individually. That may help turn around the speed degradation you're seeing. That's what I'm doing. I've also tried various other things like mmapping the file and searching it at once, etc, but almost all of the time is spent in the regexp engine so optimizing other things only gives marginal improvement. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: multithreading in python ???
Laszlo Nagy wrote: Abhishek Asthana wrote: Hi all , I have large set of data computation and I want to break it into small batches and assign it to different threads .I am implementing it in python only. Kindly help what all libraries should I refer to implement the multithreading in python. You should not do this. Python can handle multiple threads but they always use the same processor. (at least in CPython.) In order to take advantage of multiple processors, use different processes. Only partly true. Threads executing in the python interpreter are serialized and only run on a single CPU at a time. Depending on what modules you use they may be able to operate independently on multiple CPUs. The term to research is "GIL" (Global Interpreter Lock). There are many webpages discussing it, and the alternative strategies you can use. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: pyprocessing/multiprocessing for x64?
Benjamin Kaplan wrote: The only problem I can see is that 32-bit programs can't access 64-bit dlls, so the OP might have to install the 32-bit version of Python for it to work. Anyway, all of this is beside the point, because the multiprocessing module works fine on amd64 systems. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: variable expansion with sqlite
marc wyburn wrote: Hi and thanks, I was hoping to avoid having to weld qmarks together but I guess that's why people use things like SQL alchemy instead. It's a good lesson anyway. The '?' substitution is there to safely handle untrusted input. You *don't* want to pass in arbitrary user data into random parts of an SQL statement (or your database will get 0wned). I think of it as a reminder that when you have to construct your own query template by using "... %s ..." % (foo) to bypass this limitation, that you had better be darn sure the parameters you are passing in are safe. Kris -- http://mail.python.org/mailman/listinfo/python-list
Constructing MIME message without loading message stream
I would like to MIME encode a message from a large file without first loading the file into memory. Assume the file has been pre-encoded on disk (actually I am using encode_7or8bit, so the encoding should be null). Is there a way to construct the flattened MIME message such that data is streamed from the file as needed instead of being resident in memory? Do I have to subclass the MIMEBase class myself? Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: benchmark
Angel Gutierrez wrote: Steven D'Aprano wrote: On Thu, 07 Aug 2008 00:44:14 -0700, alex23 wrote: Steven D'Aprano wrote: In other words, about 20% of the time he measures is the time taken to print junk to the screen. Which makes his claim that "all the console outputs have been removed so that the benchmarking activity is not interfered with by the IO overheads" somewhat confusing...he didn't notice the output? Wrote it off as a weird Python side-effect? Wait... I've just remembered, and a quick test confirms... Python only prints bare objects if you are running in a interactive shell. Otherwise output of bare objects is suppressed unless you explicitly call print. Okay, I guess he is forgiven. False alarm, my bad. Well.. there must be somthing because this is what I got in a normal script execution: [EMAIL PROTECTED] test]$ python iter.py Time per iteration = 357.467989922 microseconds [EMAIL PROTECTED] test]$ vim iter.py [EMAIL PROTECTED] test]$ python iter2.py Time per iteration = 320.306909084 microseconds [EMAIL PROTECTED] test]$ vim iter2.py [EMAIL PROTECTED] test]$ python iter2.py Time per iteration = 312.917997837 microseconds What is the standard deviation on those numbers? What is the confidence level that they are distinct? In a thread complaining about poor benchmarking it's disappointing to see crappy test methodology being used to try and demonstrate flaws in the test. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: benchmark
jlist wrote: I think what makes more sense is to compare the code one most typically writes. In my case, I always use range() and never use psyco. But I guess for most of my work with Python performance hasn't been a issue. I haven't got to write any large systems with Python yet, where performance starts to matter. Hopefully when you do you will improve your programming practices to not make poor choices - there are few excuses for not using xrange ;) Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: Constructing MIME message without loading message stream
Diez B. Roggisch wrote: Kris Kennaway schrieb: I would like to MIME encode a message from a large file without first loading the file into memory. Assume the file has been pre-encoded on disk (actually I am using encode_7or8bit, so the encoding should be null). Is there a way to construct the flattened MIME message such that data is streamed from the file as needed instead of being resident in memory? Do I have to subclass the MIMEBase class myself? I don't know what you are after here - but I *do* know that anything above 10MB or so is most probably not transferable using mail, as MTAs impose limits on message-sizes. Or in other words: usually, whatever you want to encode should fit in memory as the network is limiting you. MIME encoding is used for other things than emails. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: benchmark
Peter Otten wrote: [EMAIL PROTECTED] wrote: On Aug 10, 10:10 pm, Kris Kennaway <[EMAIL PROTECTED]> wrote: jlist wrote: I think what makes more sense is to compare the code one most typically writes. In my case, I always use range() and never use psyco. But I guess for most of my work with Python performance hasn't been a issue. I haven't got to write any large systems with Python yet, where performance starts to matter. Hopefully when you do you will improve your programming practices to not make poor choices - there are few excuses for not using xrange ;) Kris And can you shed some light on how that relates with one of the zens of python ? There should be one-- and preferably only one --obvious way to do it. For the record, the impact of range() versus xrange() is negligable -- on my machine the xrange() variant even runs a tad slower. So it's not clear whether Kris actually knows what he's doing. You are only thinking in terms of execution speed. Now think about memory use. Using iterators instead of constructing lists is something that needs to permeate your thinking about python or you will forever be writing code that wastes memory, sometimes to a large extent. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: SSH utility
James Brady wrote: Hi all, I'm looking for a python library that lets me execute shell commands on remote machines. I've tried a few SSH utilities so far: paramiko, PySSH and pssh; unfortunately all been unreliable, and repeated questions on their respective mailing lists haven't been answered... It seems like the sort of commodity task that there should be a pretty robust library for. Are there any suggestions for alternative libraries or approaches? Personally I just Popen ssh directly. Things like paramiko make me concerned; getting the SSH protocol right is tricky and not something I want to trust to projects that have not had significant experience and auditing. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: benchmark
Peter Otten wrote: Kris Kennaway wrote: Peter Otten wrote: [EMAIL PROTECTED] wrote: On Aug 10, 10:10 pm, Kris Kennaway <[EMAIL PROTECTED]> wrote: jlist wrote: I think what makes more sense is to compare the code one most typically writes. In my case, I always use range() and never use psyco. But I guess for most of my work with Python performance hasn't been a issue. I haven't got to write any large systems with Python yet, where performance starts to matter. Hopefully when you do you will improve your programming practices to not make poor choices - there are few excuses for not using xrange ;) Kris And can you shed some light on how that relates with one of the zens of python ? There should be one-- and preferably only one --obvious way to do it. For the record, the impact of range() versus xrange() is negligable -- on my machine the xrange() variant even runs a tad slower. So it's not clear whether Kris actually knows what he's doing. You are only thinking in terms of execution speed. Yes, because my remark was made in the context of the particular benchmark supposed to be the topic of this thread. No, you may notice that the above text has moved off onto another discussion. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: In-place memory manager, mmap (was: Fastest way to store ints and floats on disk)
castironpi wrote: Hi, I've got an "in-place" memory manager that uses a disk-backed memory- mapped buffer. Among its possibilities are: storing variable-length strings and structures for persistence and interprocess communication with mmap. It allocates segments of a generic buffer by length and returns an offset to the reserved block, which can then be used with struct to pack values to store. The data structure is adapted from the GNU PAVL binary tree. Allocated blocks can be cast to ctypes.Structure instances using some monkey patching, which is optional. Want to open-source it. Any interest? Just do it. That way users can come along later. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: In-place memory manager, mmap
castironpi wrote: On Aug 24, 9:52 am, Kris Kennaway <[EMAIL PROTECTED]> wrote: castironpi wrote: Hi, I've got an "in-place" memory manager that uses a disk-backed memory- mapped buffer. Among its possibilities are: storing variable-length strings and structures for persistence and interprocess communication with mmap. It allocates segments of a generic buffer by length and returns an offset to the reserved block, which can then be used with struct to pack values to store. The data structure is adapted from the GNU PAVL binary tree. Allocated blocks can be cast to ctypes.Structure instances using some monkey patching, which is optional. Want to open-source it. Any interest? Just do it. That way users can come along later. Kris How? My website? Google Code? Too small for source forge, I think. -- http://mail.python.org/mailman/listinfo/python-list Any of those 3 would work fine, but the last two are probably better (sourceforge hosts plenty of tiny projects) if you don't want to have to manage your server and related infrastructure yourself. Kris -- http://mail.python.org/mailman/listinfo/python-list