How to Buffer Serialized Objects to Disk
Sorry to ask this question. I have search the list archives and googled, but I don't even know what words to find what I am looking for, I am just looking for a little kick in the right direction. I have a Python based log analysis program called petit ( http://crunchtools.com/petit). I am trying to modify it to manage the main object types to and from disk. Essentially, I have one object which is a list of a bunch of "Entry" objects. The Entry objects have date, time, date, etc fields which I use for analysis techniques. At the very beginning I build up the list of objects then would like to start pickling it while building to save memory. I want to be able to process more entries than I have memory. With a strait list it looks like I could build from xreadlines(), but once you turn it into a more complex object, I don't quick know where to go. I understand how to pickle the entire data structure, but I need something that will manage the memory/disk allocation? Any thoughts? Gracias Scott M -- http://mail.python.org/mailman/listinfo/python-list
Re: How to Buffer Serialized Objects to Disk
Been digging ever since I posted this. I suspected that the response might be use a database. I am worried I am trying to reinvent the wheel. The problem is I don't want any dependencies and I also don't need persistence program runs. I kind of wanted to keep the use of petit very similar to cat, head, awk, etc. But, that said, I have realized that if I provide the analysis features as an API, you very well, might want persistence between runs. What about using an array inside a shelve? Just got done messing with this in python shell: import shelve d = shelve.open(filename="/root/test.shelf", protocol=-1) d["log"] = () d["log"].append("test1") d["log"].append("test2") d["log"].append("test3") Then, always interacting with d["log"], for example: for i in d["log"]: print i Thoughts? I know this won't manage memory, but it will keep the footprint down right? On Wed, Jan 12, 2011 at 5:04 PM, Peter Otten <__pete...@web.de> wrote: > Scott McCarty wrote: > > > Sorry to ask this question. I have search the list archives and googled, > > but I don't even know what words to find what I am looking for, I am just > > looking for a little kick in the right direction. > > > > I have a Python based log analysis program called petit ( > > http://crunchtools.com/petit). I am trying to modify it to manage the > main > > object types to and from disk. > > > > Essentially, I have one object which is a list of a bunch of "Entry" > > objects. The Entry objects have date, time, date, etc fields which I use > > for analysis techniques. At the very beginning I build up the list of > > objects then would like to start pickling it while building to save > > memory. I want to be able to process more entries than I have memory. > With > > a strait list it looks like I could build from xreadlines(), but once you > > turn it into a more complex object, I don't quick know where to go. > > > > I understand how to pickle the entire data structure, but I need > something > > that will manage the memory/disk allocation? Any thoughts? > > You can write multiple pickled objects into a single file: > > import cPickle as pickle > > def dump(filename, items): >with open(filename, "wb") as out: >dump = pickle.Pickler(out).dump >for item in items: >dump(item) > > def load(filename): >with open(filename, "rb") as instream: >load = pickle.Unpickler(instream).load >while True: >try: >item = load() >except EOFError: >break >yield item > > if __name__ == "__main__": >filename = "tmp.pickle" >from collections import namedtuple >T = namedtuple("T", "alpha beta") >dump(filename, (T(a, b) for a, b in zip("abc", [1,2,3]))) >for item in load(filename): >print item > > To get random access you'd have to maintain a list containing the offsets > of > the entries in the file. > However, a simple database like SQLite is probably sufficient for the kind > of entries you have in mind, and it allows operations like aggregation, > sorting and grouping out of the box. > > Peter > > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list
CPython Signal Handler Check for SIGKILL
All, I just want to understand the C/Python piece better because I am writing a tutorial on signals and I am using python to demonstrate. I thought it would be fun to show that the SIGKILL is never processed, but instead python errors out. There is something in Python checking the SIGKILL signal handler, while not checking SIGTERM and I can't find the Python C code that handles it. When I am writing something like this, I like to point out the C code, not just cite the documentation (I did find this change in behaviour noted in the Python change log). I have searched everywhere (mostly the code and a little google) and I cannot understand where the SIGKILL signal gets checked when it is set as a handler. I have scoured the Modules/signalmodule.c only to find two instances of the RuntimeError exception, but I cannot understand how python knows when a handler is set for SIGKILL. I understand that this changed in 2.4 and I am not trying to change it, I just really want to understand where this happens. I used grep to find SIGKILL and SIGTERM to see if I could determine where the critical difference is, but I cannot figure it out. I have about 2 hours of searching around and I can't figure it out, I assume it has to rely on some default behaviour in Unix, but I have no idea. I don't see a difference between SIGKILL and SIGTERM in the python code, but obviously there is some difference. I understand what the difference is in Unix/Linux, I just want to see it in the python code. Since python is checking at run time to see what signals handlers are added, I know there must be a difference. I am not asking about the signals, I understand them, I am asking about the registration of the SIGNAL handler and how it knows that you are trying to register SIGKILL, you get an error like this. ./signal-catcher.py Traceback (most recent call last): File "./signal-catcher.py", line 22, in signal.signal(signal.SIGKILL, signal_handler_kill) RuntimeError: (22, 'Invalid argument') And the code is very simple, this attempts to register a handler for SIGKILL, but python knows and won't let you. signal.signal(signal.SIGKILL, signal_handler_kill) Please can someone just point me in the right direction. Thank You Scott M -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython Signal Handler Check for SIGKILL
Yes, yes, thank you both. That is exactly what I didn't understand, I knew it was some how linked to the C library and wasn't exactly being handled or decided at the Python layer, I just didn't understand the C part good enough. I have found the CPython source code that checks. I see what you are saying, it is basically checking for SIG_ERR like the C code and just setting the RuntimeError which forces an exit, thereby making the python module respond in a way very similar to the C library. Here is the CPython code in Modules/signalmodule.c if (PyOS_setsig(sig_num, func) == SIG_ERR) { PyErr_SetFromErrno(PyExc_RuntimeError); return NULL; } Second, I would like to apologize, this list is amazing, and I made a stupid comment on the core developers mailing list this morning because I didn't understand that this was the right place to post this question. Thank You Scott M On Mon, Jul 19, 2010 at 2:06 PM, Antoine Pitrou wrote: > > Hello, > > > I am not asking about the signals, I understand them, > > I am asking about the registration of the SIGNAL handler and how it knows > > that you are trying to register SIGKILL, you get an error like this. > > > > ./signal-catcher.py > > Traceback (most recent call last): > > File "./signal-catcher.py", line 22, in > > signal.signal(signal.SIGKILL, signal_handler_kill) > > RuntimeError: (22, 'Invalid argument') > > >>> import errno > >>> errno.errorcode[22] > 'EINVAL' > > EINVAL is the error returned by the standard POSIX signal() function > when trying to register a handler for SIGKILL. As the signal() man page > says: > > [...] > The signals SIGKILL and SIGSTOP cannot be caught or ignored. > [...] > ERRORS > EINVAL signum is invalid. > > > So, in short, Python doesn't check SIGKILL by itself. It's just > forbidden by the underlying C standard library, and Python propagates > the error as a RuntimeError. > > Regards > > Antoine. > > > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list