How to Buffer Serialized Objects to Disk

2011-01-12 Thread Scott McCarty
Sorry to ask this question. I have search the list archives and googled, but
I don't even know what words to find what I am looking for, I am just
looking for a little kick in the right direction.

I have a Python based log analysis program called petit (
http://crunchtools.com/petit). I am trying to modify it to manage the main
object types to and from disk.

Essentially, I have one object which is a list of a bunch of "Entry"
objects. The Entry objects have date, time, date, etc fields which I use for
analysis techniques. At the very beginning I build up the list of objects
then would like to start pickling it while building to save memory. I want
to be able to process more entries than I have memory. With a strait list it
looks like I could build from xreadlines(), but once you turn it into a more
complex object, I don't quick know where to go.

I understand how to pickle the entire data structure, but I need something
that will manage the memory/disk allocation?  Any thoughts?

Gracias
Scott M
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to Buffer Serialized Objects to Disk

2011-01-12 Thread Scott McCarty
Been digging ever since I posted this. I suspected that the response might
be use a database. I am worried I am trying to reinvent the wheel. The
problem is I don't want any dependencies and I also don't need persistence
program runs. I kind of wanted to keep the use of petit very similar to cat,
head, awk, etc. But, that said, I have realized that if I provide the
analysis features as an API, you very well, might want persistence between
runs.

What about using an array inside a shelve?

Just got done messing with this in python shell:

import shelve

d = shelve.open(filename="/root/test.shelf", protocol=-1)

d["log"] = ()
d["log"].append("test1")
d["log"].append("test2")
d["log"].append("test3")

Then, always interacting with d["log"], for example:

for i in d["log"]:
print i

Thoughts?


I know this won't manage memory, but it will keep the footprint down right?
On Wed, Jan 12, 2011 at 5:04 PM, Peter Otten <__pete...@web.de> wrote:

> Scott McCarty wrote:
>
> > Sorry to ask this question. I have search the list archives and googled,
> > but I don't even know what words to find what I am looking for, I am just
> > looking for a little kick in the right direction.
> >
> > I have a Python based log analysis program called petit (
> > http://crunchtools.com/petit). I am trying to modify it to manage the
> main
> > object types to and from disk.
> >
> > Essentially, I have one object which is a list of a bunch of "Entry"
> > objects. The Entry objects have date, time, date, etc fields which I use
> > for analysis techniques. At the very beginning I build up the list of
> > objects then would like to start pickling it while building to save
> > memory. I want to be able to process more entries than I have memory.
> With
> > a strait list it looks like I could build from xreadlines(), but once you
> > turn it into a more complex object, I don't quick know where to go.
> >
> > I understand how to pickle the entire data structure, but I need
> something
> > that will manage the memory/disk allocation?  Any thoughts?
>
> You can write multiple pickled objects into a single file:
>
> import cPickle as pickle
>
> def dump(filename, items):
>with open(filename, "wb") as out:
>dump = pickle.Pickler(out).dump
>for item in items:
>dump(item)
>
> def load(filename):
>with open(filename, "rb") as instream:
>load = pickle.Unpickler(instream).load
>while True:
>try:
>item = load()
>except EOFError:
>break
>yield item
>
> if __name__ == "__main__":
>filename = "tmp.pickle"
>from collections import namedtuple
>T = namedtuple("T", "alpha beta")
>dump(filename, (T(a, b) for a, b in zip("abc", [1,2,3])))
>for item in load(filename):
>print item
>
> To get random access you'd have to maintain a list containing the offsets
> of
> the entries in the file.
> However, a simple database like SQLite is probably sufficient for the kind
> of entries you have in mind, and it allows operations like aggregation,
> sorting and grouping out of the box.
>
> Peter
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list


CPython Signal Handler Check for SIGKILL

2010-07-19 Thread Scott McCarty
All,  I just want to understand the C/Python piece better because I am
writing a tutorial on signals and I am using python to demonstrate. I
thought it would be fun to show that the SIGKILL is never processed, but
instead python errors out. There is something in Python checking the SIGKILL
signal handler, while not checking SIGTERM and I can't find the Python C
code that handles it. When I am writing something like this, I like to point
out the C code, not just cite the documentation (I did find this change in
behaviour noted in the Python change log).

I have searched everywhere (mostly the code and a little google) and I
cannot understand where the SIGKILL signal gets checked when it is set as a
handler. I have scoured the Modules/signalmodule.c only to find two
instances of the RuntimeError exception, but I cannot understand how python
knows when a handler is set for SIGKILL. I understand that this changed in
2.4 and I am not trying to change it, I just really want to understand where
this happens. I used grep to find SIGKILL and SIGTERM to see if I could
determine where the critical difference is, but I cannot figure it out.

I have about 2 hours of searching around and I can't figure it out, I assume
it has to rely on some default behaviour in Unix, but I have no idea. I
don't see a difference between SIGKILL and SIGTERM in the python code, but
obviously there is some difference. I understand what the difference is in
Unix/Linux, I just want to see it in the python code. Since python is
checking at run time to see what signals handlers are added, I know there
must be a difference. I am not asking about the signals, I understand them,
I am asking about the registration of the SIGNAL handler and how it knows
that you are trying to register SIGKILL, you get an error like this.

 ./signal-catcher.py
Traceback (most recent call last):
  File "./signal-catcher.py", line 22, in 
signal.signal(signal.SIGKILL, signal_handler_kill)
RuntimeError: (22, 'Invalid argument')

And the code is very simple, this attempts to register a handler for
SIGKILL, but python knows and won't let you.

signal.signal(signal.SIGKILL, signal_handler_kill)

Please  can someone just point me in the right direction.

Thank You
Scott M
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: CPython Signal Handler Check for SIGKILL

2010-07-19 Thread Scott McCarty
Yes, yes, thank you both. That is exactly what I didn't understand, I knew
it was some how linked to the C library and wasn't exactly being handled or
decided at the Python layer, I just didn't understand the C part good
enough. I have found the CPython source code that checks. I see what you are
saying, it is basically checking for SIG_ERR like the C code and just
setting the RuntimeError which forces an exit, thereby making the python
module respond in a way very similar to the C library.

Here is the CPython code in Modules/signalmodule.c

if (PyOS_setsig(sig_num, func) == SIG_ERR) {
PyErr_SetFromErrno(PyExc_RuntimeError);
return NULL;
}

Second, I would like to apologize, this list is amazing, and I made a stupid
comment on the core developers mailing list this morning because I didn't
understand that this was the right place to post this question.

Thank You
Scott M

On Mon, Jul 19, 2010 at 2:06 PM, Antoine Pitrou  wrote:

>
> Hello,
>
> > I am not asking about the signals, I understand them,
> > I am asking about the registration of the SIGNAL handler and how it knows
> > that you are trying to register SIGKILL, you get an error like this.
> >
> >  ./signal-catcher.py
> > Traceback (most recent call last):
> >   File "./signal-catcher.py", line 22, in 
> > signal.signal(signal.SIGKILL, signal_handler_kill)
> > RuntimeError: (22, 'Invalid argument')
>
> >>> import errno
> >>> errno.errorcode[22]
> 'EINVAL'
>
> EINVAL is the error returned by the standard POSIX signal() function
> when trying to register a handler for SIGKILL. As the signal() man page
> says:
>
> [...]
>   The signals SIGKILL and SIGSTOP cannot be caught or ignored.
> [...]
> ERRORS
>   EINVAL signum is invalid.
>
>
> So, in short, Python doesn't check SIGKILL by itself. It's just
> forbidden by the underlying C standard library, and Python propagates
> the error as a RuntimeError.
>
> Regards
>
> Antoine.
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list