really slow gzip decompress, why?

2009-01-26 Thread redbaron
I've one big (6.9 Gb) .gz file with text inside it.
zcat bigfile.gz > /dev/null does the job in 4 minutes 50 seconds

python code have been doing the same job for 25 minutes and still
doesn't finish =( the code is simpliest I could ever imagine:

def main():
  fh = gzip.open(sys.argv[1])
  all(fh)

As far as I understand most of the time it executes C code, so pythons
no overhead should be noticible. Why is it so slow?
--
http://mail.python.org/mailman/listinfo/python-list


Could you recommend job schedulling solution?

2009-02-11 Thread redbaron
I've sinlge 8-way node dedicated for executing long running tasks. To
be able to execute multiple tasks on this node it shoud  spawn each
task in another process. At the same time it should accept network
connection with new tasks without blocking of client and put it on job
queue.

What is "task" ? Executing just ordinary python function will be
enough. If solution contain some client library which allow easy task
submit it will be great.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Could you recommend job schedulling solution?

2009-02-11 Thread redbaron
On 11 фев, 20:26, "bruce"  wrote:
> hi...
>
> not sure exactly what you're looking for, but "condor" has a robust job
> scheduling architecture for dealing with grid/distributed setups over
> multiple systems..
>
> give us more information, and there might be other suggestions!

Condor, Globus or any other grid system looks like serious overkill
for me. I need some sort of job manager, which accepts jobs from
clients as pickled python functions and queues/executes them. Client
side should be able to ask status of submitted job, their return
value, ask to cancel job etc.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Could you recommend job schedulling solution?

2009-02-12 Thread redbaron
>
> I think parallel python will take of that for you
> (http://www.parallelpython.com/)

I've found that RPyC (http://rpyc.wikidot.com/) is quite usefull for
my task.  It allows me to build RPC service which accepts ordinary
python function from client and return result in synchronous or
asynchronous way. Of course it is able to serve multiple clients and
it's forking server help me to solve GIL problems on intense
calculations inside Python code.

Some limitations are present, like the fact that you couldn't send on
execution any code which use C extensions which are not present on RPC
server, but I didn't expect this could work at all, so in general RPyC
makes me happy.
--
http://mail.python.org/mailman/listinfo/python-list


Re: cx_Oracle-5.0 Problem

2009-02-12 Thread redbaron
> ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.6/
> lib/python2.6/site-packages/cx_Oracle.so, 2): Symbol not found:
> ___divdi3

You didn't link cx_Oracle.so all libs which it use. run "ldd -r
cx_Oracle.so" and you'll have an idea about all missing symbols. The
names of missed symbols could give you an idea what else should
cx_Oracle.so should be linked with

--
http://mail.python.org/mailman/listinfo/python-list


Re: something wrong with isinstance

2009-02-12 Thread redbaron
Don't really sure, but try to define your class as new-style one.
Like
class GeoMap(object):
   ...

--
http://mail.python.org/mailman/listinfo/python-list


Re: Break large file down into multiple files

2009-02-13 Thread redbaron
> New to python I have a large file that I need to break up into
> multiple smaller files. I need to break the large file into sections
> where there are 65535 lines and then write those sections to seperate
> files.

If your lines are variable-length, then look at itertools recipes.

from itertools import izip_longest

def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)

with open("/file","r") as f:
for lines in grouper(65535,f,""):
data_to_write = '\n'.join(lines).rstrip("\n")
...

...

--
http://mail.python.org/mailman/listinfo/python-list


setuptools - library dependencies

2008-12-01 Thread redbaron
I try to write setup.py which compiles  C Extenstion (A). The problem
is the fact, that my C Extension depends on another C lib (B). I've
digged it setuptools sources a bit and found "libraries" option for
setuptools.setup.
Now it compile B library at build_clib stage and A Extenstion at
build_ext stage. But it doesn't pack B library in final egg file, only
A one. Even more, it compiles B as static lib, but links it to A as
dynamic, it leads to undefined symbols in A. How could I either:

1) link A with B staticaly?
2) put B in final egg in same dir as A?
--
http://mail.python.org/mailman/listinfo/python-list


multiprocessing: queue.get() blocks even if queue.qsize() != 0

2008-10-15 Thread redbaron
I run into problem with queue from multiprocessing. Even if I
queue.qsize() != 0 queue.get() still blocks and queue.get_nowait()
raises Emtpy error.

I'm unable to cut my big part to small test case, because smaller test
case similair to my real app by design is works. In what conditions is
it possible?

while qresult.qsize():
result = qresult.get()  #this code blocks!
doWithResult(result)
--
http://mail.python.org/mailman/listinfo/python-list


multiprocessing: Queue.get_nowait() never returns data

2008-10-15 Thread redbaron
I stuck in new multiprocessing module (ex. processing). I dont'
understand why queue.get_nowait() never returns data, but always
raises Empty, even if it is guaranteed that queue is not empty.

I've created small test case, here it is: http://pastebin.ca/1227666

Hope someone could explain why I'm wrong.It designed for 2.6 with
multiprocessing module,but it's trivial to convert it to processing
module for 2.5, just replace multiprocessing with "processing" and
"freeze_support" with "freezeSupport"
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing: Queue.get_nowait() never returns data

2008-10-15 Thread redbaron
my fault. changing "continue" to "break" solves the problem
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing eats memory

2008-09-26 Thread redbaron
On 26 сент, 04:20, Istvan Albert <[EMAIL PROTECTED]> wrote:
> On Sep 25, 8:40 am, "Max Ivanov" <[EMAIL PROTECTED]> wrote:
>
> > At any time in main process there are shouldn't be no more than two copies 
> > of data
> > (one original data and one result).
>
> From the looks of it you are storing a lots of references to various
> copies of your data via the async set.

How could I avoid of storing them? I need something to check does it
ready or not and retrieve results if ready. I couldn't see the way to
achieve same result without storing asyncs set.
--
http://mail.python.org/mailman/listinfo/python-list

Re: multiprocessing eats memory

2008-09-26 Thread redbaron
On 26 сент, 17:03, MRAB <[EMAIL PROTECTED]> wrote:
> On Sep 26, 9:52 am, redbaron <[EMAIL PROTECTED]> wrote:
>
> > On 26 ÓÅÎÔ, 04:20, Istvan Albert <[EMAIL PROTECTED]> wrote:
>
> > > On Sep 25, 8:40šam, "Max Ivanov" <[EMAIL PROTECTED]> wrote:
>
> > > > At any time in main process there are shouldn't be no more than two 
> > > > copies of data
> > > > (one original data and one result).
>
> > > From the looks of it you are storing a lots of references to various
> > > copies of your data via the async set.
>
> > How could I avoid of storing them? I need something to check does it
> > ready or not and retrieve results if ready. I couldn't see the way to
> > achieve same result without storing asyncs set.
>
> You could give each worker process an ID and then have them put the ID
> into a queue to signal to the main process when finished.
And how could I retrieve result from worker process without async?

>
> BTW, your test-case modifies the asyncs set while iterating over it,
> which is a bad idea.
My fault, there was list(asyncs) originally.
--
http://mail.python.org/mailman/listinfo/python-list

Re: multiprocessing eats memory

2008-09-27 Thread redbaron
> When processing data in parallel you will use up as muchmemoryas
> many datasets you are processing at any given time.
Worker processes eats 2-4 times more than I pass to them.


>If you need to
> reducememoryuse then you need to start fewer processes and use some
> mechanism to distribute the work on them as they become free. (see
> recommendation that uses Queues)
I don't understand how could I use Queue here? If worker process
finish computing, it puts its' id into Queue, in main process I
retrieve that id and how could I retrieve result from worker process
then?

--
http://mail.python.org/mailman/listinfo/python-list


Using logging module to log either to screen or a file

2010-12-07 Thread RedBaron
Hi,
I am beginner to python and i am writing a program that does a lot of
things. One of the requirements is that the program shud generate a
log file. I came across python loggging module and found it very
useful. But I have a few problems
Suppose by giving option '-v' along with the program the user can turn
off logging to a file and instead display log on the screen. Since I
am using a config file for logging, how do I accomplish this.
I tried to define two handlers (fil and screen) and added it to my
logger. But that logs data to both screen and the file. I need to log
it to only one. How do I dynamically remove one of the handler from
the logger based on user option. As a precursor how do i reference the
handlers defined in config file in the code??
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using logging module to log either to screen or a file

2010-12-07 Thread RedBaron
On Dec 7, 7:33 pm, Jean-Michel Pichavant 
wrote:
> RedBaron wrote:
> > Hi,
> > I am beginner to python and i am writing a program that does a lot of
> > things. One of the requirements is that the program shud generate a
> > log file. I came across python loggging module and found it very
> > useful. But I have a few problems
> > Suppose by giving option '-v' along with the program the user can turn
> > off logging to a file and instead display log on the screen. Since I
> > am using a config file for logging, how do I accomplish this.
> > I tried to define two handlers (fil and screen) and added it to my
> > logger. But that logs data to both screen and the file. I need to log
> > it to only one. How do I dynamically remove one of the handler from
> > the logger based on user option. As a precursor how do i reference the
> > handlers defined in config file in the code??
>
> your logger has a public 'handlers' attribute.
>
> consoleHandlers = [h for h in logger.handlers if h.__class__ is
> logging.StreamHandler] # the list of handlers logging to the console
> (assuming they are instances of the StreamHandler class)
>
> if consoleHandlers:
>     h1 = consoleHandlers[0]
>     h1.filter = lambda x:True # enable the handler
>     h1.filter = lambda x:False # disable the handler
>
> JM

Thanks JM,
This works like charm. I had also though on similar lines bt I was
using isinstance(). I have two handlers - logging.RotatingFIleHandler
and StreamHandler. isinstance() was weird in the sense that no matter
which handle I checked for being 'StreamHandler' I always got true.
Also instead of setting filter to false, I was popping from the
handlers list...Silly me
Thanks a ton
-- 
http://mail.python.org/mailman/listinfo/python-list


Case Sensitive Section names configparser

2010-12-08 Thread RedBaron
Is there any way by which configParser's get() function can be made
case insensitive?
-- 
http://mail.python.org/mailman/listinfo/python-list