weave and 64 bit issues

2013-05-14 Thread Jadhav, Alok
Hi everyone,

 

I am facing a strange problem using weave on 64 bit machine.
Specifically with weave's inline function. It has something to do with
weave's catalog. 

 

Similar issues I found in the past (very old)

 

http://mail.scipy.org/pipermail/scipy-dev/2006-June/005908.html

http://mail.scipy.org/pipermail/scipy-dev/2005-June/003042.html

 

 

I have a simple script to calculate moving average using weave's inline
function. 

 

 

 

File mvg.py

 

import numpy as np

import scipy.weave as weave

import distutils.sysconfig

import distutils.dir_util

import os

 

distutils.sysconfig._config_vars["LDSHARED"]="-LC:\strawberry64\c\x86_64
-w64-mingw32\lib"

 

 

def ExpMovAvg(data,time,lag):

if (data.size!=time.size):

print "error in EMA, data and time have different size"

return None

result=np.repeat(0.0,data.size)

code="""

#line 66 "basics.py"

result(0)=data(0);

for (int i=0;i1)

{

alpha=10;

}

result(i+1)=(1-alpha)*data(i)+alpha*result(i);

}

"""

 
weave.inline(code,["data","time","lag","result"],type_converters=weave.c
onverters.blitz,headers=[""],compiler="gcc",verbose=2)

return result

 

 

file test.py

 

import string

import numpy as np

import mvg

 

print(mvg.ExpMovAvg(np.array(range(10)),np.array(range(10)),2))

 

 

 

Output:

 

 

Working output:

 

Y:\STMM\alpha\klse\PROD>c:\python27\python.exe
s:\common\tools\python\python-2.7-64bit\test.py

[ 0.  0.  0.63212774  1.49679774  2.44701359  3.42869938
4.42196209  5.41948363  6.41857187  7.41823646]

 

Now if I keep running the script multiple times, sometimes I see correct
output... but suddenly sometimes I get below error.

 

Y:\STMM\alpha\klse\PROD>c:\python27\python.exe
s:\common\tools\python\python-2.7-64bit\test.py

repairing catalog by removing key



Looking for python27.dll

running build_ext

running build_src

build_src

building extension "sc_44f3fe3c65d5c3feecb45d9269ac207f5" sources

build_src: building npy-pkg config files

Looking for python27.dll

customize Mingw32CCompiler

customize Mingw32CCompiler using build_ext

Looking for python27.dll

customize Mingw32CCompiler

customize Mingw32CCompiler using build_ext

building 'sc_44f3fe3c65d5c3feecb45d9269ac207f5' extension

compiling C++ sources

C compiler: g++ -g -DDEBUG -DMS_WIN64 -O0 -Wall

 

compile options: '-Ic:\python27\lib\site-packages\scipy\weave
-Ic:\python27\lib\site-packages\scipy\weave\scxx
-Ic:\python27\lib\site-packages\scipy\weave\blitz

-Ic:\python27\lib\site-packages\numpy\core\include -Ic:\python27\include
-Ic:\python27\PC -c' 

 g++ -g -DDEBUG -DMS_WIN64 -O0 -Wall
-Ic:\python27\lib\site-packages\scipy\weave

-Ic:\python27\lib\site-packages\scipy\weave\scxx
-Ic:\python27\lib\site-packages\scipy\weave\blitz
-Ic:\python27\lib\site-packages\numpy\core\include -Ic:\pytho

n27\include -Ic:\python27\PC -c
c:\users\ajadhav2\appdata\local\temp\ajadhav2\python27_compiled\sc_44f3f
e3c65d5c3feecb45d9269ac207f5.cpp -o c:\users\ajadhav2\ap

pdata\local\temp\ajadhav2\python27_intermediate\compiler_2d3e1e2e4de6a91
419d2376b162e5342\Release\users\ajadhav2\appdata\local\temp\ajadhav2\pyt
hon27_compiled\s

c_44f3fe3c65d5c3feecb45d9269ac207f5.o

Found executable C:\strawberry\c\bin\g++.exe

g++ -g -DDEBUG -DMS_WIN64 -O0 -Wall
-Ic:\python27\lib\site-packages\scipy\weave-Ic:\python27\lib\site-packag
es\scipy\weave\scxx -Ic:\python27\lib\site-packages

\scipy\weave\blitz -Ic:\python27\lib\site-packages\numpy\core\include
-Ic:\python27\include -Ic:\python27\PC -c
c:\python27\lib\site-packages\scipy\weave\scxx\w

eave_imp.cpp -o
c:\users\ajadhav2\appdata\local\temp\ajadhav2\python27_intermediate\comp
iler_2d3e1e2e4de6a91419d2376b162e5342\Release\python27\lib\site-packages

\scipy\weave\scxx\weave_imp.o

g++ -g -shared
c:\users\ajadhav2\appdata\local\temp\ajadhav2\python27_intermediate\comp
iler_2d3e1e2e4de6a91419d2376b162e5342\Release\users\ajadhav2\appdata\loc
a

l\temp\ajadhav2\python27_compiled\sc_44f3fe3c65d5c3feecb45d9269ac207f5.o
c:\users\ajadhav2\appdata\local\temp\ajadhav2\python27_intermediate\comp
iler_2d3e1e2e4d

e6a91419d2376b162e5342\Release\python27\lib\site-packages\scipy\weave\sc
xx\weave_imp.o -Lc:\python27\libs -Lc:\python27\PCbuild\amd64 -lpython27
-lmsvcr90 -o c:

\users\ajadhav2\appdata\local\temp\ajadhav2\python27_compiled\sc_44f3fe3
c65d5c3feecb45d9269ac207f5.pyd

running scons

Traceback (most recent call last):

  File "s:\common\tools\python\python-2.7-64bit\test.py", line 5, in


print(mvg.ExpMovAvg(np.array(range(10)),np.array(range(10)),2))

  File "s:\common\tools\python\python-2.7-64bit\mvg.py", line 30, in
ExpMovAvg

 
weave.inline(code,["data","time","lag","result"],type_converters=weave.c
onve

rters.blitz,headers=[""],compiler="gcc",verbose=2)

  File "c:\python27\lib\site-packages\scipy\weave\inline_tools.py", line
355, in

inline

**kw)

  File "c:\python27\lib\site-packages\scipy\weave\inline_tools.py",

weave in 64 bit strange behavior

2013-05-14 Thread Jadhav, Alok
Hi everyone,

 

I realize my previous post was quite unreadable, thanks to my email
client. I am going to report my question here, with slight enhancements.
Apologies for inconvenience caused and spamming your mailboxes.

 

I am facing a strange problem using weave on 64 bit machine.
Specifically with weave's inline function. It has something to do with
weave's catalog. 

 

Similar issues I found in the past (very old)

 

http://mail.scipy.org/pipermail/scipy-dev/2006-June/005908.html 

http://mail.scipy.org/pipermail/scipy-dev/2005-June/003042.html

 

Common things I have in my observation are:

 

-  Already working setup in 32 bit doesn't work in same manner
in 64 bit env

-  Weave recompiles inline code which does not require any
recompilation. This is random behavior.  Whenever weave recompiles I see
a notification "repairing catalog by removing key" in the output which
ends up in the error message "ImportError: DLL load failed: Invalid
access to memory location"

-  Sometimes gcc gets into an infinite loop printing error
message "Looking for python27.dll". Even though the dll is on the path.
This process doesn't end. Had to kill it forcefully. G++ process became
ghost even after killing python process. 

 

Could someone advise what am I missing here. Is there any specific setup
that I need to do? Is there an issue with python 27 64 bit weave
implementation? 

 

Regards,

Alok

 



 

I have a simple script to calculate moving average using weave's inline
function. 

 

source mvg.py

 

import numpy as np

import scipy.weave as weave

import distutils.sysconfig

import distutils.dir_util

import os

 

distutils.sysconfig._config_vars["LDSHARED"]="-LC:\strawberry\c\x86_64-w
64-mingw32\lib"

 

def ExpMovAvg(data,time,lag):

if (data.size!=time.size):

print "error in EMA, data and time have different size"

return None

 

result=np.repeat(0.0,data.size)

code="""

#line 66 "basics.py"

result(0)=data(0);

for (int i=0;i1)

{

alpha=10;

}

result(i+1)=(1-alpha)*data(i)+alpha*result(i);

}

"""

 
weave.inline(code,["data","time","lag","result"],type_converters=weave.c
onverters.blitz,headers=[""],compiler="gcc",verbose=2)

return result

 

source test.py

 

import string

import numpy as np

import mvg

 

print(mvg.ExpMovAvg(np.array(range(10)),np.array(range(10)),2))

 

 

Output:

 

Working output:

 

Y:\STMM\alpha\klse\PROD>c:\python27\python.exe

s:\common\tools\python\python-2.7-64bit\test.py

 

[ 0.  0.  0.63212774  1.49679774  2.44701359  3.42869938

4.42196209  5.41948363  6.41857187  7.41823646]

 

Now if I keep running the script multiple times, sometimes I see correct

output... but suddenly sometimes I get below error.

 

Y:\STMM\alpha\klse\PROD>c:\python27\python.exe
s:\common\tools\python\python-2.7-64bit\test.py

 

repairing catalog by removing key



Looking for python27.dll

running build_ext

running build_src

build_src

building extension "sc_44f3fe3c65d5c3feecb45d9269ac207f5" sources

build_src: building npy-pkg config files

Looking for python27.dll

 

customize Mingw32CCompiler

customize Mingw32CCompiler using build_ext

Looking for python27.dll

customize Mingw32CCompiler

customize Mingw32CCompiler using build_ext

building 'sc_44f3fe3c65d5c3feecb45d9269ac207f5' extension

compiling C++ sources

C compiler: g++ -g -DDEBUG -DMS_WIN64 -O0 -Wall

 

compile options: '-Ic:\python27\lib\site-packages\scipy\weave
-Ic:\python27\lib\site-packages\scipy\weave\scxx
-Ic:\python27\lib\site-packages\scipy\weave\blitz
-Ic:\python27\lib\site-packages\numpy\core\include -Ic:\python27\include
-Ic:\python27\PC -c' 

g++ -g -DDEBUG -DMS_WIN64 -O0 -Wall
-Ic:\python27\lib\site-packages\scipy\weave
-Ic:\python27\lib\site-packages\scipy\weave\scxx
-Ic:\python27\lib\site-packages\scipy\weave\blitz
-Ic:\python27\lib\site-packages\numpy\core\include -Ic:\python27\include
-Ic:\python27\PC -c
c:\users\ajadhav2\appdata\local\temp\ajadhav2\python27_compiled\sc_44f3f
e3c65d5c3feecb45d9269ac207f5.cpp -o
c:\users\ajadhav2\appdata\local\temp\ajadhav2\python27_intermediate\comp
iler_2d3e1e2e4de6a91419d2376b162e5342\Release\users\ajadhav2\appdata\loc
al\temp\ajadhav2\python27_compiled\sc_44f3fe3c65d5c3feecb45d9269ac207f5.
o

 

Found executable C:\strawberry\c\bin\g++.exe

g++ -g -DDEBUG -DMS_WIN64 -O0
-Wall-Ic:\python27\lib\site-packages\scipy\weave-Ic:\python27\lib\site-p
ackages\scipy\weave\scxx
-Ic:\python27\lib\site-packages\scipy\weave\blitz
-Ic:\python27\lib\site-packages\numpy\core\include-Ic:\python27\include
-Ic:\python27\PC -c
c:\python27\lib\site-packages\scipy\weave\scxx\weave_imp.cpp -o
c:\users\ajadhav2\appdata\local\temp\ajadhav2\python27_intermediate\comp
iler_2d3e1e2e4de6a91419d2376b162e5342\Release\python27\lib\site-packages
\scipy\weave\scxx\weave_imp.o

g++ -g -shared
c:\users\ajadhav2\appdata\local\temp\ajadhav2\python27_intermediate\comp
iler

Python garbage collector/memory manager behaving strangely

2012-09-16 Thread Jadhav, Alok
Hi Everyone,

 

I have a simple program which reads a large file containing few million
rows, parses each row (`numpy array`) and converts into an array of
doubles (`python array`) and later writes into an `hdf5 file`. I repeat
this loop for multiple days. After reading each file, i delete all the
objects and call garbage collector.  When I run the program, First day
is parsed without any error but on the second day i get `MemoryError`. I
monitored the memory usage of my program, during first day of parsing,
memory usage is around **1.5 GB**. When the first day parsing is
finished, memory usage goes down to **50 MB**. Now when 2nd day starts
and i try to read the lines from the file I get `MemoryError`. Following
is the output of the program.

 

 

source file extracted at C:\rfadump\au\2012.08.07.txt

parsing started

current time: 2012-09-16 22:40:16.829000

50 lines parsed

100 lines parsed

150 lines parsed

200 lines parsed

250 lines parsed

300 lines parsed

350 lines parsed

400 lines parsed

450 lines parsed

500 lines parsed

parsing done.

end time is 2012-09-16 23:34:19.931000

total time elapsed 0:54:03.102000

repacking file

done

> s:\users\aaj\projects\pythonhf\rfadumptohdf.py(132)generateFiles()

-> while single_date <= self.end_date:

(Pdb) c

*** 2012-08-08 ***

source file extracted at C:\rfadump\au\2012.08.08.txt

cought an exception while generating file for day 2012-08-08.

Traceback (most recent call last):

  File "rfaDumpToHDF.py", line 175, in generateFile

lines = self.rawfile.read().split('|\n')

MemoryError

 

I am very sure that windows system task manager shows the memory usage
as **50 MB** for this process. It looks like the garbage collector or
memory manager for Python is not calculating the free memory correctly.
There should be lot of free memory but it thinks there is not enough. 

 

Any idea?

 

Thanks.

 

 

Alok Jadhav

CREDIT SUISSE AG

GAT IT Hong Kong, KVAG 67

International Commerce Centre | Hong Kong | Hong Kong

Phone +852 2101 6274 | Mobile +852 9169 7172

alok.jad...@credit-suisse.com | www.credit-suisse.com
 

 


=== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=== 

-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Python garbage collector/memory manager behaving strangely

2012-09-16 Thread Jadhav, Alok
Thanks Dave for clean explanation. I clearly understand what is going on
now. I still need some suggestions from you on this. 

There are 2 reasons why I was using  self.rawfile.read().split('|\n')
instead of self.rawfile.readlines()

- As you have seen, the line separator is not '\n' but its '|\n'.
Sometimes the data itself has '\n' characters in the middle of the line
and only way to find true end of the line is that previous character
should be a bar '|'. I was not able specify end of line using
readlines() function, but I could do it using split() function.
(One hack would be to readlines and combine them until I find '|\n'. is
there a cleaner way to do this?)
- Reading whole file at once and processing line by line was must
faster. Though speed is not of very important issue here but I think the
tie it took to parse complete file was reduced to one third of original
time.

Regards,
Alok


-Original Message-
From: Dave Angel [mailto:d...@davea.name] 
Sent: Monday, September 17, 2012 10:13 AM
To: Jadhav, Alok
Cc: python-list@python.org
Subject: Re: Python garbage collector/memory manager behaving strangely

On 09/16/2012 09:07 PM, Jadhav, Alok wrote:
> Hi Everyone,
>
>  
>
> I have a simple program which reads a large file containing few
million
> rows, parses each row (`numpy array`) and converts into an array of
> doubles (`python array`) and later writes into an `hdf5 file`. I
repeat
> this loop for multiple days. After reading each file, i delete all the
> objects and call garbage collector.  When I run the program, First day
> is parsed without any error but on the second day i get `MemoryError`.
I
> monitored the memory usage of my program, during first day of parsing,
> memory usage is around **1.5 GB**. When the first day parsing is
> finished, memory usage goes down to **50 MB**. Now when 2nd day starts
> and i try to read the lines from the file I get `MemoryError`.
Following
> is the output of the program.
>
>  
>
>  
>
> source file extracted at C:\rfadump\au\2012.08.07.txt
>
> parsing started
>
> current time: 2012-09-16 22:40:16.829000
>
> 50 lines parsed
>
> 100 lines parsed
>
> 150 lines parsed
>
> 200 lines parsed
>
> 250 lines parsed
>
> 300 lines parsed
>
> 350 lines parsed
>
> 400 lines parsed
>
> 450 lines parsed
>
> 500 lines parsed
>
> parsing done.
>
> end time is 2012-09-16 23:34:19.931000
>
> total time elapsed 0:54:03.102000
>
> repacking file
>
> done
>
> >
s:\users\aaj\projects\pythonhf\rfadumptohdf.py(132)generateFiles()
>
> -> while single_date <= self.end_date:
>
> (Pdb) c
>
> *** 2012-08-08 ***
>
> source file extracted at C:\rfadump\au\2012.08.08.txt
>
> cought an exception while generating file for day 2012-08-08.
>
> Traceback (most recent call last):
>
>   File "rfaDumpToHDF.py", line 175, in generateFile
>
> lines = self.rawfile.read().split('|\n')
>
> MemoryError
>
>  
>
> I am very sure that windows system task manager shows the memory usage
> as **50 MB** for this process. It looks like the garbage collector or
> memory manager for Python is not calculating the free memory
correctly.
> There should be lot of free memory but it thinks there is not enough. 
>
>  
>
> Any idea?
>
>  
>
> Thanks.
>
>  
>
>  
>
> Alok Jadhav
>
> CREDIT SUISSE AG
>
> GAT IT Hong Kong, KVAG 67
>
> International Commerce Centre | Hong Kong | Hong Kong
>
> Phone +852 2101 6274 | Mobile +852 9169 7172
>
> alok.jad...@credit-suisse.com | www.credit-suisse.com
> <http://www.credit-suisse.com/> 
>
>  
>

Don't blame CPython.  You're trying to do a read() of a large file,
which will result in a single large string.  Then you split it into
lines.  Why not just read it in as lines, in which case the large string
isn't necessary.   Take a look at the readlines() function.  Chances are
that even that is unnecessary, but i can't tell without seeing more of
the code.

  lines = self.rawfile.read().split('|\n')

   lines = self.rawfile.readlines()

When a single large item is being allocated, it's not enough to have
sufficient free space, the space also has to be contiguous.  After a
program runs for a while, its space naturally gets fragmented more and
more.  it's the nature of the C runtime, and CPython is stuck with it.



-- 

DaveA


=== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=== 

-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Python garbage collector/memory manager behaving strangely

2012-09-16 Thread Jadhav, Alok
I am thinking of calling a new subprocess which will do the memory
hungry job and then release the memory as specified in the link below

http://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-mem
ory-in-python/1316799#1316799

Regards,
Alok



-Original Message-
From: Dave Angel [mailto:d...@davea.name] 
Sent: Monday, September 17, 2012 10:13 AM
To: Jadhav, Alok
Cc: python-list@python.org
Subject: Re: Python garbage collector/memory manager behaving strangely

On 09/16/2012 09:07 PM, Jadhav, Alok wrote:
> Hi Everyone,
>
>  
>
> I have a simple program which reads a large file containing few
million
> rows, parses each row (`numpy array`) and converts into an array of
> doubles (`python array`) and later writes into an `hdf5 file`. I
repeat
> this loop for multiple days. After reading each file, i delete all the
> objects and call garbage collector.  When I run the program, First day
> is parsed without any error but on the second day i get `MemoryError`.
I
> monitored the memory usage of my program, during first day of parsing,
> memory usage is around **1.5 GB**. When the first day parsing is
> finished, memory usage goes down to **50 MB**. Now when 2nd day starts
> and i try to read the lines from the file I get `MemoryError`.
Following
> is the output of the program.
>
>  
>
>  
>
> source file extracted at C:\rfadump\au\2012.08.07.txt
>
> parsing started
>
> current time: 2012-09-16 22:40:16.829000
>
> 50 lines parsed
>
> 100 lines parsed
>
> 150 lines parsed
>
> 200 lines parsed
>
> 250 lines parsed
>
> 300 lines parsed
>
> 350 lines parsed
>
> 400 lines parsed
>
> 450 lines parsed
>
> 500 lines parsed
>
> parsing done.
>
> end time is 2012-09-16 23:34:19.931000
>
> total time elapsed 0:54:03.102000
>
> repacking file
>
> done
>
> >
s:\users\aaj\projects\pythonhf\rfadumptohdf.py(132)generateFiles()
>
> -> while single_date <= self.end_date:
>
> (Pdb) c
>
> *** 2012-08-08 ***
>
> source file extracted at C:\rfadump\au\2012.08.08.txt
>
> cought an exception while generating file for day 2012-08-08.
>
> Traceback (most recent call last):
>
>   File "rfaDumpToHDF.py", line 175, in generateFile
>
> lines = self.rawfile.read().split('|\n')
>
> MemoryError
>
>  
>
> I am very sure that windows system task manager shows the memory usage
> as **50 MB** for this process. It looks like the garbage collector or
> memory manager for Python is not calculating the free memory
correctly.
> There should be lot of free memory but it thinks there is not enough. 
>
>  
>
> Any idea?
>
>  
>
> Thanks.
>
>  
>
>  
>
> Alok Jadhav
>
> CREDIT SUISSE AG
>
> GAT IT Hong Kong, KVAG 67
>
> International Commerce Centre | Hong Kong | Hong Kong
>
> Phone +852 2101 6274 | Mobile +852 9169 7172
>
> alok.jad...@credit-suisse.com | www.credit-suisse.com
> <http://www.credit-suisse.com/> 
>
>  
>

Don't blame CPython.  You're trying to do a read() of a large file,
which will result in a single large string.  Then you split it into
lines.  Why not just read it in as lines, in which case the large string
isn't necessary.   Take a look at the readlines() function.  Chances are
that even that is unnecessary, but i can't tell without seeing more of
the code.

  lines = self.rawfile.read().split('|\n')

   lines = self.rawfile.readlines()

When a single large item is being allocated, it's not enough to have
sufficient free space, the space also has to be contiguous.  After a
program runs for a while, its space naturally gets fragmented more and
more.  it's the nature of the C runtime, and CPython is stuck with it.



-- 

DaveA


=== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=== 

-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Python garbage collector/memory manager behaving strangely

2012-09-17 Thread Jadhav, Alok
Thanks for your valuable inputs. This is very helpful. 


-Original Message-
From: Python-list
[mailto:python-list-bounces+alok.jadhav=credit-suisse@python.org] On
Behalf Of Dave Angel
Sent: Monday, September 17, 2012 6:47 PM
To: alex23
Cc: python-list@python.org
Subject: Re: Python garbage collector/memory manager behaving strangely

On 09/16/2012 11:25 PM, alex23 wrote:
> On Sep 17, 12:32 pm, "Jadhav, Alok" 
> wrote:
>> - As you have seen, the line separator is not '\n' but its '|\n'.
>> Sometimes the data itself has '\n' characters in the middle of the
line
>> and only way to find true end of the line is that previous character
>> should be a bar '|'. I was not able specify end of line using
>> readlines() function, but I could do it using split() function.
>> (One hack would be to readlines and combine them until I find '|\n'.
is
>> there a cleaner way to do this?)
> You can use a generator to take care of your readlines requirements:
>
> def readlines(f):
> lines = []
> while "f is not empty":
> line = f.readline()
> if not line: break
> if len(line) > 2 and line[-2:] == '|\n':
> lines.append(line)
> yield ''.join(lines)
> lines = []
> else:
> lines.append(line)

There's a few changes I'd make:
I'd change the name to something else, so as not to shadow the built-in,
and to make it clear in caller's code that it's not the built-in one.
I'd replace that compound if statement with
  if line.endswith("|\n":
I'd add a comment saying that partial lines at the end of file are
ignored.

>> - Reading whole file at once and processing line by line was must
>> faster. Though speed is not of very important issue here but I think
the
>> tie it took to parse complete file was reduced to one third of
original
>> time.

You don't say what it was faster than.  Chances are you went to the
other extreme, of doing a read() of 1 byte at a time.  Using Alex's
approach of a generator which in turn uses the readline() generator.

> With the readlines generator above, it'll read lines from the file
> until it has a complete "line" by your requirement, at which point
> it'll yield it. If you don't need the entire file in memory for the
> end result, you'll be able to process each "line" one at a time and
> perform whatever you need against it before asking for the next.
>
> with open(u'infile.txt','r') as infile:
> for line in readlines(infile):
> ...
>
> Generators are a very efficient way of processing large amounts of
> data. You can chain them together very easily:
>
> real_lines = readlines(infile)
> marker_lines = (l for l in real_lines if l.startswith('#'))
> every_second_marker = (l for i,l in enumerate(marker_lines) if (i
> +1) % 2 == 0)
> map(some_function, every_second_marker)
>
> The real_lines generator returns your definition of a line. The
> marker_lines generator filters out everything that doesn't start with
> #, while every_second_marker returns only half of those. (Yes, these
> could all be written as a single generator, but this is very useful
> for more complex pipelines).
>
> The big advantage of this approach is that nothing is read from the
> file into memory until map is called, and given the way they're
> chained together, only one of your lines should be in memory at any
> given time.


-- 

DaveA

-- 
http://mail.python.org/mailman/listinfo/python-list

=== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=== 

-- 
http://mail.python.org/mailman/listinfo/python-list