RE: Multiline regex

Andreas Tawn Wed, 21 Jul 2010 09:44:25 -0700

>>> I could make it that simple, but that is also incredibly slow and on
>>> a file with several million lines, it takes somewhere in the league of
>>> half an hour to grab all the data. I need this to grab data from
>>> many many file and return the data quickly.
>>>
>>> Brandon L. Harris
>>>
>> That's surprising.
>>
>> I just made a file with 13 million lines of your data (447Mb) and
>> read it with my code. It took a little over 36 seconds. There must be
>> something different in your set up or the real data you've got.
>>
>> Cheers,
>>
>> Drea
>>
> Could it be that there isn't just that type of data in the file? there
> are many different types, that is just one that I'm trying to grab.
> 
> Brandon L. Harris


I don't see why it would make such a difference.

If your data looks like...

<block header>
\t<attribute>
\t<attribute>
\t<attribute>

Just change this line...

if line.startswith("createNode"):

to...

if not line.startswith("\t"):

and it won't care what sort of data the file contains.

Processing that data after you've collected it will still take a while, but 
that's the same whichever method you use to read it.

Cheers,

Drea

p.s. Just noticed I hadn't pre-declared the currentBlock list.
-- 
http://mail.python.org/mailman/listinfo/python-list

RE: Multiline regex

Reply via email to