File parser

2005-08-29 Thread Angelic Devil

I'm building a file parser but I have a problem I'm not sure how to
solve.  The files this will parse have the potential to be huge
(multiple GBs).  There are distinct sections of the file that I
want to read into separate dictionaries to perform different
operations on.  Each section has specific begin and end statements
like the following:

KEYWORD
.
.
.
END KEYWORD

The very first thing I do is read the entire file contents into a
string.  I then store the contents in a list, splitting on line ends
as follows:


file_lines = file_contents.split('\n')


Next, I build smaller lists from the different sections using the
begin and end keywords:


begin_index = file_lines.index(begin_keyword)
end_index = file_lines.index(end_keyword)
small_list = [ file_lines[begin_index + 1] : file_lines[end_index - 1] ]


I then plan on parsing each list to build the different dictionaries.
The problem is that one begin statement is a substring of another
begin statement as in the following example:


BAR
END BAR

FOOBAR
END FOOBAR


I can't just look for the line in the list that contains BAR because
FOOBAR might come first in the list.  My list would then look like

[foobar_1, foobar_2, ..., foobar_n, ..., bar_1, bar_2, ..., bar_m]

I don't really want to use regular expressions, but I don't see a way
to get around this without doing so.  Does anyone have any suggestions
on how to accomplish this? If regexps are the way to go, is there an
efficient way to parse the contents of a potentially large list using
regular expressions?

Any help is appreciated!

Thanks,
Aaron

-- 
"Tis better to be silent and be thought a fool, than to speak and
 remove all doubt."
-- Abraham Lincoln
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: File parser

2005-08-30 Thread Angelic Devil
"Rune Strand" <[EMAIL PROTECTED]> writes:


Thanks.  This shows definate promise.  I've already tailored it for
what I need, and it appears to be working.


-- 
"Society in every state is a blessing, but Government, even in its best
state, is but a necessary evil; in its worst state, an intolerable one."
-- Thomas Paine
-- 
http://mail.python.org/mailman/listinfo/python-list


Record separator for readlines()

2005-09-02 Thread Angelic Devil

I know this has been asked before (I already consulted the Google
Groups archive), but I have not seen a definative answer.  Is there a
way to change the record separator in readlines()?  The documentation
does not mention any way to do this.  I know way back in 1998, Guido
said he would consider adding it, but apparently that didn't happen.
Is there some way to do this?

-- 
"First they ignore you, then they laugh at you, then they fight you,
then you win."
   -- Mohandas Gandhi
-- 
http://mail.python.org/mailman/listinfo/python-list