Imaginationworks wrote:
Hi,

I am trying to read object information from a text file (approx.
30,000 lines) with the following format, each line corresponds to a
line in the text file.  Currently, the whole file was read into a
string list using readlines(), then use for loop to search the "= {"
and "};" to determine the Object, SubObject,and SubSubObject. My
questions are

1) Is there any efficient method that I can search the whole string
list to find the location of the tokens(such as '= {' or '};'

Yes. Read the *whole* file into a single string using file.read() method, and then search through the string using string methods (for simple things) or use re, the regular expression module, (for more complex searches). Note: There is a point where a file becomes large enough that reading the whole file into memory at once (either as a single string or as a list of strings) is foolish. However, 30,000 lines doesn't push that boundary.
2) Is there any efficient ways to extract the object information you
may suggest?

Again, the re module has nice ways to find a pattern, and return parse out pieces of it. Building a good regular expression takes time, experience, and a bit of black magic... To do so for this case, we might need more knowledge of your format. Also regular expressions have their limits. For instance, if the sub objects can nest to any level, then in fact, regular expressions alone can't solve the whole problem, and you'll need a more robust parser.


Thanks,

- Jeremy



===== Structured text file =================
Object1 = {

...

SubObject1 = {
....

SubSubObject1 = {
...
};
};

SubObject2 = {
....

SubSubObject21 = {
...
};
};

SubObjectN = {
....

SubSubObjectN = {
...
};
};
};

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to