Hello, here is a trouble that i had, i would like to resolve it with python, even if i still have no clue on how to do it.
i had many small "text" files, so to speed up processes on them, i used to copy them inside a huge one adding some king of xml separator : <file name="..."> [content] </file> content is tab separated data (columns) ; data are strings now here come the tricky part for me : i would like to be able to create some kind of matching rules, using regular expressions, rules should match data on one line (the smallest data unit for me) or a set of lines, say for example : if on this line , match first column against this regexp and match second column and on following line match third column -> trigger something so, here is how i had tried : - having all the rules, - build some kind of analyzer for each rule, - keep size of longest one L, - then read each line of the huge file one by one, - inside a "file", create all the subsets of length <= L - for each analyzer see if it matches any of the subsets - if it occurs... my trouble is here : "for each analyzer see if it matches any of the subset" it is really to slow, i had many many rules, and as it is "for loop inside for loop", and inside each rule also "for loop on subsets lines" i need to speed up that, have you any idea ? i am thinking of having "only rules for one line" and to keep traces of if a rule is a "ending one" (to trigger something) , or a "must continue" , but is still unclear to me for now... a great thing could also have been some sort of dict with regexp keys... (and actually it would be great if i could also use some kind of regexp operator to tell one can skip the content of 0 to n lines before matching, just as if in the example i had changed "following..." by "skip at least 2 lines and match third column on next line - it would be great, but i still have really no idea on how to even think about that) great thx to anybody who could help, best -- http://mail.python.org/mailman/listinfo/python-list