Hi Vlasta. > > tags_lookups[tag][item_dict[tag]] = tags_lookups[tag].get(item_dict[tag], > set()) | set([idx]) > > I thought, whether I am not overestimating myself with respect to the future > maintaining of the code ... :-)
Here's a suggestion for readability and maintainability: Make a search() method (possibly on the object storing the text), which takes variable args. You'd declare it something like this: def search(self, **kwargs) And call it like this: text.search(line=123, VCC=1, etc) Probably it would return a list of results (line numbers in the original text?). Then inside the function build up the sets etc, and comment the logic liberally so you can understand it later. This approach should be much more readable the equivalent sqllite + SQL. > > The suggested XML structure is actually almost the one, I use to prepare > and control the input data before converting it to the one presented in the > previous mail :-). The main problem is, that I can't seem to make it fully > valid XML without deforming the structure of the text itself - it can't be > easily decided, what CUSTOM_TAG should be in some places - due to the > overlapping etc. I was originally going to suggest an overlapped tag format, similar to this: <b>Bold text<i>Bold and italicised</b>Italicised</i> But I changed my mind. It's seriously malformed :-) I think that for now, your format, where your marker values remain 'active' until changed to another value is the best way way to go. XML's problems with this sort of thing are a know limitation. See this link: http://en.wikipedia.org/wiki/XML#Disadvantages_of_XML "Expressing overlapping (non-hierarchical) node relationships requires extra effort" There is some research into the subject, but no major (ie widely-used) Python libraries as far as I can tell. Google for "overlapping markup" and "python overlapping markup" for more information. Here's one interesting page I found: http://www.wilmott.ca/python/xmlparser.html#OVERLAPPED It discusses the issue in detail, and comes with a Python implementation (I haven't checked it). David. -- http://mail.python.org/mailman/listinfo/python-list