Hi all, Firstly, I do apologize about this quite a long post without a concrete programming question - I'd like to ask about the availability of some tools or methods/concepts in python environment I am probably missing, hence I thought, I should describe my task in some detail as well as the ways I've tried sofar.
I am preparing an digital edition of a set of texts (probably with a wxpython gui in the final form, not web-based (yet)); however the most important decision just now seems to be the way, how to store the textual data. I need to keep track of various additional parameters for specific portions of text ( e.g . text source, chapter, verse number, folio of the manuscript, some alternative numberings etc.) First I thought, XML would be suitable, as there are some standards in presenting old text this way; but trying this out I encountered several problems: I didn't find any suitable gui widget capable of a graphical (formated) displaying of XML as well as exporting it e.g. as rtf. but more important, the texts I have cannot be easily treated in a hierarchical manner required for XML, as there are overlappings between the portions of texts beeing described. ( e.g. chapter vs folio boundaries). An option I tried with XML was to split the text into the smallest relatively "unproblematic" units and repeat most of the attributes for each element - but in my opinion this way the usual benefits of XML structures are lost and the redundancy of such a format is hardly acceptable. Moreover the displaying isn't adressed this way I also would like to have a search function for this set of texts, the additional data should be preserved in the results. For now I ended up with a kind of pseudo-markup similar to XML, where the tags aren't structured, they only determine the changing parameters while allowing overlapping (the tag set is more or less hardcoded, the nesting of tags with the same name is not allowed). Currently the implementation is surely far from ideal - based on regexp parsing of the tags, storing of their values along with the text index in a dict. After writing the raw text to the widgets (wx - TextCtrl) the styling can be applied according to the data in the dict, also the additional informations for any given position can be retrieved ( e.g. numbering - currently for displaying - in future the synchronisation of multiple interrelated texts should be possible). Basically this concept works for me somehow, including the text search, which can benefit from the regular expression support directly, but I feel, this is not quite as effective or straightforward as I would like. Especially I would like to have some more general solution rather than an ad hoc format, which isn't very versatile. I was wondering if someone would have any suggestions for dealing with such tasks. Am I maybe mistaken in my assumptions (non suitability of XML ...) or am I missing some tools or techniques usable for this? Any hints are much appreciated, regards, Vlasta
-- http://mail.python.org/mailman/listinfo/python-list