> > > Thank you very much for your comprehensive answer and detailed informations, > David; I really appreciate it!
You're welcome. > > As for the number of items, there would be approx. 34 000 calls of execute() > in my present code, in the final version probably more; I think executmany > is more efficient here, if there aren't any drawbacks. executemany is probably a good idea here. If memory becomes a problem at some point (eg: millions of lines) you'll probably want to use an on-disk database (I suggest postgresql), and read in batches of say 1000, which you save to the database with a single executemany. > I thought a database would perform much better in such cases (and the > preliminary tests confirm this); It sounds like your Python code has some serious (algorithm or data structure) problems. Python code (with appropriate dictionaries, caches, memoization etc) should always be able to perform better than an in-memory SQL database (even optimized with indexes, etc). You definitely need to check that. You don't even create any db indexes (in the code you gave), but sqllite still performs better than your structs which do use dicts! Basically what you're saying is that (the equivalent, but with SQL overhead) of a set of lists containing tuples, iwthout any dicts for lookup, performs better than your original sructure. Unless you actually need a database (too much data for memory, or you need persistant data, possibly for sharing between apps) I suggest sticking to Python structures, and optimizing your structures and look-up algorithms. One reasonable reason for you to use a database would be if maintaining your dicts (for fast lookup) became too complicated and you wanted the DBM to keep indexes automatically updated for you. > however now I see the possible risks too. > I'll look into other possible approaches. There will be clearly a finite set > of the column names, hence it is possible to define the db at the beginning > and fill it with values only after parsing the texts; the problem is, this > can't be done untill the texts are available, I just thought, a more generic > approach would be be more easily extensible and also a bit simpler. If you don't know in advance what the fields are, then how does your python app work? Are all queries in the user interface (that refer to this database) all arbitrary and initiated by a human? Also, how do you setup foreign relationships between tables, and indexes (for performance), if you don't know what fields are going to be present? Maybe this would be more clear (to me) if you gave a short example of the data, with a note or two to explain where there are performance problems. > the parsing the text to get the tags with their values are done in the > program itself What kind of tags? From your description it sounds like you have markers inside the text (basically bookmarks with arbitrary metadata, hence your need for dynamic schema), which the user can query, so they can quickly jump between parts of the text, based on their search results. Is this about right? > Just for information, what are valid table and column names in sqlite? > (Unfortunately, I couldn't find any reference for that (maybe except the > reserved sqlite_master); as quoted names, everything I tried, worked fine, > also whitespace, various unicode characters etc.; of course, I can imagine, I'm not sure about that. If you want to be safe: 1) Add a prefix to your table and field names 2) 'normalise' all table and field names (eg: convert to lower case, remove non-alphabetic characters, etc). 3) Make sure that you don't get the same 'normalised' name for 2 different incoming strings. 4) Be prepared to scrap the schema-generating approach if your app's database requirements change (eg: you need to share the db with other apps which have their own tables that you don't want to stomp over). David. -- http://mail.python.org/mailman/listinfo/python-list