agc wrote: > Hi, > > I'm looking for a fast way of accessing some simple (structured) data. > > The data is like this: > Approx 6 - 10 GB simple XML files with the only elements > I really care about are the <title> and <article> ones. > > So what I'm hoping to do is put this data in a format so > that I can access it as fast as possible for a given request > (http request, Python web server) that specifies just the title, > and I return the article content. > > Is there some good format that is optimized for search for > just 1 attribute (title) and then returning the corresponding article? > > I've thought about putting this data in a SQLite database because > from what I know SQLite has very fast reads (no network latency, etc) > but not as fast writes, which is fine because I probably wont be doing > much writing (I wont ever care about the speed of any writes). > > So is a database the way to go, or is there some other, > more specialized format that would be better? >
"Database" without any further qualification indicates exact matching, which doesn't seem to be very practical in the context of titles of articles. There is an enormous body of literature on inexact/fuzzy matching, and lots of deployed applications -- it's not a Python-related question, really. -- http://mail.python.org/mailman/listinfo/python-list