Christoph Haas wrote: > On Thursday 03 August 2006 17:40, jay graves wrote: > > How hard would it be to create this nested structure? > Not hard. Instead of doing "INSERT INTO" I would add values to a dictionary > or list. That's even simpler. > > I've found > > pickling really large data structures doesn't really save a huge amount > > of time when reloading them from disk but YMMV and you would have to > > profile it to know for sure. > Okay, that takes a bit of pickle's magic away. :)
But since it is so easy to create your nested structure, it may be worth trying. I've rarely used pickled files and maybe my specific data structure caused alot of churn in the pickle/unpickle code. Doesn't hurt to try. You also need to try walking your data structure to see how easy/efficient it is to get the results you want. If you have to do a text search for every node, it might actually be slower. In the app I described, everytime i do a reload (equvalent to your parse step) I interrogate each row and update multiple dictionaries with differents sets of key tuples with the dictionary value being the row itself. (just like indexes in a SQL db) The row is the same object so the only extra memory I need is for the key tuples. It sure beats iterating over a list with 50K entries top to bottom and testing for the right condition, but I don't know your app so I can't tell if this is a valid strategy. > > > So the question is: would you rather force the data into a relational > > > database and write object-relational wrappers around it? Or would you > > > pickle it and load it later and work on the data? The latter > > > application is currently a CGI. I'm open to whatever. :) > > Convert your CGI to a persistant python webserver (I use CherryPy but > > you can pick whatever works for you.) and store the nested data > > structure globally. Reload/Reparse as necessary. It saves the > > pickle/unpickle step. > Up to now I have just used CGI. But that doesn't stop me from looking at > other web frameworks. However the reparsing as necessary makes a quick > query take 10-30 seconds. And my users usually query the database just > once every now and then and expect to have little delay. That time is not > very user-friendly. I'm not sure I made the advantages of a python web/app server clear. The main point of CherryPy or similar web frameworks is that since they are serving the HTTP requests (which are admittedly light), it can keep any data you want persistent because it is always running, not respawned on each request. (Caveat: These are very broad strokes and there are possible race conditions but no worse than your Postgres solution) Imagine if you will: fwdata = {} expiretime = 0 def loaddata(): global fwdata,expiretime temp = {} parse data into temp dictionary expiretime = now + 5 minutes fwdata = temp loaddata() while 1: handle http request if query: if now > expiretime: loaddata() query fwdata and build output html Does the underlying change every 5 minutes? If not, you could even be trickier and provide a 'reload' url that foces the app to reload the data. If you can track when the source data changes (maybe there are sanctioned interfaces to use when editing the data), just hit the appropriate reload URL and your app is always up to date without lots of needless reparsing. e.g. fwdata = {} def loaddata(): global fwdata temp = {} parse data into temp dictionary fwdata = temp loaddata() while 1: handle http request if queryrequest: query fwdata and build output html elif reloadrequest: loaddata() Hope this helps or clarifies my point. ... jay graves -- http://mail.python.org/mailman/listinfo/python-list