I have Section 4.4.1 of SICP rattling around in my head (database queries), and I'm trying to come up with a simple dictionary-based database in Python to represent circuit diagrams. My main confusion isn't one of implementation, but a matter of "big thinking", fundamentally, about the problem. Please don't suggest using a SQL library, as I'm looking to learn to fish, so to speak, and to learn a bit about the biology of fish.
I've subclassed dict to hdict ("hashable dict") and rewritten the __hash__ function so I can include a dict into a set. Thanks to a previous poster here for providing that suggestion. A circuit component looks like this for example: comp1 = hdict(value=1e-6, footprint='0402', vendor='digikey') comp2 = hdict(value=10e3, footprint='0603', vendor='mouser') etc, etc. The database holds the values like this: db = dict() # normal dict db['value'] = ([1e-6, 10e3], [comp1, comp2]) #ordered set for fast lookup/insertion db['footprint'] = {'0402':set[comp1], '0603':comp2} # unordered uses normal dict for fast lookup db['vendor'] = {'digikey':[comp1], 'mouser':[comp2]} So basically the keys are the component parameters, and the values is the list of components with that value. Stuff that is comparable is ordered; stuff that is discrete is not ordered, using either 2-tuples or dicts, respectively. This allows extremely fast lookup of components based on their properties, with O(1) performance for non-ordered stuff, and O(log n) performance for ordered stuff (using bisect methods). The set operations are extremely fast, so I can do AND and OR operations on compound queries of this nature without worry. I need this speed not so much for selecting components when the user types in a query, but for when the mouse is hovering over the schematic and I need things to light up underneath, if the GUI is generating hundreds of mouse messages per second, and for inspector windows to appear quickly when you click, etc. If you have ever used Altium, you can see how effective this is in terms of creating a good interactive user experience. My question is what happens when I choose to search for NOT footprint='0402'. Should this return a blank list? This happens by default., and is in fact true: a blank list satisfies "not anything" actually. Should this return everything that is NOT footprint 0402 (ie returns 0603 components)? This *strongly implies* a pre-selection of *all* objects before performing the negating function, or of looking at the ordering of other queries in the overall search term, and then applying NOT to the results of another query. But I'm suspicious of a brute force preselection of all objects whenever I see a NOT, and anyway it messes up the clean query/combiner method of search I'm doing, and it requires an implied sequence of search, where I'm pretty sure it should not rely on sequencing. Even though this is single threaded, etc., the semantics of the query should not rely on ordering of the search term: footprint='0402' and NOT vendor='mouser' should return the same as NOT vendor='mouser' and footprint='0402'. So this is my philosophical quandary. I'm not sure what the correct thing is. In SICP they are using nondeterministic stuff which I don't quite get, so it's hard to follow. Also they are not using dictionaries and hashes, so I'm not sure if their generate-and-test method would work here anyway. Generate-and-test seems extremely inefficient. Can a wise guru please enlighten me? thanks Michael -- http://mail.python.org/mailman/listinfo/python-list