>> >> First of all, ask yourself, what is the size of your dataset? Can >> you fit it all into memory? >> If so, you have the full power of lisp at your command in dealing >> with the querying. You >> will not have to write any macros to do this. You might find the DCM >> package, in the "contrib" >> directory, a useful package, although it does not address querying; >> it is more of a cache handling >> issue. (DCM has only been tested under SBCL, as far as I know.) > > In general, I do think that the dataset fits in memory. However, we > have not fully loaded all of our data into Elephant. Simply looking at > the MySQL file storage for the database, it occupies 900MB of disk > space, including indices. However, when we did loads of some of the > tables into Elephant, the size of the Elephant data files were, at > least, 5 times the size. I don't know the reason why. I don't know if > that's the nature of BDB when it stores "arbitrarily" any type of > object in the k,v pair. I don't know if we had some circular > references when we loaded our model and that just simply increased the > data file by that much (although I wouldn't think so, since if that > was the case, I would expect that it only stored references to objects > and not duplicating the objects). Regardless, our dev server currently > has 4GB of RAM. I think that once properly loaded, all the data should > be able to fit in memory. Were you storing persistent-metaclass objects or simply normal objects? Normal objects are huge relative to persistent objects as currently all the slot names are also serialized (potentially in 32-bit unicode if on SBCL). I have some improvements planned to reduce the storage and processing overhead for strings, but it will take some evaluation to make sure performance doesn't suffer over much. Some of this will make it into the forthcoming 0.6.1 release I'm trying to finish a bunch of work for prior to my twin daughters are born.
Also SBCL strings in the 0.6.0 release are stored in their internal 32-bit form so we can memcpy them back and forth between elephant and the internal format. The new serializer will store them in the minimal format and then convert back to the 32-bit internal form for unicode enabled SBCL. > Now, for the target application, I prefer not to rely in the data > fitting in memory. Reason being is that the nature of the application > requires the data to be available for several years. This 900MB is the > results of only 1 year for one company. As we get more companies to > use the application and keep the data online for several years, the > assumption of the data fitting into memory will no longer be applicable. See my delayed e-mail on a hybrid query approach... > I have been looking into the DCM package and I think that it certainly > looks promising. We haven't used it yet, but certainly hope that > sooner, rather than later, will be made part of Elephant permanently. > I also hope that it's not targeted mainly at the in-memory database > type of application, but rather, as an efficient caching mechanism for > persistent data (regardless of where it's being permanently stored). DCM is, as I understand it, a form of prevalence with explicit checkpointing using Elephant as the backend. > With regards to: "...If so, you have the full power of lisp at your > command in dealing with the querying...", I agree with you. However, > where I'm trying to get at is how "easy" would it be to generate these > type of dynamic queries in a generic way. Of course, we could always > hard code all the cases for each of our different searchable screens, > but the thought of that simply just makes me vomit :) Associate a constraint on the display with the user action for selections and an ordering function for specific columns. If you want to sort by multiple columns, then compose your functions: Selection constraint fn: #'(lambda (obj) (= (slot-accessor obj) value) Sorting fn: #'(lambda (obj1 obj2) (string< (name obj1) (name obj2))) #'(lambda (obj1 obj2) (< (age obj1) (age obj2))) Composed sorting function: (defun compose-sorts (sort1 sort2) (lambda (o1 o2) (or (funcall sort1 o1 o2) (funcall sort2 o1 o2)))) Same thing with constraints: (defun compose-constraint (c1 c2) (lambda (obj) (and (funcall c1 obj) (funcall c2 obj)))) Or to be a little more fancy: (defun compose-constraints (clist) (lambda (obj) (every #'(lambda (c) (funcall c obj)) clist))) (defun compose-sorts (slist) (lambda (o1 o2) (some #'(lambda (sort) (funcall sort o1 o2))))) This all comes down to: (defun select-results (objects) (select-if (compose-constraints (make-name-constraints "Bob") (make-age-constraint 18 120)) objects)) (defun order-results (objects) (sort (copy-list objects) (compose-sorts #'by-occupation #'by-name #'by-date-of-birth))) This way you can have a set of widgets with static constraints and then you can compose functions over the constraints based on the dynamic query. There may be more elegant ways to do this but I think this is what Robert was driving at. To do this on elephant simply requires that you (get-instances-by-class 'person) to get all instances of person and then run the lisp query over it. If you access person alot, these persistent references will be cached in memory. Or they're all in the cache including slot values in DCM - in which case these queries will run really fast! >> >> Under SBCL, when it comes to sorting you have "sort" and >> "stable-sort"; I think these are build in. >> I'll eat a candle if you don't find them to blazingly fast (although >> the predicates that you pass them >> might take some time.) > > I thought the answer to my sorting question is exactly addressed by > your comment. I suppose that once I have the resulting dataset, I > could run it by "sort". They key would be how to make it arbitrarily > sortable (in a similar way as the dynamic query) Sorting should be easy to do in this manner once you've found your target group of objects and when you've cached all the slot values in-memory DCM style. My proposal includes some sorting in the retrieval to avoid accessing and serializing the slot values. Every step of a cursor over a secondary index deserializes the key, the primary BTree key and the value (the persistent object record) from that primary BTree. This does not deserialize the slot values so is reasonably fast. >> >> I think really the only good way to answer this question in a deeper >> way is to provide some >> example code. I do exactly what you are talking about in my >> application (http://konsenti.com), >> (although I use DCM), so I ought to be able to produce an example >> program relatively quickly. >> You'll have to figure out how to map the GUI into those requests >> yourself, however. >> > > I believe (and hope) that we should have no problem mapping the GUI to > the requests. Just out of curiosity (and I don't mean to divert from > the topic of this thread): if you're using DCM for your konsenti (BTW, > nice concept) site, how do you protect your in-memory data? Do you > just write an image to disk every once in a while for back ups? How > resilient is this to hardware failure and you loosing data since the > last image (if that's your approach)? > Every time you finish or 'commit' a user transaction (form submit), just write the object data back to the Elephant persistent store before acking to the user. Ian _______________________________________________ elephant-devel site list elephant-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/elephant-devel