Hi Smylers, > > XML::Index::DataGuide > > Personally I don't think I would have guessed what "data guide" is and > understood the purpose of your module just from the name -- but that > might just be me, and I can't think of a better name. > > > XML::Index::CADG (Content Aware DG) There has been the indexing technology called "DataGuide" before. It works basically like this: You have an index tree that leads to keyword occurences in documents and you have a flat (inverted file) index that leads to keyword occurences in documents. The problem is that at query time a big and expensive join has to be made and this slows down query processing in a manner linear to the number of documents you have in your whole index system - or depending on the properties of your query.
The CADG applies a special way of document processing by generating an "annotated" index tree that allows to prune out all the irrelevant paths and thus speeds up semistructured search by a factor of up to 600 times, off course, depending on the type of the query. > If that is a specific type of data guide then it should be named > XML::Index::DataGuide::CA (or whatever) to indicate that -- otherwise > there's nothing linking the "DG" in the second module with the first > one. > > This does mean that very specific modules do end up with rather long > names, but generally they don't have to be typed very often (the use > line, plus in the constructor for OO modules), and in the long run a > meaningful name is worth more than a few keystrokes. I don't object to long names. Though, I think if we open such a namespace, we should follow it by the technology (CADG). And below that we should deploy the according methods (like the constructor, the "add", the "search" and possibly some servicing methods). The terminology "CADG" does not originate from me but from the consortium of authors (colleagues) from the Institute of Computational Linguistics and the Institute of Computing Sciences of Munich University . Regards, Markus
