Hi Thomas, Our primary motivation was performance and secondary was a "pythonic" api. Our needs were simpler than the complexity of the whole lucene.facet package. On the Lucene side of things, it looks like we have something similar to CategoryPath (statically 2 deep: "/Field/Value") and FacetRequest (only allow searching at root level, optionally only on filtered docs set and fields). Specifically, we implemented an index/cache of all documents and their terms. As far as I know SOLR uses caching of the Lucene index to perform faceting.
Our implementation is based on http://lucene.apache.org/solr/api/org/apache/solr/request/UnInvertedField.html and the interface in Python is almost identical. You pass our object an IndexReader and by default all Terms with TermVectors are indexed. You can then selectively retrieve fields. Here's an example of use: http://pastebin.com/Lq3LZKMp. The whole module is ~2000 lines (python interface, c++ implementation, comments). With initial tests, the algorithm is about 100 faster in C++ than when implemented in Python. On Wed, Apr 18, 2012 at 9:31 AM, Thomas Koch <k...@orbiteam.de> wrote: > Hi, > sounds like an interesting project – may I ask what you actually > implemented and what’s the motivation (e.g. performance?)? > > I’ve started to experiment with the Facet support in Lucene (actually in > PyLucene – ported an example to Python) and found that facetted search > support in Lucene looks powerful (though API is still said to be > ‘experimental’ and I can’t say anything about performance yet). I’m > talking about the org.apache.lucene.facet.* packages – part of the contrib > part of Lucene and available as JARs that’s accessible in PyLucene as well. > I’m not that familiar with Solr but AFAIK it’s based on Lucene (Java) and > should (hopefully) use the same Java code for its facet search support. Of > course Solr adds some nice configuration support and web GUI to Lucene, but > the ‘core’ search is built on Lucene (to my knowledge). So did you > re-implement the Lucene facet search/index code (like > TaxonomyReader/Writer, FacetRequest stuff etc.) in C++ or what part of > Solr?? > > Regarding Facet support in PyLucene I can share the samples I’ve ‘ported’ > to Python so far. There’s still a patch pending for JavaList (required by > facet features) which I come back to later on this list (still some open > issues). Hopefully this can be included in the PyLucene 3.6 version … > > Regards > Thomas > -- > OrbiTeam Software GmbH & Co. KG > Germany http://www.orbiteam.de > > > Von: Caleb Burns [mailto:ca...@ridersdiscount.com] > Gesendet: Dienstag, 17. April 2012 21:16 > An: pylucene-dev@lucene.apache.org > Betreff: PyLucene use JCC shared object by default > > Hi, > > I've finished the process at my organization of re-implementing SOLR's > faceting algorithm (in C++). > > We would like the public at large to have access to the work we've done > and plan to do. In order for this to be a real possibility the code needs > to be built against and use the same JVM as the PyLucene installation does. > The most logical way we feel to have this accomplished is by having > PyLucenes' default installation use JCC as a Shared Object. > > We have yet more plans to extend and provide utilities that work with > PyLucene, but this all hinges on having the shared object. The only > alternative methodology would require the bundling of our source with the > PyLucene project itself as a fork. > > We are eager to start open sourcing our work, so please let us know what > would be the best way to integrate our work. > -- Caleb Burns Developer | Riders Discount