Re: Faceted search with OpenBitSet/SortedVIntList

Sameer Maggon Sat, 07 Feb 2009 11:19:13 -0800

Did you look at Solr? It provides faceted search out of the box and isbuilt on top of Lucene.


Sameer.

On Feb 7, 2009, at 10:57 AM, Raffaella Ventaglio<r.ventag...@gmail.com> wrote:

Hi,

I am trying to implement a kind of faceted search using Lucene 2.4.0.

I have a list of configuration rules that tell me how to generate this
facets and the corresponding queries (that can range from simple term
queries to complex boolean queries).
When my application starts, it creates the whole set of facetsobjects and
initializes them.
For each facet:
- I create the query according to the configured rule;
- I ask the reader for the bitset corresponding to that query and Istore it
in the Facet object;
- I get the cardinality of the bitset and I save it in the Facetobject as
its "initial count".
When the user does a search I have to update the "counts" associatedto each
Facet:
- I get the bitset corresponding to the "query + filter" generatedby the
user search;
- I get the cardinality of the ("search bitset" AND "facet bitset")and I
save it as the updated count.
In my first solution, I used only "OpenBitSetDISI" objects, both forFacet
bitset and for search bitset.
So I could use "intersectionCount" method to get updated countsafter user
search.
This works very well and it is very fast, but when the number ofdocumentsin the index and the number of facets grows it is too memoryconsuming.
So I tried a different solution: when I create facet bitsets I usethe samerule applied in ChainedFilter/BooleanFilter to decide if I have tostore an
OpenBitSet or a SortedVIntList.
When I have to calculate updated counts:
- if the facet has an OpenBitSet, I use the "intersectionCount" method
directly;
- if the facet has a SortedVIntList, I first create a newOpenBitSetDISIusing the SortedVIntList.iterator and then I use the"intersectionCount"
method.
In this way, I use a smaller amount of memory at initializationtime, butfor each user search I create a large number of objects (that Isuddenlythrow away) and this affects application performance because itwastes a lot
of time doing GC.

So my question is: is there a better way to accomplish this task?
I think, it would be fine if I could calculate "intersectionCount"directlyon SortedVIntList objects, but I have not found nothing like that inLucene
2.4 JavaDoc.
Am I missing something?
As a reference, now my index contains more than 500.000 documentsand I have
to create/manage up to 50.000 facets.
Using "second solution", at initialization time my facets structurerequiresmore or less 120MB (and this is good enough), while updating countsit uses
even 2GB of memory (and this is very bad).

Thanks in advance,
Raf


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Faceted search with OpenBitSet/SortedVIntList

Reply via email to