Simplifying Lucene 4 storage formats

Vitaly Funstein Tue, 26 Mar 2013 12:01:15 -0700

This is probably a pretty general inquiry, but I'm just exploring this as
an option at the moment.


It seems that Lucene 4 adds some freedom to define how data is actually
written to underlying storage by exposing the codec API. However, I find
the learning curve for understanding what bits to change quite steep, i.e.
one really needs to get into the guts of storage formats and how data in
these formats is actually consumed by search queries.

Is there some type of tutorial, possibly with code samples, that would
guide me through what needs to be done for specific use cases? Basically,
what I am looking for is the ability to "turn off" certain features of the
engine, creating a "lite" version of Lucene's codec that would both cut
down on the amount of data to persist while indexing, and on query
execution time. To be a bit more specific, the queries in my case do not go
beyond NumericRangeQuery, WildCardQuery and TermQuery types, so things like
similarities, boosts and scoring are not used. So obviously I want to
preserve the existing functionality while removing support for features I'm
not using (yet).

Thanks.

Simplifying Lucene 4 storage formats

Reply via email to