This is probably a pretty general inquiry, but I'm just exploring this as an option at the moment.
It seems that Lucene 4 adds some freedom to define how data is actually written to underlying storage by exposing the codec API. However, I find the learning curve for understanding what bits to change quite steep, i.e. one really needs to get into the guts of storage formats and how data in these formats is actually consumed by search queries. Is there some type of tutorial, possibly with code samples, that would guide me through what needs to be done for specific use cases? Basically, what I am looking for is the ability to "turn off" certain features of the engine, creating a "lite" version of Lucene's codec that would both cut down on the amount of data to persist while indexing, and on query execution time. To be a bit more specific, the queries in my case do not go beyond NumericRangeQuery, WildCardQuery and TermQuery types, so things like similarities, boosts and scoring are not used. So obviously I want to preserve the existing functionality while removing support for features I'm not using (yet). Thanks.