Re: disable field length normalization on specific fields?

2016-03-28 Thread Chris Hostetter
yep, just use a customied similarity that doesn't include a length factor when computing the norm. If you are currently using TFIDFSimilarity (or one of it's subclasses) then the computeNorm method delegates to a lengthNorm method, and you can override that to return "1" for fields with a cert

Re: Single string automaton causes NPE on Terms.intersect( CompiledAutomaton, BytesRef term )

2016-03-28 Thread José Tomás Atria
Hi Mike, I'd be happy to, but I have never used JIRA before and I don't entirely understand what you mean by adding a test case as a patch (academic programmer here, we are notoriously ignorant of established development practices :P). thanks! jta On Fri, Mar 25, 2016 at 7:54 PM, Michael McCandl

Re: Compression algorithm for posting lists

2016-03-28 Thread Vishwas Jain
Thanks for the reply and information. I have some doubts regarding the implemenation of lucene54 codec when writing the posting lists using the lucene50 postinglistwriter while going through the code. What exactly does the finish() method in the TermsWriter class of the BlockTreeTerms

Re: Compression algorithm for posting lists

2016-03-28 Thread Vishwas Jain
Thanks for the reply and information. I have some doubts regarding the implemenation of lucene54 codec when writing the posting lists using the lucene50 postinglistwriter while going through the code. What exactly does the finish() method in the TermsWriter class of the BlockTreeTerms

Re: Compression algorithm for posting lists

2016-03-28 Thread Greg Bowyer
The posting list is compressed using a specialised technique aimed at pure numbers. Currently the codec uses a variant of Patched Frame of Reference coding to perform this compression. A good survey of such techniques can be found in the good IR books (https://mitpress.mit.edu/books/information-r

Compression algorithm for posting lists

2016-03-28 Thread Vishwas Jain
Hello , We are trying to implement better compression techniques in lucene54 codec of Apache Lucene. Currently there is no such compression for posting lists in lucene54 codec but LZ4 compression technique is used for stored fields. Does anyone know why there is no compression technique