The data is pretty varied. Some documents are very small (order of a few k)
while others can go over a few MBs. There are 20 fields created in the index
currently. Half the fields use StandardAnalyzer, and half use a
WhitespaceTokenizer coupled with a LowerCaseFilter.
The benchmark reads 1000 docu
I use Lucene/MemoryIndex for a large number of queries against data in a
streaming system. I'm looking to upgrade from v3.5 to 4.x, but it seems that
using MemoryIndex is roughly 25% slower based on a benchmark I built using
our internal queries and a sample of 1000 documents to run against. I
hav