The data is pretty varied. Some documents are very small (order of a few k) while others can go over a few MBs. There are 20 fields created in the index currently. Half the fields use StandardAnalyzer, and half use a WhitespaceTokenizer coupled with a LowerCaseFilter.
The benchmark reads 1000 documents that are sampled from the data stream and then runs through a large number of queries that we have defined in the system. We do this 100 times to ensure the JVM is warm and take the average amount of time it takes to run all queries against each document. The code is run with v3.5 and v4.3. The newer version of Lucene is consistently around 25% slower using the same queries and sample documents. One thing I haven't done is to time each individual query to see if there are specific queries that are causing the increase or documents (e.g. if the larger documents are running slower now). Thanks, Chris -- View this message in context: http://lucene.472066.n3.nabble.com/MemoryIndex-in-Lucene-4-x-tp4077919p4078764.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org