> Lucene started out at an avg 3ms but subsequent runs took it down > dramatically due to OS file caching. The all-in-memory hashset implementation > clearly did not demonstrate the same speed ups between runs.
I don't say the benchmark was wrong or anything, but this is surprising. I mean, the default HashSet impl. is a bucketed linked-list implementation. It made me wonder how the data was distributed. Even with OS file caching the in-memory data structure shouldn't fall short, at least intuitively. > I can make the code available but the data wouldn't be possible. > The English Wikipedia page titles are probably an equivalent size and shape > so I could try and package something up around that as a benchmarking tool > for others to play with. If you find a spare cycle, it'd be great, thanks! Dawid --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org