: For a number of years I've been doing this for some time by creating a : RAMDirectory, creating a document for one of the sentence and then doing a : search using the other sentence and seeing if we get a good match. This has : worked reasonably well but since improving the performance of other parts of : the application this part has become a performance bottleneck, not that : suprising as Im creating all these objects just for a one off search, and I : have to do this for many sentence pairs.
i'm not an academic, and i don't want to undermine the very goal specific advice given by other folks in this thread - but i do wnat ot point out that if you are doing *lots* of comparisons like this, then building a RAMDirectory for each and every "known" song title to comare with each and every "new" song title is already a super inefficient use of lucene. if instead you built and *kept* a lucene index containing all known song titles (one per doc) and then queried it for each "new" song title that came in you'd probably find yourself with a much more efficient solution w/o needing to spend a lot of time investigating new algorithms. -Hoss http://www.lucidworks.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org