Levenshtein FST's?

2016-05-23 Thread Luke Nezda
Hello, all - I'd like to use Lucene's automaton/FST code to achieve fast fuzzy (OSA edit distance up to 2) search for many (10k+) strings (knowledge base: kb) in many large strings (docs). Approach I was thinking of: create Levenshtein FST with all paths associated with unedited form for each kb

RE: migrating to 6.0 -- how to apply filter to getSpans

2016-05-23 Thread Allison, Timothy B.
Solution (I think): create a weight for the searcher and then call "scorer" from that for each LeafReaderContext: Weight searcherWeight = searcher.createWeight(filter, false); for (LeafReaderContext ctx : searcher.getIndexReader().leaves()) { Scorer leafReaderContextScorer = searche

Re: Lucene indexing throughput (and Mike's lucenebench charts)

2016-05-23 Thread Michael McCandless
I finally dug into this, and it turns out the nightly benchmark I run had bad bottlenecks such that it couldn't feed documents quickly enough to Lucene to take advantage of the concurrent hardware in beast2. I fixed that and just re-ran the nightly run and it shows good gains: https://plus.google.

Migration from 4-7-0 to 6-0-0

2016-05-23 Thread Jean-Claude Dauphin
Hello, I have some difficulties to find out in which release some API changes were made and information about how to migrate. For example I was not able to find: 1) when the method FieldType.setIndexed(true) was dropped and how to change coding 2) Same for the method Query.extractTerms I would a