Re: matching multi-word terms

2011-03-12 Thread Erick Erickson
This looks like just a phrase query, perhaps with no slop. Term query definitely won't work if you've tokenized a the field, because your terms would be "A" and "B", but not "A B". SpanQueries should also work if you want, there's no reason to subclass anything, just use SpanNearQuery... You can

matching multi-word terms

2011-03-12 Thread Michael Wiegand
Hi, I would like to find documents matching multi-word terms, more specifically: my query is something like "B C" and I would like match contexts such as "A B C D E" but not "B A C D E" There seems to be some contradictory information on the web. Apparently, the statement Term t = new Term("

Re: Which is the +best +fast HTML parser/tokenizer that I can use with Lucene for indexing HTML content today ?

2011-03-12 Thread Trejkaz
On Fri, Mar 11, 2011 at 10:03 PM, shrinath.m wrote: > I am trying to index content withing certain HTML tags, how do I index it ? > Which is the best parser/tokenizer available to do this ? This doesn't really answer the question, but I think it will help... The features you want to look for: 1.