Rob, look at the third hit:
http://www.lucenebook.com/search?query=bi-grams
Otis
- Original Message
From: Rob Young <[EMAIL PROTECTED]>
> That sounds like just what I'm looking for. Do you know if this is
> covered in Lucene in Action or where I can find more information about it.
E
using overlapping n-grams. Searching the list archive may
give you some background if Lucene in Action doesn't have enough info on this
topic.
-Original Message-
From: Rob Young [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 11, 2006 11:39 AM
To: java-user@lucene.apache.org
Subject: Re:
That sounds like just what I'm looking for. Do you know if this is
covered in Lucene in Action or where I can find more information about it.
Eric Isakson wrote:
You might consider using overlapping bi-gram tokenization with stripped out
whitespace and a PhraseQuery.
So your tokenized conten
D student at University of Trento, ITALY
==
- Original Message -
From: "Eric Isakson" <[EMAIL PROTECTED]>
To:
Sent: Thursday, May 11, 2006 3:54 PM
Subject: RE: Searching across spaces
You might consider using overlapping bi-gram tokenization with strippe
You might consider using overlapping bi-gram tokenization with stripped out
whitespace and a PhraseQuery.
So your tokenized content, "spongebob squarepants", would look like:
sp po on ng ge eb bo ob bs sq qu ua ar re ep pa an nt ts
and your tokens for your query, "sponge bob", would look like
Yes, I looked at the synonym sollution from Lucene in Action but, as
you point out, I have to know about it ahead of time. The only
sollution I've had so far is to index the term without the spaces as
well and then run two searches, one with spaces and one without. It
would work but it just seems
I suspect you have to do some fancy indexing. That is, index the following
terms: sponge bob square pants spongebob squarepants.
But this requires that you understand all the variations you want to hit on
ahead of time.
Or, you could conceivably deal with wildcard queries, but I think this is
th