The BoW approach is simple and highly effective IMO. If you want to get a
bit fancy, you could also use a MultiField query in the combined index.
Another brute-force approach would be to hit all 3 indexes and see which
ones come back with the highest score(s).
On Mon, Mar 9, 2009 at 8:43 AM, Er
Sure, Lucene is suited. If
The central problem here isn't the search engine, IMO, it's
figuring out what bits of the query are relevant to what
parts of the data. That is, in some random string, what is
the street, business name, address, etc.
Lucene has nothing built in that I know of that'l
Thanks for all the inputs guys.
As Erick said let me elaborate the problem a bit.
We are trying to develop a local search application. The user will be able
to locate businesses, localities and roads. We have data for all the 3 with
us. We do not want to provide separate boxes for the user to ent
Whatever you do will be wrong . What you're saying is
that you have structured data that the user wants to search
in an unstructured way, and you want to try to create a
system that intuits what the user meant. Good luck .
Can you back up a bit and talk about the problem you're
trying to solve? If
You could have single index file with all the names tagged at the time of
indexing. For the query parsing, you could have a lookup
for common words ending which identify the business names (like Corp, Inc,
LLC, Ltd, etc.) and common words like (road, avenue,
street, lane etc) for address and separ
Can you not make one index with all three types of name and just
search that? Sounds much easier. You might get a few funnies like
business Kingston on McDonald's street, but they'd be the exception.
--
Ian.
On Fri, Mar 6, 2009 at 6:25 AM, Srinivas Bharghav
wrote:
> I am trying to evaluate as
Hi Srinivas,
Perhaps what you need here is a query formation logic which assigns the
right keywords to the right fields. Let me know in case I got it wrong. One
way to do that could be by using index time boost for fields and then
running a query (so that a particular field is preferred over the o