Your search service sounds very interesting. I once also thought about a similar application for shopping.
2009/12/21 fei liu <liufeipeki...@gmail.com> > that's a good idea. This approad will work. > > Supposing someone submit a query "nice rice restaurant", I think he/she > wants to find a chinese restaurant. But you will find the category of > "restaurant" retrieved with higher score than "chinese restaurant". > Usually you may find no category retrieved by your approach, because users' > query doesn't share the same words with your categories' name. > But you could have a try. If you have interesting results, let me know. > Thanks! > > That's my opinion. > > 2009/12/21 Alex <azli...@gmail.com> > > > Hi ! > > > > Many thanks to both of you for your suggestions and answers! > > > > What Weiwei Wang suggests is a part of the solution I am willing to > > implement. I will definitely use the suggest-as-you-type approach in the > > query form as it will allow for pre-emptive disambiguation and I believe, > > will give very satisfying results. > > > > However, search users are wild beasts and I can't count on them to always > > use the given suggestions. All I can count on is very erratic, sparse and > > ambiguous queries :) So I need an almost fool proof solution. > > > > To answer your question : > > "BTW, I do not understand why you need to know the category of user > input" > > I am trying to understand the user intent behind the query to filter out > > results based on a given category of locations. If a user queries "Fast > > Food > > in Nanjing" I don't want to return all the documents that contain the > words > > "Fast" and "Food" and "Nanjing". I use a custom algorithm to figure out > the > > intended location first. Then using the Spatial contrib I filter out the > > results based on a given area that was identified earlier. Finally I sort > > the results according to distance from the location point / centroid > found > > earlier. > > > > Identifying the category allows me two things : > > > > 1) Filter out irrelevant results : I dont want my resultset to include a > > Supermarket in "Nanjing" where the "food" is fresh and service is "fast" > > just because the query words were included in the description of the > > location. Since I am using custom, distance based, sorting of the > results, > > I > > can't afford to have the supermarket be the top result because it is the > > closest to the location centroid identified earlier. The user intent was > > clearly "fast food" and not a supermarket ! > > > > 2) Understand user intent to provide targetted advertizing. > > > > 3) Understanding the category of location a user is looking for also > allows > > to calculate more accurately the bounding box = the maximum distance at > > which the location should be located to be relevant to the user. A user > > looking for Pizza in New York is expecting to have his results within a > > radius of a maximum of 1 or 2 miles. If he is looking for a Theme Park he > > will probably be willing to go further away to find it. So identifying > the > > category of the location the user is looking for lets me calculate the > > didstance radius more acurately. > > > > > > > > > > Fei Liu > > > > Thanks a lot for the papers you pointed me to. I cam accross them earlier > > in > > my research and re-reading them gave me new insights. However I believe > > that > > the Two steps approach you are recomending is not very viable under heavy > > load as it requires two passes on the index. > > I believe however that Identifying the dominant category(ies) of the > > resultset when no category could be clearly identified using the query > > alone, can be very valuable if sent back to the user as an information > and > > a > > category link ! > > > > Now what I think I will do to pre-emptively identify the location > > category(ies) implied in the query : > > > > 1 - use my own custom category set and index their names using the > synonym > > analyzer provided with Lucene and also use some sort of normalization > such > > as stemmin. maybe also using snowball analyzer. > > 2 - break the query into Shingles (word based grams) and analyze each > > shingle using the analyzers that were used in (1). then query Lucene with > > these analyzed shingles against the category index built earlier. > > > > Hopefully the category with the highest Lucene score should be the one > > intended by the user.... > > > > Later on, I also intend to use some sort of training based approach using > > search queries that would have been tagged with the relevant location > > categories. > > > > What do you guys think ? > > > > Would this be a viable approach ? > > > > Thanks for all ! > > > > > > Cheers > > > > Alex > > > > > > -- > ------------------------- > Liu Fei(刘飞) > Institute of Software > School of Electronics Engineering & Computer Science > Peking University > Beijing, 100871, PRC. > -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Homepage: http://cs.nju.edu.cn/rl/weiweiwang