Hi, I am trying to expand user queries to figure out potential document categories implied in the query. I wanted to know what was the best way to figure out the document category that is the most relevant to the query.
Let me explain further: I have created categories that are applied to documents I want to index. Some example categories are : Hotel Restaurant Fast Food Chinese Restaurant Church Bank Gas station I also am trying to create category aliases such as Chinese food can also be named Chinese restaurant with the same category unique ID. The documents I index have 1 primary category and 1...N secondary categories such as : McDonalds will be categorized under Fast food as a primary category but also under Restaurant as secondary category. The London Pub at the corner of my street will be categorized as Pub as primary category and also as Bar, Food and Beverages, Restaurant, and Fast Food (since then also have takeaway burgers ;). This all gives me a set of categories that are quite clearly identified, as well as a set of category aliases even though I'm aware that I can't figure out all the possible aliases of all my categories. At least I have the most obvious ones. Now with all this, I wanted to know, with the help of Lucene (or any other efficient method), how I could figure out the most relevant category (if any) behind a user query. For example : If my user looks for : "Chang's chinese restaurant" the obvious category should be "Chinese Restaurant" but if my user looks for "chines restauran" (misspelled) the category should also be "Chinese Restaurant" (such as google is capable of doing) OR "chinese bistro" should probably also return me the category "Chinese Restaurant" since bistro is a very similar concept to "Retaurant" ... Once the category is identified I can then query the index for documents that match that category the best. What is the proper way to identify the most relevant category in a user query based on the above ? Should I consider any other better approach ? Any help appreciated. Many thanks Alex.