Thanks! I wound up indexing both versions in the same index, and boosting the words that appeared in the "good word" list!
Thanks again for your advice! Matthew Hall-7 wrote: > > Erm, I likely should have mentioned that this technique requires the use > of a MultiFieldQueryParser. > > Matt > > Matthew Hall wrote: >> If you can build an analyzer that tokenizes the second field so that >> it filters out the words you don't want, you can then take advantage >> of more intelligent queries as well. >> >> >> So for the example that pjaol wrote, the query would become something >> like this: >> >> Query= body:(game OR redskins) keyword:(redskins)^10 >> >> Depending on your corpus this may or may not be possible, the >> determining factor being whether or not the list of words being >> removed from each document to create the second field varies. >> >> (more specifically I mean, in some documents you remove the word game, >> and in others you don't, if this is the case this technique won't work >> for you.) >> >> Matt >> >> >> >> theDude_2 wrote: >>> Ah, Interesting... I didnt think of that! I will try it and report back >>> >>> >>> pjaol wrote: >>> >>>> Why not put the keywords into the same document as another field? and >>>> search >>>> both fields >>>> at once, you can then use lucene syntax to give a boosting to the >>>> keyword >>>> fields. >>>> e.g. >>>> body:A good game last night by the redskins >>>> keyword: redskins >>>> >>>> Query= body:(game OR redskins) keyword:(game OR redskins)^10 >>>> >>>> And adjust the boosting until you're happy. >>>> Check out for querying multiple fields >>>> http://wiki.apache.org/lucene-java/LuceneFAQ#head-300f0756fdaa71f522c96a868351f716573f2d77 >>>> >>>> >>>> >>>> You might even want to consider Solr and it's dismax search component >>>> http://wiki.apache.org/solr/DisMaxRequestHandler >>>> to make it easier >>>> >>>> >>>> >>>> >>>> On Fri, Apr 17, 2009 at 11:19 AM, theDude_2 <aornst...@webmd.net> >>>> wrote: >>>> >>>> >>>>> I appreciate your response, and read the wiki article concerning the >>>>> Federated search >>>>> and >>>>> >>>>> I'm not sure that my project falls into the "Federated Search" >>>>> bucket... >>>>> >>>>> What I've done is created 2 indexes created with the same documents. >>>>> One index, contains the full documents - great for pure relevancy >>>>> search >>>>> The second index: contains all of the same documents, but a small >>>>> subset >>>>> of >>>>> each documents contents - only allowing words to be indexed that we >>>>> deem >>>>> as >>>>> "good words" - >>>>> >>>>> (for example) if this was a football article database >>>>> Index 1: would index 100% of the article about the Redskins and the >>>>> New >>>>> York >>>>> Giants >>>>> Index 2: would index the same article by only the "good words" in the >>>>> document like Redskins, Giants, Quarterback, Linebacker, etc. >>>>> >>>>> What I'm trying to do, if it's even possible! is run the search on >>>>> both >>>>> indexes containing references to the same article, and multiple the >>>>> scores >>>>> together to get a final score that would represent something like a >>>>> "relative AND good word" score.... >>>>> >>>>> Figuring that if a user searches on "Who is the Quarterback for the >>>>> Giants" >>>>> this will get the user an article that is both related to the >>>>> query, and >>>>> deemed "important" to the query... >>>>> >>>>> I will look further into federated search and related items, but I >>>>> think >>>>> that lucene probably wont be able to help me with this, am I right? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------ >>>>> >>>>> pjaol wrote: >>>>> >>>>>> I'd start by doing some research on the question rather than >>>>>> asking for >>>>>> >>>>> a >>>>> >>>>>> solution.. >>>>>> What your asking for can be considered 'Federated Search' >>>>>> http://en.wikipedia.org/wiki/Federated_search >>>>>> >>>>>> And it can be conceived in as many ways as you have document >>>>>> types. Any >>>>>> answer will probably end up >>>>>> customized and weighted by your document silo value, usually >>>>>> companies >>>>>> weight those by business rules >>>>>> rather than head down the path of federated search, as it's just >>>>>> >>>>> quicker >>>>> >>>>>> and >>>>>> cheaper, and you can accomplish more. >>>>>> e.g >>>>>> Medication = score *2 (as higher advertising incentives) >>>>>> Diseases = score >>>>>> Books = score * 0.75 ( thousands of books, which nobody buys etc..) >>>>>> >>>>>> You might also want to try consolidating your data into 1 schema, and >>>>>> consider layering or collapsing results >>>>>> based on type. >>>>>> >>>>>> P >>>>>> >>>>>> On Fri, Apr 17, 2009 at 10:39 AM, theDude_2 <aornst...@webmd.net> >>>>>> >>>>> wrote: >>>>> >>>>>>> (bump) - any thoughts? >>>>>>> ---- >>>>>>> >>>>>>> >>>>>>> >>>>>>> theDude_2 wrote: >>>>>>> >>>>>>>> hi! >>>>>>>> >>>>>>>> I am trying to do something a little unique... >>>>>>>> >>>>>>>> I have a 90k text documents that I am trying to search >>>>>>>> Search A: indexes and searches the documents using regular >>>>>>>> relevancy >>>>>>>> search >>>>>>>> Search B: indexes and searches the documents using a smaller subset >>>>>>>> >>>>> of >>>>> >>>>>>>> "key" words that I have chosen >>>>>>>> >>>>>>>> This gives me 2 seperate scores: Score A, and Score B... >>>>>>>> >>>>>>>> I am trying to show the top 10 results of the scores combined >>>>>>>> so.... >>>>>>>> >>>>>>>> FinalScoretextDoc = (scoreA_of_td1 * 0.5) * (scoreB_of_td1 * 0.5) >>>>>>>> >>>>>>>> While it seems straightforward, I do not want to calculate the >>>>>>>> >>>>> scores >>>>> >>>>>>> of >>>>>>> >>>>>>>> all the documents outside of lucene. How can I integrate this >>>>>>>> >>>>> better >>>>> >>>>>>> into >>>>>>> >>>>>>>> the lucene search engine? Is this possible to do by any simple >>>>>>>> >>>>> means? >>>>> >>>>>>>> Thanks guys + gals! >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> >>>>>>> >>>>> http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23098961.html >>>>> >>>>> >>>>> >>>>>>> Sent from the Lucene - Java Users mailing list archive at >>>>>>> Nabble.com. >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> >>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> -- >>>>> View this message in context: >>>>> http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23099744.html >>>>> >>>>> >>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>>> >>>>> >>>> >>> >>> >> >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23137164.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org