Hi Look at the apache mohaut project (based on hadoop ). It seems you need machine learning algorithms.
Best Regards Alexander Aristov On 17 August 2011 20:39, Ian Lea <ian....@gmail.com> wrote: > Certainly sounds doable in lucene. Is it basically working apart from > false positives? Can you give some examples of the false positives? > > I'd be tempted to look at span queries which will let you say that > "Yesterday I put on my green plaid shirt" is a better match against > "Green plaid shirt with stripes" than "a plaid shirt that is green" > would. If that is what you want. See > http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ for > good info on span queries. > > As for misspellings, that is a separate issue. Google lucene > spellcheck. Or look at synonyms if you've got a list of alternatives. > > > -- > Ian. > > > On Wed, Aug 17, 2011 at 4:03 AM, Josh Rehman <j...@joshrehman.com> wrote: > > My organization is looking to solve a difficult problem, and I believe > that > > Lucene is a close fit (although perhaps it is not). However I'm not sure > > exactly how to approach this problem. > > > > The problem is this: given a small set of fixed noun phrases and a much > > larger set of human generated short sentences, determine whether the > > sentences refer to those noun phrases. For example, perhaps I have these > > noun phrases: > > > > 1. Bright yellow book > > 2. Large bulbous balloon > > 3. Green plaid shirt with stripes > > 4. Dark yellow book > > > > And these sentences: > > > > 1. Yesterday I put on my green plaid shirt. > > 2. Next week I'll sell my balloon. > > 3. Just finished my bright book. > > 4. Wondering at how lovely my baloon is [Note the misspelling] > > > > Given that list of sentences, I will generate (sentence, noun phrase) > > ordered pairs like this: > > 1,3 > > 2,2 > > 3,1 > > 4,2 > > > > Or even an ordered pair of (sentence, [noun phrases]). E.g. 3,[1,4] > (because > > there might be an ambiguous reference to "Book") > > > > The "shape" of this problem looks a lot like what Lucene does, but > frankly I > > don't have a lot of experience with textual indexing and search. I've > > installed Lucene and managed to index and search my data structures, > however > > with the StandardIndexer I'm getting a lot of false positives. > > > > Here is the code I have so far (I've elided the parsing code which is not > > very interesting): > > https://gist.github.com/1150723 > > > > Really appreciate any and all guidance. Thanks. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >