Re: Need Help: Business Scenario to lucene implementation

2011-09-01 Thread Saurabh Gokhale
Hi Grant, Thanks for the reply. I would definitely look into Solr Deduplication approch. But since I am using pure lucene and not Solr, I am not sure how feasible that would be to find something in lucene or try duplicating it. But thats looks to be the way forward. Also regarding the question a

Re: Need Help: Business Scenario to lucene implementation

2011-09-01 Thread Grant Ingersoll
I'd probably treat this as a deduplication problem and look to use a fuzzy matching approach, such as the TextProfileSignature in Solr/Nutch: http://wiki.apache.org/solr/Deduplication, which I believe is tunable as to it's threshold of acceptance. I'd also likely give pushback on the notion of

Re: Need Help: Business Scenario to lucene implementation

2011-08-31 Thread Saurabh Gokhale
Can some one pls help with the logic that can be applied to decide on the closeness requirement given below (like 50% matching). This matching is a pure text matching. Since the current lucene score does not translate into the percentage of closeness, is there anything else that can give this info

Need Help: Business Scenario to lucene implementation

2011-08-30 Thread Saurabh Gokhale
Hi All, I need your help to understand how I can have Lucene applied to the following business scenario. Question is in RED *Business Scenario:* Analyze newly created document "A" with existing documents in the system and if document A matches more than (similar to) 50% with any of the existing d