Hi Grant,
Thanks for the reply.
I would definitely look into Solr Deduplication approch. But since I am
using pure lucene and not Solr, I am not sure how feasible that would be to
find something in lucene or try duplicating it. But thats looks to be the
way forward.
Also regarding the question a
I'd probably treat this as a deduplication problem and look to use a fuzzy
matching approach, such as the TextProfileSignature in Solr/Nutch:
http://wiki.apache.org/solr/Deduplication, which I believe is tunable as to
it's threshold of acceptance.
I'd also likely give pushback on the notion of
Can some one pls help with the logic that can be applied to decide on the
closeness requirement given below (like 50% matching). This matching is a
pure text matching.
Since the current lucene score does not translate into the percentage of
closeness, is there anything else that can give this info
Hi All,
I need your help to understand how I can have Lucene applied to the
following business scenario. Question is in RED
*Business Scenario:*
Analyze newly created document "A" with existing documents in the system and
if document A matches more than (similar to) 50% with any of the existing
d