Hi,

You may look out for plagirism detectors.

My approach would be :
1. Hash all the keywords in one file and keep the count.
2. For each keyword in the other file , check if it exists in the hash table
, decrement its count. Also increment a counter which represents the
similarity between the two docs.

For percentage you might also count the total keywords in the second doc and
do "found keywords"/ total keywords.

On Wed, Jul 6, 2011 at 11:41 AM, Navneet Gupta <[email protected]>wrote:

> See diff documentation. It's an application of Longest Common
> Subsequence problem.
> http://en.wikipedia.org/wiki/Diff
>
> On Wed, Jul 6, 2011 at 11:12 AM, priyanshu <[email protected]>
> wrote:
> > What is the most efficient way to compare two text documents?? Also we
> > need to find the percentage by which they match..
> >
> > Thanks,
> > priyanshu
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> [email protected].
> > For more options, visit this group at
> http://groups.google.com/group/algogeeks?hl=en.
> >
> >
>
>
>
> --
> Navneet
>
> --
> You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/algogeeks?hl=en.
>
>


-- 
regards,
chinna.

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Reply via email to