Advice on comparing two documents. Summary This project is not a search engine but a semantic comparison between two documents. The purpose of this application is to assist users in modifying the text in a document to improve the relevancy rank of the document to another document. For example, the user would want to compare Document A to Document B to identify the text in Document A that has relevancy to Document B. Then, the user would want the ability to identify the text to modify to improve the relevancy rating. Description: Both documents are XML with tags identifying the keywords or blocks of text in the document.
Sample Structure Document A <DocumentName>DocumentA</DocumentName> <Keyword>This is keyword 1</Keyword> <Keyword>Keywords can be any length</Keyword> <Keyword>Some keywords will match Document B</Keyword> <Keyword>Some keywords will not match</Keyword> <Keyword>Keywords can contain text, numbers, and symbols</Keyword> Document B <DocumentName>DocumentB</DocumentName> <Keyword>This is Document B keyword 1</Keyword> <Keyword>Document B serves as the basis or standard for comparing</Keyword> <Keyword>Document A will be modified by the user to match the keywords in Document B</Keyword> <Keyword>Document A and Document B will always be compared to each other</Keyword> <Keyword>This application is to help users add text, numbers and symbols to improve their relevancy ranking</Keyword> We believe we need to use Lucene to do semantic searches to determine relevance. Our preferred output would be to show a user the words from each document with their relevancy. To remove excessive data, the output would show all keywords from Document B, and only those with a relevancy ranking from Document A. Sample Output Document B Document A Relevancy This is Document B keyword 1 This is keyword 1 .25 This is Document B keyword 1 Keywords can be any length .25 This is Document B keyword 1 Some keywords will match Document B .25 This is Document B keyword 1 Some keywords will not match .25 This is Document B keyword 1 Keywords can contain text, numbers, and symbols .25 Document B serves as the basis or standard for comparing Some keywords will match Document B .5 Document A will be modified by the user to match the keywords in Document B This is keyword 1 .1 Document A will be modified by the user to match the keywords in Document B Keywords can be any length .1 Document A will be modified by the user to match the keywords in Document B Some keywords will not match .1 Document A will be modified by the user to match the keywords in Document B Some keywords will match Document B .75 Document A will be modified by the user to match the keywords in Document B Keywords can contain text, numbers, and symbols .1 This application is to help users add text, numbers and symbols to improve their relevancy ranking Keywords can contain text, numbers, and symbols .9 Jon Ciampi Mobile (415) 990-3151