I have a need to implement an custom inverted index in Lucene. I have files like the ones I have attached here. The Files have words and and scores assigned to that word. There will 100's of such files. Each file will have atleast 50000 such name value pairs.
Note: Currently the file only shows 10s of such name value pairs. But My real production data will have 50000 plus name value pairs in file. Currently I index the data using Lucene's Inverted Index. The query that is being execute against the Index has 100 Words. When the query is excuted against the index the result is returned in 100 milli seconds or so. Problem: Once i have the results of the query, I have to go through each file (for ex. attached file one). Then for each word in the user input query, I have to compute the total score. Doing this against 100's of files and 100's of keywords is causing the score computation to be slow i.e. about 3-5seconds. I need help resolving the above problem so that score computation takes less than 200Milli Seconds or so. One Resolution I was thinking is modifying the Lucene Source Code for creating inverted index. In this index we store the score in the index itself. When the results of the query are returned, we will get the scores along with the file names, there by eleminating the need to search the file for the keyword and corresponding score. I need to compute the total of all scores that belong to one single file. I am also open to any other ideas that you may have. Any suggestions regarding this will be very helpful. a.
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org