Re: interpreting scores

2009-05-08 Thread Nate
Wow Karl, thank you so much for writing this up! It was a great help! I have the ngram tokenizing working as you described. Searches are very good! In order to verify the hits are of high quality, I use the Smith-Waterman algorithm. Other approximate string comparisons I evaluated didn't work well

Re: interpreting scores

2009-05-08 Thread Karl Wettin
8 maj 2009 kl. 13.13 skrev Nate: Is it possible to get a count for how many terms a result matched? Currently I think you can only do that by using Searcher.explain(). But that is not a very nice solution. A better solution is beeing worked on and might be available in a few months or so.

Re: interpreting scores

2009-05-08 Thread Karl Wettin
Ngrams can be use for lots of stuff. In your case it has nothing to do with spellchecking, it was the "until" vs. "'till" that made me think of them as they would allow you to get at least partial matching of the text. Also, ngrams gives you a bit of phrase functionallity. Create the grams

Re: interpreting scores

2009-05-08 Thread Nate
Is it possible to get a count for how many terms a result matched? Googling, it doesn't appear to be done easily. I tried it out by breaking my query into words myself, then doing a search for each one and keeping track of the results and counts. This way I know if 4 out of 5 terms matched a docume

Re: interpreting scores

2009-05-07 Thread Nate
Hi Karl, No, sometimes there will not be a matching MP3 for a note file. When this happens, the results I get are very poor. For example, if a song with a common song word like "love" in the name does not have a matching note file, then I get a handful of results that contain the word "love" but a

Re: interpreting scores

2009-05-07 Thread Karl Wettin
Nate, will there always be a correspodning mp3 for any given note sheet? As for analysis, I'd try using ngrams of the complete untokenized file name if I was you. "Michael Jackson Don't Stop 'till You Get Enough" -> "^mic", "mich", "icha", "chae", "hael", "ael ", "el j", "l ja", and so on

Re: interpreting scores

2009-05-06 Thread Nate
Thanks Anshum. What happens if a search returns only one match, and that match is not very "good"? If scores are only comparable to the scores of other matches in the same search, then the score is effectively meaningless if there is only one match. It seems like a very common need to want to pro

Re: interpreting scores

2009-05-06 Thread Anshum
Hi Nate, The scores are only comparable within the same search and not over different searches as the scores are affected by query as well as docs. About the threshold, I guess you could have count cutoff to get 'x' best matches. Said so coz I'm not really able to recollect anything which could use