Re: product based term combination for BooleanQuery?

2007-07-03 Thread Jason Pump
You're not using any type of phrase search. Try -> ( (title:"John Bush"^4.0) OR (body:"John Bush") ) AND ( (title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush) ) or maybe ( (title:"John Bush"~4^4.0) OR (body:"John Bush"~4) ) AND ( (title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush

Re: Language detection library

2007-05-03 Thread Jason Pump
commands, e-mail: [EMAIL PROTECTED] -- Jason Pump Technical Architect Healthline 660 Third Street, Ste. 100 San Francisco, CA 94107 direct dial 415.281.3133 cell 510.812.1784 www.healthline.com 09 F9 11 02 9D 74 E3 5B D8 41 5

OT re Emulating Pages Search

2007-04-03 Thread Jason Pump
If the documents have some sort of fixed ranking value (pageweight) and the documents are arranged in the index in that order then at some point you can say there is no reason to look for more matches, e.g. even if the words were next to each other in query order, the document couldn't possibly

Re: Index a source, but not store it... can it be done?

2007-03-09 Thread Jason Pump
documents places on them and how much effort he thinks that a hacker might be prepared to put into recovering the text. The best you're ever going to do is to protect the index as well as you do the original documents. jch ----

Re: Index a source, but not store it... can it be done?

2007-03-08 Thread Jason Pump
If you store a hash code of the word rather then the actual word you should be able to search for stuff but not be able to actually retrieve it; you can trade precision for "security" based on the number of bits in the hash code ( e.g. 32 or 64 bits). I'd think a 64 bit hash would be a reasonab

Re: Text storing design and performance question

2007-01-11 Thread Jason Pump
uery is "apple banana orange". The word "apple" is near the start of the document, "banana" and "orange" at the end. Wouldn't your optimization stop at the word "apple" and just return this word highlighted? Or do you know of a way to quantify th

Re: Text storing design and performance question

2007-01-10 Thread Jason Pump
Renaud, one optimization you can do on this is to try the first 10kb, see if it finds text worth highlighting, if not, with a slight overlap try the next 9.9kb - 19.9kb or just 9.9kb -> end if you're feeling lazy. This assumes that most good matches are at the start of the document, and that th

Re: word frequency list?

2006-08-31 Thread Jason Pump
s are normalized as follows: ALL CAP words are prepended with a_ and Capitalized words are prepended with c_ after downcasing. Digits are all replaced with 0. Cheers, Boris On 8/30/06, Jason Pump <[EMAIL PROTECTED]> wrote: Is there a large list of words and their frequency in the engl

word frequency list?

2006-08-30 Thread Jason Pump
Is there a large list of words and their frequency in the english language? Obviously it would differ by corpus but I would like to see what's already available. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional comman

Re: search with RangeFilter.Less

2006-06-28 Thread Jason Pump
It's a string comparison. Make the "5" a "05" would be a simple workaround. Jason Peter W. wrote: Hello, I'm trying to do a numerical search for a property in Lucene using RangeFilter.Less without using both RangeQuery and test cases. Here's the code that I expect would return one hit : (ad

Re: Adding stem AND original term

2006-06-28 Thread Jason Pump
I would think what you want to do is index on the stem, and rank on the stem and the original form. After all, if you match exactly, then you better match for the stem. Robert Haycock wrote: Hi, I started using the EnglishStemmer and noticed that only the stem gets added to the index. I woul