You're not using any type of phrase search. Try ->
( (title:"John Bush"^4.0) OR (body:"John Bush") ) AND ( (title:John^4.0
body:John) AND (title:Bush^4.0 body:Bush) )
or maybe
( (title:"John Bush"~4^4.0) OR (body:"John Bush"~4) ) AND (
(title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush
commands, e-mail: [EMAIL PROTECTED]
--
Jason Pump
Technical Architect
Healthline
660 Third Street, Ste. 100
San Francisco, CA 94107
direct dial 415.281.3133
cell 510.812.1784
www.healthline.com
09 F9 11 02 9D 74 E3 5B D8 41 5
If the documents have some sort of fixed ranking value (pageweight) and
the documents are arranged in the index in that order then at some point
you can say there is no reason to look for more matches, e.g. even if
the words were next to each other in query order, the document couldn't
possibly
documents
places on them and how much effort he thinks that a hacker might be
prepared to put into recovering the text.
The best you're ever going to do is to protect the index as well as
you do the original documents.
jch
----
If you store a hash code of the word rather then the actual word you
should be able to search for stuff but not be able to actually retrieve
it; you can trade precision for "security" based on the number of bits
in the hash code ( e.g. 32 or 64 bits). I'd think a 64 bit hash would be
a reasonab
uery is "apple banana orange". The word "apple" is near the start of
the document, "banana" and "orange" at the end. Wouldn't your optimization
stop at the word "apple" and just return this word highlighted? Or do you
know of a way to quantify th
Renaud, one optimization you can do on this is to try the first 10kb,
see if it finds text worth highlighting, if not, with a slight overlap
try the next 9.9kb - 19.9kb or just 9.9kb -> end if you're feeling lazy.
This assumes that most good matches are at the start of the document,
and that th
s
are normalized as follows: ALL CAP words are prepended with a_ and
Capitalized words are prepended with c_ after downcasing. Digits are all
replaced with 0.
Cheers,
Boris
On 8/30/06, Jason Pump <[EMAIL PROTECTED]> wrote:
Is there a large list of words and their frequency in the engl
Is there a large list of words and their frequency in the english
language? Obviously it would differ by corpus but I would like to see
what's already available.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional comman
It's a string comparison. Make the "5" a "05" would be a simple workaround.
Jason
Peter W. wrote:
Hello,
I'm trying to do a numerical search for a property in Lucene using
RangeFilter.Less
without using both RangeQuery and test cases.
Here's the code that I expect would return one hit :
(ad
I would think what you want to do is index on the stem, and rank on the
stem and the original form. After all, if you match exactly, then you
better match for the stem.
Robert Haycock wrote:
Hi,
I started using the EnglishStemmer and noticed that only the stem gets
added to the index. I woul
11 matches
Mail list logo