Hello all,
I have been trying out the MoreLikeThis and many other similarity types of
queries, but still run into problems with content not being matched up.
Let me give an example, as well as some question that, hopefully someone can
answer, to help me refine my work.
Example:
1) Document A m
: This is index of 1, is has index 2, an has index 3 Example has index 4.
: What I have is the actual "character position" in the original text. "This"
in that case, you'll have to do a while loop over next() calls and check
the startOffset (or endOffset) of each untill you find the one you are
I thought that went to the "index" of the token. I may not understand it
completely but this is how I currently view the TokenStream
For example if my text was the following:
This is an Example
This is index of 1, is has index 2, an has index 3 Example has index 4.
What I have is the actual "c
Intentionally copied the subject line of this thread (from last August), and an
email from the thread is attached at the end of this email -
I ran into similar problems in custom sorting (memory leak due to caching) -
the subject has been well discussed in the thread but just want to add a voic
: I never got a response to this and thought maybe I was too wordy.
:
: I'm wondering if there's a way where given a position in the original text
: you can retrieve the token index that is nearest to that position using the
: StandardToken/StandardTokenizer classes?
i may not be understanding the
: A possible solution is store in a document object two fields: the original
: and the lowercased. I use the last one to make the query, and the other one
: to show the results. It works, but it doesnt smell good!
if your analyzer is what does hte lowercasing then you don't need two
seperate fi
Hi Erick, Jiye,
Thanks for your help!
My index is too short (less then 2MB). So I am not worry about it! I will
index it twice!
Thanks again!
[]s
Eloi
On 7/6/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
I flat guarantee that if you try to search on fields that are indexed
mixed case, y
I flat guarantee that if you try to search on fields that are indexed
mixed case, you'll have no end of grief . Everything from
mis-typed search requests to the same word being cased
differently in different parts of the source to ..
Your idea to index it twice is actually a solution that is
You may store the original text in the doc w/o index it and index the
lower case version without storing it. This may save you some space/time.
Eloi Rocha Neto wrote:
Hi Daniel,
I dont lowercase the field at index time, because I have to show the
results in the same way as it was found.
F
Hi Daniel,
I dont lowercase the field at index time, because I have to show the
results in the same way as it was found.
For instance:
Some fields indexed:
PP-Trip SubAlcance Seq Negativa
PP-Trip SubAlcance Seq Positiva
PS-Trip SubAlcance Seq Negativa
PS-Trip SubAlc
Warning, I don't know much about JCA Connector.
That said, index modifications aren't visible to a searcher
until the *searcher* is closed and re-opened. Which sounds
suspiciously like what would happen when the thread
terminates.
This may be totally off base, but sounds like a place to look...
Hello,
i have following problem:
I have written a Lucene JCA Connector which also taking care of the
index maintenance. From time to time the connector
is called (time initiated) and verifies if the index still is in synch
with the filesystem (deleted, added or updated documents).
If something c
FYI: Solr has a nice Analysis debugging tool that lets you see the
results of running an analysis as it passes through each phase of the
Analyzer. Some enterprising soul might want to make a contribution
along these lines that could be added to the contrib. :-)
Cheers,
Grant
On Jul 3, 2
13 matches
Mail list logo