, one sentence per
row, and they can be searched by mysql's full text search feature. Using
database, it will be also easy to tell which document the matched sentence
belongs to.
AJ
On 2/6/06, Marc Hadfield <[EMAIL PROTECTED]> wrote:
Hi AJ -
Depending on your need, you could
o to keep in index), expected query
performance, and so on.
---marc hadfield
AJ Chen wrote:
I'll appreciate any advice on whether Lucene is appropriate for index/search
sentences. I have millions of documents broken down into millions of
sentences. Each sentence does not exist as a docum
're
seeing.
Can you post code demonstrating the problem? ideally in the form of
a simple, self contained, JUnit test?
-Hoss
On Jan 4, 2006, at 9:39 PM, Marc Hadfield wrote:
hello all -
i have a problem with a SpanNearQuery returning incorrect (false
positive) results.
I am creating
hello all -
i have a problem with a SpanNearQuery returning incorrect (false
positive) results.
I am creating the context of a field using tokens which have position
increment set to either 1 or 0. The position increment is set to 0 for
special tokens, in this case part-of-speech markers.
The standard way to do this is to additionally index the reverse of all
strings/tokens, potentially in a different field "reverse:", ie index
forward:abcd as well as reverse:dcba. Then in queries of the form
"*cd", reverse the query to "dc*" so that you end up with "reverse:dc*"
in your
tracted out and performance
penalties occur, although I can't say how much of a hit it is.
Best,
Marc Hadfield
Beto Siless wrote:
Hi, I'm with the transaction problem too: I have Documents which are
represented by a Business Object (persisted in a DB with an ORM),
indexed with
hello -
a fuzzy query related question:
has there been any other implementations of "fuzzy" queries other than
edit-distance? and/or modifications of edit-distance to less penalize
common alternate spellings? - i.e. "couldn't" vs. "couldnt" -- here the
apostrophe would get a smaller penalt
thanks again!
Doug Cutting wrote:
Marc Hadfield wrote:
In the SpanNear (or for that matter PhraseQuery), one can set a slop
value where 0 (zero) means one following after the other.
How can one differentiate between Terms at the **same** position vs.
one after the other?
The
)/0 (B)/1 (C)/2
vs
( A B )/0 (C)/1 (D)/2 ...
How can a SpanNear (or anything) query for A,B tell these two cases apart?
---Marc
Doug Cutting wrote:
Marc Hadfield wrote:
I actually mention your option in my email:
In principle I could store the full text in two fields with the
second
ries might work.
Marc
Doug Cutting wrote:
Marc Hadfield wrote:
I actually mention your option in my email:
In principle I could store the full text in two fields with the
second field containing the types without incrementing the token
index. Then, do a SpanQuery for "Johnson&q
Doug Cutting wrote:
Why not store them in the same field using positionIncrement=0 for the
types? Then they won't change positions of non-type tokens. You
should distinguish the types syntactically, e.g., prefix them with a
space or other character that does not occur within words. That way
ulting match would have a token position which would refer back to
the matching position in the first field. I don't know if this is a
really good idea.
Any thoughts?
---Marc Hadfield
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
12 matches
Mail list logo