Re: use Lucene to index sentences

2006-02-06 Thread Marc Hadfield
, one sentence per row, and they can be searched by mysql's full text search feature. Using database, it will be also easy to tell which document the matched sentence belongs to. AJ On 2/6/06, Marc Hadfield <[EMAIL PROTECTED]> wrote: Hi AJ - Depending on your need, you could

Re: use Lucene to index sentences

2006-02-06 Thread Marc Hadfield
o to keep in index), expected query performance, and so on. ---marc hadfield AJ Chen wrote: I'll appreciate any advice on whether Lucene is appropriate for index/search sentences. I have millions of documents broken down into millions of sentences. Each sentence does not exist as a docum

Re: span / position increment issue

2006-01-05 Thread Marc Hadfield
're seeing. Can you post code demonstrating the problem? ideally in the form of a simple, self contained, JUnit test? -Hoss On Jan 4, 2006, at 9:39 PM, Marc Hadfield wrote: hello all - i have a problem with a SpanNearQuery returning incorrect (false positive) results. I am creating

span / position increment issue

2006-01-04 Thread Marc Hadfield
hello all - i have a problem with a SpanNearQuery returning incorrect (false positive) results. I am creating the context of a field using tokens which have position increment set to either 1 or 0. The position increment is set to 0 for special tokens, in this case part-of-speech markers.

Re: Wildcard

2005-12-02 Thread Marc Hadfield
The standard way to do this is to additionally index the reverse of all strings/tokens, potentially in a different field "reverse:", ie index forward:abcd as well as reverse:dcba. Then in queries of the form "*cd", reverse the query to "dc*" so that you end up with "reverse:dc*" in your

Re: Lucene & Transactional semantics

2005-11-17 Thread Marc Hadfield
tracted out and performance penalties occur, although I can't say how much of a hit it is. Best, Marc Hadfield Beto Siless wrote: Hi, I'm with the transaction problem too: I have Documents which are represented by a Business Object (persisted in a DB with an ORM), indexed with

Re: Funny results with Fuzzy

2005-10-25 Thread Marc Hadfield
hello - a fuzzy query related question: has there been any other implementations of "fuzzy" queries other than edit-distance? and/or modifications of edit-distance to less penalize common alternate spellings? - i.e. "couldn't" vs. "couldnt" -- here the apostrophe would get a smaller penalt

Re: query across fields?

2005-10-11 Thread Marc Hadfield
thanks again! Doug Cutting wrote: Marc Hadfield wrote: In the SpanNear (or for that matter PhraseQuery), one can set a slop value where 0 (zero) means one following after the other. How can one differentiate between Terms at the **same** position vs. one after the other? The

Re: query across fields?

2005-10-11 Thread Marc Hadfield
)/0 (B)/1 (C)/2 vs ( A B )/0 (C)/1 (D)/2 ... How can a SpanNear (or anything) query for A,B tell these two cases apart? ---Marc Doug Cutting wrote: Marc Hadfield wrote: I actually mention your option in my email: In principle I could store the full text in two fields with the second

Re: query across fields?

2005-10-10 Thread Marc Hadfield
ries might work. Marc Doug Cutting wrote: Marc Hadfield wrote: I actually mention your option in my email: In principle I could store the full text in two fields with the second field containing the types without incrementing the token index. Then, do a SpanQuery for "Johnson&q

Re: query across fields?

2005-10-10 Thread Marc Hadfield
Doug Cutting wrote: Why not store them in the same field using positionIncrement=0 for the types? Then they won't change positions of non-type tokens. You should distinguish the types syntactically, e.g., prefix them with a space or other character that does not occur within words. That way

query across fields?

2005-10-10 Thread Marc Hadfield
ulting match would have a token position which would refer back to the matching position in the first field. I don't know if this is a really good idea. Any thoughts? ---Marc Hadfield - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]