RE: NOT_ANALYZED field

2009-04-28 Thread CM Wong
Thanks a lot. I am now indexing my id in lowercase and my problem is solved. Regards, CM --- Uwe Schindler wrote: > That is normal. Fields that are not analyzed are indexed as single tokens. > The anaylzer does not only tokenize the text, it also transforms it (e.g. to > lower case). If you e

Phrase Highlighting

2009-04-28 Thread Max Lynch
Hi, I am trying to find out exactly when a word I'm looking for in a document is found. I've talked to a few people on IRC and it seems like the best way is to use a highlighter. What I have right now is a system where I put each word the highlighter is called with into a list so I then know whic

RE: kamikaze

2009-04-28 Thread molz
Hi Micheal, Thanks for trying out Kamikaze for starters. So I guess there are a few issues here 1. getDocSetInstance(int min, max, count,DocSetFactory.FOCUS) assumes that count < max. I guess thats an API check we should add anyways to improve usability. That is not to say that it will not work

Re: sub-scores for all clauses in a BooleanQuery

2009-04-28 Thread Chris Hostetter
: I've also tried getting the scores by walking the clauses of the : BooleanQuery, but that doesn't seem to work either, because the : queryNorm is off. For example, here's an original explanation for a : 3-clause query, where one clause doesn't match: a simple solution would be to eliminate the

Re: Proximity and Percentage match search in Lucene

2009-04-28 Thread Chris Hostetter
Radha: replying/reforwarding the same message over and over doesn't tend to be a useful way to encourage additional replies. if you do have something to add to an existing discussion that you've started, you should at least do it as a reply to the orriginal discussion so people have the full

Re: Appropriate analyzer

2009-04-28 Thread Chris Hostetter
: try to use RegexQuery Except that his input string is longer then the terms he wants to match on. It sounds like what you are looking for is essentially a simplified use case of the "longest matching sub-phrase" problem... http://www.nabble.com/Dictionary-lookup-possibilities-to22977277.ht

Re: Getting matched words for PhraseQuery or SpanNearQuery

2009-04-28 Thread Jaco
Hi Mark, Thanks for that - after wading through some source code in the highlighter package and reading more docs I managed to get out the info I needed by getting the start and end token position of each span found and subsequently getting the words back out of the TokenStream that I initially cr

Re: Searcher#setSimilarity clarifications

2009-04-28 Thread Doron Cohen
Searcher is quite light. It is the index reader that is heavier. So create a single index reader, for each of the similarities to be use concurrently, create a searcher over that single reader, set its similarity, and so on. Doron On Mon, Apr 27, 2009 at 7:53 PM, Rakesh Sinha wrote: > I am looki

Re: Read past EOF

2009-04-28 Thread Michael McCandless
Ugh, indeed FieldInfos fails to properly read 2.3.x indices if the field name contains non-ascii characters. I'll open an issue, make a test case and work out a fix. Hmm. Thanks for raising this! Mike On Tue, Apr 28, 2009 at 7:53 AM, Mike Streeton wrote: > I have an index that works fine on L

RE: Read past EOF

2009-04-28 Thread Mike Streeton
An update, I have managed to get it to not fail by debugging and changing the value of org.apache.lucene.store.InputIndex.preUTF8Strings = true. The value is always false when it fails. Mike -Original Message- From: Mike Streeton [mailto:mike.stree...@connexica.com] Sent: 28 April 200

RE: NOT_ANALYZED field

2009-04-28 Thread Uwe Schindler
That is normal. Fields that are not analyzed are indexed as single tokens. The anaylzer does not only tokenize the text, it also transforms it (e.g. to lower case). If you enter your search using the query parser, the entered search terms are analyzed! And for full text engines, the analyzer for qu

Read past EOF

2009-04-28 Thread Mike Streeton
I have an index that works fine on Lucene 2.3.2 but fails to open in 2.4.1, it always fails with an Read past EOF. The index does contain some field names with german umlaut characters in Any ideas? Many Thanks Mike CheckIndex v2.3.2 NOTE: testing will be more thorough if you run java with

Re: NOT_ANALYZED field

2009-04-28 Thread Erick Erickson
Well, you haven't shown us your program, so it's hard to tell But my first uninformed guess would be that the case of your search doesn't exactly match the case you indexed when you add letters to your IDs. We need to see the search code particularly, including the analyzers you use (a snippe

NOT_ANALYZED field

2009-04-28 Thread CM Wong
Hi, In my simple program I have an ID field which is NON_ANALYZED. I find that if the field contains only numeric characters (e.g. id="00023"), I can successsfully search for the doc. (search for "id:00023") But if the field contains non-numeric characates (e.g. id="nJK00023") then the search re

Re: Getting matched words for PhraseQuery or SpanNearQuery

2009-04-28 Thread Mark Miller
The Span Highlighter gets positions by attempting to convert a standard Lucne Query to a SpanQuery approximate, and then calling getSpans on the span query to find start end positions (getSpans is called against a fast single document MemoryIndex). You might check out WeightedSpanTermExtractor

Re: Why Lucene phrase searching fail?

2009-04-28 Thread Ian Lea
Looks fine to me, but you haven't told us what analyzers you are using, whether you are using omitTf, suggested as a possibility by Koji, or anything else. Answer Koji, read the "Why am I getting no hits / incorrect hits?" section of the FAQ and if still stuck, post here the simplest possible self

Re: ArrayIndexOutOfBoundsException from TermInfosReader.get (2.3.2)

2009-04-28 Thread Michael McCandless
This doesn't ring a bell (ie sounds like something new). It's quite spooky. Any hints on what led to this? It looks like, somehow, enumOffset is that massive negative number (-1030685), in this code from TermInfosReader.java: // optimize sequential access: first try scanning cached enum w/o

Getting matched words for PhraseQuery or SpanNearQuery

2009-04-28 Thread Jaco
Hello, I am pretty new to the Lucene API, and there's something I can't figure out from the docs and from the mailing list archives. I hope somebody can point me into the right direction. Here's my case: for text analysis purposes I am doing PhraseQueries and SpanNearQueries. Using the highlighter