Re: Question on custom scoring

2007-08-16 Thread Chris Hostetter
: document in the scoring formula, and I thought the CustomScoreQuery would be : useful, but I am realizing that it may not be easy because the "relevance" : score from Lucene has no absolute meaning. The relevance score could be 5 or : 500 and there is no way for me gauge what that number means an

Re: [Fwd: Exception in MultiLevelSkipListReader$SkipBuffer.readByte]

2007-08-16 Thread Michael Busch
Scott Montgomerie wrote: > I just tried it with the latest nightly build, the problem still happens. > > I think it must have to do with a corrupted index somehow. I've also > noticed, as a separate issue, that after this period of time (4-5 days), > certain documents aren't indexed correctly.

Re: Nested concept fields

2007-08-16 Thread Chris Hostetter
: sent:(expired num[1 TO 5] "days ago") : : I don't see how to do this using either Lucene's QueryParser or the : QsolParser. Is it possible to do it using the Query API (and the appropriate : indexing changes)? take a look at Span queries, particularly SpanNearQuery ... that can do pretty

Re: [Fwd: Exception in MultiLevelSkipListReader$SkipBuffer.readByte]

2007-08-16 Thread Scott Montgomerie
I just tried it with the latest nightly build, the problem still happens. I think it must have to do with a corrupted index somehow. I've also noticed, as a separate issue, that after this period of time (4-5 days), certain documents aren't indexed correctly. For example, I will do a query: Qu

Re: getting term offset information for fields with multiple value entiries

2007-08-16 Thread Grant Ingersoll
Hi Christian, Is there anyway you can post a complete, self-contained example preferably as a JUnit test? I think it would be useful to know more about how you are indexing (i.e. what Analyzer, etc.) The offsets should be taken from whatever is set in on the Token during Analysis. I, too,

Re: Location of SpanRegexQuery

2007-08-16 Thread dontspamterry
And so it is! My bad - guess I should have paid more attention to the README file which clearly explains the contents :P -Terry Erick Erickson wrote: > > It should already be on your disk with the distribution. Try > /contrib/regex. > > Lots of things are rooted in contrib, and I've never ha

Re: Location of SpanRegexQuery

2007-08-16 Thread Erick Erickson
It should already be on your disk with the distribution. Try /contrib/regex. Lots of things are rooted in contrib, and I've never had to find any other jars from the Lucene site, they've all been in contrib Hope this helps Erick On 8/16/07, dontspamterry <[EMAIL PROTECTED]> wrote: > > > Hi,

Location of SpanRegexQuery

2007-08-16 Thread dontspamterry
Hi, While researching support for wildcards in a PhraseQuery, I see various references to SpanRegexQuery which is not part of the 2.2 distribution. I checked the Lucene site to see if it's some add-on jar, but couldn't find anything so I'm wondering where can I obtain the .class/jar file(s) for t

Re: out of order

2007-08-16 Thread Michael McCandless
OK, that's clean (no leftover files). So this cause does not seem to be the same cause as LUCENE-140. Can you capture the exact docs you are adding (all indexed fields) and try to replay them to see if the same exception is reproducible? Have you seen this happen on a different machine? (Just

Re: out of order

2007-08-16 Thread testn
There are two files: 1. segments_2 [-1, -1, -3, 0, 0, 1, 20, 112, 39, 17, -80, 0, 0, 0, 0, 0, 0, 0, 0] 2. segments.gen [-1, -1, -1, -2, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2] but this one when the index is done done properly. hossman wrote: > > : After you close that IndexWriter, can

Re: tell snowballfilter not to stem certain words?

2007-08-16 Thread karl wettin
16 aug 2007 kl. 20.34 skrev Donna L Gresh: Apologies if this is in the FAQ or elsewhere available but I could not find this. Can I provide a list of words that should *not* be stemmed by the SnowballFilter? If it is a static list, simply add it as an exception in the snowball code and reco

Re: out of order

2007-08-16 Thread Chris Hostetter
: After you close that IndexWriter, can you list the files in your : directory (that's a RAMDirectory right?)? Something like this: The OP said this was a fairly small RAMDirectory index right? would it be worth while to just write the whole thing to disk and post it onlin so people could see ev

Re: out of order

2007-08-16 Thread Michael McCandless
Hmmm. It is interesting, because that specific call (using IndexWriter to "create" an index) was one of the causes in LUCENE-140. But I'm pretty sure we fixed that cause. As part of LUCENE-140 we also added further checks to catch re-using of an old .del file at a lower level, and you're not hi

Re: tell snowballfilter not to stem certain words?

2007-08-16 Thread Erick Erickson
Not that I know of. I suspect you'll have to write a filter that returns the stemmed or unstemmed based on membership in your list of words not to stem. Best Erick On 8/16/07, Donna L Gresh <[EMAIL PROTECTED]> wrote: > > Apologies if this is in the FAQ or elsewhere available but I could not > fin

tell snowballfilter not to stem certain words?

2007-08-16 Thread Donna L Gresh
Apologies if this is in the FAQ or elsewhere available but I could not find this. Can I provide a list of words that should *not* be stemmed by the SnowballFilter? My analyzer looks like this: analyzer = new StandardAnalyzer(stopwords) { public TokenStream tokenStream(String fieldName, j

Document Similarities lucene(particularly using doc id's)

2007-08-16 Thread Lokeya
Hi All, I have the following set up: a) Indexed set of docs. b) Ran 1st query and got tops docs c) Fetched the id's from that and stored in a data structure. d) Ran 2nd query , got top docs , fetched id's and stored in a data structure. Now i have 2 sets of doc ids (set 1) and (set 1). I want

Re: out of order

2007-08-16 Thread testn
Does it help you to find out if I create an empty index before start the real operation? IndexWriter writer = new IndexWriter(directory, new SimpleAnalyzer(), true); writer.close(); /* add new index afterward */ This is to clean up the index since springmodule

Re: out of order

2007-08-16 Thread Michael McCandless
OK. Is it possible to capture this as small test case? Maybe also call IndexWriter.setInfoStream(System.out) and capture details on what segments are being merged? Can you shed some light on how the application is using Lucene? Are you doing deletes as well as adds? Opening readers against th

Re: query question

2007-08-16 Thread testn
Can you post your code? Make sure that when you use wildcard in your custom query parser, it will generate either WildcardQuery or PrefixQuery correctly. is_maximum wrote: > > Yes karl, when I explore the index by Luke I can see the terms > for example I have a field namely, patientResult, it

Possible to expose similarity as a property in hits collection?

2007-08-16 Thread Michael Barbarelli
Hello all. I am trying to get at the raw difference that Lucene uses -- the result of the fail-fast Levenstein distance algorithm. I believe that it is calculated in FuzzyTermEnum.java (FuzzyTermEnum.cs). In the application I have built upon Lucene, I would like to expose similarity as the score,

Re: Stemmed terms/common terms

2007-08-16 Thread Alf Eaton
On 16 Aug 2007, at 15:17, Alf Eaton wrote: - Is there a way to get a list of all the terms in the index (or maybe just the top n) ordered by descending frequency of usage? I imagine it's related to docFreq, but can't see how to get a list of terms in all documents. Thanks to http://tinyu

Re: [Fwd: Exception in MultiLevelSkipListReader$SkipBuffer.readByte]

2007-08-16 Thread Yonik Seeley
I wonder if this is related to https://issues.apache.org/jira/browse/LUCENE-951 If it's easy enough for you to reproduce, could you try the trunk version of Lucene and see if it's fixed? -Yonik On 8/16/07, Scott Montgomerie <[EMAIL PROTECTED]> wrote: > I'm getting an ArrayIndexOutOfBoundsExcepti

[Fwd: Exception in MultiLevelSkipListReader$SkipBuffer.readByte]

2007-08-16 Thread Scott Montgomerie
I'm getting an ArrayIndexOutOfBoundsException in MultiLevelSkipListReader$SkipBuffer. This happens sporadically, on a fairly small index (18 MB, about 30,000 documents). The index is subject to a lot of adds and deletes, some of them concurrently. It happens after about 4 days of heavy usage. I was

Re: Question about highlighting returning nothing

2007-08-16 Thread Lukas Vlcek
Hi, What I meant was that highlighter can return either null or empty string. So one should check for the null first and then also for "". At least that is my observation... Lukas On 8/16/07, mark harwood <[EMAIL PROTECTED]> wrote: > > Highlighter deliberately returns null so the calling app can

Re: Stemmed terms/common terms

2007-08-16 Thread Alf Eaton
On 16 Aug 2007, at 17:06, Grant Ingersoll wrote: On Aug 16, 2007, at 10:17 AM, Alf Eaton wrote: A couple of questions about term frequencies and stemming: - What's the best way to get the most common unstemmed form of a Porter-stemmed word from the index? For example given the stem 'wal

Re: Stemmed terms/common terms

2007-08-16 Thread Grant Ingersoll
On Aug 16, 2007, at 10:17 AM, Alf Eaton wrote: A couple of questions about term frequencies and stemming: - What's the best way to get the most common unstemmed form of a Porter-stemmed word from the index? For example given the stem 'walk', find that 'walking' is the most common full word

Re: out of order

2007-08-16 Thread testn
Here you go -> Error during the indexing : docs out of order (0 <= 0 ) org.apache.lucene.index.CorruptIndexException: docs out of order (0 <= 0 ) at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:368) at org.apache.lucene.index.SegmentMerger.mergeTerm

Re: Question about highlighting returning nothing

2007-08-16 Thread mark harwood
Highlighter deliberately returns null so the calling app can tell when the text wasn't successfully highlighted. Situations when this can happen are: 1) The text is out of synch with the index (the scenario you encountered) 2) The choice of analyzer used to tokenize the text differs from that us

Re: Question about highlighting returning nothing

2007-08-16 Thread Lukas Vlcek
Donna, Now I understand what you are saying (seems that I had PBCAK as well ;-) As for your last question: ...under what conditions would the highlighter return nothing? Only if no terms matched? I remember that I found that highlighter can return null or empty string in different situations. I

Stemmed terms/common terms

2007-08-16 Thread Alf Eaton
A couple of questions about term frequencies and stemming: - What's the best way to get the most common unstemmed form of a Porter-stemmed word from the index? For example given the stem 'walk', find that 'walking' is the most common full word in the index. - Is there a way to get a list of

Re: 答复: Indexing correctly?

2007-08-16 Thread John Paul Sondag
I've started to redo tests one at a time to see what exactly caused the decreased index time. Using the absolute path instead of the relative path to the data doesn't seem to have made a significant difference, but using StringBuffers (with a default of 25) made a huge change. I still have to

Re: Question about highlighting returning nothing

2007-08-16 Thread Donna L Gresh
Actually I don't think I'm having trouble-- as I mentioned, my text is *not* stored, so to do highlighting I retrieve the text from the database, apply the appropriate analyzer, and do the highlighting. It seems to be working exactly as it should. My problem was that in a few cases, the document h

Re: Can I do boosting based on term postions?

2007-08-16 Thread vini
Hi Shailendra, Could you pls send the same class file to my gmail a/c too ? Regards vini Shailendra Sharma wrote: > > Ah, Good way ! > > On 8/4/07, Paul Elschot <[EMAIL PROTECTED]> wrote: >> >> On Friday 03 August 2007 20:35, Shailendra Sharma wrote: >> > Paul, >> > >> > If I understand Cedri

Re: out of order

2007-08-16 Thread Michael McCandless
Well then that is particularly spooky!! And, hopefully, possible/easy to reproduce. Thanks. Mike "testn" <[EMAIL PROTECTED]> wrote: > > I use RAMDirectory and the error often shows the low number. Last time it > happened with message "7<=7". Nest time it happens, I will try to capture > the s

getting term offset information for fields with multiple value entiries

2007-08-16 Thread duiduder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I have an index with an 'actor' field, for each actor there exists an single field value entry, e.g. stored/compressed,indexed,tokenized,termVector,termVectorOffsets,termVectorPosition movie_actors:Mayrata O'Wisiedo (as Mairata O'Wisiedo)

Re: query question

2007-08-16 Thread Mohammad Norouzi
Yes karl, when I explore the index by Luke I can see the terms for example I have a field namely, patientResult, it contains value "Ca. Oxalate:many" and also other values such as "Ca. Oxalate:few" etc. the problems are when I put this query: patientResult:(Ca. Oxalate:few) the result is 84329 Ca.