: document in the scoring formula, and I thought the CustomScoreQuery would be
: useful, but I am realizing that it may not be easy because the "relevance"
: score from Lucene has no absolute meaning. The relevance score could be 5 or
: 500 and there is no way for me gauge what that number means an
Scott Montgomerie wrote:
> I just tried it with the latest nightly build, the problem still happens.
>
> I think it must have to do with a corrupted index somehow. I've also
> noticed, as a separate issue, that after this period of time (4-5 days),
> certain documents aren't indexed correctly.
: sent:(expired num[1 TO 5] "days ago")
:
: I don't see how to do this using either Lucene's QueryParser or the
: QsolParser. Is it possible to do it using the Query API (and the appropriate
: indexing changes)?
take a look at Span queries, particularly SpanNearQuery ... that can do
pretty
I just tried it with the latest nightly build, the problem still happens.
I think it must have to do with a corrupted index somehow. I've also
noticed, as a separate issue, that after this period of time (4-5 days),
certain documents aren't indexed correctly. For example, I will do a query:
Qu
Hi Christian,
Is there anyway you can post a complete, self-contained example
preferably as a JUnit test? I think it would be useful to know more
about how you are indexing (i.e. what Analyzer, etc.)
The offsets should be taken from whatever is set in on the Token
during Analysis. I, too,
And so it is! My bad - guess I should have paid more attention to the README
file which clearly explains the contents :P
-Terry
Erick Erickson wrote:
>
> It should already be on your disk with the distribution. Try
> /contrib/regex.
>
> Lots of things are rooted in contrib, and I've never ha
It should already be on your disk with the distribution. Try
/contrib/regex.
Lots of things are rooted in contrib, and I've never had to
find any other jars from the Lucene site, they've all
been in contrib
Hope this helps
Erick
On 8/16/07, dontspamterry <[EMAIL PROTECTED]> wrote:
>
>
> Hi,
Hi,
While researching support for wildcards in a PhraseQuery, I see various
references to SpanRegexQuery which is not part of the 2.2 distribution. I
checked the Lucene site to see if it's some add-on jar, but couldn't find
anything so I'm wondering where can I obtain the .class/jar file(s) for t
OK, that's clean (no leftover files). So this cause does not seem to
be the same cause as LUCENE-140.
Can you capture the exact docs you are adding (all indexed fields) and
try to replay them to see if the same exception is reproducible?
Have you seen this happen on a different machine? (Just
There are two files:
1. segments_2 [-1, -1, -3, 0, 0, 1, 20, 112, 39, 17, -80, 0, 0, 0, 0, 0, 0,
0, 0]
2. segments.gen [-1, -1, -1, -2, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0,
0, 2]
but this one when the index is done done properly.
hossman wrote:
>
> : After you close that IndexWriter, can
16 aug 2007 kl. 20.34 skrev Donna L Gresh:
Apologies if this is in the FAQ or elsewhere available but I could not
find this.
Can I provide a list of words that should *not* be stemmed by the
SnowballFilter?
If it is a static list, simply add it as an exception in the snowball
code and reco
: After you close that IndexWriter, can you list the files in your
: directory (that's a RAMDirectory right?)? Something like this:
The OP said this was a fairly small RAMDirectory index right? would it be
worth while to just write the whole thing to disk and post it onlin so
people could see ev
Hmmm. It is interesting, because that specific call (using
IndexWriter to "create" an index) was one of the causes in
LUCENE-140. But I'm pretty sure we fixed that cause. As part of
LUCENE-140 we also added further checks to catch re-using of an old
.del file at a lower level, and you're not hi
Not that I know of. I suspect you'll have to write a filter that returns
the stemmed or unstemmed based on membership in your list
of words not to stem.
Best
Erick
On 8/16/07, Donna L Gresh <[EMAIL PROTECTED]> wrote:
>
> Apologies if this is in the FAQ or elsewhere available but I could not
> fin
Apologies if this is in the FAQ or elsewhere available but I could not
find this.
Can I provide a list of words that should *not* be stemmed by the
SnowballFilter? My analyzer looks like this:
analyzer = new StandardAnalyzer(stopwords) {
public TokenStream tokenStream(String fieldName, j
Hi All,
I have the following set up: a) Indexed set of docs. b) Ran 1st query and
got tops docs c) Fetched the id's from that and stored in a data structure.
d) Ran 2nd query , got top docs , fetched id's and stored in a data
structure.
Now i have 2 sets of doc ids (set 1) and (set 1).
I want
Does it help you to find out if I create an empty index before start the real
operation?
IndexWriter writer = new IndexWriter(directory, new
SimpleAnalyzer(), true);
writer.close();
/* add new index afterward */
This is to clean up the index since springmodule
OK. Is it possible to capture this as small test case?
Maybe also call IndexWriter.setInfoStream(System.out) and capture details on
what segments are being merged?
Can you shed some light on how the application is using Lucene? Are you doing
deletes as well as adds? Opening readers against th
Can you post your code? Make sure that when you use wildcard in your custom
query parser, it will generate either WildcardQuery or PrefixQuery
correctly.
is_maximum wrote:
>
> Yes karl, when I explore the index by Luke I can see the terms
> for example I have a field namely, patientResult, it
Hello all.
I am trying to get at the raw difference that Lucene uses -- the result of
the fail-fast Levenstein distance algorithm. I believe that it is
calculated in FuzzyTermEnum.java (FuzzyTermEnum.cs).
In the application I have built upon Lucene, I would like to expose
similarity as the score,
On 16 Aug 2007, at 15:17, Alf Eaton wrote:
- Is there a way to get a list of all the terms in the index (or
maybe just the top n) ordered by descending frequency of usage? I
imagine it's related to docFreq, but can't see how to get a list of
terms in all documents.
Thanks to http://tinyu
I wonder if this is related to
https://issues.apache.org/jira/browse/LUCENE-951
If it's easy enough for you to reproduce, could you try the trunk
version of Lucene and see if it's fixed?
-Yonik
On 8/16/07, Scott Montgomerie <[EMAIL PROTECTED]> wrote:
> I'm getting an ArrayIndexOutOfBoundsExcepti
I'm getting an ArrayIndexOutOfBoundsException in
MultiLevelSkipListReader$SkipBuffer. This happens sporadically, on a
fairly small index (18 MB, about 30,000 documents). The index is
subject to a lot of adds and deletes, some of them concurrently. It
happens after about 4 days of heavy usage. I was
Hi,
What I meant was that highlighter can return either null or empty string. So
one should check for the null first and then also for "". At least that is
my observation...
Lukas
On 8/16/07, mark harwood <[EMAIL PROTECTED]> wrote:
>
> Highlighter deliberately returns null so the calling app can
On 16 Aug 2007, at 17:06, Grant Ingersoll wrote:
On Aug 16, 2007, at 10:17 AM, Alf Eaton wrote:
A couple of questions about term frequencies and stemming:
- What's the best way to get the most common unstemmed form of a
Porter-stemmed word from the index? For example given the stem
'wal
On Aug 16, 2007, at 10:17 AM, Alf Eaton wrote:
A couple of questions about term frequencies and stemming:
- What's the best way to get the most common unstemmed form of a
Porter-stemmed word from the index? For example given the stem
'walk', find that 'walking' is the most common full word
Here you go
-> Error during the indexing : docs out of order (0 <= 0 )
org.apache.lucene.index.CorruptIndexException: docs out of order (0 <= 0 )
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:368)
at
org.apache.lucene.index.SegmentMerger.mergeTerm
Highlighter deliberately returns null so the calling app can tell when the text
wasn't successfully highlighted.
Situations when this can happen are:
1) The text is out of synch with the index (the scenario you encountered)
2) The choice of analyzer used to tokenize the text differs from that us
Donna,
Now I understand what you are saying (seems that I had PBCAK as well ;-)
As for your last question: ...under what conditions would the highlighter
return nothing? Only if no terms matched?
I remember that I found that highlighter can return null or empty string in
different situations. I
A couple of questions about term frequencies and stemming:
- What's the best way to get the most common unstemmed form of a
Porter-stemmed word from the index? For example given the stem
'walk', find that 'walking' is the most common full word in the index.
- Is there a way to get a list of
I've started to redo tests one at a time to see what exactly caused the
decreased index time. Using the absolute path instead of the relative path
to the data doesn't seem to have made a significant difference, but using
StringBuffers (with a default of 25) made a huge change. I still have
to
Actually I don't think I'm having trouble-- as I mentioned,
my text is *not* stored, so to do highlighting I retrieve the
text from the database, apply the appropriate analyzer,
and do the highlighting. It seems to be working exactly as
it should. My problem was that in a few cases, the document
h
Hi Shailendra,
Could you pls send the same class file to my gmail a/c too ?
Regards
vini
Shailendra Sharma wrote:
>
> Ah, Good way !
>
> On 8/4/07, Paul Elschot <[EMAIL PROTECTED]> wrote:
>>
>> On Friday 03 August 2007 20:35, Shailendra Sharma wrote:
>> > Paul,
>> >
>> > If I understand Cedri
Well then that is particularly spooky!!
And, hopefully, possible/easy to reproduce. Thanks.
Mike
"testn" <[EMAIL PROTECTED]> wrote:
>
> I use RAMDirectory and the error often shows the low number. Last time it
> happened with message "7<=7". Nest time it happens, I will try to capture
> the s
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hello,
I have an index with an 'actor' field, for each actor there exists an single
field value entry, e.g.
stored/compressed,indexed,tokenized,termVector,termVectorOffsets,termVectorPosition
movie_actors:Mayrata O'Wisiedo (as Mairata O'Wisiedo)
Yes karl, when I explore the index by Luke I can see the terms
for example I have a field namely, patientResult, it contains value "Ca.
Oxalate:many" and also other values such as "Ca. Oxalate:few" etc.
the problems are when I put this query: patientResult:(Ca. Oxalate:few)
the result is
84329 Ca.
36 matches
Mail list logo