Hi,
I have test some stemmer algorithms in my application. However, i think we'd
better writer a weaker algorithm. I mean, the Porter and some other
algorithms are too strong. maybe an algorithm which can convert plural to
single noun is enough.
On 2/14/06, Yilmazel, Sibel <[EMAIL PROTECTED]> wro
I have seen this error in my Simpy logs before at least the NPE in
compareTo (I don't recall the rest of the stack).
Have you tried debugging this? I suppose the Term field or value is null
somehow... not sure why.
Otis
P.S.
Deleting files - don't :)
- Original Message
From: Greg
I can't share any experiences with K-Stem, but I can share that I do remember
K-stem people contributing a piece of code that integrated their K-Stem work
with Lucene a few (2?) years ago. Their code had some funky license attached,
so it never made it into Lucene, but it was available for down
Hi Paul,
Yes, that is exactly what I was trying to say in my earlier example of
acessing documents in a chronologically sorted order (which might be the same
as index insert order). Thanks for confirming it.
Otis
- Original Message
From: Paul Elschot
IndexReader.doc(docId
I may misunderstand your needs, but isn't it relevance feedback?
Please check Grant Ingersoll's presentation at ApacheCon 2005.
He put out great demo programs for the relevance feedback using Lucene.
Thank you,
Koji
> -Original Message-
> From: Chun Wei Ho [mailto:[EMAIL PROTECTED]
> Sen
: That gets things into the 'deleteable' file - but its never actually
: deleting all of the files from the deleteable file. I'm almost always
: ending up with at least 1 duplicate copy of my index.
I think it only tries to delete the files listed in deletable prior to
trying to delete any other
: I can create a test case; should I include an index
: along with it (it could be rather large)?
the ideal test case creates the index in it's constructor or setUp method.
since the index is going to be totally artificial, the data doesn't
matter, just theterm you want to delete on (and they can
: 10 documents ordered by score. But the 2nd document are more frequently
: choosen and clicked by users than the 1st one. Of course, I will record the
: click number. I want the 2nd document to bubble up and become the first one.
: How can I integrate the function to Lucene.
: Any suggestion?
Ta
Take a look at the HighFreqTerms sample class in contrib...
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/miscellaneous/src/java/org/apache/lucene/misc/HighFreqTerms.java?rev=376393&view=log
...it doesn't meet your goal, because it returns a list of terms that
appear frequently in
I may be wrong but isn't this what Carrot2 does?
-Ben
On 2/13/06, Chun Wei Ho <[EMAIL PROTECTED]> wrote:
>
> Thanks. But I am actually looking for approaches/libraries which will
> help me to come up with the suggested "refine searches".
>
> For example I might search for "accident" on the headli
I can create a test case; should I include an index
along with it (it could be rather large)?
I'm running the deletion process again with the latest
nightly build. So far I haven't seen any of the
previous problems, so perhaps there is already a fix
in place.
Thanks!
Greg
--- Daniel Naber <[EM
Paul Smith wrote:
is 1.9 binary backward compatible? (both source code and index format).
That is the intent. Try a nightly build:
http://cvs.apache.org/dist/lucene/java/nightly/
Doug
-
To unsubscribe, e-mail: [EMAIL PROTEC
No, all CSInputStream's share a single FSInputStream, so the
FSInputStream shouldn't be closed until all of the CSInputStream's,
have been closed. This is done by CompoundFileReader.close(). It
sounds like that's what's not getting called. As you update
indexes, how do you close stale
On 14/02/2006, at 7:44 AM, Doug Cutting wrote:
Paul Smith wrote:
We're using Lucene 1.4.3, and after hunting around in the source
code just to see what I might be missing, I came across this, and
I'd just like some comments.
Please try using a 1.9 build to see if this is something that'
Sebastian Menge wrote:
Or, to put it more simple, what does a boost of "2" or "10" _mean_ in
contrast to a boost of "0.5" or "0.1" !?
Boosts are simply multiplied into scores. So they only mean something
in the context of the rest of the scoring mechanism.
http://lucene.apache.org/java/docs
Paul Smith wrote:
We're using Lucene 1.4.3, and after hunting around in the source code
just to see what I might be missing, I came across this, and I'd just
like some comments.
Please try using a 1.9 build to see if this is something that's perhaps
already been fixed.
CompoundFileReader
On Montag 13 Februar 2006 19:42, Greg Gershman wrote:
> I'm still wondering if anyone has any thoughts on the
> NullPointerException and/or the delete/optimize
> problems I'm having. Â They seem to be very real
> issues.
I haven't seen this before (and don't remember anyone on the list
mentioning
Aigner, Thomas wrote:
I believe that the files are actually deleted from lucene when the
optimize is run.
That gets things into the 'deleteable' file - but its never actually
deleting all of the files from the deleteable file. I'm almost always
ending up with at least 1 duplicate copy of my
Thanks, that is the way things will be done in the
future.
I'm still wondering if anyone has any thoughts on the
NullPointerException and/or the delete/optimize
problems I'm having. They seem to be very real
issues.
Greg
--- "Michael D. Curtin" <[EMAIL PROTECTED]> wrote:
> Greg Gershman wrote:
Hello all,
We have done some preliminary research on Porter2 and K-stem algorithms
and have some questions.
Porter2 was found to be a 'strong' stemming algorithm where it strips
off both inflectional suffixes (-s, -es, -ed) and derivational suffixes
(-able, -aciousness, -ability). K-Stem seemed t
Greg Gershman wrote:
No problem; this is not meant to be a regular
operation, rather it's a (hopefully) one-time thing
till the index can be restructured.
The data is chronological in nature, deleting
everything before a specific point in time. The index
is optimized, so is it possible to remo
I believe that the files are actually deleted from lucene when the
optimize is run.
-Original Message-
From: Dan Armbrust [mailto:[EMAIL PROTECTED]
Sent: Monday, February 13, 2006 12:27 PM
To: java-user@lucene.apache.org
Subject: When do files in 'deleteable' get deleted?
If I am using l
If I am using lucene (daily build from ~ a month ago or so) on windows -
and when I merge two indexes together, I get a number of .cfs files
noted in my 'deleteable' file - but they never seem to actually be
deleted by lucene.
When does lucene try to delete these files - does it ever work on
No problem; this is not meant to be a regular
operation, rather it's a (hopefully) one-time thing
till the index can be restructured.
The data is chronological in nature, deleting
everything before a specific point in time. The index
is optimized, so is it possible to remove specific
files? I'm
Greg Gershman wrote:
I'm trying to delete a large number of documents
(~15million) from a a large index (30+ million
documents). I've started with an optimized index, and
a list of docIds (our own unique identifier for a
document, not a Lucene doc number) to pass to the
IndexReader.delete(Term
I'm trying to delete a large number of documents
(~15million) from a a large index (30+ million
documents). I've started with an optimized index, and
a list of docIds (our own unique identifier for a
document, not a Lucene doc number) to pass to the
IndexReader.delete(Term t) method. I've had a f
Hi,all
We have a requirement that the clicked-number of the document will be a
factor in the score's calculation. For example, a search operation returns
10 documents ordered by score. But the 2nd document are more frequently
choosen and clicked by users than the 1st one. Of course, I will record
Thanks. But I am actually looking for approaches/libraries which will
help me to come up with the suggested "refine searches".
For example I might search for "accident" on the headlines at a news
site, which would come back with lots of hits. I am looking for
something that would analyze the headl
>And next time if it is a refined search I will merge current query with
How do you recognize a refined query? And how are you the queries refined?
Cheers,
klaus
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional com
Hi ,
I have implemented by using query "mergeBooleanQueries" method... in this
approach I have created one POJO class of RefineQuery which contains one
variable called Query and I will set whenever I get a search..
And next time if it is a refined search I will merge current query with the
refin
A simple approach is to count the most common words in the result set and
present them in combination with the original query. If you have any meta
information you could use them the refine the query.
-Ursprüngliche Nachricht-
Von: Chun Wei Ho [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 1
Hi,
I am trying to suggest refine searches for my Lucene search. For
example, if a search turned out too many searches, it would list a
number of document title subsequences that occurred frequently in the
results of the previous search, as possible candidates for refining
the search.
Does anyone
Andrzej Bialecki wrote:
None of you mentioned yet the aspect that 4k is the memory page size
on IA32 hardware. This in itself would favor any operations using
multiple of this size, and penalize operations using amounts below
this size.
For normal I/O it will rarely make any difference at al
Hi,
None of you mentioned yet the aspect that 4k is the memory page size on
IA32 hardware. This in itself would favor any operations using multiple
of this size, and penalize operations using amounts below this size.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ ___
34 matches
Mail list logo