Re: Stemmer algorithms

2006-02-13 Thread jason
Hi, I have test some stemmer algorithms in my application. However, i think we'd better writer a weaker algorithm. I mean, the Porter and some other algorithms are too strong. maybe an algorithm which can convert plural to single noun is enough. On 2/14/06, Yilmazel, Sibel <[EMAIL PROTECTED]> wro

Re: Help with mass delete from large index

2006-02-13 Thread Otis Gospodnetic
I have seen this error in my Simpy logs before at least the NPE in compareTo (I don't recall the rest of the stack). Have you tried debugging this? I suppose the Term field or value is null somehow... not sure why. Otis P.S. Deleting files - don't :) - Original Message From: Greg

Re: Stemmer algorithms

2006-02-13 Thread Otis Gospodnetic
I can't share any experiences with K-Stem, but I can share that I do remember K-stem people contributing a piece of code that integrated their K-Stem work with Lucene a few (2?) years ago. Their code had some funky license attached, so it never made it into Lucene, but it was available for down

Re: Performance and FS block size

2006-02-13 Thread Otis Gospodnetic
Hi Paul, Yes, that is exactly what I was trying to say in my earlier example of acessing documents in a chronologically sorted order (which might be the same as index insert order). Thanks for confirming it. Otis - Original Message From: Paul Elschot IndexReader.doc(docId

RE: Suggesting refine searches with Lucene

2006-02-13 Thread Koji Sekiguchi
I may misunderstand your needs, but isn't it relevance feedback? Please check Grant Ingersoll's presentation at ApacheCon 2005. He put out great demo programs for the relevance feedback using Lucene. Thank you, Koji > -Original Message- > From: Chun Wei Ho [mailto:[EMAIL PROTECTED] > Sen

Re: When do files in 'deleteable' get deleted?

2006-02-13 Thread Chris Hostetter
: That gets things into the 'deleteable' file - but its never actually : deleting all of the files from the deleteable file. I'm almost always : ending up with at least 1 duplicate copy of my index. I think it only tries to delete the files listed in deletable prior to trying to delete any other

Re: Help with mass delete from large index

2006-02-13 Thread Chris Hostetter
: I can create a test case; should I include an index : along with it (it could be rather large)? the ideal test case creates the index in it's constructor or setUp method. since the index is going to be totally artificial, the data doesn't matter, just theterm you want to delete on (and they can

Re: Can the score be changed dynamically?

2006-02-13 Thread Chris Hostetter
: 10 documents ordered by score. But the 2nd document are more frequently : choosen and clicked by users than the 1st one. Of course, I will record the : click number. I want the 2nd document to bubble up and become the first one. : How can I integrate the function to Lucene. : Any suggestion? Ta

Re: Suggesting refine searches with Lucene

2006-02-13 Thread Chris Hostetter
Take a look at the HighFreqTerms sample class in contrib... http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/miscellaneous/src/java/org/apache/lucene/misc/HighFreqTerms.java?rev=376393&view=log ...it doesn't meet your goal, because it returns a list of terms that appear frequently in

Re: Suggesting refine searches with Lucene

2006-02-13 Thread Ben
I may be wrong but isn't this what Carrot2 does? -Ben On 2/13/06, Chun Wei Ho <[EMAIL PROTECTED]> wrote: > > Thanks. But I am actually looking for approaches/libraries which will > help me to come up with the suggested "refine searches". > > For example I might search for "accident" on the headli

Re: Help with mass delete from large index

2006-02-13 Thread Greg Gershman
I can create a test case; should I include an index along with it (it could be rather large)? I'm running the deletion process again with the latest nightly build. So far I haven't seen any of the previous problems, so perhaps there is already a fix in place. Thanks! Greg --- Daniel Naber <[EM

Re: CompoundFileReader question/'leaking' file descriptors ?

2006-02-13 Thread Doug Cutting
Paul Smith wrote: is 1.9 binary backward compatible? (both source code and index format). That is the intent. Try a nightly build: http://cvs.apache.org/dist/lucene/java/nightly/ Doug - To unsubscribe, e-mail: [EMAIL PROTEC

Re: CompoundFileReader question/'leaking' file descriptors ?

2006-02-13 Thread Paul Smith
No, all CSInputStream's share a single FSInputStream, so the FSInputStream shouldn't be closed until all of the CSInputStream's, have been closed. This is done by CompoundFileReader.close(). It sounds like that's what's not getting called. As you update indexes, how do you close stale

Re: CompoundFileReader question/'leaking' file descriptors ?

2006-02-13 Thread Paul Smith
On 14/02/2006, at 7:44 AM, Doug Cutting wrote: Paul Smith wrote: We're using Lucene 1.4.3, and after hunting around in the source code just to see what I might be missing, I came across this, and I'd just like some comments. Please try using a 1.9 build to see if this is something that'

Re: Boosting

2006-02-13 Thread Doug Cutting
Sebastian Menge wrote: Or, to put it more simple, what does a boost of "2" or "10" _mean_ in contrast to a boost of "0.5" or "0.1" !? Boosts are simply multiplied into scores. So they only mean something in the context of the rest of the scoring mechanism. http://lucene.apache.org/java/docs

Re: CompoundFileReader question/'leaking' file descriptors ?

2006-02-13 Thread Doug Cutting
Paul Smith wrote: We're using Lucene 1.4.3, and after hunting around in the source code just to see what I might be missing, I came across this, and I'd just like some comments. Please try using a 1.9 build to see if this is something that's perhaps already been fixed. CompoundFileReader

Re: Help with mass delete from large index

2006-02-13 Thread Daniel Naber
On Montag 13 Februar 2006 19:42, Greg Gershman wrote: > I'm still wondering if anyone has any thoughts on the > NullPointerException and/or the delete/optimize > problems I'm having.  They seem to be very real > issues. I haven't seen this before (and don't remember anyone on the list mentioning

Re: When do files in 'deleteable' get deleted?

2006-02-13 Thread Dan Armbrust
Aigner, Thomas wrote: I believe that the files are actually deleted from lucene when the optimize is run. That gets things into the 'deleteable' file - but its never actually deleting all of the files from the deleteable file. I'm almost always ending up with at least 1 duplicate copy of my

Re: Help with mass delete from large index

2006-02-13 Thread Greg Gershman
Thanks, that is the way things will be done in the future. I'm still wondering if anyone has any thoughts on the NullPointerException and/or the delete/optimize problems I'm having. They seem to be very real issues. Greg --- "Michael D. Curtin" <[EMAIL PROTECTED]> wrote: > Greg Gershman wrote:

Stemmer algorithms

2006-02-13 Thread Yilmazel, Sibel
Hello all, We have done some preliminary research on Porter2 and K-stem algorithms and have some questions. Porter2 was found to be a 'strong' stemming algorithm where it strips off both inflectional suffixes (-s, -es, -ed) and derivational suffixes (-able, -aciousness, -ability). K-Stem seemed t

Re: Help with mass delete from large index

2006-02-13 Thread Michael D. Curtin
Greg Gershman wrote: No problem; this is not meant to be a regular operation, rather it's a (hopefully) one-time thing till the index can be restructured. The data is chronological in nature, deleting everything before a specific point in time. The index is optimized, so is it possible to remo

RE: When do files in 'deleteable' get deleted?

2006-02-13 Thread Aigner, Thomas
I believe that the files are actually deleted from lucene when the optimize is run. -Original Message- From: Dan Armbrust [mailto:[EMAIL PROTECTED] Sent: Monday, February 13, 2006 12:27 PM To: java-user@lucene.apache.org Subject: When do files in 'deleteable' get deleted? If I am using l

When do files in 'deleteable' get deleted?

2006-02-13 Thread Dan Armbrust
If I am using lucene (daily build from ~ a month ago or so) on windows - and when I merge two indexes together, I get a number of .cfs files noted in my 'deleteable' file - but they never seem to actually be deleted by lucene. When does lucene try to delete these files - does it ever work on

Re: Help with mass delete from large index

2006-02-13 Thread Greg Gershman
No problem; this is not meant to be a regular operation, rather it's a (hopefully) one-time thing till the index can be restructured. The data is chronological in nature, deleting everything before a specific point in time. The index is optimized, so is it possible to remove specific files? I'm

Re: Help with mass delete from large index

2006-02-13 Thread Michael D. Curtin
Greg Gershman wrote: I'm trying to delete a large number of documents (~15million) from a a large index (30+ million documents). I've started with an optimized index, and a list of docIds (our own unique identifier for a document, not a Lucene doc number) to pass to the IndexReader.delete(Term

Help with mass delete from large index

2006-02-13 Thread Greg Gershman
I'm trying to delete a large number of documents (~15million) from a a large index (30+ million documents). I've started with an optimized index, and a list of docIds (our own unique identifier for a document, not a Lucene doc number) to pass to the IndexReader.delete(Term t) method. I've had a f

Can the score be changed dynamically?

2006-02-13 Thread Sen Zhou
Hi,all We have a requirement that the clicked-number of the document will be a factor in the score's calculation. For example, a search operation returns 10 documents ordered by score. But the 2nd document are more frequently choosen and clicked by users than the 1st one. Of course, I will record

Re: Suggesting refine searches with Lucene

2006-02-13 Thread Chun Wei Ho
Thanks. But I am actually looking for approaches/libraries which will help me to come up with the suggested "refine searches". For example I might search for "accident" on the headlines at a news site, which would come back with lots of hits. I am looking for something that would analyze the headl

AW: Suggesting refine searches with Lucene

2006-02-13 Thread Klaus
>And next time if it is a refined search I will merge current query with How do you recognize a refined query? And how are you the queries refined? Cheers, klaus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional com

RE: Suggesting refine searches with Lucene

2006-02-13 Thread Ravi
Hi , I have implemented by using query "mergeBooleanQueries" method... in this approach I have created one POJO class of RefineQuery which contains one variable called Query and I will set whenever I get a search.. And next time if it is a refined search I will merge current query with the refin

AW: Suggesting refine searches with Lucene

2006-02-13 Thread Klaus
A simple approach is to count the most common words in the result set and present them in combination with the original query. If you have any meta information you could use them the refine the query. -Ursprüngliche Nachricht- Von: Chun Wei Ho [mailto:[EMAIL PROTECTED] Gesendet: Montag, 1

Suggesting refine searches with Lucene

2006-02-13 Thread Chun Wei Ho
Hi, I am trying to suggest refine searches for my Lucene search. For example, if a search turned out too many searches, it would list a number of document title subsequences that occurred frequently in the results of the previous search, as possible candidates for refining the search. Does anyone

Re: Performance and FS block size

2006-02-13 Thread John Haxby
Andrzej Bialecki wrote: None of you mentioned yet the aspect that 4k is the memory page size on IA32 hardware. This in itself would favor any operations using multiple of this size, and penalize operations using amounts below this size. For normal I/O it will rarely make any difference at al

Re: Performance and FS block size

2006-02-13 Thread Andrzej Bialecki
Hi, None of you mentioned yet the aspect that 4k is the memory page size on IA32 hardware. This in itself would favor any operations using multiple of this size, and penalize operations using amounts below this size. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ ___