Re: Re: Preserving old releases

2008-12-19 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Preserving old releases

2008-12-19 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Preserving old releases

2008-12-19 Thread Chris Hostetter
a couple of refrences to "Lucene 1.2" in the last few months got me thinking and made me realize that 1.4.3 is the oldest release available in the lucene dist archive, they might still be in the jakarta dist archive. sure enough... http://archive.apache.org/dist/jakarta/lucene/source/ http:/

Lucene and JSON

2008-12-19 Thread Thomas J. Buhr
Lucene, Is there JSON support in Lucene? JSON is more fat-free compared to XML and would be preferred. Digester works well for indexing XML but something along the same lines for JSON would be even sweeter. Best, Thom - To

Re: Stemming behavior

2008-12-19 Thread Grant Ingersoll
This is likely one of the many subtleties of the Porter stemmer. Dr. Porter has chosen a particular way of doing things, but it isn't necessarily right for everyone. You really have to measure the net benefit across all your searches, not specifically just one. If you can't live with thi

Re: lucene suiteable ? 6 mio recods / day 1k

2008-12-19 Thread Aaron Schon
Christian, I do not have an answer for you (hope some of the gurus on this board can provide you an appropriate answer. However, I would request you share your finding and experience on this list. We are facing a similar situation and would appreciate if you shared your learning. Regards AS

Url Analyzer

2008-12-19 Thread Mark Ferguson
Hello, I was wondering if there had been any work done out there on an analyzer for URL strings. I'm looking for something which will match on any of the words in the domain or path of the URL. I am considering using a PatternAnalyzer but I wanted to ask this group to see if this was something whi

Showcase - What I made with Lucene

2008-12-19 Thread Ian Vink
A Freeware, OpenSource Windows PC and Web based application: http://BahaiResearch.com It allows people from 14 languages to investigate the religious texts of other religions. The goal is to foster better understanding between peoples of many religions and many languages. A many-to-many relations

Re: Approximate release date for Lucene 2.9

2008-12-19 Thread Kay Kay
Thanks Mike for the links. That certainly helps us better to plan the dependencies. Michael McCandless wrote: Well... there are a couple threads on java-dev discussing this "now": http://www.nabble.com/2.9-3.0-plan---Java-1.5-td20972994.html http://www.nabble.com/2.9,-3.0-and-deprecation-

Re: Approximate release date for Lucene 2.9

2008-12-19 Thread Michael McCandless
You're right, there's not much benefit now... there will be more benefit when flexible indexing is available. Though, you could set up an analysis chain where the producer puts something "new" onto each token, and somewhere downstream you pick that up and do something interesting with it

Re: Approximate release date for Lucene 2.9

2008-12-19 Thread Mark Miller
Right, I was debating throwing that in myself - its great stuff, but I wasn't sure how much of a feature benefit it brought now. My understanding is that its main benefit is along the flexible indexing path and using multiple consumers eg its more setup for the goodness yet to come. My understa

Re: Re: Re: lucene suiteable ? 6 mio recods / day 1k

2008-12-19 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Re: lucene suiteable ? 6 mio recods / day 1k

2008-12-19 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: lucene suiteable ? 6 mio recods / day 1k

2008-12-19 Thread Erick Erickson
Well, I'm reasonably sure you could make this work, although it'll take some effort. The 3,000,000 records/day should be pretty easy. Parsing the URLs, if none of the supplied tokenizers do exactly what you want, you can always make your own. Or you can pre-process the input if that's easier. e.g

Re: Approximate release date for Lucene 2.9

2008-12-19 Thread Michael McCandless
The new extensible TokenStream API (based on AttributeSource) is also in 2.9. Mike Mark Miller wrote: Well look at the issues and see for yourself :) Its a subjective call I think. Heres my take: There are not going to be too many sweeping changes in the next release. There are tons of

lucene suiteable ? 6 mio recods / day 1k

2008-12-19 Thread Christian Brennsteiner
hi *, i am searching for a fulltext index capeable of the following requirements: index everyday 3 000 000 new records with a validity of N days (e.g. 90 days expiration) == 34,7 / s one record is e.g. an url and can be up to 2 k big http://example.com/somedir/some.html lucene should use "/" as

Re: optimize: went from 14488449 to 38449

2008-12-19 Thread Michael McCandless
How did you delete the documents? EG, by docID using IndexReader, by Term or Query using IndexWriter? And when you said your previous index had 14488449 docs, was numDocs() or maxDoc()? Mike 1world1love wrote: Ganesh - yahoo wrote: Optimize will remove the deletes and rearrange t

Stemming behavior

2008-12-19 Thread Jay Malaluan
Hi, I'm using the SnowballAnalyzer for my stemming processing. search words: love, loved, loveliness, loveless, lovely, and loving On my index I have the word love. The behavior during searching is that it can't correctly stem the two words loveliness, loveless to love. And the odd thing is love

Re: Unique results in BooleanQuery

2008-12-19 Thread Jay Joel Malaluan
Hi Chris, I was just thinking that when the 1st query of q2 is run it will have its result. Then the 2nd query of q2 will run and have its own result BUT it is now filtered that no same data from the 1st query is returned. Results of the 1st and 2nd query have been appended. Does the 2nd quer

Re: addIndexesNoOptimize question

2008-12-19 Thread Antony Bowesman
Thanks Mike, I'm still on 2.3.1, so will upgrade soon. Antony Michael McCandless wrote: This was an attempt on addIndexesNoOptimize's part to "respect" the maxMergeDocs (which prevents large segments from being merged) you had set on IndexWriter. However, the check was too pedantic, and was