Re: Controlling index file name

2008-04-03 Thread Anshum
Hi, I guess (but 'm not quite sure) you are looking for a way to incrementally index(+update existing index), there would be a lot info available on the same. What I would suggest would be deleting the indexes from the current index using deleteDocuments (http://lucene.apache.org/java/2_3_1/api/o

RE: Lucene 2.3.0 and NFS

2008-04-03 Thread Duan, Nick
Have you looked at Nutch or Hadoop? They are subprojects of Lucene, developed specifically to support large-scale, distributed indexing. Nutch is probably more mature whereas Hadoop supports clustering out of the box... ND -Original Message- From: Rajesh parab [mailto:[EMAIL PROTECTED]

Lucene 2.3.0 and NFS

2008-04-03 Thread Rajesh parab
Hi, We are currently using Lucene 2.0 for full-text searches within our enterprise application, which can be deployed in clustered environment. We generate Lucene index for data stored inside relational database. As Lucene 2.0 did not have solid NFS support and as we wanted Lucene based searches

RE: Adding attribute to index

2008-04-03 Thread Nitasha Walia (niwalia)
Thanks !! -Original Message- From: Donna L Gresh [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 02, 2008 11:52 AM To: java-user@lucene.apache.org Subject: Re: Adding attribute to index This is "fast and loose" code (from my head; check the syntax). I *highly* recommend you get a copy

Re: Error tolerant text search with Lucene?

2008-04-03 Thread Marjan Celikik
Dominique Béjean wrote: http://today.java.net/pub/a/today/2005/08/09/didyoumean.html -Message d'origine- De : Marjan Celikik [mailto:[EMAIL PROTECTED] Envoyé : jeudi 3 avril 2008 15:12 À : java-user@lucene.apache.org Objet : Error tolerant text search with Lucene? Hi everyone, I kn

RE: Error tolerant text search with Lucene?

2008-04-03 Thread Dominique Béjean
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html -Message d'origine- De : Marjan Celikik [mailto:[EMAIL PROTECTED] Envoyé : jeudi 3 avril 2008 15:12 À : java-user@lucene.apache.org Objet : Error tolerant text search with Lucene? Hi everyone, I know that there are packages

Search emails - parsing mailbox (mbox) files

2008-04-03 Thread Subodh Damle
Is there any reliable implementation for parsing email mailbox files (mbox format), especially large (>50MB) archives ? Even after searching lucene mailing list archives, googling around, I couldn't find one. I took a look at Apache James project which seems to offer some support , but couldn't fin

Search emails - parsing mailbox (mbox) files

2008-04-03 Thread Subodh Damle
Is there any reliable implementation for parsing email mailbox files (mbox format), especially large (>50MB) archives ? Even after searching lucene mailing list archives, googling around, I couldn't find one. I took a look at Apache James project which seems to offer some support , but couldn't fin

Re: Lucene Proximity Searches

2008-04-03 Thread Erick Erickson
Could you explain your use case? Because to say that you want to score documents that don't have all the terms with a *phrase query* is contradictory. The point of a phrase query is exactly that all the terms are there and within some some proximity. Best Erick On Thu, Apr 3, 2008 at 12:17 P

Re: payload performance wrt fieldcache

2008-04-03 Thread John Wang
Apparently tp.nextPosition() is needed :( Any ideas? -John On Thu, Apr 3, 2008 at 8:20 AM, John Wang <[EMAIL PROTECTED]> wrote: > I am loading both from disk. > But I found the culprit: > > My code: > > while (tp.next()) > > { > > //assert tp.doc() < maxDoc; > > tp.

Lucene Proximity Searches

2008-04-03 Thread Ana Rábade
Hi! I'm using Lucene Proximity Searches, but I've seen Lucene only scores documents which contain all the terms in the phrase. I also need to score documents although they don't contain all those terms. Is it possible with Lucene PhraseQueries or SpanNearQuery? If not, could you tell me a way to

Re: Lucene Proximity Searches

2008-04-03 Thread Ana Rábade
Hi! I'm using Lucene Proximity Searches, but I've seen Lucene only scores documents which contain all the terms in the phrase. I also need to score documents although they don't contain all those terms. Is it possible with Lucene PhraseQueries or SpanNearQuery? If not, could you tell me a way to

Re: payload performance wrt fieldcache

2008-04-03 Thread John Wang
I am loading both from disk. But I found the culprit: My code: while (tp.next()) { //assert tp.doc() < maxDoc; tp.nextPosition(); <-- this call is the problem tp.getPayload(payloadBuffer, 0); byter.load(_array, tp.doc(), payloadBuffe

Error tolerant text search with Lucene?

2008-04-03 Thread Marjan Celikik
Hi everyone, I know that there are packages that support the "Did you mean ... ?" search features with lucene which tries to find the most suited correct-word query.. however, so far I haven't encountered the opposite search feature: given a correct query, find all documents which contain misspel

Re: payload performance wrt fieldcache

2008-04-03 Thread Chris Lu
If your index size grows larger, payload method would be more slower. It's because Payload are read from hard disk. Fieldcache is in the memory, which is much faster. Unless you are going with Solid State Disk, you'd better go with Fieldcache for faster search. -- Chris Lu --

Re: Implementing CMS search function using Lucene

2008-04-03 Thread Matthew Hall
You could try something like this, which use when I put my own documents together: public Document getDocument(){ Document doc = new Document(); doc.add(new Field("db_key", this.getDb_key(), Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.add(new Field("ac

Re: Implementing CMS search function using Lucene

2008-04-03 Thread Илья Казначеев
В сообщении от Thursday 03 April 2008 16:24:15 Илья Казначеев написал(а): > - Is there a way to set weights for different fields? Let's say, content > have a weight of 1, title have a weight of 5 and picture subscribe have a > weight of 0.5. If no, can I do that by hand? Already found field.setBoo

Re: payload performance wrt fieldcache

2008-04-03 Thread John Wang
Sorry, gmail was screwy and accidentally sent the msg. Anyway, I have a large index, about 30M docs. I have a date field (by days) and there are about 1000 of them, every doc has a date field filled in. So out of curiosity I index the date field two ways: 1) using "date" as a field, and set the d

payload performance wrt fieldcache

2008-04-03 Thread John Wang
Hi:

Error tolerant text search with Lucene?

2008-04-03 Thread Marjan Celikik
Hi everyone, I know that there are packages that support the "Did you mean ... ?" search features with lucene which tries to find the most suited correct-word query.. however, so far I haven't encountered the opposite search feature: given a correct query, find all documents which contain mis

Re: Controlling index file name

2008-04-03 Thread Bhavin Pandya
I also faced same problem in past. But in my case the index size was not the issue so i maintained two folder "newindex" and "oldindex"... and swaping at every update. -Bhavin pandya - Original Message - From: "021336" <[EMAIL PROTECTED]> To: Sent: Tuesday, April 01, 2008 9:44 PM Su

Re: PhraseQuery little bug?

2008-04-03 Thread Darren Govoni
One interpretation of the query with ~5 is that your text has 5 words and ~5 would imply a word in any position can match. Could it be this? - Original Message - From: "Ivan Vasilev" <[EMAIL PROTECTED]> To: "LUCENE MAIL LIST" Sent: Thursday, April 03, 2008 6:03 AM Subject: PhraseQuery

Implementing CMS search function using Lucene

2008-04-03 Thread Илья Казначеев
Hello. We've designing a CMS in Java, and I've trying to implement site search function using lucene. The basic conception is that: - Site features numerous objects that we'd like to throw into index: pages, various text blocks on those pages, descriptions and keyword lists of those pages, sta

Re: PhraseQuery little bug?

2008-04-03 Thread Ivan Vasilev
ds, as well as, in between words – then THE ORDER of the searched words does not matter. Best Regards, Ivan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ NOD32 2

PhraseQuery little bug?

2008-04-03 Thread Ivan Vasilev
Hi Guys, I make the following test – I create 2 files. File1.txt with content: “apple 2 3 4 pear” And File2.txt with content: “pear 2 3 4 apple” I made the following searching tests: 1. Using Luke Search tab. 1.1. When searching for: content:"pear apple"~3 Then the File1.txt is returned. 1.2.