Re: Query in Lucene 2.3.0

2008-01-31 Thread Michael McCandless
The write.lock has always been to prevent multiple instances of IndexWriter (or, IndexReader doing deletes) from operating on the same index at a time. Many threads sharing a single instance of these classes has always been fine. Mike ajay_garg wrote: @Mike. Thanks for the reply. B

Re: How to rename fields in an index

2008-01-31 Thread jjlarrea
Did anyone ever post a packaged solution for simple field renaming? Since I didn't see one, I offer (link below) a beanshell script 'fieldrename' which uses the Lucene API to run through the segments, gather fieldnames, pass then through a user-supplied regular-expression transformation, and rewr

Reuse single document and fields

2008-01-31 Thread yu
Hi, I am trying to use the latest 2.3 API on Field to improve the indexing performance by reusing Documents and Fields. After reading lucene-java wiki and the java doc on Field, I have a couple of questions about the comment in Field.setValue(), namely, "Note that you should only use this me

Re: Query in Lucene 2.3.0

2008-01-31 Thread ajay_garg
@Mike. Thanks for the reply. But I had thought that write.lock is there to prevent multiple additions/updates/deleteDocuments. Has there been a change recently in this regard ? Thanks Ajay Garg Michael McCandless-2 wrote: > > > That's right. > > Each thread can enter IndexWriter.add/upda

Re: Distributed Lucene Directory

2008-01-31 Thread Mark Miller
Cedric Ho wrote: But managing such a set of indexes is not trivial. Especially when need to add redundancies for reliability and update frequently. Agreed. Apparently the Solr guys are working on this now. Certainly not trivial to do right. You might want to check out that work. I want to

Re: Distributed Lucene Directory

2008-01-31 Thread Cedric Ho
Yes, I am aware of the RemoteSearchable and ParallelSearcher. And I am doing something similiar now. i.e. split the index on multiple machines. But managing such a set of indexes is not trivial. Especially when need to add redundancies for reliability and update frequently. I bumped into this a w

RE: Having 2 fields, each using different analyzers?

2008-01-31 Thread Steven A Rowe
Hi Itamar, On 01/31/2008 at 6:28 PM, Itamar Syn-Hershko wrote: > Since Analyzer is set per IndexWriter, which is being added a Document, > which has several fields, I was wondering how would I store 2 different > fields in a Document, each being passed through a different Analyzer? > The idea is t

Using a QueryParser with an untokenized field?

2008-01-31 Thread Eleanor Joslin
In my Lucene index there's a field that contains the local names of XML elements, one name per document. Users can enter arbitrary queries for this field, so I'm using a QueryParser. From reading around it looks as if the field needs to be tokenized, but since the field's content is always a

Having 2 fields, each using different analyzers?

2008-01-31 Thread Itamar Syn-Hershko
Hi all, Since Analyzer is set per IndexWriter, which is being added a Document, which has several fields, I was wondering how would I store 2 different fields in a Document, each being passed through a different Analyzer? The idea is to have 2 fields of the same content, one stemmed and one is not

Re: Word / Pharse match shown in a context

2008-01-31 Thread Mark Miller
You don't necessarily need to store the data in Lucene, but yes it does need to be stored somewhere. Otherwise, where would the context come from? If you are not stripping stopwords or stemming or lowercasing or anything, I suppose you could rebuild it from the index... To keep from having to

Re: appending field to an existing index

2008-01-31 Thread Chris Hostetter
: I have to keep one index though. Is there a way to reproduce an index from : an indexReader? asuming you have indexes that work in conjunction with eachther they way you want when using ParallelReader, you should (in theory) be able to use... ParallelReader r = ...; IndexWriter w = new

Re: Word / Pharse match shown in a context

2008-01-31 Thread DURGA DEEP
I have a follow up question. Seems like if I want to use highlighting, we should store the content of the entire document that has to be indexed. d.add( new Field( FIELD_NAME, "some text", Field.Store.YES, Field.Index.TOKENIZED) ); Are there better ways of acheiving this ?. Since we have

Different levels of negative boosting

2008-01-31 Thread prabin meitei
Hi, I want to give different levels of negative boost (reduce the score) to documents for different matching queries. How it can be done?? Googling I found out this link http://wiki.apache.org/jakarta-lucene/CommunityContributions but it just gives the option of giving single level negative boos

Re: Performance guarantees and index format

2008-01-31 Thread Andrzej Bialecki
Mark Miller wrote: https://issues.apache.org/jira/browse/LUCENE-997 What this issue doesn't discuss is what to do with partial results obtained when a timeout occurred. As the original poster points out, document lists are traversed in the order they were added and not the order of their imp

Re: Performance guarantees and index format

2008-01-31 Thread Mark Miller
https://issues.apache.org/jira/browse/LUCENE-997 - Mark Kyle Maxwell wrote: I'd like to be able to guarantee that a search will finish in (approximately?) N seconds. This seems like a generally applicable goal for the project. It would be nice to not have to worry about malicious or naive u

Performance guarantees and index format

2008-01-31 Thread Kyle Maxwell
I'd like to be able to guarantee that a search will finish in (approximately?) N seconds. This seems like a generally applicable goal for the project. It would be nice to not have to worry about malicious or naive users DOSing a search instance. In some cases, precision can be sacrificed, to see

Re: appending field to an existing index

2008-01-31 Thread John Wang
I was actually thinking of creating a separate index with only the extra field and them modify the other index (change some files etc.) Sounds hacky. Dunno if its possible. Thanks -john On Jan 31, 2008 9:36 AM, Erick Erickson <[EMAIL PROTECTED]> wrote: > As always, "it depends". You can try to

Re: document boost

2008-01-31 Thread Mike Grafton
So we upgraded to SOLR 1.2, which uses Lucene 2.1 or so, and the problem went away. Thanks all the help, folks! Mike On 1/30/08, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > Hi Mike, I think this issue probably belongs in the Solr lists since > it looks like you're indexing through it. > I did a

Re: appending field to an existing index

2008-01-31 Thread Erick Erickson
As always, "it depends". You can try to reconstruct the doc from an index, see Luke. But depending upon you you indexed things, it may be more or less lossy. I remember this was discussed recently, you might have some luck if you search the archive. But it may be very, very expensive to reconstruc

Re: document Id question, again

2008-01-31 Thread Michael McCandless
DocIDs change whenever segments that had deletes pending, get merged. So if you have no deletions, docIDs won't ever change. Mike Cam Bazz wrote: Hello; If no document is ever deleted nor updated from an index, will the document id change? under which circumstances will the document ids c

Re: appending field to an existing index

2008-01-31 Thread John Wang
I have to keep one index though. Is there a way to reproduce an index from an indexReader? -John On Jan 31, 2008 1:30 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Just beware, though, that with ParallelReader you must ensure that > the internal docIDs of both indices remain "aligned" ov

RE: Spell checking street names

2008-01-31 Thread Max Metral
Thanks all. I've got it working now using a KeywordAnalyzer. The edit distance metric I'm using is purely "edit" based, i.e. when I input "Bennett", I get "Jennett", "Gannett", "Kenneth" and THEN "Bennet". While I see the logic, it's obviously not the best metric. Is there an appropriate edit di

document Id question, again

2008-01-31 Thread Cam Bazz
Hello; If no document is ever deleted nor updated from an index, will the document id change? under which circumstances will the document ids change, apart from delete? Best Regards, -C.B.

Re: contrib/benchmark Quality

2008-01-31 Thread Grant Ingersoll
Yes, I was thinking of smaller collections/query logs. The Million Queries Track is certainly interesting. It would also be good to be able to spit out reports as files. Sigh. Need about 5 more hours in the day and the energy to access them. -Grant On Jan 31, 2008, at 1:02 AM, Doron Coh

Re: Luke for Lucene 2.3?

2008-01-31 Thread Paleo Tek
vivek sar wrote: Hi, ... I also read something about web-based Luke, but can't find it in the contrib in 2.3, is it part of Lucene 2.3? How do I use it? Thanks, -vivek I'm using Julien Nioche's tool, LIMO, which is probably what you mean by a web-based Luke. To

Re: Spell checking street names

2008-01-31 Thread Karl Wettin
30 jan 2008 kl. 17.34 skrev Max Metral: Part of the reason is if we look at some common mistakes: For Commonwealth: Communwealth Comonwealth Common wealth If they are common misstakes you can pick them up using reinforcement learning.

Re: Distributed Lucene Directory

2008-01-31 Thread Karl Wettin
31 jan 2008 kl. 09.42 skrev Cedric Ho: I am wondering if there exist any implemenation of org.apache.lucene.store.Directory which can be distributed across multiple machines with comparable performance to a local FSDirectory index, or is such an idea feasible in the first place. By comparable p

Re: Retain the index

2008-01-31 Thread Michael McCandless
Physically delete the file, or, use IndexReader.unlock static method. Mike On Jan 31, 2008, at 4:26 AM, anjana m wrote: How do i remove the locks..? On Jan 31, 2008 2:49 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: It looks like you are passing "true" to FSDirectory.getDirectory

Re: appending field to an existing index

2008-01-31 Thread Michael McCandless
Just beware, though, that with ParallelReader you must ensure that the internal docIDs of both indices remain "aligned" over time. If you never do deletions, then that happens for free. If you do deletions, then, you must change IndexWriter to buffer by doc count (same doc count for all wr

Re: Retain the index

2008-01-31 Thread anjana m
How do i remove the locks..? On Jan 31, 2008 2:49 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > It looks like you are passing "true" to FSDirectory.getDirectory, > which you shouldn't do. Always pass "false" to that. (Newer > versions of Lucene have deprecated the create flag to FSDir

Re: Retain the index

2008-01-31 Thread anjana m
On false.. i get this error... wht do you suggest me to do further i am using lucene 1.4 final.. plsease help Exception in thread "main" java.io.IOException: Lock obtain timed out: [EMAIL PROTECTED]:\DOCUME~1\ANJANA\LOCALS~1\Temp\lucene- 26fee40bb91b3504c3589207f2d7efa3-write.lock at o

Re: Query in Lucene 2.3.0

2008-01-31 Thread Michael McCandless
That's right. Each thread can enter IndexWriter.add/update/deleteDocument(s) in parallel. There are some parts inside IndexWriter that are synchronized but they are kept to a minimum to keep good thread concurrency. As you add threads it's best to increase the RAM buffer at the same ti

Re: appending field to an existing index

2008-01-31 Thread Doron Cohen
This may help: http://www.nabble.com/Updating-Lucene-Index-with-Unstored-fields-tt15188818.html#a15188818 Doron On Thu, Jan 31, 2008 at 2:42 AM, John Wang <[EMAIL PROTECTED]> wrote: > Hi all: > >We have a large index and it is difficult to reindex. > >We want to add another field to the

Re: Retain the index

2008-01-31 Thread Michael McCandless
It looks like you are passing "true" to FSDirectory.getDirectory, which you shouldn't do. Always pass "false" to that. (Newer versions of Lucene have deprecated the create flag to FSDirectory, leaving it entirely to IndexWriter). On the lock obtain timed out, probably that's a left over

Re: Spell checking street names

2008-01-31 Thread eks dev
Otis, I think it was proposed to have spell checker that works on multiple tokens / Document: where field to be searched with SpellChecker" looks like "lucene search library" does not get tokenized and then fed to the SpellChecker, rather having this as a "single token" that gets chopped int

Distributed Lucene Directory

2008-01-31 Thread Cedric Ho
Hi all, I am wondering if there exist any implemenation of org.apache.lucene.store.Directory which can be distributed across multiple machines with comparable performance to a local FSDirectory index, or is such an idea feasible in the first place. By comparable performance I mean a 100G index di