Atomic updates on Lucene index document?

2005-04-14 Thread Terence Lai
Hi all, As far as I know, I don't find any Lucene API for updating an index document. What I have to do is to delete the existing index document and insert a new one. However, this is going to be 2 separate operations (delete and update). If the first operation suceeds while the second operatio

RE: Update performance/indexwriter.delete()?

2005-04-14 Thread Chris Hostetter
You mentioned before that you can't "batch" your updates ... i can understand not being able to batch updates by number of updates -- but why can't you batch by time? It may sound bad to only process updates once an hour, or once every half hour, or once every 5 minutes, or even once every 30 sec

Re: Strange sort error

2005-04-14 Thread Chris Hostetter
: one is sorting on doesn't even have to exist in all the documents. I : think it would be even more confusing for an invalid query suddenly : becoming a valid query in the future just because someone added a doc Or worse, a query that does work today, stops working tomorow because one doc was r

Zilverline Search Engine version 1.2.0 released

2005-04-14 Thread Zilverline info
All, I've just released Zilverline version 1.2.0. This version is fully webbased, all settings, collections, preferences can be set via the web interface. You don't need to edit any config files anymore. Also I'm adding Powerpoint and Excel Extractors. The source will be made available as well ve

Re: Strange sort error

2005-04-14 Thread Yonik Seeley
Also, it's more flexible. You can easily implement stricter checking on top of a "lax" model (use a term enumerator to see if the field exists before you call search), but not vice versa. -Yonik On 4/14/05, Yonik Seeley <[EMAIL PROTECTED]> wrote: > Hmmm, that's a great lucene architecture questi

Re: Strange sort error

2005-04-14 Thread Yonik Seeley
Hmmm, that's a great lucene architecture question. Should one be allowed to sort on a field that doesn't exist? One *can* query on fields that don't exist (and that's correct in my view). The thing is, lucene field creation is lazy... just because the field doesn't exist now doesn't mean that it w

Re: Strange sort error

2005-04-14 Thread Daniel Naber
On Thursday 14 April 2005 16:28, Yonik Seeley wrote: > I haven't tried it, but I think the fix should be easy... never throw > that exception. As Lucene does not have the concept of a "warning" I think it should throw exceptions when someone tries to do something that doesn't make sense (even i

Re: Boosting not working?

2005-04-14 Thread Erik Hatcher
On Apr 14, 2005, at 4:32 PM, Martin May wrote: I've got the book (which is great, btw). I used Luke to get explanations of the results, but I don't see any boosts in the explanations. The index-time boosts are folded into the field normalization factor, so you won't see boost by itself. That fie

Re: Boosting not working?

2005-04-14 Thread Martin May
I've got the book (which is great, btw). I used Luke to get explanations of the results, but I don't see any boosts in the explanations. Martin On Thu, 2005-04-14 at 13:24 -0700, Otis Gospodnetic wrote: > I'd look a the output of Explain to see how ranking score is calculated > > Look at this:

Re: Boosting not working?

2005-04-14 Thread Otis Gospodnetic
I'd look a the output of Explain to see how ranking score is calculated Look at this: http://lucenebook.com/search?query=explain (hit #1 is from a free chapter) Otis --- Martin May <[EMAIL PROTECTED]> wrote: > > I have a bunch of documents in my index, some of which have values > for a > certa

Boosting not working?

2005-04-14 Thread Martin May
I have a bunch of documents in my index, some of which have values for a certain field while others don't. I'd like the ones that do have a value to always show up before the ones who don't when sorting by relevance. I tried to accomplish this by check whether there are values for the field, and

Re: IOException: No such path or directory

2005-04-14 Thread Daniel Naber
On Thursday 14 April 2005 16:44, Luis Medina wrote: > primarily reporting lock issues (except no lock files > were found in the directory). With "that directory", do you mean the index directory? The lock files are not there, but in /tmp (by default). It's only okay to remove the lock file manu

Re: Update performance/indexwriter.delete()?

2005-04-14 Thread Doug Cutting
Roy Klein wrote: I think this is a better way of asking my original questions: "Why was this designed this way?" In order to optimize updates. "Can it be changed to optimize updates?" Updates are fastest when additions and deletions are separately batched. That is the design. Doug -

RE: Update performance/indexwriter.delete()?

2005-04-14 Thread Roy Klein
Hi, I guess I didn't ask my question very well. I do understand that you can only do a delete via a reader based on the current sources, what I don't understand is why the delete function couldn't be incorporated into a writer, so that updates could be all done within the context of a writer? Fo

Re: Reverting QueryParser ?

2005-04-14 Thread Pierrick Brihaye
Hi, Erik Hatcher a écrit : No, this hasn't been done except for the basic Query.toString() output which for the most part is parsable again. The question is, what do you do about the analysis process? It's a one-way transformation - and parsing again may not yield the same query. We (the SDX de

Re: Update performance/indexwriter.delete()?

2005-04-14 Thread Doug Cutting
Yonik Seeley wrote: There are times, however, when it would be nice for deletes to be able to be concurrent with adds. It would also be nice if good coffee was free. Q: can docids change after an add() (with merging segments going on behind the scenes) or is optimize() the only call that ends up ch

Re: Reverting QueryParser ?

2005-04-14 Thread Doug Cutting
Paul Libbrecht wrote: I am currently evaluating the need for an elaborate query data-structure (to be exchanged over XML-RPC) as opposed to working with plain strings. I'd opt for both. For example: "java based" -coffee site apache.org d

Re: Update performance/indexwriter.delete()?

2005-04-14 Thread Yonik Seeley
> An IndexReader is required to, given a term, find the document number to > mark deleted. Yeah, most the time it makes sense to do deletions off the IndexReader. There are times, however, when it would be nice for deletes to be able to be concurrent with adds. Q: can docids change after an add(

Re: Reverting QueryParser ?

2005-04-14 Thread Erik Hatcher
On Apr 14, 2005, at 11:32 AM, Paul Libbrecht wrote: Hi, I am currently evaluating the need for an elaborate query data-structure (to be exchanged over XML-RPC) as opposed to working with plain strings. One thing that would heavily vote for strings would be to have query objects returne

Re: Update performance/indexwriter.delete()?

2005-04-14 Thread Doug Cutting
Roy Klein wrote: So one thing I've been wondering: Why do you need to do deletes from an indexreader? Is this not in the FAQ? It should be... IndexWriter can only append documents to an index. An IndexReader is required to, given a term, find the document number to mark deleted. Also, in the cu

Re: Hungarian notation analyzer and phrase queries

2005-04-14 Thread Doug Cutting
Paul Smith wrote: So it sounds like there isn't a perfect solution, but I think the best tradeoff for me is to put them all in the same position unless anyone has more input on the subject? If they're all at the same position you can still use slop to match the phrase. So if 'power', 'query'

Re: getting the number of occurrences within a document

2005-04-14 Thread Andy Roberts
On Thursday 14 Apr 2005 15:15, Pablo Gomes Ludermir wrote: > Hello all, > > I would like to get the following information from the index: > > 1. Given a term, how many times the term occurs in each document. > Something like a triple: > < Term, Doc1, Freq> , , , ... > > Is possible to do that? > >

Reverting QueryParser ?

2005-04-14 Thread Paul Libbrecht
Hi, I am currently evaluating the need for an elaborate query data-structure (to be exchanged over XML-RPC) as opposed to working with plain strings. One thing that would heavily vote for strings would be to have query objects returned by Query-parser reconvertible to a string (and bac

Re: getting the number of occurrences within a document

2005-04-14 Thread Paul Libbrecht
Le 14 avr. 05, à 17:15, Pablo Gomes Ludermir a écrit : I would like to get the following information from the index: 1. Given a term, how many times the term occurs in each document. Something like a triple: < Term, Doc1, Freq> , , , ... Is possible to do that? Luke did this to my index with good s

RE: getting the number of occurrences within a document

2005-04-14 Thread Pasha Bizhan
Hi, > From: Pablo Gomes Ludermir [mailto:[EMAIL PROTECTED] > I would like to get the following information from the index: > > 1. Given a term, how many times the term occurs in each document. > Something like a triple: > < Term, Doc1, Freq> , , , ... > > Is possible to do that? See IndexRead

Re: Strange sort error

2005-04-14 Thread Yonik Seeley
> if (termEnum==null || term.field() != field) break; // CHANGE > here Errr, that should be term==null of course. > if (term==null || term.field() != field) break; // CHANGE here And it *may* be slightly speedier to check for null just before the do/while loop instead:

getting the number of occurrences within a document

2005-04-14 Thread Pablo Gomes Ludermir
Hello all, I would like to get the following information from the index: 1. Given a term, how many times the term occurs in each document. Something like a triple: < Term, Doc1, Freq> , , , ... Is possible to do that? Regards, Pablo -- Pablo Gomes Ludermir [EMAIL PROTECTED]

RE: Update performance/indexwriter.delete()?

2005-04-14 Thread Peter Veentjer - Anchor Men
-Oorspronkelijk bericht- Van: Roy Klein [mailto:[EMAIL PROTECTED] Verzonden: donderdag 14 april 2005 15:40 Aan: java-user@lucene.apache.org Onderwerp: Update performance/indexwriter.delete()? >>I've got an application that will be doing >>constant updates to an index. >>I've looked i

IOException: No such path or directory

2005-04-14 Thread Luis Medina
Hi Everyone, The company I work for uses Lucene search 2 of their sites. Each site's configuration is (almost) an mirror image of the other. The only difference here is the content. We use a servlet to start up a Lucene mantainance utility that keeps the indexes up to date. This servlet is set to

Re: Strange sort error

2005-04-14 Thread Yonik Seeley
I haven't tried it, but I think the fix should be easy... never throw that exception. Either check for null before the loop, or in the loop. Original code for native int sorting: TermEnum termEnum = reader.terms (new Term (field, "")); try { if (termEnum.term() == null)

Update performance/indexwriter.delete()?

2005-04-14 Thread Roy Klein
I've got an application that will be doing constant updates to an index. I've looked into batching those updates, however, based on the way the application works, the updates can't be batched. (Well, I figure with a lot of work, I might be able to batch ~10% of the transactions) Another requiremen

Re: Searching an NTFS File Server

2005-04-14 Thread John Haxby
Maher Martin wrote: * The user's access rights would be read from Active Directory (i.e windows group membership, etc) * On the submission of a query to Lucene - the user / group access rights would be appended as required search criteria and Lucene would filter out all results that the user should

Re: Highlighter for CJK ??

2005-04-14 Thread mark harwood
Hi Eric, I haven't tested it personally, but I have had reports that it works OK with CJKAnalyzer. This was reported after I added support for overlapping tokens in tokenstreams last July. Cheers, Mark --- Eric Chow <[EMAIL PROTECTED]> wrote: > Hello, > > Is any any good Highlighter for Asian

Re: Highlighter for CJK ??

2005-04-14 Thread Che Dong
Here is a demo: http://grassland.cnblog.org/ Che Dong Eric Chow åé: Hello, Is any any good Highlighter for Asian languages (Chinese, Japanese, Koreanese) Eric - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-m

Highlighter for CJK ??

2005-04-14 Thread Eric Chow
Hello, Is any any good Highlighter for Asian languages (Chinese, Japanese, Koreanese) Eric - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]