calculating score - implementing your own 'Scorer' - how to?..

2008-11-25 Thread Vlad Olenin
Hi, I'm new to Lucene, so looking for some guidance as to the most efficient / appropriate way to implement the following functionality. * Each Document consists of a number of fields * Each Field value, when indexed, can have different 'score' value associated with it ** for simplicity, the sco

Re: Which is faster/better

2008-11-25 Thread Antony Bowesman
Michael McCandless wrote: If you have nothing open already, and all you want to do is delete certain documents and make a commit point, then using IndexReader vs IndexWriter should show very little difference in speed. Thanks. This use case can assume there may be nothing open. I prefer Ind

Re: Lucene implementation/performance question

2008-11-25 Thread Greg Shackles
Just wanted to post a little follow-up here now that I've gotten through implementing the system using payloads. Execution times are phenomenal! Things that took over a minute to run in my old system take fractions of a second to run now. I would also like to thank Mark for being very responsive

Re: Which is faster/better

2008-11-25 Thread Grant Ingersoll
On Nov 25, 2008, at 12:59 PM, Khawaja Shams wrote: On Tue, Nov 25, 2008 at 8:42 AM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: On Nov 25, 2008, at 10:46 AM, Michael McCandless wrote: If you already have the docId, would you need to/want to do delete-by-Query or even delete-by-Term? Isn't

Re: Which is faster/better

2008-11-25 Thread Khawaja Shams
On Tue, Nov 25, 2008 at 8:42 AM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > > On Nov 25, 2008, at 10:46 AM, Michael McCandless wrote: > > If you already have the docId, would you need to/want to do >>> delete-by-Query or even delete-by-Term? Isn't delete-by-id a lot lighter >>> weight since it

Re: Which is faster/better

2008-11-25 Thread Michael McCandless
Grant Ingersoll wrote: On Nov 25, 2008, at 10:46 AM, Michael McCandless wrote: If you already have the docId, would you need to/want to do delete- by-Query or even delete-by-Term? Isn't delete-by-id a lot lighter weight since it only marks the the doc as deleted, where as d-b-Q can poten

Re: Which is faster/better

2008-11-25 Thread Grant Ingersoll
On Nov 25, 2008, at 10:46 AM, Michael McCandless wrote: If you already have the docId, would you need to/want to do delete- by-Query or even delete-by-Term? Isn't delete-by-id a lot lighter weight since it only marks the the doc as deleted, where as d-b-Q can potentially force a flush, etc

Re: Analyzer

2008-11-25 Thread Erick Erickson
H, how would you do this without open/closing your IndexWriter around different types of documents? And as far as querying is concerned, I doubt the input would be a file, so one of the canned analyzers should do. Although "care should be taken " Best Erick On Tue, Nov 25, 2008 at 10:57 A

Re: Analyzer

2008-11-25 Thread Erick Erickson
I'm assuming that you want a different analyzer to handle extracting the relevant information to put into a "text" field of the Lucene document. I know of no way you can attach different analyzers to a single field. You can certainly attach different analyzers to *different* fields... The first th

Re: Analyzer

2008-11-25 Thread Ian Lea
Yes, you can. But it is generally best to use the same analyzer for indexing and for searching so, assuming that you want searches to find matches in files of whatever type, I'd recommend pre-processing the files to a common text format before indexing and then using the same analyzer for all of t

Re: Which is faster/better

2008-11-25 Thread Michael McCandless
Grant Ingersoll wrote: On Nov 25, 2008, at 7:53 AM, Michael McCandless wrote: As of 2.4, IndexWriter now provides delete-by-Query, which I think ought to meet nearly all of the cases where someone wants to delete-by-docID using IndexReader. Or are there situations out there where delete-by

Analyzer

2008-11-25 Thread Allahbaksh Mohammedali Asadullah
HI All, I am indexing a set file type (html, js,jsp,xml etc). All the file type have a common field called as text. This field contains all the file data. Can I have different analyzer for depending upon file type. Note: I am indexing all file type with same indexer. Regards, Allahbaksh Allahb

Re: Which is faster/better

2008-11-25 Thread Grant Ingersoll
On Nov 25, 2008, at 7:53 AM, Michael McCandless wrote: As of 2.4, IndexWriter now provides delete-by-Query, which I think ought to meet nearly all of the cases where someone wants to delete-by-docID using IndexReader. Or are there situations out there where delete-by-docID is still compelling

RE: Indexing accented characters, then searching by any form

2008-11-25 Thread Diego Cassinera
Are you sure you are creating the fields with Field.Index.ANALYZED ? -Mensaje original- De: Dora [mailto:[EMAIL PROTECTED] Enviado el: martes, 25 de noviembre de 2008 12:22 p.m. Para: java-user@lucene.apache.org Asunto: Re: Indexing accented characters, then searching by any form Karl

Re: Indexing accented characters, then searching by any form

2008-11-25 Thread Dora
Karl Wettin wrote: > > Try this (dry coded) snippet instead: > > StandardAnalyzer objAnalyzer = new StandardAnalyzer() { >public TokenStream tokenStream(String fieldName, Reader reader) { > return new ISOLatin1AccentFilter(super.tokenStream(fieldName, > reader)); >} > } > I tr

Re: Scoped Search and Facets generation using Lucene

2008-11-25 Thread Derk Crezee
I remember seeing a paper about indexing xml using Lucene here: https://ssl.bnt.com/idealliance/papers/xmle02/dx_xmle02/html/abstract/03-02-08.html I think it will be applicable to your problem. - Derk On Mon, Nov 17, 2008 at 4:06 PM, Aleksander M. Stensby < [EMAIL PROTECTED]> wrote: > Yes, you

Re: Which is faster/better

2008-11-25 Thread Michael McCandless
If you have nothing open already, and all you want to do is delete certain documents and make a commit point, then using IndexReader vs IndexWriter should show very little difference in speed. But if you have mixed adds/deletes, especially a batch of them where you don't need any commit points u

Re: FIltering with booleanFilter

2008-11-25 Thread Ian Lea
Hi Do you maybe need MUST rather than SHOULD? -- Ian. On Tue, Nov 25, 2008 at 11:41 AM, Albert Juhe <[EMAIL PROTECTED]> wrote: > > Hi, > > I'm trying to use the boolean filter, because after a search I want to show > documents with a determinate code. > > codisFiltre="XX07_04141_00853#XX06_03

FIltering with booleanFilter

2008-11-25 Thread Albert Juhe
Hi, I'm trying to use the boolean filter, because after a search I want to show documents with a determinate code. codisFiltre="XX07_04141_00853#XX06_03002_00852#UX06_07019_02994" String[] codi =codisFiltre.split('#'); booleanFilter = new BooleanFilter(); for (int i = 0; i < codi.length

Re: Marked for deletion

2008-11-25 Thread Erik Hatcher
On Nov 25, 2008, at 5:00 AM, Ganesh wrote: My index application is a separate process and my search application is part of web ui. When User performs delete, i want to do mark for deletion. I think i have no other option other than to update the document, but index app is a separate proce

Marked for deletion

2008-11-25 Thread Ganesh
Hello all, My index application is a separate process and my search application is part of web ui. When User performs delete, i want to do mark for deletion. I think i have no other option other than to update the document, but index app is a separate process and it uses index writer. In orde