Re: Updating Index.

2005-04-07 Thread Morus Walter
pashupathinath writes: >how can i traverse through the values stored in the > index and make sure that the new records are not > duplicated ? once i encounter the duplicate primary > key, i should be able to delete all the various fields > values associated with that primary key. > There's

Re: Updating Index.

2005-04-07 Thread Paul Elschot
On Friday 08 April 2005 07:42, pashupathinath wrote: > hi, > i've created an index for database records. the > problem is whenever i'm trying to update the database, > i mean adding or deleting records from the database i > want the index to be updated too. >right now, i am adding new documen

Re: list all terms in a field

2005-04-07 Thread Chris Hostetter
: Is there a simple way to list all terms in a field? : The only approach that I see is to use the IndexReader.terms() method : and then iterate over all the results and build my list by manually : filtering. This seems inefficient and there must be a better way that : my newbie eyes don't see.

Re: list all terms in a field

2005-04-07 Thread Chris Lamprecht
Mark, Here's a small piece of code that outputs a list of all terms for a given field, in order of decreasing term frequency: --- Requires Java 1.5 for PriorityQueue, or you can use Doug Lea's version --- String field = "myfield".intern(); // intern required for != below In

Updating Index.

2005-04-07 Thread pashupathinath
hi, i've created an index for database records. the problem is whenever i'm trying to update the database, i mean adding or deleting records from the database i want the index to be updated too. right now, i am adding new documents to the existing index whenever i add new records to the databa

list all terms in a field

2005-04-07 Thread Mark Gunnels
Is there a simple way to list all terms in a field? The only approach that I see is to use the IndexReader.terms() method and then iterate over all the results and build my list by manually filtering. This seems inefficient and there must be a better way that my newbie eyes don't see. ---

O'Reilly on Native XML Databases

2005-04-07 Thread aurora
I was reading an interesting article on O'Reilly about Native XML Databases. http://www.xml.com/pub/a/2005/03/30/native.html My initial reaction is someone is trying to take on relational database again and this time it is a resurrection of hierarchical database. But as I read on, I find th

Re: Sorting date stored in milliseconds time

2005-04-07 Thread Chris Hostetter
: 2) I doubt that ordering on 2 fields like "time" up to sec (or even to min) : and "integer" will be quicker when sorting using just one "long" i wouldn't be so sure untill you benchmark it ... The biggest issue is the total number of Terms per field that come into play when you sort ... with m

Re: Search performance under high load

2005-04-07 Thread David Spencer
Yura Smolsky wrote: Hello, mark. mh> 2) My app uses long queries, some of which include mh> very common terms. Using the "MoreLikeThis" query to mh> drop common terms drastically improved performance. If mh> your "killer queries" are long ones you could spot mh> them and service them with a MoreLik

Re[2]: Search performance under high load

2005-04-07 Thread Yura Smolsky
Hello, mark. mh> 2) My app uses long queries, some of which include mh> very common terms. Using the "MoreLikeThis" query to mh> drop common terms drastically improved performance. If mh> your "killer queries" are long ones you could spot mh> them and service them with a MoreLikeThis or simply mh>

RE: HTML pages highlighter

2005-04-07 Thread Yagnesh Shah
Hi! Eric, Yes HighlightIt.java and HighlightTest.java works. I did attached the file, Anyway here is the source: 1 2 3 4 5 6 javax.servlet.* 7 javax.servlet.http.* 8 java.io.StringWriter 9 java.io.StringReader 10 java.i

nested queries

2005-04-07 Thread Romero Mariela
Hi all, We are using Lucene to search business objects with simple queries, but now we need advanced searchs. For example, we have a user object which has as indexed fields the id and its function, and an account object which has as indexed fields its id and the id of the owner user.

Scoring refinement question

2005-04-07 Thread quinton olivier
Hi, I don't know if this question has already been asked as I couldn't find any clue on Mail Archive. I would like to kown if there is a proper way to refine the scoring of a fuzzy query in such a way : taking in account only the best match for a given position, and not to sum scores for all ma

Best practice

2005-04-07 Thread Fred Lamuette
Hello, I'm pretty new to Lucene, and I'd like to have your point of view on the following case. I need to build a webapp that would manage workers' profiles and in each profile, we allow to attach any documents (pdf, doc, video, ...). Pretty basic. We need to index searchable documents (lucene) and

Re: Sorting date stored in milliseconds time

2005-04-07 Thread iouli . golovatyi
well, 1) it would be additional logic overhead to generate the unique id and keep it global for all data providers 2) I doubt that ordering on 2 fields like "time" up to sec (or even to min) and "integer" will be quicker when sorting using just one "long"

Re: Escaping special characters

2005-04-07 Thread Andy Roberts
On Thursday 07 Apr 2005 06:38, Chuck Williams wrote: > Mufaddal Khumri writes (4/6/2005 11:21 PM): > >Hi, > > > >Am new to Lucene. I found the following page: > >http://lucene.apache.org/java/docs/queryparsersyntax.html. At the bottom > >of the page there is a section that in order to escape specia

Re: Search performance under high load

2005-04-07 Thread mark harwood
In addition to the comments already made, I recently recently found these changes to be useful: 1) Swapping out Sun 1.4.2_05 JVM for BEA's JRockit JVM halved my query times. (In both cases did not tweak any default JVM settings other than -Xmx to ensure adequate memory allocation). 2) My app use

Re: Search performance under high load

2005-04-07 Thread Paul Elschot
Daniel, On Thursday 07 April 2005 00:54, Chris Hostetter wrote: > > : Queries: The query strings are of highly differing complexity, from > : simple x:y to long queries involving conjunctions, disjunctions and > : wildecard queries. > : > : 90% of the queries run brilliantly. Problem is that 10%