RE: problem updating a document: no segments file?

2006-01-29 Thread John Powers
just for the archives in case anyone else runs into this.. i had my lucene implementations index to a different directory allowing the searcher to work over the previous one while the index built the new one. then at the eend of building the new one, the indexing code would tell the searcher

Re: Term

2006-01-29 Thread Chris Hostetter
: terms (i.e the words) inverted, i mean, for example i need the word : "horse" to be stored as "esroh" because in my application i need to find : all the words in the index that end in an specific suffix. : I thought in inverting the files before indexing it but it would : increase the time co

Re: deleting duplicate documents from my index

2006-01-29 Thread Chris Hostetter
: Hi, im trying to delete duplicate documents from my index, the unique : indentifier is the documents url (aka field "url"). : : my initial thought of how to acomplish this is to open the index via a : reader and sort them by the documents url and then iterate through them : looking for a match w

Possible new use for term dictionary... (maybe)

2006-01-29 Thread Samuel Edge
I would like to use Lucene's term dictionary as a randomly addressable, lexically sorted repository. A mouthful I know I want to be able to access terms as if I had loaded all terms using IndexReader.terms(context); iterating all terms and storing their text sequentially, in sorted order, in a

Re: Searching over more than one Fields

2006-01-29 Thread Chris Brown
I'd suggest creating the index a little differently. How about creating each paragraph as a document. Each document could have three fields: filename, paragraph number and content. With an index like this you'd be able to easily search one field for the content, the hits could report which paragr

Re: Throughput doesn't increase when using more concurrent threads

2006-01-29 Thread Daniel Noll
Yonik Seeley wrote: On 1/29/06, Daniel Noll <[EMAIL PROTECTED]> wrote: Peter Keegan wrote: I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on Intel. If you know of any, please let me know. Linux may be an option, too. Is this true about the 64-bit JVM not

Re: Throughput doesn't increase when using more concurrent threads

2006-01-29 Thread Yonik Seeley
On 1/29/06, Daniel Noll <[EMAIL PROTECTED]> wrote: > Peter Keegan wrote: > > I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on > > Intel. If you know of any, please let me know. Linux may be an option, too. > > > Is this true about the 64-bit JVM not working on Intel? Go ba

Re: Throughput doesn't increase when using more concurrent threads

2006-01-29 Thread Daniel Noll
Peter Keegan wrote: I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now getting 250 queries/sec and excellent cpu utilization (equal concurrency on all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware of it. Wow. That's fast. Out of interest, does in

Re: Throughput doesn't increase when using more concurrent threads

2006-01-29 Thread Daniel Noll
Peter Keegan wrote: I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on Intel. If you know of any, please let me know. Linux may be an option, too. Is this true about the 64-bit JVM not working on Intel? I was under the impression that it supported the AMD64 instruction

Term

2006-01-29 Thread Jairo Sánchez
I don't Know if anyone could help me with this issue: for the requirements of the application i'm doing i need to store the terms (i.e the words) inverted, i mean, for example i need the word "horse" to be stored as "esroh" because in my application i need to find all the words in the inde

Re: Searching over more than one Fields

2006-01-29 Thread Chris Brown
I'd suggest creating the index a little differently. How about creating each paragraph as a document. Each document could have three fields: filename, paragraph number and content. With an index like this you'd be able to easily search one field for the content, the hits could report which paragr

Re: Searching over more than one Fields

2006-01-29 Thread Chris Brown
I'd suggest creating the index a little differently. How about creating each paragraph as a document. Each document could have three fields: filename, paragraph number and content. With an index like this you'd be able to easily search one field for the content, the hits could report which par

Re: deleting duplicate documents from my index

2006-01-29 Thread Jeff Rodenburg
One way to do this (depending on your system and index size) is to remove and add every url you find. This would ensure that every document in the index is unique. No need to worry about sorting and iteration and doc_ids and the like. It rebuilds your entire index, but if you have a duplication

Searching over more than one Fields

2006-01-29 Thread Jairo Sánchez Menéndez
Hi everybody, Well I will explain you my problem: I am indexing ".txt" files and basically I split each file in paragraphs, I mean, i create a Document for each file and within this Document I add one Field named "px" for each paragraph (x) of the file. My question is: after creating the index

Re: grouping results by fields

2006-01-29 Thread Jim Powers
We're doing something very similar. Recently C|Net started using Lucene and there is a blog entry about how they implemented a "category" scheme that basically does what you want. http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-t266441.html#a748420 The

grouping results by fields

2006-01-29 Thread zzzzz shalev
hey, i have a bit of a complex problem, i need to group results recieved in a result set, for example: my result set returns 10,000 results there are about 10 fields in each result document i need to group the most frequent values appearing in each field. if 1 of m

Re: Chinese support

2006-01-29 Thread Ray Tsang
Zsolt, It's in the lucene trunk under the contrib/ directory, you can check it out from the repository, take a look at http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/ ray, On 1/29/06, Zsolt <[EMAIL PROTECTED]> wrote: > And where can I find