just for the archives in case anyone else runs into this..
i had my lucene implementations index to a different directory allowing the
searcher to work over the previous one while the index built the new one.
then at the eend of building the new one, the indexing code would tell the
searcher
: terms (i.e the words) inverted, i mean, for example i need the word
: "horse" to be stored as "esroh" because in my application i need to find
: all the words in the index that end in an specific suffix.
: I thought in inverting the files before indexing it but it would
: increase the time co
: Hi, im trying to delete duplicate documents from my index, the unique
: indentifier is the documents url (aka field "url").
:
: my initial thought of how to acomplish this is to open the index via a
: reader and sort them by the documents url and then iterate through them
: looking for a match w
I would like to use Lucene's term dictionary as a randomly addressable,
lexically sorted repository. A mouthful I know I want to be able to
access terms as if I had loaded all terms using IndexReader.terms(context);
iterating all terms and storing their text sequentially, in sorted order, in
a
I'd suggest creating the index a little differently. How about creating each
paragraph as a document. Each document could have three fields: filename,
paragraph number and content.
With an index like this you'd be able to easily search one field for the
content, the hits could report which paragr
Yonik Seeley wrote:
On 1/29/06, Daniel Noll <[EMAIL PROTECTED]> wrote:
Peter Keegan wrote:
I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on
Intel. If you know of any, please let me know. Linux may be an option, too.
Is this true about the 64-bit JVM not
On 1/29/06, Daniel Noll <[EMAIL PROTECTED]> wrote:
> Peter Keegan wrote:
> > I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on
> > Intel. If you know of any, please let me know. Linux may be an option, too.
> >
> Is this true about the 64-bit JVM not working on Intel?
Go ba
Peter Keegan wrote:
I tried the AMD64-bit JVM from Sun and with MMapDirectory and I'm now
getting 250 queries/sec and excellent cpu utilization (equal concurrency on
all cpus)!! Yonik, thanks for the pointer to the 64-bit jvm. I wasn't aware
of it.
Wow. That's fast.
Out of interest, does in
Peter Keegan wrote:
I'd love to try this, but I'm not aware of any 64-bit jvms for Windows on
Intel. If you know of any, please let me know. Linux may be an option, too.
Is this true about the 64-bit JVM not working on Intel? I was under the
impression that it supported the AMD64 instruction
I don't Know if anyone could help me with this issue:
for the requirements of the application i'm doing i need to store the terms
(i.e the words) inverted, i mean, for example i need the word "horse" to be
stored as "esroh" because in my application i need to find all the words in the
inde
I'd suggest creating the index a little differently. How about creating each
paragraph as a document. Each document could have three fields: filename,
paragraph number and content.
With an index like this you'd be able to easily search one field for the
content, the hits could report which paragr
I'd suggest creating the index a little differently. How about creating each
paragraph as a document. Each document could have three fields: filename,
paragraph number and content.
With an index like this you'd be able to easily search one field for the
content, the hits could report which par
One way to do this (depending on your system and index size) is to remove
and add every url you find. This would ensure that every document in the
index is unique. No need to worry about sorting and iteration and doc_ids
and the like.
It rebuilds your entire index, but if you have a duplication
Hi everybody,
Well I will explain you my problem:
I am indexing ".txt" files and basically I split each file in
paragraphs, I mean, i create a Document for each file and within this
Document I
add one Field named "px" for each paragraph (x) of the file.
My question is: after creating the index
We're doing something very similar. Recently C|Net started using Lucene and
there is a blog entry about how they implemented a "category" scheme that
basically does what you want.
http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-t266441.html#a748420
The
hey,
i have a bit of a complex problem,
i need to group results recieved in a result set,
for example:
my result set returns 10,000 results
there are about 10 fields in each result document
i need to group the most frequent values appearing in each field.
if 1 of m
Zsolt,
It's in the lucene trunk under the contrib/ directory, you can check
it out from the repository, take a look at
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/
ray,
On 1/29/06, Zsolt <[EMAIL PROTECTED]> wrote:
> And where can I find
17 matches
Mail list logo