Hi Stefan,
you might want to consider org.apache.lucene.store.FileSwitchDirectory
before going for the symlinks approach.
Sorry I don't know the effect nor recommended file types, I would
naively start setting the smallest on SSD, then perform tests, but
that's possibly not the best scenario:
under
" It is strange that it should take 20 second to gather fields,"
20s including search and gather fields, it's the total time
2010/8/27 Karl Wettin :
> My mail client died while sending this mail.. Sorry for any duplicate.
>
> It is strange that it should take 20 second to gather fields, this is th
Hi,
I have a lucene index of 100 million documents. But the document size is very
small - 5 fields with 1 or 2 terms each. Only 1 field is analyzed and others
are just simply indexed. The index is optimized to 2 segments and the total
index size is 7GB.
I open a searcher with a termsInfoDiviso
if I index only 7k documents, the time comparison:
time1: 7602331019 time2: 4246878035 total1: 10736 total2: 7393
it seems II is faster than RAMDirectory.
My indexed texts are all hotel names (chinese and english, litter french).
it has about 100k terms. terms such as hotel is very frequent and
ho
Thanks, Lance. After exploring for a while, I used lucene's ShingleFilter
followed by the SynonymFilter in Lucene in Action book. Then using the type
attribute, I removed all the shingles which did not belong to any category.
On Wed, Aug 18, 2010 at 10:28 PM, Lance Norskog wrote:
> Yes, you need
Hi everyone,
I'm trying to figure out the effects on search performance of using the
non-CFS format and spreading the various underlying files to different
disks/media types. For example, I'm considering moving a segment's various
.t* term-related files onto a solid-state drive, the .fdx/.fdt
stor
Hi Erick
Thanks for your response. I used the Lucene in Action 1st edition as a
reference for batch indexing. I've just got my copy of the 2nd edition which
mentions that there is no point in using RAM directory. Not saying I don't
trust you :).
I'll update my code to use the normal fs direc
I'm going to sidestep your question and ask why you're using
a RAMDirectory in the first place. People often think it'll
speed up their indexing because it's in RAM, but the
normal FS-based indexing caches in RAM too, and you
can use various settings governing segments, ramusage
etc. to control how
Hi
I have a list of batch tasks that need to be executed. Each batch contains
1000 documents and basically I use a RAMDirectory based index writer, and at
the end of adding 1000 documents to the memory i perform the following:
ramWriter.commit();
indexWriter.addIndexesNoOptimize(ramW
My mail client died while sending this mail.. Sorry for any duplicate.
It is strange that it should take 20 second to gather fields, this is
the only thing that really suprises me. I'd expect it to be instant
compared to RAMDirectory. It is hard to say from the information you
provided. Did
Hello, I am familiar with the SpanQuery construct and set an upper Slop limit.
1. But when I get the hit results, is there any way I can access the
actual slop and the span text itself in that particular hit.
2. Also it is possible to have multiple matches within the same
document. So how do I acc
Why do you care? By that I mean that nothing you've written gives
us any clue whether you need to do anything about making things
faster. "Making things faster" is a laudable goal, but not worth worrying
about until you can confidently state you have performance issues.
And you've provided no deta
I'm curious about what the largest Lucene installations are, in terms of:
- Greatest number of documents (i.e. X billion docs)
- Largest data size (i.e. Y terabytes of indexes)
- Most machines (i.e. Z shards or severs)
Apart from general curiosity, the obvious follow-up question would be what
app
ok, thank you Ivan!!
On Tue, Aug 24, 2010 at 5:13 PM, Ivan Provalov wrote:
> Aida,
>
> Right now it will do two term collocation only.
>
> Ivan
>
>
> --- On Mon, 8/23/10, Aida Hota wrote:
>
> > From: Aida Hota
> > Subject: Re: Calculate Term Co-occurrence Matrix
> > To: java-user@lucene.apache
hi ,
1. whether any search query, will scan for all documents in the
lucene indexes
2.
I want to search query faster.So I thought of if I
could reduce the number of docs , lucene needs to search for , when given
some search parameters. It would act lil faster.
Can we make subset
(subindexe
I have about 70k document, the total indexed size is about 15MB(the
orginal text files' size).
dir=new RAMDirectory();
IndexWriter write=new IndexWriter(dir,...;
for(loop){
writer.addDocument(doc);
}
writer
16 matches
Mail list logo