Ladies and Gentlemen:
Below is an exception and the source code that generates it:
ERROR opening the Index - contact sysadmin!
Error message: no segments* file found in
org.apache.lucene.store.FSDirectory@/home/hdiwan/public_html/Q4D: files:
Stack Trace follows...
org.apache.lucene.index.Segme
Well, I really don't have a clue what'll happen with that many
documents. It's more a matter of unique terms from what I
understand.
I'll be *really* curious how it turns out.
Erick
On Thu, Mar 6, 2008 at 6:03 PM, Ray <[EMAIL PROTECTED]> wrote:
>
> Thanks for your answer.
>
> Well I want to sea
Thanks for your answer.
Well I want to search around 6 billion documents.
Most of them very small, but I am confident to be hitting
that number in the long run.
I am currently running a small random text indexer with 400 docs/second.
It will reach 2 billion in around 45 days.
I really hope yo
Well, I'm not sure. But any index, even one split amongst many nodes
is going to have some interesting performance characteristics if you
have over 2 billion documents So I'm not sure it matters ...
What problem are you really trying to solve? You'll probably get
more meaningful answers if you
Sridhar,
We have been using approach 2 in our production system with good results. We
have separate processes for indexing and searching. The main issue that came
up was in deleting old indexes (see: *http://tinyurl.com/32q8c4*). Most of
our production problems occur during indexing, and we are ab
Hi,
I am new with Lucene.
I dont understand how Lucene works in some cases. For example:
If I have an index with the following three entries:
- ATUAÇÃO FALHA DE DISJUNTOR
- RESET DE FALHA DE DISJUNTOR
- FALHA DE COMANDO
When I try to look for something limilar with "FALHA DE DI
Thanks for all replies.
Today when I printed out the query that's generated it does not have the
extra paren's. And query.rewrite(reader).toString() now gives the same
result as query.toString(). All I can figure is I must have changed
something between starting the email and sending it out. The o
> > With a commit after every add: 30 min.
> > With a commit after 100 add: 23 min.
> > Only one commit: 20 min.
>
> All of these times look pretty slow... perhaps lucene is not the
> bottleneck here?
Therefore I wrote:
"(including time to get the document from the archive)"
Not the absolute
On Thu, Mar 6, 2008 at 12:22 PM, <[EMAIL PROTECTED]> wrote:
> > Since Lucene buffers in memory, you will always have the risk of
> > losing recently added documents that haven't been flushed yet.
> > Committing on every document would be too slow to be practical.
>
> Well it is not sooo slw
> Since Lucene buffers in memory, you will always have the risk of
> losing recently added documents that haven't been flushed yet.
> Committing on every document would be too slow to be practical.
Well it is not sooo slw...
I have indexed 10.000 docs, resulting in 14 MB index. The index has
okay thanks. the first thing was what i've expected :-) . well about my
second issue,
i was totally wrong. Just forget what i've said! I had in mind that if i
have several fields with the same name
these fields are connected to a big string.
Now as i read your message i remember that this behavior
Hey Guys,
just a quick question to confirm an assumption I have.
Is it correct that I can have around 100 Indexes each at its
Integer.MAX_VALUE limit of documents, but can happily
search them all with a MultiSearcher if all combined returned
hits don't add up to the Integer.MAX_VALUE themselves
On Thu, Mar 6, 2008 at 8:02 AM, Sridhar Raman <[EMAIL PROTECTED]> wrote:
> > This way no reader will ever see the changes until you successfully
> > close the writer. If the machine crashes the index is still in the
> > starting state as of when the writer was first opened.
> Ok, I have a sligh
No, as far as I know you can't combine wildcards in phrases. This would
get extraordinarily ugly extraordinarily quickly. The way Lucene handles
wildcards (conceputally) is to expand all the possible terms into a large OR
clause. Say my index contains term1, term2, and term3. The search for term*
r
On Thu, Mar 6, 2008 at 3:57 AM, Eric Th <[EMAIL PROTECTED]> wrote:
> Hi All,
> Does anyone do a benchmark to verify the index writing efficiency of lucene?
> When the index size is larger than 10G, will it be much slower than smaller
> ones ?
>
> Actually i did some works about this issue,
> a
Sridhar Raman wrote:
This way no reader will ever see the changes until you successfully
close the writer. If the machine crashes the index is still in the
starting state as of when the writer was first opened.
Ok, I have a slight doubt in this. Say I have gone ahead with
Approach 1
If I ha
> This way no reader will ever see the changes until you successfully
> close the writer. If the machine crashes the index is still in the
> starting state as of when the writer was first opened.
Ok, I have a slight doubt in this. Say I have gone ahead with Approach 1
If I have opened the writer
okay, another problem occured. I have different fields with the same name. I
can't seperate them like naming them field1 field2 etc. cause while indexing
i don't know how many fields i will need.
Like a book has several signature numbers i want to save them in a field
signature and when i search f
A simple variant on Approach 1 would be to open your writer with
autoCommit=false.
This way no reader will ever see the changes until you successfully
close the writer. If the machine crashes the index is still in the
starting state as of when the writer was first opened.
Also, re-open
hey everybody,
I'm wondering if it's possible to combine wildcards and phrase query.
For example "term1 term*"
I know that the documentation says "Lucene supports single and multiple
character wildcard searches within single terms (not within phrase queries)"
but maybe someone has had the same
This is my situation. I have an index, which has a lot of search requests
coming into it. I use just a single instance of IndexSearcher to process
these requests. At the same time, this index is also getting updated by an
IndexWriter. And I want these new changes to be reflected _only_ at certa
2008/3/6, Chris Hostetter <[EMAIL PROTECTED]>:
>
>
> : If I do a query.toString(), both queries give different results, which
>
> : is probably a clue (additional paren's with the BooleanQuery)
> :
> : Query.toString the old way using queryParser:
> : +(id:1^2.0 id:2 ... ) +type:CORE
> :
> : Qu
"To confuse matters more, it is not really a matter of synonyms, as the
orginal term is discarded from the index and there is only one mapped term"
I'm not sure I fully understand this: am I right in thinking that you
will be searching using these controlled volcabulary words, and that the
sea
Hi All,
Does anyone do a benchmark to verify the index writing efficiency of lucene?
When the index size is larger than 10G, will it be much slower than smaller
ones ?
Actually i did some works about this issue,
and i found that, if build small index firstly then merge them all, the time
taken wil
24 matches
Mail list logo