How to create segments files?

2008-03-06 Thread Hasan Diwan
Ladies and Gentlemen: Below is an exception and the source code that generates it: ERROR opening the Index - contact sysadmin! Error message: no segments* file found in org.apache.lucene.store.FSDirectory@/home/hdiwan/public_html/Q4D: files: Stack Trace follows... org.apache.lucene.index.Segme

Re: MultiSearcher to overcome the Integer.MAX_VALUE limit

2008-03-06 Thread Erick Erickson
Well, I really don't have a clue what'll happen with that many documents. It's more a matter of unique terms from what I understand. I'll be *really* curious how it turns out. Erick On Thu, Mar 6, 2008 at 6:03 PM, Ray <[EMAIL PROTECTED]> wrote: > > Thanks for your answer. > > Well I want to sea

Re: MultiSearcher to overcome the Integer.MAX_VALUE limit

2008-03-06 Thread Ray
Thanks for your answer. Well I want to search around 6 billion documents. Most of them very small, but I am confident to be hitting that number in the long run. I am currently running a small random text indexer with 400 docs/second. It will reach 2 billion in around 45 days. I really hope yo

Re: MultiSearcher to overcome the Integer.MAX_VALUE limit

2008-03-06 Thread Erick Erickson
Well, I'm not sure. But any index, even one split amongst many nodes is going to have some interesting performance characteristics if you have over 2 billion documents So I'm not sure it matters ... What problem are you really trying to solve? You'll probably get more meaningful answers if you

Re: Swapping between indexes

2008-03-06 Thread Peter Keegan
Sridhar, We have been using approach 2 in our production system with good results. We have separate processes for indexing and searching. The main issue that came up was in deleting old indexes (see: *http://tinyurl.com/32q8c4*). Most of our production problems occur during indexing, and we are ab

Help with Fuzzy Queries

2008-03-06 Thread Eloi Rocha Neto
Hi, I am new with Lucene. I dont understand how Lucene works in some cases. For example: If I have an index with the following three entries: - ATUAÇÃO FALHA DE DISJUNTOR - RESET DE FALHA DE DISJUNTOR - FALHA DE COMANDO When I try to look for something limilar with "FALHA DE DI

RE: Boolean Query search performance

2008-03-06 Thread Beard, Brian
Thanks for all replies. Today when I printed out the query that's generated it does not have the extra paren's. And query.rewrite(reader).toString() now gives the same result as query.toString(). All I can figure is I must have changed something between starting the email and sending it out. The o

RE: Swapping between indexes

2008-03-06 Thread spring
> > With a commit after every add: 30 min. > > With a commit after 100 add: 23 min. > > Only one commit: 20 min. > > All of these times look pretty slow... perhaps lucene is not the > bottleneck here? Therefore I wrote: "(including time to get the document from the archive)" Not the absolute

Re: Swapping between indexes

2008-03-06 Thread Yonik Seeley
On Thu, Mar 6, 2008 at 12:22 PM, <[EMAIL PROTECTED]> wrote: > > Since Lucene buffers in memory, you will always have the risk of > > losing recently added documents that haven't been flushed yet. > > Committing on every document would be too slow to be practical. > > Well it is not sooo slw

RE: Swapping between indexes

2008-03-06 Thread spring
> Since Lucene buffers in memory, you will always have the risk of > losing recently added documents that haven't been flushed yet. > Committing on every document would be too slow to be practical. Well it is not sooo slw... I have indexed 10.000 docs, resulting in 14 MB index. The index has

Re: combine wildcard and phrase query

2008-03-06 Thread JensBurkhardt
okay thanks. the first thing was what i've expected :-) . well about my second issue, i was totally wrong. Just forget what i've said! I had in mind that if i have several fields with the same name these fields are connected to a big string. Now as i read your message i remember that this behavior

MultiSearcher to overcome the Integer.MAX_VALUE limit

2008-03-06 Thread Ray
Hey Guys, just a quick question to confirm an assumption I have. Is it correct that I can have around 100 Indexes each at its Integer.MAX_VALUE limit of documents, but can happily search them all with a MultiSearcher if all combined returned hits don't add up to the Integer.MAX_VALUE themselves

Re: Swapping between indexes

2008-03-06 Thread Yonik Seeley
On Thu, Mar 6, 2008 at 8:02 AM, Sridhar Raman <[EMAIL PROTECTED]> wrote: > > This way no reader will ever see the changes until you successfully > > close the writer. If the machine crashes the index is still in the > > starting state as of when the writer was first opened. > Ok, I have a sligh

Re: combine wildcard and phrase query

2008-03-06 Thread Erick Erickson
No, as far as I know you can't combine wildcards in phrases. This would get extraordinarily ugly extraordinarily quickly. The way Lucene handles wildcards (conceputally) is to expand all the possible terms into a large OR clause. Say my index contains term1, term2, and term3. The search for term* r

Re: What about the index writing efficiency of large index ?

2008-03-06 Thread Yonik Seeley
On Thu, Mar 6, 2008 at 3:57 AM, Eric Th <[EMAIL PROTECTED]> wrote: > Hi All, > Does anyone do a benchmark to verify the index writing efficiency of lucene? > When the index size is larger than 10G, will it be much slower than smaller > ones ? > > Actually i did some works about this issue, > a

Re: Swapping between indexes

2008-03-06 Thread Michael McCandless
Sridhar Raman wrote: This way no reader will ever see the changes until you successfully close the writer. If the machine crashes the index is still in the starting state as of when the writer was first opened. Ok, I have a slight doubt in this. Say I have gone ahead with Approach 1 If I ha

Re: Swapping between indexes

2008-03-06 Thread Sridhar Raman
> This way no reader will ever see the changes until you successfully > close the writer. If the machine crashes the index is still in the > starting state as of when the writer was first opened. Ok, I have a slight doubt in this. Say I have gone ahead with Approach 1 If I have opened the writer

Re: combine wildcard and phrase query

2008-03-06 Thread JensBurkhardt
okay, another problem occured. I have different fields with the same name. I can't seperate them like naming them field1 field2 etc. cause while indexing i don't know how many fields i will need. Like a book has several signature numbers i want to save them in a field signature and when i search f

Re: Swapping between indexes

2008-03-06 Thread Michael McCandless
A simple variant on Approach 1 would be to open your writer with autoCommit=false. This way no reader will ever see the changes until you successfully close the writer. If the machine crashes the index is still in the starting state as of when the writer was first opened. Also, re-open

combine wildcard and phrase query

2008-03-06 Thread JensBurkhardt
hey everybody, I'm wondering if it's possible to combine wildcards and phrase query. For example "term1 term*" I know that the documentation says "Lucene supports single and multiple character wildcard searches within single terms (not within phrase queries)" but maybe someone has had the same

Swapping between indexes

2008-03-06 Thread Sridhar Raman
This is my situation. I have an index, which has a lot of search requests coming into it. I use just a single instance of IndexSearcher to process these requests. At the same time, this index is also getting updated by an IndexWriter. And I want these new changes to be reflected _only_ at certa

Re: Boolean Query search performance

2008-03-06 Thread Eric Th
2008/3/6, Chris Hostetter <[EMAIL PROTECTED]>: > > > : If I do a query.toString(), both queries give different results, which > > : is probably a clue (additional paren's with the BooleanQuery) > : > : Query.toString the old way using queryParser: > : +(id:1^2.0 id:2 ... ) +type:CORE > : > : Qu

Re: storing position - keyword

2008-03-06 Thread John Byrne
"To confuse matters more, it is not really a matter of synonyms, as the orginal term is discarded from the index and there is only one mapped term" I'm not sure I fully understand this: am I right in thinking that you will be searching using these controlled volcabulary words, and that the sea

What about the index writing efficiency of large index ?

2008-03-06 Thread Eric Th
Hi All, Does anyone do a benchmark to verify the index writing efficiency of lucene? When the index size is larger than 10G, will it be much slower than smaller ones ? Actually i did some works about this issue, and i found that, if build small index firstly then merge them all, the time taken wil