date:20090310

Re: How to search both Tokenized and Untokenized fields

2009-03-10 Thread rokham

Thanks a bunch for you very prompt reply. I looked into the PerFieldAnalyzerWrapper class and I understand how you can add a specific analyzer for each field. My question is how does this link to the query that's sent to me. If I'm given a query as follows: (+tokenized:value1 +tokenized:vaue2) (+

Re: A model for predicting indexing memory costs?

2009-03-10 Thread Grant Ingersoll

On Mar 10, 2009, at 7:55 AM, mark harwood wrote: It does not indefinitely hang, I guess I just need to be more patient. Thanks for the GC settings. I don't currently have the luxury of "15 other" processors but this will definitely be of use in other environments. It is also, usually

Re: A model for predicting indexing memory costs?

2009-03-10 Thread Michael McCandless

mark harwood wrote: I think a modelling/sizing spreadsheetwould be a useful addition to our documentation. IW will simply use up to the RAM you told it to, and then flush. Add onto that RAM consumed by merging, which in the presence of deletes is totDocCount * 4.125 bytes, plus numberOfField

RE: A model for predicting indexing memory costs?

2009-03-10 Thread Jon Loken

Hi, I haven't followed the whole thread, so pardon me if I am off topic. In terms of OutOfMemoryExceptions, why not attempt to alleviate this in your code, rather than overly relying on garbage collection. On other words: set big objects to null when you are finished with them, in particular in

Re: A model for predicting indexing memory costs?

2009-03-10 Thread Michael McCandless

mark harwood wrote: Could you get a heap dump (eg with YourKit) of what's using up all the memory when you hit OOM? On this particular machine I have a JRE, no admin rights and therefore limited profiling capability :( That's why I was trying to come up with some formula for estimating

Re: A model for predicting indexing memory costs?

2009-03-10 Thread mark harwood

> I get really belligerent when being told to solve problems while wearing a > ball-and-chain. I seem to have touched quite a nerve there then, Erick ;) I appreciate your sympathy. To be fair I haven't exhausted all possible avenues in changing the environment but I do remain interested in un

Re: A model for predicting indexing memory costs?

2009-03-10 Thread Erick Erickson

You have my sympathy. Let's see, you're being told "we can't give you the tools you need to diagnose/fix the problem, but fix it anyway". Probably with the addendum "And fix it by Friday". You might want to consider staging a mutiny until "the powers that be" can give you a solution. Perhaps worki

RE: index large size file

2009-03-10 Thread Amy Zhou

Thanks Eric for your quick response and useful information. I'll give a try to bump up the MaxFieldLength and check the performance. It seems the quickest way to handle the issue. Amy -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, March 10, 200

Re: index large size file

2009-03-10 Thread Erick Erickson

Sure there are other options. You could decide to index in chunks rather then entire documents. You could decide many things. None of which we can recommend unless we have a clue what you're really trying to accomplish or whether you're encountering a specific problem. I can say that we've indexe

Re: A model for predicting indexing memory costs?

2009-03-10 Thread mark harwood

>>Could you get a heap dump (eg with YourKit) of what's using up all the memory >>when you hit OOM? On this particular machine I have a JRE, no admin rights and therefore limited profiling capability :( That's why I was trying to come up with some formula for estimating memory usage. >>When y

RE: index large size file

2009-03-10 Thread Amy Zhou

My issue here is that large file is truncated with default MaxFieldLength 10,000 during indexing. The file size I index could be 10mb or larger. My questions are: 1) If I chose MaxFieldLength as UNLIMITED instead of 100,000, what the performance could be? 2) Any other options? -Original

Re: index large size file

2009-03-10 Thread Mark Miller

Amy Zhou wrote: Hi, I'm having a couple of questions about indexing large size file. As my understanding, the default MaxFieldLength 100,000. In Lucene 2.4, we can set the MaxFieldLength during constructor. My questions are: The default is 10,000. 1) How's the performance if MaxFieldLengt

index large size file

2009-03-10 Thread Amy Zhou

Hi, I'm having a couple of questions about indexing large size file. As my understanding, the default MaxFieldLength 100,000. In Lucene 2.4, we can set the MaxFieldLength during constructor. My questions are: 1) How's the performance if MaxFieldLength is set to UNLIMITED? 2) Any other options f

Re: A model for predicting indexing memory costs?

2009-03-10 Thread Michael McCandless

Mark, Could you get a heap dump (eg with YourKit) of what's using up all the memory when you hit OOM? Also, can you turn on infoStream and post the output leading up to the OOM? When you say "write session", are you closing & opening a new IndexWriter each time? Or, just calling .comm

Re: A model for predicting indexing memory costs?

2009-03-10 Thread mark harwood

>>OK. What do you think about LUCENE-1541, does the more complicated APIrectify >>the space improvement and reduced term number? I don't see the Trie terms being the main contributor to the term pool. Using the Luke vocabulary-growth plugin I can see the number of unique terms tailing off fai

RE: A model for predicting indexing memory costs?

2009-03-10 Thread Uwe Schindler

> >>It does not indefinitely hang, > > I guess I just need to be more patient. > Thanks for the GC settings. I don't currently have the luxury of "15 > other" processors but this will definitely be of use in other > environments. Even with one processor, a parallel GC is sometimes better. The tra

Re: Questions about analyzer

2009-03-10 Thread Ganesh

Erick, I got your reply, but i asked more more query. Mike in of his replies to the thread "Faceted search using Lucene", gave the following code review comment * You are creating a new Analyzer & QueryParser every time, also creating unnecessary garbage; instead, they should be created once

Re: Questions about analyzer

2009-03-10 Thread Erick Erickson

Yes, I replied 4 days ago, is your SPAM filter interfering? On Tue, Mar 10, 2009 at 8:35 AM, Ganesh wrote: > Any reply on this? > > - Original Message - From: "Ganesh" > To: > Sent: Monday, March 09, 2009 11:28 AM > Subject: Re: Questions about analyzer > > > Mike in of his replies to

Re: Questions about analyzer

2009-03-10 Thread Ganesh

Any reply on this? - Original Message - From: "Ganesh" To: Sent: Monday, March 09, 2009 11:28 AM Subject: Re: Questions about analyzer Mike in of his replies to the thread "Faceted search using Lucene", gave the following code review comment * You are creating a new Analyzer &

Re: A model for predicting indexing memory costs?

2009-03-10 Thread mark harwood

>>It does not indefinitely hang, I guess I just need to be more patient. Thanks for the GC settings. I don't currently have the luxury of "15 other" processors but this will definitely be of use in other environments. >>How works TrieRange for you? I used it back when it was tucked away in Pa

RE: A model for predicting indexing memory costs?

2009-03-10 Thread Uwe Schindler

It does not indefinitely hang, I think the problem is, that the GC takes up all processor resources and nothing else runs any more. You should also enable the parallel GC. We had similar problems on the searching side, when the webserver suddenly stopped for about 20 minutes (!) and doing nothing m

Re: A model for predicting indexing memory costs?

2009-03-10 Thread mark harwood

Thanks, Ian. I forgot to mention I tried that setting and it then seemed to hang indefinitely. I then switched back to a strategy of trying to minimise memory usage or at least gain an understanding of how much memory would be required by my application. Cheers Mark - Original Message

Re: A model for predicting indexing memory costs?

2009-03-10 Thread Ian Lea

That's not the usual OOM message is it? java.lang.OutOfMemoryError: GC overhead limit exceeded. Looks like you might be able to work round it with -XX:-UseGCOverheadLimit http://java-monitor.com/forum/archive/index.php/t-54.html http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#

Re: A model for predicting indexing memory costs?

2009-03-10 Thread mark harwood

>>But... how come setting IW's RAM buffer doesn't prevent the OOMs? I've been setting the IndexWriter RAM buffer to 300 meg and giving the JVM 1gig. Last run I gave the JVM 3 gig, with writer settings of RAM buffer=300 meg, merge factor=20, term interval=8192, usecompound=false. All fields a

Re: How to search both Tokenized and Untokenized fields

Re: A model for predicting indexing memory costs?

Re: A model for predicting indexing memory costs?

RE: A model for predicting indexing memory costs?

Re: A model for predicting indexing memory costs?

Re: A model for predicting indexing memory costs?

Re: A model for predicting indexing memory costs?

RE: index large size file

Re: index large size file

Re: A model for predicting indexing memory costs?

RE: index large size file

Re: index large size file

index large size file

Re: A model for predicting indexing memory costs?

Re: A model for predicting indexing memory costs?

RE: A model for predicting indexing memory costs?

Re: Questions about analyzer

Re: Questions about analyzer

Re: Questions about analyzer

Re: A model for predicting indexing memory costs?

RE: A model for predicting indexing memory costs?

Re: A model for predicting indexing memory costs?

Re: A model for predicting indexing memory costs?

Re: A model for predicting indexing memory costs?

24 matches

Site Navigation

Mail list logo

Footer information