Once you’ve profiled your app, you should also play around with different 
garbage collectors.  Considering you’re reaching max heap, I assume your tuples 
are probably pretty large.  If that’s the case and you’re using the CMS garbage 
collector, you’re going to blow out your heap regularly.  I found with large 
tuples and/or memory intensive computations that the old parallel GC works the 
best because it compresses old gen every time it collects… CMS doesn’t and each 
sweep it  tries to jam more into the heap until it can’t any longer and then 
blows up.  There is also a great article by Michael Knoll about storm’s message 
buffers and how to tweak them depending on your needs.  
http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/



From: Sa Li [mailto:[email protected]]
Sent: Monday, March 09, 2015 10:15 PM
To: [email protected]
Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded


I have not done that yet, not quite familiar with this, but I will try to do 
that tomorrow, thanks.
On Mar 9, 2015 7:10 PM, "Nathan Leung" 
<[email protected]<mailto:[email protected]>> wrote:
Have you profiled you spout / bolt logic as recommended earlier in this thread?

On Mon, Mar 9, 2015 at 9:49 PM, Sa Li 
<[email protected]<mailto:[email protected]>> wrote:

You are right , I have already increased the heap in yaml to 2 G for each 
worker, but still have the issue, so I doubt I may running into some other 
causes,  receive,send buffer size? And in general, before I see the GC overhead 
in storm ui,  I came cross other errors in worker log as well, like Netty 
connection, null pointer,etc, as I show in another post.

Thanks
On Mar 9, 2015 5:36 PM, "Nathan Leung" 
<[email protected]<mailto:[email protected]>> wrote:
I still think you should try running with a larger heap.  :)  Max spout pending 
determines how many tuples can be pending (tuple tree is not fully acked) per 
spout task.  If you have many spout tasks per worker this can be a large amount 
of memory.  It also depends on how big your tuples are.

On Mon, Mar 9, 2015 at 6:14 PM, Sa Li 
<[email protected]<mailto:[email protected]>> wrote:
Hi, Nathan

We have played around max spout pending in dev, if we set it as 10, it is OK, 
but if we set it more than 50, GC overhead starts to come out. We are finally 
writing tuples into postgresqlDB, the highest speed for writing into DB is 
around 40Krecords/minute, which is supposed to be very slow, maybe that is why 
tuples getting accumulated in memory before dumped into DB. But I think 10 is 
too small, does that mean, only 10 tuples are allowed in the flight?

thanks

AL

On Fri, Mar 6, 2015 at 7:39 PM, Nathan Leung 
<[email protected]<mailto:[email protected]>> wrote:
I've not modified netty so I can't comment on that.  I would set max spout 
pending; try 1000 at first.  This will limit the number of tuples that you can 
have in flight simultaneously and therefore limit the amount of memory used by 
these tuples and their processing.

On Fri, Mar 6, 2015 at 7:03 PM, Sa Li 
<[email protected]<mailto:[email protected]>> wrote:
Hi, Nathan

THe log size of that kafka topic is 23515541, each record is about 3K,  I check 
the yaml file, I don't have max spout pending set, so I assume it is should be 
default: topology.max.spout.pending: null

Should I set it to a certain value? Also I sometimes seeing the 
java.nio.channels.ClosedChannelException: null, or  b.s.d.worker [ERROR] Error 
on initialization of server mk-worker
does this mean I should add
storm.messaging.netty.server_worker_threads: 1




storm.messaging.netty.client_worker_threads: 1



storm.messaging.netty.buffer_size: 5242880 #5MB buffer



storm.messaging.netty.max_retries: 30



storm.messaging.netty.max_wait_ms: 1000



storm.messaging.netty.min_wait_ms: 100

into yaml, and modfiy the values?



thanks



On Fri, Mar 6, 2015 at 2:22 PM, Nathan Leung 
<[email protected]<mailto:[email protected]>> wrote:

How much data do you have in Kafka? How is your max spout pending set? If you 
have a high max spout pending (or if you emit unanchored tuples) you could be 
using up a lot of memory.
On Mar 6, 2015 5:14 PM, "Sa Li" 
<[email protected]<mailto:[email protected]>> wrote:
Hi, Nathan

I have met a strange issue, when I set spoutConf.forceFromStart=true, it will 
quickly run into GC overhead limit, even I already increase the heap size, but 
I if I remove this setting
it will work fine, I was thinking maybe the kafkaSpout consuming data much 
faster than the data being written into postgresDB, and data will quick take 
the memory and causing heap overflow. But I did the same test on my DEV 
cluster, it will working fine, even I set spoutConf.forceFromStart=true. I 
check the storm config for DEV and production, they are all same.

Any hints?

thanks

AL


On Thu, Mar 5, 2015 at 3:26 PM, Nathan Leung 
<[email protected]<mailto:[email protected]>> wrote:
I don't see anything glaring.  I would try increasing heap size.  It could be 
that you're right on the threshold of what you've allocated and you just need 
more memory.

On Thu, Mar 5, 2015 at 5:41 PM, Sa Li 
<[email protected]<mailto:[email protected]>> wrote:
Hi, All,
,
I kind locate where the problem come from, in my running command, I will 
specify the clientid of TridentKafkaConfig, if I keep the clientid as the one I 
used before, it will cause GC error, otherwise I am completely OK. Here is the 
code:


if (parameters.containsKey("clientid")) {
    logger.info("topic=>" + parameters.get("clientid") + "/" + 
parameters.get("topic"));
    spoutConf = new TridentKafkaConfig(zk, parameters.get("topic"), 
parameters.get("clientid"));

Any idea about this error?



Thanks



AL

On Thu, Mar 5, 2015 at 12:02 PM, Sa Li 
<[email protected]<mailto:[email protected]>> wrote:
Sorry, continue last thread:

2015-03-05T11:48:08.418-0800 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: Remote address is not 
reachable. We will close this client 
Netty-Client-complicated-laugh/10.100.98.103:6703<http://10.100.98.103:6703>
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
 ~[storm-core-0.9.3.jar:0.9.3]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
 ~[storm-core-0.9.3.jar:0.9.3]
        at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) 
~[storm-core-0.9.3.jar:0.9.3]
        at 
backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94) 
~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) 
~[storm-core-0.9.3.jar:0.9.3]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
Caused by: java.lang.RuntimeException: Remote address is not reachable. We will 
close this client 
Netty-Client-complicated-laugh/10.100.98.103:6703<http://10.100.98.103:6703>
        at backtype.storm.messaging.netty.Client.connect(Client.java:171) 
~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.messaging.netty.Client.send(Client.java:194) 
~[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) 
~[storm-core-0.9.3.jar:0.9.3]
        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330)
 ~[storm-core-0.9.3.jar:0.9.3]
        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328)
 ~[storm-core-0.9.3.jar:0.9.3]
        at 
backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58) 
~[storm-core-0.9.3.jar:0.9.3]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 ~[storm-core-0.9.3.jar:0.9.3]
        ... 6 common frames omitted
2015-03-05T11:48:08.423-0800 b.s.util [ERROR] Halting process: ("Async loop 
died!")
java.lang.RuntimeException: ("Async loop died!")
        at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) 
[storm-core-0.9.3.jar:0.9.3]
        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
        at 
backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92) 
[storm-core-0.9.3.jar:0.9.3]
        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473) 
[storm-core-0.9.3.jar:0.9.3]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
2015-03-05T11:48:08.425-0800 b.s.d.worker [INFO] Shutting down worker 
eventsStreamerv1-48-1425499636 0673ece0-cea2-4185-9b3e-6c49ad585576 6703
2015-03-05T11:48:08.426-0800 b.s.m.n.Client [INFO] Closing Netty Client 
Netty-Client-beloved-judge/10.100.98.104:6703<http://10.100.98.104:6703>

I doubt this is caused by my eventUpfater, which write data in batch


static class EventUpdater implements ReducerAggregator<List<String>> {

            @Override
            public List<String> init(){
                     return null;
            }

            @Override
            public List<String> reduce(List<String> curr, TridentTuple tuple) {
                   List<String> updated = null ;

                   if ( curr == null ) {
                                    String event = (String) tuple.getValue(1);
                                    System.out.println("===:" + event + ":");
                                    updated = Lists.newArrayList(event);
                   } else {
                                    System.out.println("===+" +  tuple + ":");
                                    updated = curr ;
                   }
//              System.out.println("(())");
              return updated ;
            }
        }
How do you think

THanks

On Thu, Mar 5, 2015 at 11:57 AM, Sa Li 
<[email protected]<mailto:[email protected]>> wrote:
Thank you very much for the reply, here is error I saw in production server 
worker-6703.log,


On Thu, Mar 5, 2015 at 11:31 AM, Nathan Leung 
<[email protected]<mailto:[email protected]>> wrote:
Yeah, then in this case maybe you can install JDK / Yourkit in the remote 
machines and run the tools over X or something.  I'm assuming this is a 
development cluster (not live / production) and that installing debugging tools 
and running remote UIs etc is not a problem.  :)

On Thu, Mar 5, 2015 at 1:52 PM, Andrew Xor 
<[email protected]<mailto:[email protected]>> wrote:
Nathan I think that if he wants to profile a bolt per se that runs in a worker 
that resides in a different cluster node than the one the profiling tool runs 
he won't be able to attach the process since it resides in a different physical 
machine, me thinks (well, now that I think of it better it can be done... via 
remote debugging but that's just a pain in the ***).
Regards,
A.

On Thu, Mar 5, 2015 at 8:46 PM, Nathan Leung 
<[email protected]<mailto:[email protected]>> wrote:

You don't need to change your code. As Andrew mentioned you can get a lot of 
mileage by profiling your logic in a standalone program. For jvisualvm, you can 
just run your program (a loop that runs for a long time is best) then attach to 
the running process with jvisualvm.  It's pretty straightforward to use and you 
can also find good guides with a Google search.
On Mar 5, 2015 1:43 PM, "Andrew Xor" 
<[email protected]<mailto:[email protected]>> wrote:
​
Well...  detecting memory leaks in Java is a bit tricky as Java does a lot for 
you. Generally though, as long as you avoid using "new" operator and close any 
resources that you do not use you should be fine... but a Profiler such as the 
ones mentioned by Nathan will tell you the whole truth. YourKit is awesome and 
has a free trial, go ahead and test drive it. I am pretty sure that you need a 
working jar (or compilable code that has a main function in it) in order to 
profile it, although if you want to profile your bolts and spouts is a bit 
tricker. Hopefully your algorithm (or portions of it) can be put in a sample 
test program that is able to be executed locally for you to profile it.
Hope this helped. Regards,
A.
​

On Thu, Mar 5, 2015 at 8:33 PM, Sa Li 
<[email protected]<mailto:[email protected]>> wrote:

On Thu, Mar 5, 2015 at 10:26 AM, Andrew Xor 
<[email protected]<mailto:[email protected]>> wrote:
Unfortunately that is not fixed, it depends on the computations and 
data-structures you have; in my case for example I use more than 2GB since I 
need to keep a large matrix in memory... having said that, in most cases it 
should be relatively easy to estimate how much memory you are going to need and 
use that... or if that's not possible you can just increase it and try the "set 
and see" approach. Check for memory leaks as well... (unclosed resources and so 
on...!)
Regards.

​A.​

On Thu, Mar 5, 2015 at 8:21 PM, Sa Li 
<[email protected]<mailto:[email protected]>> wrote:
Thanks, Nathan. How much is should be in general?

On Thu, Mar 5, 2015 at 10:15 AM, Nathan Leung 
<[email protected]<mailto:[email protected]>> wrote:

Your worker is allocated a maximum of 768mb of heap. It's quite possible that 
this is not enough. Try increasing Xmx i worker.childopts.
On Mar 5, 2015 1:10 PM, "Sa Li" 
<[email protected]<mailto:[email protected]>> wrote:
Hi, All

I have been running a trident topology on production server, code is like this:


topology.newStream("spoutInit", kafkaSpout)
                .each(new Fields("str"),
                        new JsonObjectParse(),
                        new Fields("eventType", "event"))
                .parallelismHint(pHint)
                .groupBy(new Fields("event"))
                .persistentAggregate(PostgresqlState.newFactory(config), new 
Fields("eventType"), new EventUpdater(), new Fields("eventWord"))
        ;

        Config conf = new Config();
        conf.registerMetricsConsumer(LoggingMetricsConsumer.class, 1);

Basically, it does simple things to get data from kafka, parse to different 
field and write into postgresDB. But in storm UI, I did see such error, 
"java.lang.OutOfMemoryError: GC overhead limit exceeded". It all happens in 
same worker of each node - 6703. I understand this is because by default the 
JVM is configured to throw this error if you are spending more than 98% of the 
total time in GC and after the GC less than 2% of the heap is recovered.

I am not sure what is exact cause for memory leak, is it OK by simply increase 
the heap? Here is my storm.yaml:

supervisor.slots.ports:

     - 6700

     - 6701

     - 6702

     - 6703

nimbus.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"

ui.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true"

supervisor.childopts: "-Djava.net.preferIPv4Stack=true"

worker.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true"



Anyone has similar issues, and what will be the best way to overcome?





thanks in advance



AL


















----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may 
contain information that is privileged, confidential and/or proprietary and 
subject to important terms and conditions available at 
http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended 
recipient, please delete this message.

Reply via email to