Hi, If you do have a chance, it would great to hear if the patch attached to this JIRA (https://issues.apache.org/jira/browse/FLUME-1794) fixes the performance problem.
Brock On Tue, Dec 18, 2012 at 11:25 AM, Brock Noland <br...@cloudera.com> wrote: > Yeah I think we should do that check in the background and then update > a flag. This how hdfs and mapred do it. > > On Tue, Dec 18, 2012 at 11:04 AM, Hari Shreedharan > <hshreedha...@cloudera.com> wrote: >> Yep. The disk space calls require an NFS call for each write, and that slows >> things down a lot. >> >> -- >> Hari Shreedharan >> >> On Tuesday, December 18, 2012 at 8:43 AM, Brock Noland wrote: >> >> We'd need those thread dumps to help confirm but I bet that FLUME-1609 >> results in a NFS call on each operation on the channel. >> >> If that is true, that would explain why it works well on local disk. >> >> Brock >> >> On Tue, Dec 18, 2012 at 10:17 AM, Brock Noland <br...@cloudera.com> wrote: >> >> Hi, >> >> Hmm, yes in general performance is not going to be great over NFS, but >> there haven't been any FC changes that stick out here. >> >> Could you take 10 thread dumps of the agent running the file channel >> and 10 thread dumps of the agent sending data to the agent with the >> file channel? (You can address them to myself directly since the list >> won't take attachements.) >> >> Are there any patterns, like it works for 40 seconds then times out >> and then works for 39 seconds, etc? >> >> Brock >> >> On Tue, Dec 18, 2012 at 10:07 AM, Rakos, Rudolf >> <rudolf.ra...@morganstanley.com> wrote: >> >> Hi, >> >> >> >> We’ve run into a strange problem regarding NFS and File Channel performance >> while evaluating the new version of Flume. >> >> We had no issues with the previous version (1.2.0). >> >> >> >> Our configuration looks like this: >> >> · Node1: >> (Avro RPC Clients ->) Avro Source and Custom Sources -> File Channel -> Avro >> Sink (-> Node 2) >> >> · Node2: >> (Node1s ->) Avro Source -> File Channel -> Custom Sink >> >> >> >> Both the checkpoint and the data directories of the File Channels are on NFS >> shares. We use the same share for checkpoint and data directories, but >> different shares for each Node. Unfortunately it is not an option for us to >> use local directories. >> >> The events are about 1KB large, and the batch sizes are the following: >> >> · Avro RPC Clients: 1000 >> >> · Custom Sources: 2000 >> >> · Avro Sink: 5000 >> >> · Custom Sink: 10000 >> >> >> >> We are experiencing very slow File Channel performance compared to the >> previous version, and high amount of timeouts (almost always) in the Avro >> RPC Clients and the Avro Sink. >> >> Something like this: >> >> · 2012-12-18 15:43:31,828 >> [SinkRunner-PollingRunner-ExceptionCatchingSinkProcessor] WARN >> org.apache.flume.sink.AvroSink - Failed to send event batch >> org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: ***, >> port: *** }: Failed to send batch >> at >> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236) >> ~[flume-ng-sdk-1.3.0.jar:1.3.0] >> *** >> at >> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) >> [flume-ng-core-1.3.0.jar:1.3.0] >> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31] >> Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient { >> host: ***, port: *** }: Handshake timed out after 20000ms >> at >> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280) >> ~[flume-ng-sdk-1.3.0.jar:1.3.0] >> at >> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224) >> ~[flume-ng-sdk-1.3.0.jar:1.3.0] >> ... 5 common frames omitted >> Caused by: java.util.concurrent.TimeoutException: null >> at >> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228) >> ~[na:1.6.0_31] >> at java.util.concurrent.FutureTask.get(FutureTask.java:91) >> ~[na:1.6.0_31] >> at >> org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278) >> ~[flume-ng-sdk-1.3.0.jar:1.3.0] >> ... 6 common frames omitted >> >> (I had to remove some details, sorry for that.) >> >> >> >> We managed to narrow down the root cause of the issue to the File Channel, >> because: >> >> · Everything works fine if we switch to the Memory Channel or to the >> Old File Channel (1.2.0). >> >> · Everything works fine if we use local directories. >> >> We’ve tested this on multiple different PCs (both Windows and Linux). >> >> >> >> I spent the day debugging and profiling, but I could not find anything worth >> mentioning (nothing with excessive CPU usage, no threads are waiting too >> much, etc…). The only problem is that File Channel takes and puts take way >> more time than with the previous version. >> >> >> >> >> >> Could someone please try the File Channel on an NFS share? >> >> Does anyone have similar issues? >> >> >> >> Thank you for your help. >> >> >> >> Regards, >> >> Rudolf >> >> >> >> Rudolf Rakos >> Morgan Stanley | ISG Technology >> Lechner Odon fasor 8 | Floor 06 >> Budapest, 1095 >> Phone: +36 1 881-4011 >> rudolf.ra...@morganstanley.com >> >> >> Be carbon conscious. Please consider our environment before printing this >> email. >> >> >> >> >> ________________________________ >> >> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions >> or views contained herein are not intended to be, and do not constitute, >> advice within the meaning of Section 975 of the Dodd-Frank Wall Street >> Reform and Consumer Protection Act. If you have received this communication >> in error, please destroy all electronic and paper copies and notify the >> sender immediately. Mistransmission is not intended to waive confidentiality >> or privilege. Morgan Stanley reserves the right, to the extent permitted >> under applicable law, to monitor electronic communications. This message is >> subject to terms available at the following link: >> http://www.morganstanley.com/disclaimers If you cannot access these links, >> please notify us by reply message and we will send the contents to you. By >> messaging with Morgan Stanley you consent to the foregoing. >> >> >> >> >> -- >> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ >> >> >> >> >> -- >> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ >> >> > > > > -- > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/