Hi Radim, Currently it's CPU-intensive for several reasons: 1) It doesn't yet use the native CRC code 2) It makes several unnecessary copies and byte buffer allocations, both in the client and in the DataNode
There are open JIRAs for these, and I have a preliminary patch which helped a lot, but it hasn't been high priority. On most clusters, writing becomes network bound before being CPU-bound. On the other hand, as 10gbe is becoming fairly common, this will probably be more important soon. Hoping to find time to get back to finishing the patches in the next few months. -Todd On Sun, Nov 25, 2012 at 1:41 PM, Radim Kolar <h...@filez.com> wrote: > anybody tried to profile why HDFS write path is so much CPU intensive? > -- Todd Lipcon Software Engineer, Cloudera