Re: murmur3 instead of crc32

2012-11-25 Thread Todd Lipcon
Hi Radim, With SSE4.2 support, the iSCSI CRC32C is the fastest method available. As of HDFS 2, we use that method by default for new files. -Todd On Sun, Nov 25, 2012 at 6:58 PM, Radim Kolar wrote: > its not that big speed difference in this test: > > http://www.strchr.com/hash_**functions#res

Re: profiling hdfs write path

2012-11-25 Thread Radim Kolar
Currently it's CPU-intensive for several reasons: 1) It doesn't yet use the native CRC code 2) It makes several unnecessary copies and byte buffer allocations, both in the client and in the DataNode There are open JIRAs for these, and I have a preliminary patch which helped a lot, but it hasn't

Re: murmur3 instead of crc32

2012-11-25 Thread Radim Kolar
its not that big speed difference in this test: http://www.strchr.com/hash_functions#results asm version of CRC32 on i5 is fastest, but Java8 switched to murmur3 for hashing strings, i didnt get why they use it instead of *java.util.zip.CRC32. The collisions seems to be about same.*

murmur3 instead of crc32

2012-11-25 Thread Radim Kolar
i just tested C version and murmur3 32_le is about 4 times faster then CRC32. I submitted yesterday murmur3 hash support. What it takes to change checksum method, does .metadata information what hash type is used inside?

Re: profiling hdfs write path

2012-11-25 Thread Todd Lipcon
Hi Radim, Currently it's CPU-intensive for several reasons: 1) It doesn't yet use the native CRC code 2) It makes several unnecessary copies and byte buffer allocations, both in the client and in the DataNode There are open JIRAs for these, and I have a preliminary patch which helped a lot, but i

profiling hdfs write path

2012-11-25 Thread Radim Kolar
anybody tried to profile why HDFS write path is so much CPU intensive?

Jenkins build is back to normal : Hadoop-Hdfs-trunk #1237

2012-11-25 Thread Apache Jenkins Server
See

Jenkins build is back to stable : Hadoop-Hdfs-0.23-Build #446

2012-11-25 Thread Apache Jenkins Server
See