[ https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701858#comment-14701858 ]
Ariel Weisberg edited comment on CASSANDRA-8630 at 8/18/15 7:47 PM: -------------------------------------------------------------------- Doing code review now * In RebufferingInputStream, don't throw assertion error since it's not an assert nor an Error to read from a closed stream. JDK classes seem to throw IOException with a message. I say don't throw anything just let it NPE since that is what the other functions do and we are avoiding the extra branch in those. Or... check for "closed" all the time and throw IOException. Maybe Benedict has an opinion on the performance of checking. * RandomAccessReader.reBufferMmap() - how is mmap reading rate limited? * RandomAccessReader.Channel is not a channel. It's more of a wrapper, descriptor, proxy or something. * MemoryInputStream - This should let you read from Memories larger than 2 gigabytes right? Ints.checkedCast in getByteBuffer will throw? * -RateLimiter is not a final class. We could start using a noop rate limiter instead of null.- Constructor is private, maybe a rate limiter with a huge rate? * RandomAccessReader.bytesRemaining() uses Ints.checkedCast, but we expect the file to be bigger than an int so it shouldn't throw. The API allows this and FileInputStream doesn't throw for available. In this case saturated cast is probably the right one. We should do a pass for Ints.checkedCast and make sure throwing is the right behavior instead of writing handling for it. * BufferPool.java has an import change that is extra * I was going to ask for some warnings cleanup, but it's a big patch touching a lot of files that already had warnings, so whatever you want to do. * Thumbs up for logging the random seed in the tests The approach looks good. I'm still reviewing. Working on the tests and coverage now. was (Author: aweisberg): Doing code review now * In RebufferingInputStream, don't throw assertion error since it's not an assert nor an Error to read from a closed stream. JDK classes seem to throw IOException with a message. I say don't throw anything just let it NPE since that is what the other functions do and we are avoiding the extra branch in those. Or... check for "closed" all the time and throw IOException. Maybe Benedict has an opinion on the performance of checking. * RandomAccessReader.reBufferMmap() - how is mmap reading rate limited? * RandomAccessReader.Channel is not a channel. It's more of a wrapper, descriptor, proxy or something. * MemoryInputStream - This should let you read from Memories larger than 2 gigabytes right? Ints.checkedCast in getByteBuffer will throw? * RateLimiter is not a final class. We could start using a noop rate limiter instead of null. * RandomAccessReader.bytesRemaining() uses Ints.checkedCast, but we expect the file to be bigger than an int so it shouldn't throw. The API allows this and FileInputStream doesn't throw for available. In this case saturated cast is probably the right one. We should do a pass for Ints.checkedCast and make sure throwing is the right behavior instead of writing handling for it. * BufferPool.java has an import change that is extra * I was going to ask for some warnings cleanup, but it's a big patch touching a lot of files that already had warnings, so whatever you want to do. * Thumbs up for logging the random seed in the tests The approach looks good. I'm still reviewing. Working on the tests and coverage now. > Faster sequential IO (on compaction, streaming, etc) > ---------------------------------------------------- > > Key: CASSANDRA-8630 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8630 > Project: Cassandra > Issue Type: Improvement > Components: Core, Tools > Reporter: Oleg Anastasyev > Assignee: Stefania > Labels: compaction, performance > Fix For: 3.x > > Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, > flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, > mmaped_uncomp_hotspot.png > > > When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot > of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int). > This is because default implementations of readShort,readLong, etc as well as > their matching write* are implemented with numerous calls of byte by byte > read and write. > This makes a lot of syscalls as well. > A quick microbench shows than just reimplementation of these methods in > either way gives 8x speed increase. > A patch attached implements RandomAccessReader.read<Type> and > SequencialWriter.write<Type> methods in more efficient way. > I also eliminated some extra byte copies in CompositeType.split and > ColumnNameHelper.maxComponents, which were on my profiler's hotspot method > list during tests. > A stress tests on my laptop show that this patch makes compaction 25-30% > faster on uncompressed sstables and 15% faster for compressed ones. > A deployment to production shows much less CPU load for compaction. > (I attached a cpu load graph from one of our production, orange is niced CPU > load - i.e. compaction; yellow is user - i.e. not compaction related tasks) -- This message was sent by Atlassian JIRA (v6.3.4#6332)