To change the readahead amount iirc is something like blockdev --setra 16 /dev/sda where the number is the readahead buffer size in KB I think.
-Jay On Sun, Jan 25, 2015 at 3:50 PM, Roger Hoover <roger.hoo...@gmail.com> wrote: > I haven't had a chance to try it yet. Hopefully next week. I'll let you > know what I find. > > On Sun, Jan 25, 2015 at 2:40 PM, Chris Riccomini <criccom...@apache.org> > wrote: > > > Awesome, I'll have a look at this. @Roger, did setting this improve your > > RocksDB throughput? > > > > On Sun, Jan 25, 2015 at 12:53 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > > > I have seen a similar thing from the OS tunable readahead. I think > Linux > > > defaults to reading a full 128K into pagecache with every read. This is > > > sensible for spinning disks where maybe blowing 500us may mean you get > > > lucky and save a 10ms seek. But for SSDs, especially a key-value store > > > doing purely random access, it is a total waste and huge perf hit. > > > > > > -Jay > > > > > > On Sun, Jan 25, 2015 at 12:29 PM, Roger Hoover <roger.hoo...@gmail.com > > > > > wrote: > > > > > > > FYI, for Linux with SSDs, changing the io scheduler to deadline or > noop > > > can > > > > make a 500x improvement. I haven't tried this myself. > > > > > > > > > > > > > > > > > > http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/hardware.html#_disks > > > > > > > > On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini < > > > > criccom...@linkedin.com.invalid> wrote: > > > > > > > > > Hey Roger, > > > > > > > > > > We did some benchmarking, and discovered very similar performance > to > > > what > > > > > you've described. We saw ~40k writes/sec, and ~20 k reads/sec, > > > > > per-container, on a Virident SSD. This was without any changelog. > Are > > > you > > > > > using a changelog on the store? > > > > > > > > > > When we attached a changelog to the store, the writes dropped > > > > > significantly (~1000 writes/sec). When we hooked up VisualVM, we > saw > > > that > > > > > the container was spending > 99% of its time in > > > > KafkaSystemProducer.send(). > > > > > > > > > > We're currently doing two things: > > > > > > > > > > 1. Working with our performance team to understand and tune RocksDB > > > > > properly. > > > > > 2. Upgrading the Kafka producer to use the new Java-based API. > > > > (SAMZA-227) > > > > > > > > > > For (1), it seems like we should be able to get a lot higher > > throughput > > > > > from RocksDB. Anecdotally, we've heard that RocksDB requires many > > > threads > > > > > in order to max out an SSD, and since Samza is single-threaded, we > > > could > > > > > just be hitting a RocksDB bottleneck. We won't know until we dig > into > > > the > > > > > problem (which we started investigating last week). The current > plan > > is > > > > to > > > > > start by benchmarking RocksDB JNI outside of Samza, and see what we > > can > > > > > get. From there, we'll know our "speed of light", and can try to > get > > > > Samza > > > > > as close as possible to it. If RocksDB JNI can't be made to go > > "fast", > > > > > then we'll have to understand why. > > > > > > > > > > (2) should help with the changelog issue. I believe that the > slowness > > > > with > > > > > the changelog is caused because the changelog is using a sync > > producer > > > to > > > > > send to Kafka, and is blocking when a batch is flushed. In the new > > API, > > > > > the concept of a "sync" producer is removed. All writes are handled > > on > > > an > > > > > async writer thread (though we can still guarantee writes are > safely > > > > > written before checkpointing, which is what we need). > > > > > > > > > > In short, I agree, it seems slow. We see this behavior, too. We're > > > > digging > > > > > into it. > > > > > > > > > > Cheers, > > > > > Chris > > > > > > > > > > On 1/17/15 12:58 PM, "Roger Hoover" <roger.hoo...@gmail.com> > wrote: > > > > > > > > > > >Michael, > > > > > > > > > > > >Thanks for the response. I used VisualVM and YourKit and see the > > CPU > > > is > > > > > >not being used (0.1%). I took a few thread dumps and see the main > > > > thread > > > > > >blocked on the flush() method inside the KV store. > > > > > > > > > > > >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose < > > elementat...@gmail.com > > > > > > > > > >wrote: > > > > > > > > > > > >> Is your process at 100% CPU? I suspect you're spending most of > > your > > > > > >>time in > > > > > >> JSON deserialization, but profile it and check. > > > > > >> > > > > > >> Michael > > > > > >> > > > > > >> On Friday, January 16, 2015, Roger Hoover < > roger.hoo...@gmail.com > > > > > > > > >>wrote: > > > > > >> > > > > > >> > Hi guys, > > > > > >> > > > > > > >> > I'm testing a job that needs to load 40M records (6GB in Kafka > > as > > > > > >>JSON) > > > > > >> > from a bootstrap topic. The topic has 4 partitions and I'm > > > running > > > > > >>the > > > > > >> job > > > > > >> > using the ProcessJobFactory so all four tasks are in one > > > container. > > > > > >> > > > > > > >> > Using RocksDB, it's taking 19 minutes to load all the data > which > > > > > >>amounts > > > > > >> to > > > > > >> > 35k records/sec or 5MB/s based on input size. I ran iostat > > during > > > > > >>this > > > > > >> > time as see the disk write throughput is 14MB/s. > > > > > >> > > > > > > >> > I didn't tweak any of the storage settings. > > > > > >> > > > > > > >> > A few questions: > > > > > >> > 1) Does this seem low? I'm running on a Macbook Pro with SSD. > > > > > >> > 2) Do you have any recommendations for improving the load > speed? > > > > > >> > > > > > > >> > Thanks, > > > > > >> > > > > > > >> > Roger > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > >