I write at every step MyConfig.blobsize number of bytes, that I configured to be from 100000 to 1000000. This allows me to "simulate" the writing of a 600Mb file, as configuration on github ( https://github.com/giampaolotrapasso/cassandratest/blob/master/src/main/resources/application.conf
*)* Giampaolo 2016-02-08 23:25 GMT+01:00 Jack Krupansky <jack.krupan...@gmail.com>: > You appear to be writing the entire bob on each chunk rather than the > slice of the blob. > > -- Jack Krupansky > > On Mon, Feb 8, 2016 at 1:45 PM, Giampaolo Trapasso < > giampaolo.trapa...@radicalbit.io> wrote: > >> Hi to all, >> >> I'm trying to put a large binary file (> 500MB) on a C* cluster as fast >> as I can but I get some (many) WriteTimeoutExceptions. >> >> I created a small POC that isolates the problem I'm facing. Here you will >> find the code: https://github.com/giampaolotrapasso/cassandratest, >> >> *Main details about it:* >> >> - I try to write the file into chunks (*data* field) <= 1MB (1MB is >> recommended max size for single cell), >> >> >> - Chunks are grouped into buckets. Every bucket is a partition row, >> - Buckets are grouped by UUIDs. >> >> >> - Chunk size and bucket size are configurable from app so I can try >> different configurations and see what happens. >> >> >> - Trying to max throughput, I execute asynch insertions, however to >> avoid too much pressure on the db, after a threshold, I wait at least for >> a >> finished insert to add another (this part is quite raw in my code but I >> think it's not so important). Also this parameter is configurable to test >> different combinations. >> >> This is the table on db: >> >> CREATE TABLE blobtest.store ( >> uuid uuid, >> bucket bigint, >> start bigint, >> data blob, >> end bigint, >> PRIMARY KEY ((uuid, bucket), start) >> ) >> >> and this is the main code (Scala, but I hope is be generally readable) >> >> val statement = client.session.prepare("INSERT INTO >> blobTest.store(uuid, bucket, start, end, data) VALUES (?, ?, ?, ?, ?) if >> not exists;") >> >> val blob = new Array[Byte](MyConfig.blobSize) >> scala.util.Random.nextBytes(blob) >> >> write(client, >> numberOfRecords = MyConfig.recordNumber, >> bucketSize = MyConfig.bucketSize, >> maxConcurrentWrites = MyConfig.maxFutures, >> blob, >> statement) >> >> where write is >> >> def write(database: Database, numberOfRecords: Int, bucketSize: Int, >> maxConcurrentWrites: Int, >> blob: Array[Byte], statement: PreparedStatement): Unit = { >> >> val uuid: UUID = UUID.randomUUID() >> var count = 0; >> >> //Javish loop >> while (count < numberOfRecords) { >> val record = Record( >> uuid = uuid, >> bucket = count / bucketSize, >> start = ((count % bucketSize)) * blob.length, >> end = ((count % bucketSize) + 1) * blob.length, >> bytes = blob >> ) >> asynchWrite(database, maxConcurrentWrites, statement, record) >> count += 1 >> } >> >> waitDbWrites() >> } >> >> and asynchWrite is just binding to statement >> >> *Problem* >> >> The problem is that when I try to increase the chunck size, or the number >> of asynch insert or the size of the bucket (ie number of chuncks), app >> become unstable since the db starts throwing WriteTimeoutException. >> >> I've tested the stuff on the CCM (4 nodes) and on a EC2 cluster (5 nodes, >> 8GB Heap). Problem seems the same on both enviroments. >> >> On my local cluster, I've tried to change respect to default >> configuration: >> >> concurrent_writes: 128 >> >> write_request_timeout_in_ms: 200000 >> >> other configurations are here: >> https://gist.github.com/giampaolotrapasso/ca21a83befd339075e07 >> >> *Other* >> >> Exceptions seems random, sometimes are at the beginning of the write >> >> *Questions:* >> >> 1. Is my model wrong? Am I missing some important detail? >> >> 2. What are the important information to look at for this kind of problem? >> >> 3. Why exceptions are so random? >> >> 4. There is some other C* parameter I can set to assure that >> WriteTimeoutException does not occur? >> >> I hope I provided enough information to get some help. >> >> Thank you in advance for any reply. >> >> >> Giampaolo >> >> >> >> >> >> >> >> >