You appear to be writing the entire bob on each chunk rather than the slice of the blob.
-- Jack Krupansky On Mon, Feb 8, 2016 at 1:45 PM, Giampaolo Trapasso < giampaolo.trapa...@radicalbit.io> wrote: > Hi to all, > > I'm trying to put a large binary file (> 500MB) on a C* cluster as fast as > I can but I get some (many) WriteTimeoutExceptions. > > I created a small POC that isolates the problem I'm facing. Here you will > find the code: https://github.com/giampaolotrapasso/cassandratest, > > *Main details about it:* > > - I try to write the file into chunks (*data* field) <= 1MB (1MB is > recommended max size for single cell), > > > - Chunks are grouped into buckets. Every bucket is a partition row, > - Buckets are grouped by UUIDs. > > > - Chunk size and bucket size are configurable from app so I can try > different configurations and see what happens. > > > - Trying to max throughput, I execute asynch insertions, however to > avoid too much pressure on the db, after a threshold, I wait at least for a > finished insert to add another (this part is quite raw in my code but I > think it's not so important). Also this parameter is configurable to test > different combinations. > > This is the table on db: > > CREATE TABLE blobtest.store ( > uuid uuid, > bucket bigint, > start bigint, > data blob, > end bigint, > PRIMARY KEY ((uuid, bucket), start) > ) > > and this is the main code (Scala, but I hope is be generally readable) > > val statement = client.session.prepare("INSERT INTO > blobTest.store(uuid, bucket, start, end, data) VALUES (?, ?, ?, ?, ?) if > not exists;") > > val blob = new Array[Byte](MyConfig.blobSize) > scala.util.Random.nextBytes(blob) > > write(client, > numberOfRecords = MyConfig.recordNumber, > bucketSize = MyConfig.bucketSize, > maxConcurrentWrites = MyConfig.maxFutures, > blob, > statement) > > where write is > > def write(database: Database, numberOfRecords: Int, bucketSize: Int, > maxConcurrentWrites: Int, > blob: Array[Byte], statement: PreparedStatement): Unit = { > > val uuid: UUID = UUID.randomUUID() > var count = 0; > > //Javish loop > while (count < numberOfRecords) { > val record = Record( > uuid = uuid, > bucket = count / bucketSize, > start = ((count % bucketSize)) * blob.length, > end = ((count % bucketSize) + 1) * blob.length, > bytes = blob > ) > asynchWrite(database, maxConcurrentWrites, statement, record) > count += 1 > } > > waitDbWrites() > } > > and asynchWrite is just binding to statement > > *Problem* > > The problem is that when I try to increase the chunck size, or the number > of asynch insert or the size of the bucket (ie number of chuncks), app > become unstable since the db starts throwing WriteTimeoutException. > > I've tested the stuff on the CCM (4 nodes) and on a EC2 cluster (5 nodes, > 8GB Heap). Problem seems the same on both enviroments. > > On my local cluster, I've tried to change respect to default > configuration: > > concurrent_writes: 128 > > write_request_timeout_in_ms: 200000 > > other configurations are here: > https://gist.github.com/giampaolotrapasso/ca21a83befd339075e07 > > *Other* > > Exceptions seems random, sometimes are at the beginning of the write > > *Questions:* > > 1. Is my model wrong? Am I missing some important detail? > > 2. What are the important information to look at for this kind of problem? > > 3. Why exceptions are so random? > > 4. There is some other C* parameter I can set to assure that > WriteTimeoutException does not occur? > > I hope I provided enough information to get some help. > > Thank you in advance for any reply. > > > Giampaolo > > > > > > > >