I write at every step MyConfig.blobsize number of bytes, that I configured
to be from 100000 to 1000000. This allows me to "simulate" the writing of a
600Mb file, as configuration on github (
https://github.com/giampaolotrapasso/cassandratest/blob/master/src/main/resources/application.conf


*)*
 Giampaolo

2016-02-08 23:25 GMT+01:00 Jack Krupansky <jack.krupan...@gmail.com>:

> You appear to be writing the entire bob on each chunk rather than the
> slice of the blob.
>
> -- Jack Krupansky
>
> On Mon, Feb 8, 2016 at 1:45 PM, Giampaolo Trapasso <
> giampaolo.trapa...@radicalbit.io> wrote:
>
>> Hi to all,
>>
>> I'm trying to put a large binary file (> 500MB) on a C* cluster as fast
>> as I can but I get some (many) WriteTimeoutExceptions.
>>
>> I created a small POC that isolates the problem I'm facing. Here you will
>> find the code: https://github.com/giampaolotrapasso/cassandratest,
>>
>> *Main details about it:*
>>
>>    - I try to write the file into chunks (*data* field) <= 1MB (1MB is
>>    recommended max size for single cell),
>>
>>
>>    - Chunks are grouped into buckets. Every bucket is a partition row,
>>    - Buckets are grouped by UUIDs.
>>
>>
>>    - Chunk size and bucket size are configurable from app so I can try
>>    different configurations and see what happens.
>>
>>
>>    - Trying to max throughput, I execute asynch insertions, however to
>>    avoid too much pressure on the db, after a threshold, I wait at least for 
>> a
>>    finished insert to add another (this part is quite raw in my code but I
>>    think it's not so important). Also this parameter is configurable to test
>>    different combinations.
>>
>> This is the table on db:
>>
>> CREATE TABLE blobtest.store (
>>     uuid uuid,
>>     bucket bigint,
>>     start bigint,
>>     data blob,
>>     end bigint,
>>     PRIMARY KEY ((uuid, bucket), start)
>> )
>>
>> and this is the main code (Scala, but I hope is be generally readable)
>>
>>     val statement = client.session.prepare("INSERT INTO
>> blobTest.store(uuid, bucket, start, end, data) VALUES (?, ?, ?, ?, ?) if
>> not exists;")
>>
>>     val blob = new Array[Byte](MyConfig.blobSize)
>>     scala.util.Random.nextBytes(blob)
>>
>>     write(client,
>>       numberOfRecords = MyConfig.recordNumber,
>>       bucketSize = MyConfig.bucketSize,
>>       maxConcurrentWrites = MyConfig.maxFutures,
>>       blob,
>>       statement)
>>
>> where write is
>>
>> def write(database: Database, numberOfRecords: Int, bucketSize: Int,
>> maxConcurrentWrites: Int,
>>             blob: Array[Byte], statement: PreparedStatement): Unit = {
>>
>>     val uuid: UUID = UUID.randomUUID()
>>     var count = 0;
>>
>>     //Javish loop
>>     while (count < numberOfRecords) {
>>       val record = Record(
>>         uuid = uuid,
>>         bucket = count / bucketSize,
>>         start = ((count % bucketSize)) * blob.length,
>>         end = ((count % bucketSize) + 1) * blob.length,
>>         bytes = blob
>>       )
>>       asynchWrite(database, maxConcurrentWrites, statement, record)
>>       count += 1
>>     }
>>
>>     waitDbWrites()
>>   }
>>
>> and asynchWrite is just binding to statement
>>
>> *Problem*
>>
>> The problem is that when I try to increase the chunck size, or the number
>> of asynch insert or the size of the bucket (ie number of chuncks), app
>> become unstable since the db starts throwing WriteTimeoutException.
>>
>> I've tested the stuff on the CCM (4 nodes) and on a EC2 cluster (5 nodes,
>> 8GB Heap). Problem seems the same on both enviroments.
>>
>> On my local cluster, I've tried to change respect to default
>> configuration:
>>
>> concurrent_writes: 128
>>
>> write_request_timeout_in_ms: 200000
>>
>> other configurations are here:
>> https://gist.github.com/giampaolotrapasso/ca21a83befd339075e07
>>
>> *Other*
>>
>> Exceptions seems random, sometimes are at the beginning of the write
>>
>> *Questions:*
>>
>> 1. Is my model wrong? Am I missing some important detail?
>>
>> 2. What are the important information to look at for this kind of problem?
>>
>> 3. Why exceptions are so random?
>>
>> 4. There is some other C* parameter I can set to assure that
>> WriteTimeoutException does not occur?
>>
>> I hope I provided enough information to get some help.
>>
>> Thank you in advance for any reply.
>>
>>
>> Giampaolo
>>
>>
>>
>>
>>
>>
>>
>>
>

Reply via email to