You appear to be writing the entire bob on each chunk rather than the slice
of the blob.

-- Jack Krupansky

On Mon, Feb 8, 2016 at 1:45 PM, Giampaolo Trapasso <
giampaolo.trapa...@radicalbit.io> wrote:

> Hi to all,
>
> I'm trying to put a large binary file (> 500MB) on a C* cluster as fast as
> I can but I get some (many) WriteTimeoutExceptions.
>
> I created a small POC that isolates the problem I'm facing. Here you will
> find the code: https://github.com/giampaolotrapasso/cassandratest,
>
> *Main details about it:*
>
>    - I try to write the file into chunks (*data* field) <= 1MB (1MB is
>    recommended max size for single cell),
>
>
>    - Chunks are grouped into buckets. Every bucket is a partition row,
>    - Buckets are grouped by UUIDs.
>
>
>    - Chunk size and bucket size are configurable from app so I can try
>    different configurations and see what happens.
>
>
>    - Trying to max throughput, I execute asynch insertions, however to
>    avoid too much pressure on the db, after a threshold, I wait at least for a
>    finished insert to add another (this part is quite raw in my code but I
>    think it's not so important). Also this parameter is configurable to test
>    different combinations.
>
> This is the table on db:
>
> CREATE TABLE blobtest.store (
>     uuid uuid,
>     bucket bigint,
>     start bigint,
>     data blob,
>     end bigint,
>     PRIMARY KEY ((uuid, bucket), start)
> )
>
> and this is the main code (Scala, but I hope is be generally readable)
>
>     val statement = client.session.prepare("INSERT INTO
> blobTest.store(uuid, bucket, start, end, data) VALUES (?, ?, ?, ?, ?) if
> not exists;")
>
>     val blob = new Array[Byte](MyConfig.blobSize)
>     scala.util.Random.nextBytes(blob)
>
>     write(client,
>       numberOfRecords = MyConfig.recordNumber,
>       bucketSize = MyConfig.bucketSize,
>       maxConcurrentWrites = MyConfig.maxFutures,
>       blob,
>       statement)
>
> where write is
>
> def write(database: Database, numberOfRecords: Int, bucketSize: Int,
> maxConcurrentWrites: Int,
>             blob: Array[Byte], statement: PreparedStatement): Unit = {
>
>     val uuid: UUID = UUID.randomUUID()
>     var count = 0;
>
>     //Javish loop
>     while (count < numberOfRecords) {
>       val record = Record(
>         uuid = uuid,
>         bucket = count / bucketSize,
>         start = ((count % bucketSize)) * blob.length,
>         end = ((count % bucketSize) + 1) * blob.length,
>         bytes = blob
>       )
>       asynchWrite(database, maxConcurrentWrites, statement, record)
>       count += 1
>     }
>
>     waitDbWrites()
>   }
>
> and asynchWrite is just binding to statement
>
> *Problem*
>
> The problem is that when I try to increase the chunck size, or the number
> of asynch insert or the size of the bucket (ie number of chuncks), app
> become unstable since the db starts throwing WriteTimeoutException.
>
> I've tested the stuff on the CCM (4 nodes) and on a EC2 cluster (5 nodes,
> 8GB Heap). Problem seems the same on both enviroments.
>
> On my local cluster, I've tried to change respect to default
> configuration:
>
> concurrent_writes: 128
>
> write_request_timeout_in_ms: 200000
>
> other configurations are here:
> https://gist.github.com/giampaolotrapasso/ca21a83befd339075e07
>
> *Other*
>
> Exceptions seems random, sometimes are at the beginning of the write
>
> *Questions:*
>
> 1. Is my model wrong? Am I missing some important detail?
>
> 2. What are the important information to look at for this kind of problem?
>
> 3. Why exceptions are so random?
>
> 4. There is some other C* parameter I can set to assure that
> WriteTimeoutException does not occur?
>
> I hope I provided enough information to get some help.
>
> Thank you in advance for any reply.
>
>
> Giampaolo
>
>
>
>
>
>
>
>

Reply via email to