Re: Can we do batch writes on cassandra using flink while leveraging the locality?

Chesnay Schepler Tue, 01 Nov 2016 13:15:49 -0700

Hello,

the main issue that prevented us from writing batches is that there is aserver-side limit as to how big a batch may be,however there was no way to tell how big the batch that you arecurrently building up actually is.

Regarding locality, I'm not sure if a partitioner alone solves thisissue. While the data and sink instance may be on the right node, the sinkwould still have to know which Cassandra instance to write to toactually make use of the locality. Never looked to deeply into datalocality,

so I don't whether/how we would have to change the sink to do that :(

Regards,
Chesnay

On 01.11.2016 20:29, Stephan Ewen wrote:

Hi!

I do not know the details of how Cassandra supports batched writes,but here are some thoughts:

- Grouping writes that go to the same partition together into onebatch write request makes sense. If you have some sample code forthat, it should be not too hard to integrate into the Flink Cassandraconnector

- If you know the partitioning scheme in Cassandra and you use"DataStream.partitionCustom(partitioner, key)" it should result in away that all write requests from one parallel sink instance go to thesame Cassandra node (or a small number of nodes). Would that help?


Greetings,
Stephan

On Fri, Oct 28, 2016 at 8:57 AM, kant kodali <kanth...@gmail.com<mailto:kanth...@gmail.com>> wrote:


    Spark Cassandra connector does it! but I don't think it really
    implements a custom partitioner I think it just leverages token
    aware policy and does batch writes by default within a partition
    but you can also do across partitions with the same replica!

    On Thu, Oct 27, 2016 at 8:41 AM, Shannon Carey <sca...@expedia.com
    <mailto:sca...@expedia.com>> wrote:

        It certainly seems possible to write a Partitioner that does
        what you describe. I started implementing one but didn't have
        time to finish it. I think the main difficulty is in properly
        dealing with partition ownership changes in Cassandra… if you
        are maintaining state in Flink and the partitioning changes,
        your job might produce inaccurate output. If, on the other
        hand, you are only using the partitioner just before the
        output, dynamic partitioning changes might be ok.


        From: kant kodali <kanth...@gmail.com <mailto:kanth...@gmail.com>>
        Date: Thursday, October 27, 2016 at 3:17 AM
        To: <user@flink.apache.org <mailto:user@flink.apache.org>>
        Subject: Can we do batch writes on cassandra using flink while
        leveraging the locality?

        locality? For example the batch writes in Cassandra will put
        pressure on the coordinator but since the connectors are built
        by leveraging the locality I was wondering if we could do
        batch of writes on a node where the batch belongs?

Re: Can we do batch writes on cassandra using flink while leveraging the locality?

Reply via email to