Hello,

the main issue that prevented us from writing batches is that there is a server-side limit as to how big a batch may be, however there was no way to tell how big the batch that you are currently building up actually is.

Regarding locality, I'm not sure if a partitioner alone solves this issue. While the data and sink instance may be on the right node, the sink would still have to know which Cassandra instance to write to to actually make use of the locality. Never looked to deeply into data locality,
so I don't whether/how we would have to change the sink to do that :(

Regards,
Chesnay

On 01.11.2016 20:29, Stephan Ewen wrote:
Hi!

I do not know the details of how Cassandra supports batched writes, but here are some thoughts:

- Grouping writes that go to the same partition together into one batch write request makes sense. If you have some sample code for that, it should be not too hard to integrate into the Flink Cassandra connector

- If you know the partitioning scheme in Cassandra and you use "DataStream.partitionCustom(partitioner, key)" it should result in a way that all write requests from one parallel sink instance go to the same Cassandra node (or a small number of nodes). Would that help?

Greetings,
Stephan




On Fri, Oct 28, 2016 at 8:57 AM, kant kodali <kanth...@gmail.com <mailto:kanth...@gmail.com>> wrote:

    Spark Cassandra connector does it! but I don't think it really
    implements a custom partitioner I think it just leverages token
    aware policy and does batch writes by default within a partition
    but you can also do across partitions with the same replica!

    On Thu, Oct 27, 2016 at 8:41 AM, Shannon Carey <sca...@expedia.com
    <mailto:sca...@expedia.com>> wrote:

        It certainly seems possible to write a Partitioner that does
        what you describe. I started implementing one but didn't have
        time to finish it. I think the main difficulty is in properly
        dealing with partition ownership changes in Cassandra… if you
        are maintaining state in Flink and the partitioning changes,
        your job might produce inaccurate output. If, on the other
        hand, you are only using the partitioner just before the
        output, dynamic partitioning changes might be ok.


        From: kant kodali <kanth...@gmail.com <mailto:kanth...@gmail.com>>
        Date: Thursday, October 27, 2016 at 3:17 AM
        To: <user@flink.apache.org <mailto:user@flink.apache.org>>
        Subject: Can we do batch writes on cassandra using flink while
        leveraging the locality?

        locality? For example the batch writes in Cassandra will put
        pressure on the coordinator but since the connectors are built
        by leveraging the locality I was wondering if we could do
        batch of writes on a node where the batch belongs?




Reply via email to