Hello,
the main issue that prevented us from writing batches is that there is a
server-side limit as to how big a batch may be,
however there was no way to tell how big the batch that you are
currently building up actually is.
Regarding locality, I'm not sure if a partitioner alone solves this
issue. While the data and sink instance may be on the right node, the sink
would still have to know which Cassandra instance to write to to
actually make use of the locality. Never looked to deeply into data
locality,
so I don't whether/how we would have to change the sink to do that :(
Regards,
Chesnay
On 01.11.2016 20:29, Stephan Ewen wrote:
Hi!
I do not know the details of how Cassandra supports batched writes,
but here are some thoughts:
- Grouping writes that go to the same partition together into one
batch write request makes sense. If you have some sample code for
that, it should be not too hard to integrate into the Flink Cassandra
connector
- If you know the partitioning scheme in Cassandra and you use
"DataStream.partitionCustom(partitioner, key)" it should result in a
way that all write requests from one parallel sink instance go to the
same Cassandra node (or a small number of nodes). Would that help?
Greetings,
Stephan
On Fri, Oct 28, 2016 at 8:57 AM, kant kodali <kanth...@gmail.com
<mailto:kanth...@gmail.com>> wrote:
Spark Cassandra connector does it! but I don't think it really
implements a custom partitioner I think it just leverages token
aware policy and does batch writes by default within a partition
but you can also do across partitions with the same replica!
On Thu, Oct 27, 2016 at 8:41 AM, Shannon Carey <sca...@expedia.com
<mailto:sca...@expedia.com>> wrote:
It certainly seems possible to write a Partitioner that does
what you describe. I started implementing one but didn't have
time to finish it. I think the main difficulty is in properly
dealing with partition ownership changes in Cassandra… if you
are maintaining state in Flink and the partitioning changes,
your job might produce inaccurate output. If, on the other
hand, you are only using the partitioner just before the
output, dynamic partitioning changes might be ok.
From: kant kodali <kanth...@gmail.com <mailto:kanth...@gmail.com>>
Date: Thursday, October 27, 2016 at 3:17 AM
To: <user@flink.apache.org <mailto:user@flink.apache.org>>
Subject: Can we do batch writes on cassandra using flink while
leveraging the locality?
locality? For example the batch writes in Cassandra will put
pressure on the coordinator but since the connectors are built
by leveraging the locality I was wondering if we could do
batch of writes on a node where the batch belongs?