[ https://issues.apache.org/jira/browse/IGNITE-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070511#comment-15070511 ]
Roman Shtykh commented on IGNITE-2016: -------------------------------------- I think using _IgniteDataStreamer_ is not a very good idea for this integration. I'd like to stay with _IgniteCache.putAll(...)_. And here is the reasoning. To put it simply, Kafka sends data to _IgniteSinkTask_ and records offsets of messages delivered on _flush(...)_. If something goes wrong with processing, it can be restarted from the last recorded offset. So to guarantee the data that is delivered from Kafka is transferred to Ignite and safe, we need to be sure the transfer is complete, and only after that complete _flush(...)_. On the other hand, _IgniteDataStreamer_ has its own buffer and flushes the data whenever it feels like doing so (specified by _autoFlushFrequency(...)_). Of course, we can invoke _get()_ on IgniteFuture's returned by _IgniteDataStreamer.addData()_ and call _IgniteDataStreamer.flush()_ to be sure all data transfer is complete, but it cancels advantages _IgniteDataStreamer_'s buffer. Also, I think it is more reasonable to rely on Kafka Connect's flushing interval instead (by exposing _autoFlushFrequency(...)_ to the user, we'll have two flushing mechanism which will most likely be out of sync), have a buffer which will be flushed with blocking _cache.putAll(...)_ and safely commit offsets to Kafka. This way, if something goes wrong with processing, the data will be redelivered from the last recorded offset -- i.e., we guarantee no data loss in transfer process and provide at-least-once delivery. Supporting exactly-once delivery semantics is probably not needed for cache (as invoking _put()_ with the same entry will give the same result in the cache) unless there is a special request in future. > Update KafkaStreamer to fit new features introduced in Kafka 0.9 > ---------------------------------------------------------------- > > Key: IGNITE-2016 > URL: https://issues.apache.org/jira/browse/IGNITE-2016 > Project: Ignite > Issue Type: New Feature > Components: streaming > Reporter: Roman Shtykh > Assignee: Roman Shtykh > > Particularly, > - new consumer > - Kafka Connect (Copycat) > http://www.confluent.io/blog/apache-kafka-0.9-is-released > This can be a a different integration task or a complete re-write of the > current implementation, considering the fact that Kafka Connect is a new > standard way for "large-scale, real-time data import and export for Kafka." -- This message was sent by Atlassian JIRA (v6.3.4#6332)