[
https://issues.apache.org/jira/browse/IGNITE-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070511#comment-15070511
]
Roman Shtykh commented on IGNITE-2016:
--------------------------------------
I think using _IgniteDataStreamer_ is not a very good idea for this
integration. I'd like to stay with _IgniteCache.putAll(...)_. And here is the
reasoning.
To put it simply, Kafka sends data to _IgniteSinkTask_ and records offsets of
messages delivered on _flush(...)_. If something goes wrong with processing, it
can be restarted from the last recorded offset.
So to guarantee the data that is delivered from Kafka is transferred to Ignite
and safe, we need to be sure the transfer is complete, and only after that
complete _flush(...)_. On the other hand, _IgniteDataStreamer_ has its own
buffer and flushes the data whenever it feels like doing so (specified by
_autoFlushFrequency(...)_). Of course, we can invoke _get()_ on IgniteFuture's
returned by _IgniteDataStreamer.addData()_ and call
_IgniteDataStreamer.flush()_ to be sure all data transfer is complete, but it
cancels advantages _IgniteDataStreamer_'s buffer.
Also, I think it is more reasonable to rely on Kafka Connect's flushing
interval instead (by exposing _autoFlushFrequency(...)_ to the user, we'll have
two flushing mechanism which will most likely be out of sync), have a buffer
which will be flushed with blocking _cache.putAll(...)_ and safely commit
offsets to Kafka.
This way, if something goes wrong with processing, the data will be redelivered
from the last recorded offset -- i.e., we guarantee no data loss in transfer
process and provide at-least-once delivery. Supporting exactly-once delivery
semantics is probably not needed for cache (as invoking _put()_ with the same
entry will give the same result in the cache) unless there is a special request
in future.
> Update KafkaStreamer to fit new features introduced in Kafka 0.9
> ----------------------------------------------------------------
>
> Key: IGNITE-2016
> URL: https://issues.apache.org/jira/browse/IGNITE-2016
> Project: Ignite
> Issue Type: New Feature
> Components: streaming
> Reporter: Roman Shtykh
> Assignee: Roman Shtykh
>
> Particularly,
> - new consumer
> - Kafka Connect (Copycat)
> http://www.confluent.io/blog/apache-kafka-0.9-is-released
> This can be a a different integration task or a complete re-write of the
> current implementation, considering the fact that Kafka Connect is a new
> standard way for "large-scale, real-time data import and export for Kafka."
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)