[ 
https://issues.apache.org/jira/browse/IGNITE-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070511#comment-15070511
 ] 

Roman Shtykh commented on IGNITE-2016:
--------------------------------------

I think using _IgniteDataStreamer_ is not a very good idea for this 
integration. I'd like to stay with _IgniteCache.putAll(...)_. And here is the 
reasoning.

To put it simply, Kafka sends data to _IgniteSinkTask_ and records offsets of 
messages delivered on _flush(...)_. If something goes wrong with processing, it 
can be restarted from the last recorded offset.
So to guarantee the data that is delivered from Kafka is transferred to Ignite 
and safe, we need to be sure the transfer is complete, and only after that 
complete _flush(...)_. On the other hand, _IgniteDataStreamer_ has its own 
buffer and flushes the data whenever it feels like doing so (specified by 
_autoFlushFrequency(...)_). Of course, we can invoke _get()_ on IgniteFuture's 
returned by _IgniteDataStreamer.addData()_ and call 
_IgniteDataStreamer.flush()_ to be sure all data transfer is complete, but it 
cancels advantages _IgniteDataStreamer_'s buffer.

Also, I think it is more reasonable to rely on Kafka Connect's flushing 
interval instead (by exposing _autoFlushFrequency(...)_ to the user, we'll have 
two flushing mechanism which will most likely be out of sync), have a buffer 
which will be flushed with blocking _cache.putAll(...)_ and safely commit 
offsets to Kafka.

This way, if something goes wrong with processing, the data will be redelivered 
from the last recorded offset -- i.e., we guarantee no data loss in transfer 
process and provide at-least-once delivery. Supporting exactly-once delivery 
semantics is probably not needed for cache (as invoking _put()_ with the same 
entry will give the same result in the cache) unless there is a special request 
in future.

> Update KafkaStreamer to fit new features introduced in Kafka 0.9
> ----------------------------------------------------------------
>
>                 Key: IGNITE-2016
>                 URL: https://issues.apache.org/jira/browse/IGNITE-2016
>             Project: Ignite
>          Issue Type: New Feature
>          Components: streaming
>            Reporter: Roman Shtykh
>            Assignee: Roman Shtykh
>
> Particularly,
> - new consumer
> - Kafka Connect (Copycat)
> http://www.confluent.io/blog/apache-kafka-0.9-is-released
> This can be a a different integration task or a complete re-write of the 
> current implementation, considering the fact that Kafka Connect is a new 
> standard way for "large-scale, real-time data import and export for Kafka."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to