[jira] [Commented] (IGNITE-529) Implement IgniteFlumeStreamer to stream data from Apache Flume

Roman Shtykh (JIRA) Tue, 10 Nov 2015 20:25:37 -0800

    [ 
https://issues.apache.org/jira/browse/IGNITE-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999926#comment-14999926
 ]


Roman Shtykh commented on IGNITE-529:
-------------------------------------

Anton,

A> So, it means we absolutelly need no FlumeStreamer, because everything will 
start and work at Flume (Sink) side.
R> Yes, I also think so. No reason to have it in this integration. In my 
current implementation it does nothing useful.

1) IgniteSink starts Ignite node from configuration, but uses getCache instead 
of getOrCreateCache.
R> Ok. So a cache is created based on what is in Ignite configurations xml, 
right?
R> Just out of curiosity, what is the reason for not allowing 
getOrCreateCache()? Is it for having all configurations in one place (xml 
file)? With getCache() the user will have to specify the cache name both in 
Ignite configurations xml and sink configurations file.

2) IgniteSink has own transformer, Interface should be provided.
Implementation of transformer should be specified at configuration.
R> Ok.

3) Sink.process() writes directly to Ignite.cache, using transformer. Put, 
putAll or DataSrtreamer can be used to update cache.
R> Do you mean the transformer is something more than just converting Flume 
event to key&value types of the cache? Something that exposes a cache instance 
to the user and the user can choose whether to use put() or putAll() in his/her 
implementation?
R> I wouldn't expose the cache to the user -- just let him/her implement data 
conversion interface to specify at configuration (2) in your proposal) and use 
putAll() since this will probably be the most used method considering batching 
is good for large data loads (even if creating maps may introduce memory 
overheads).
R > What do you think?

A> Is it possible to use Flume Instance at tests?
R> What instance to you have in mind? Normally it is sufficient to have a 
channel and a sink, as it is in my tests. Our you can run it as I described in 
README (but that is not for tests).

> Implement IgniteFlumeStreamer to stream data from Apache Flume
> --------------------------------------------------------------
>
>                 Key: IGNITE-529
>                 URL: https://issues.apache.org/jira/browse/IGNITE-529
>             Project: Ignite
>          Issue Type: Sub-task
>          Components: streaming
>            Reporter: Dmitriy Setrakyan
>            Assignee: Roman Shtykh
>
> We have {{IgniteDataStreamer}} which is used to load data into Ignite under 
> high load. It was previously named {{IgniteDataLoader}}, see ticket 
> IGNITE-394.
> See [Apache Flume|http://flume.apache.org/] for more information.
> We should create {{IgniteFlumeStreamer}} which will consume messages from 
> Apache Flume and stream them into Ignite caches. 
> More details to follow, but to the least we should be able to:
> * Convert Flume data to Ignite data using an optional pluggable converter.
> * Specify the cache name for the Ignite cache to load data into.
> * Specify other flags available on {{IgniteDataStreamer}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (IGNITE-529) Implement IgniteFlumeStreamer to stream data from Apache Flume

Reply via email to