Re: Flink application with HBase

Stephan Ewen Wed, 30 Dec 2015 16:30:06 -0800

The OutputFormats (such as the HBaseOutputFormat) come originally from the
DataSet API.

The work with DataStream, but the main difference to the SinkFunction is
that have no way to let you implement custom checkpointing hooks. Since
sinks interact with the outside works (side effect), they are by default
not "exactly once", but only "at least once" in cases of failures when you
use checkpointing.

If that works for your case, feel free to use the HBaseOutputFormat.

If you plan on adding custom exactly-once sink checkpointing logic (such as
buffering data in the sink and committing only upon successful
checkpoints), I would go for the SinkFunction.

Greetings,
Stephan

On Tue, Dec 22, 2015 at 1:45 PM, Márton Balassi <balassi.mar...@gmail.com>
wrote:

> Hi Thomas,
>
> You can use both of the suggested solutions.
>
> The benefit that you might get from HBaseOutputformat that it is already
> tested and integrated with Flink as opposed to you having to connect to
> HBase in a general SinkFunction.
>
> Best,
>
> Marton
> On Dec 22, 2015 1:04 PM, "Thomas Lamirault" <thomas.lamira...@ericsson.com>
> wrote:
>
>> Hello everybody,
>>
>> I am using Flink (0.10.1) with a streaming source (Kafka) , and I write
>> results of  flatMap/keyBy/timeWindow/reduce to a HBase table.
>> I have try with a class (Sinkclass) who implements
>> SinkFunction<MyObject>, and a class (HBaseOutputFormat) who implements
>> OutputFormat<MyObject>. For you, it's better to use the Sinkclass or 
>> HBaseOutputFormat,
>> for better performance and cleaner code ? (Or equivalent ?)
>>
>> Thanks,
>>
>> B.R / Cordialement
>>
>> Thomas Lamirault
>>
>

Re: Flink application with HBase

Reply via email to