If your environment is not kerberized (or if you can offord to restart the
job every 7 days),  a checkpoint enabled, flink job with windowing and the
count trigger, would be ideal for your requirement.

Check the api's on flink windows.

I had something like this that worked

stream.keyBy(0).countWindow(2000).apply(function)
                .setParallelism(conf.getInt("window.parallelism"));

where stream is a data stream using the kafka connector,
"function" is where you would have the "send it to my microservice" part.
keyBy(0) is the aggregation based on a key field

You could look up the individual methods in the api.

Thanks,
Prabhu

On Wed, Aug 3, 2016 at 5:21 AM, Alam, Zeeshan <zeeshan.a...@fmr.com> wrote:

> Hi,
>
>
>
> Flink works very well with Kafka if you wish to stream data. Following  is
> how I am streaming data with Kafka and Flink.
>
>
>
> FlinkKafkaConsumer08<Event> kafkaConsumer = *new* FlinkKafkaConsumer08<>(
> *KAFKA_AVRO_TOPIC*, avroSchema, properties);
>
> DataStream<Event> messageStream = env.addSource(kafkaConsumer);
>
>
>
> Is there a way to do a micro batch operation on the data coming from
> Kafka? What I want to do is to *reduce* or *aggregate* the events coming
> from Kafka. For instance I am getting 40000 events per second from Kafka
> and what I want is to group 2000 events into one and send it to my
> microservice for further processing. Can I use the *Flink DataSet API*
> for this or should I go with Spark or some other framework?
>
>
>
> Thanks & Regards
>
> Zeeshan Alam
>

Reply via email to