If your environment is not kerberized (or if you can offord to restart the job every 7 days), a checkpoint enabled, flink job with windowing and the count trigger, would be ideal for your requirement.
Check the api's on flink windows. I had something like this that worked stream.keyBy(0).countWindow(2000).apply(function) .setParallelism(conf.getInt("window.parallelism")); where stream is a data stream using the kafka connector, "function" is where you would have the "send it to my microservice" part. keyBy(0) is the aggregation based on a key field You could look up the individual methods in the api. Thanks, Prabhu On Wed, Aug 3, 2016 at 5:21 AM, Alam, Zeeshan <zeeshan.a...@fmr.com> wrote: > Hi, > > > > Flink works very well with Kafka if you wish to stream data. Following is > how I am streaming data with Kafka and Flink. > > > > FlinkKafkaConsumer08<Event> kafkaConsumer = *new* FlinkKafkaConsumer08<>( > *KAFKA_AVRO_TOPIC*, avroSchema, properties); > > DataStream<Event> messageStream = env.addSource(kafkaConsumer); > > > > Is there a way to do a micro batch operation on the data coming from > Kafka? What I want to do is to *reduce* or *aggregate* the events coming > from Kafka. For instance I am getting 40000 events per second from Kafka > and what I want is to group 2000 events into one and send it to my > microservice for further processing. Can I use the *Flink DataSet API* > for this or should I go with Spark or some other framework? > > > > Thanks & Regards > > Zeeshan Alam >