In SQL, you can use the over window to deduplicate the messages by the id [1], but i'm not sure if there are same semantic operators in DataStream.
[1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#deduplication s_penakalap...@yahoo.com <s_penakalap...@yahoo.com> 于2020年10月28日周三 下午12:34写道: > Hi All, > > Request your inputs please. > > Regards, > Sunitha > > On Tuesday, October 27, 2020, 01:01:41 PM GMT+5:30, > s_penakalap...@yahoo.com <s_penakalap...@yahoo.com> wrote: > > > Hi Team, > > I want to use Flink Datastream for Batch operations which involves huge > data, I did try to calculate count and average on the whole Datastream with > out using window function. > > Approach I tried to calculate count on the datastream: > 1> Read data from table (say past 2 days of data) as Datastream > 2> apply Key operation on the datastream > 3> then use reduce function to find count, sum and average. > > I have written output to file and also inserted into table: sample data > from file is: > > vehicleId=aa, count=1, fuel=10, avgFuel=0.0 > vehicleId=dd, count=1, fuel=7, avgFuel=0.0 > vehicleId=dd, count=2, fuel=22, avgFuel=11.0 > vehicleId=dd, count=3, fuel=42, avgFuel=14.0 > vehicleId=ee, count=1, fuel=0, avgFuel=0.0 > > what I am looking for is , when there are multiple records with same > vehicle Id I see that only the final record is having correct values (like > vehicleId=dd). > Is there any way to get only one final record for each vehicle as shown > below: > vehicleId=aa, count=1, fuel=10, avgFuel=0.0 > vehicleId=dd, count=3, fuel=42, avgFuel=14.0 > vehicleId=ee, count=1, fuel=0, avgFuel=0.0 > > Also I request some help on how to sort whole DataStream based on one > attribute. Say we have x records in one Batch Job I would like to sort and > fetch X-2 position record per vehicle. > > Regards, > Sunitha. > >