Hi Sunitha, you probably need to apply a non-windowed grouping.
datastream .keyBy(Event::getVehicleId) .reduce((first, other) -> first); This example will always throw away the second record. You may want to combine the records though by summing up the fuel. Best, Arvid On Wed, Oct 28, 2020 at 8:47 AM Danny Chan <danny0...@apache.org> wrote: > In SQL, you can use the over window to deduplicate the messages by the id > [1], but i'm not sure if there are same semantic operators in DataStream. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#deduplication > > s_penakalap...@yahoo.com <s_penakalap...@yahoo.com> 于2020年10月28日周三 > 下午12:34写道: > >> Hi All, >> >> Request your inputs please. >> >> Regards, >> Sunitha >> >> On Tuesday, October 27, 2020, 01:01:41 PM GMT+5:30, >> s_penakalap...@yahoo.com <s_penakalap...@yahoo.com> wrote: >> >> >> Hi Team, >> >> I want to use Flink Datastream for Batch operations which involves huge >> data, I did try to calculate count and average on the whole Datastream with >> out using window function. >> >> Approach I tried to calculate count on the datastream: >> 1> Read data from table (say past 2 days of data) as Datastream >> 2> apply Key operation on the datastream >> 3> then use reduce function to find count, sum and average. >> >> I have written output to file and also inserted into table: sample data >> from file is: >> >> vehicleId=aa, count=1, fuel=10, avgFuel=0.0 >> vehicleId=dd, count=1, fuel=7, avgFuel=0.0 >> vehicleId=dd, count=2, fuel=22, avgFuel=11.0 >> vehicleId=dd, count=3, fuel=42, avgFuel=14.0 >> vehicleId=ee, count=1, fuel=0, avgFuel=0.0 >> >> what I am looking for is , when there are multiple records with same >> vehicle Id I see that only the final record is having correct values (like >> vehicleId=dd). >> Is there any way to get only one final record for each vehicle as shown >> below: >> vehicleId=aa, count=1, fuel=10, avgFuel=0.0 >> vehicleId=dd, count=3, fuel=42, avgFuel=14.0 >> vehicleId=ee, count=1, fuel=0, avgFuel=0.0 >> >> Also I request some help on how to sort whole DataStream based on one >> attribute. Say we have x records in one Batch Job I would like to sort and >> fetch X-2 position record per vehicle. >> >> Regards, >> Sunitha. >> >> -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng