In SQL, you can use the over window to deduplicate the messages by the id
[1], but i'm not sure if there are same semantic operators in DataStream.

[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#deduplication

s_penakalap...@yahoo.com <s_penakalap...@yahoo.com> 于2020年10月28日周三
下午12:34写道:

> Hi All,
>
> Request your inputs please.
>
> Regards,
> Sunitha
>
> On Tuesday, October 27, 2020, 01:01:41 PM GMT+5:30,
> s_penakalap...@yahoo.com <s_penakalap...@yahoo.com> wrote:
>
>
> Hi Team,
>
> I want to use Flink Datastream for Batch operations which involves huge
> data, I did try to calculate count and average on the whole Datastream with
> out using window function.
>
>  Approach I tried to calculate count on the datastream:
> 1> Read data from table (say past 2 days of data) as Datastream
> 2> apply Key operation on the datastream
> 3> then use reduce function to find count, sum and average.
>
> I have written output to file and also inserted into table: sample data
> from file is:
>
> vehicleId=aa, count=1, fuel=10, avgFuel=0.0
> vehicleId=dd, count=1, fuel=7, avgFuel=0.0
> vehicleId=dd, count=2, fuel=22, avgFuel=11.0
> vehicleId=dd, count=3, fuel=42, avgFuel=14.0
> vehicleId=ee, count=1, fuel=0, avgFuel=0.0
>
> what I am looking for is , when there are multiple records with same
> vehicle Id I see that only the final record is having correct values (like 
> vehicleId=dd).
> Is there any way to get only one final record for each vehicle as shown
> below:
> vehicleId=aa, count=1, fuel=10, avgFuel=0.0
> vehicleId=dd, count=3, fuel=42, avgFuel=14.0
> vehicleId=ee, count=1, fuel=0, avgFuel=0.0
>
> Also I request some help on how to sort whole DataStream based on one
> attribute. Say we have x records in one Batch Job I would like to sort and
> fetch X-2 position record per vehicle.
>
> Regards,
> Sunitha.
>
>

Reply via email to