Hi Sunitha,

you probably need to apply a non-windowed grouping.

datastream
   .keyBy(Event::getVehicleId)
   .reduce((first, other) -> first);

This example will always throw away the second record. You may want to
combine the records though by summing up the fuel.

Best,

Arvid

On Wed, Oct 28, 2020 at 8:47 AM Danny Chan <danny0...@apache.org> wrote:

> In SQL, you can use the over window to deduplicate the messages by the id
> [1], but i'm not sure if there are same semantic operators in DataStream.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#deduplication
>
> s_penakalap...@yahoo.com <s_penakalap...@yahoo.com> 于2020年10月28日周三
> 下午12:34写道:
>
>> Hi All,
>>
>> Request your inputs please.
>>
>> Regards,
>> Sunitha
>>
>> On Tuesday, October 27, 2020, 01:01:41 PM GMT+5:30,
>> s_penakalap...@yahoo.com <s_penakalap...@yahoo.com> wrote:
>>
>>
>> Hi Team,
>>
>> I want to use Flink Datastream for Batch operations which involves huge
>> data, I did try to calculate count and average on the whole Datastream with
>> out using window function.
>>
>>  Approach I tried to calculate count on the datastream:
>> 1> Read data from table (say past 2 days of data) as Datastream
>> 2> apply Key operation on the datastream
>> 3> then use reduce function to find count, sum and average.
>>
>> I have written output to file and also inserted into table: sample data
>> from file is:
>>
>> vehicleId=aa, count=1, fuel=10, avgFuel=0.0
>> vehicleId=dd, count=1, fuel=7, avgFuel=0.0
>> vehicleId=dd, count=2, fuel=22, avgFuel=11.0
>> vehicleId=dd, count=3, fuel=42, avgFuel=14.0
>> vehicleId=ee, count=1, fuel=0, avgFuel=0.0
>>
>> what I am looking for is , when there are multiple records with same
>> vehicle Id I see that only the final record is having correct values (like 
>> vehicleId=dd).
>> Is there any way to get only one final record for each vehicle as shown
>> below:
>> vehicleId=aa, count=1, fuel=10, avgFuel=0.0
>> vehicleId=dd, count=3, fuel=42, avgFuel=14.0
>> vehicleId=ee, count=1, fuel=0, avgFuel=0.0
>>
>> Also I request some help on how to sort whole DataStream based on one
>> attribute. Say we have x records in one Batch Job I would like to sort and
>> fetch X-2 position record per vehicle.
>>
>> Regards,
>> Sunitha.
>>
>>

-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Reply via email to