Working with bounded Datastreams - Flink 1.11.1

s_penakalap...@yahoo.com Tue, 27 Oct 2020 00:32:02 -0700

Hi Team,
I want to use Flink Datastream for Batch operations which involves huge data, I 
did try to calculate count and average on the whole Datastream with out using 
window function.
 Approach I tried to calculate count on the datastream:1> Read data from table 
(say past 2 days of data) as Datastream2> apply Key operation on the 
datastream3> then use reduce function to find count, sum and average.
I have written output to file and also inserted into table: sample data from 
file is:
vehicleId=aa, count=1, fuel=10, avgFuel=0.0vehicleId=dd, count=1, fuel=7, 
avgFuel=0.0
vehicleId=dd, count=2, fuel=22, avgFuel=11.0vehicleId=dd, count=3, fuel=42, 
avgFuel=14.0vehicleId=ee, count=1, fuel=0, avgFuel=0.0
what I am looking for is , when there are multiple records with same vehicle Id 
I see that only the final record is having correct values (like vehicleId=dd). 
Is there any way to get only one final record for each vehicle as shown 
below:vehicleId=aa, count=1, fuel=10, avgFuel=0.0vehicleId=dd, count=3, 
fuel=42, avgFuel=14.0
vehicleId=ee, count=1, fuel=0, avgFuel=0.0
Also I request some help on how to sort whole DataStream based on one 
attribute. Say we have x records in one Batch Job I would like to sort and 
fetch X-2 position record per vehicle.
Regards,Sunitha.

Working with bounded Datastreams - Flink 1.11.1

Reply via email to