Hi Team,
I want to use Flink Datastream for Batch operations which involves huge data, I
did try to calculate count and average on the whole Datastream with out using
window function.
Approach I tried to calculate count on the datastream:1> Read data from table
(say past 2 days of data) as Datastream2> apply Key operation on the
datastream3> then use reduce function to find count, sum and average.
I have written output to file and also inserted into table: sample data from
file is:
vehicleId=aa, count=1, fuel=10, avgFuel=0.0vehicleId=dd, count=1, fuel=7,
avgFuel=0.0
vehicleId=dd, count=2, fuel=22, avgFuel=11.0vehicleId=dd, count=3, fuel=42,
avgFuel=14.0vehicleId=ee, count=1, fuel=0, avgFuel=0.0
what I am looking for is , when there are multiple records with same vehicle Id
I see that only the final record is having correct values (like vehicleId=dd).
Is there any way to get only one final record for each vehicle as shown
below:vehicleId=aa, count=1, fuel=10, avgFuel=0.0vehicleId=dd, count=3,
fuel=42, avgFuel=14.0
vehicleId=ee, count=1, fuel=0, avgFuel=0.0
Also I request some help on how to sort whole DataStream based on one
attribute. Say we have x records in one Batch Job I would like to sort and
fetch X-2 position record per vehicle.
Regards,Sunitha.