Use RocksDBBackend to store whether the element appeared within the last
one day,  here is the code:

*public class DedupFunction extends KeyedProcessFunction<Long, IN,OUT>  {*

*    private ValueState<Boolean> isExist;*

*    public void open(Configuration parameters) throws Exception {*
*        ValueStateDescriptor<boolean> desc = new ........*
*        StateTtlConfig ttlConfig =
StateTtlConfig.newBuilder(Time.hours(24)).setUpdateType......*
*        desc.enableTimeToLive(ttlConfig);*
*        isExist = getRuntimeContext().getState(desc);*
*    }*

*    public void processElement(IN in, .... ) {*
*        if(null == isExist.value()) {*
*            out.collect(in)*
*            isExist.update(true)*
*        } *
*    }*
*}*

Because the number of distinct key is too large(about 10 billion one day ),
there's performance bottleneck for this operator.
How can I optimize the performance?

Thanks,
Lei

Reply via email to