Re: Large rocksdb state restore/checkpoint duration behavior

Aminouvic Tue, 23 Oct 2018 03:08:49 -0700

Hello, 

Thank you for your answer and apologies for the late response.


For timers we are using : 

state.backend.rocksdb.timer-service.factory: rocksdb 

Are we still affected by [1] ?

For the interruptibility, we have coalesced our timers and the application
became more responsive to stop signals.

Also, after long investigations, we've found that we were abusing/misusing
AggregatingState & RocksDB :-/

The pseudo code of our window function looked like the following :
/    ........
    private AggregatingState<*IN*,OUT> state;
    .......
    @Override
    public void apply(Tuple4<String,String,String,String> key, TimeWindow
window, Iterable<IN> input, Collector<OUT> out) throws Exception {
        Iterator<IN> it= input.iterator();
        while (it.hasNext()){
            *state.add(it.next());*
        }
        out.collect(state.get());
    }
    ......../

Doing so, leads to call getInternal/updateInternal in state.add on RocksDB
for each inputItem and causes a huge pressure on RocksDB.

We transformed the code in order to iterate over items in the
AggregagtingFunction instead and call the state.add only once:

/    ........
    private AggregatingState<*Iterable<IN>*,OUT> state;
    .......
    @Override
    public void apply(Tuple4<String,String,String,String> key, TimeWindow
window, Iterable<IN> input, Collector<OUT> out) throws Exception {
       * state.add(input);*
        out.collect(state.get());
    }
    ........
/
Is this the right way to do so ?

Since this modifications, the application is more stable and the recover
time falls to few minutes.

Thank you for your help.

Best regards,
Amine



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Large rocksdb state restore/checkpoint duration behavior

Reply via email to