[GitHub] flink issue #3574: [FLINK-5653] Add processing time OVER ROWS BETWEEN x PREC...

fhueske Mon, 27 Mar 2017 01:35:44 -0700

Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3574
  
    Hi @huawei-flink, thanks for your detailed explanation. 
    
    The benefits of the MapState are that we only need to deserialize all keys 
and not all rows as in the ValueState or ListState case. Identifying the 
smallest key (as needed for OVER ROWS) is basically for free. Once the smallest 
key has been found, we only need to deserialize the rows that need to be 
retracted. All other rows are not touched at all. 
    
    The benchmarks that @rtudoran ran were done with an in-memory state 
backend, which does not de/serialize data but keeps the state as objects on the 
heap. I think the numbers would be different if you would switch to the RocksDB 
state backend which serializes all data (RocksDB is the only state backend 
recommended for production settings). In fact, I would read from the result of 
the benchmarks that sorting the keys does not have a major impact on the 
performance. Another important aspect of the design is that RocksDB iterates of 
the the map keys in order, so even sorting (or rather ensuring a sorted order) 
becomes O(n). 
    
    I do see the benefits of keeping data in order, but de/serialization is one 
of the major costs when processing data on the JVM and it makes a lot of sense 
to optimize for reduced de/serialization overhead.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3574: [FLINK-5653] Add processing time OVER ROWS BETWEEN x PREC...

Reply via email to