Re: Streaming to Parquet Files in HDFS

2018-09-28 Thread hao gao
Hi Bill, I wrote those two medium posts you mentioned above. But clearly, the techlab one is much better I would suggest just "close the file when checkpointing" which is the easiest way. If you use BucketingSink, you can modify the code to make it work. Just replace the code from line 691 to 693

Re: Flink + Marathon (Mesos) Memory Issues

2018-05-04 Thread hao gao
Hi, Since you said BucketingSink, I think it may be related to your bucketer. Let's say you bucket by hour. In your stream, at a moment, your records' timestamp ranges from hour 00 to hour 23. Which means in your task, it needs 24 writers dedicated to each bucket. If you have 4 task slots in a ta

Re: Externalized checkpoints and metadata

2018-04-25 Thread hao gao
Hi Juan, We modified the flink code a little bit to change the flink checkpoint structure so we can easily identify which is which you can read my note or the PR https://medium.com/hadoop-noob/flink-externalized-checkpoint-eb86e693cfed https://github.com/BranchMetrics/flink/pull/6/files Hope it he