Re: Dynamic partitioning for stream output

2016-07-11 Thread Josh
Hi guys, I've been working on this feature as I needed something similar. Have a look at my issue here https://issues.apache.org/jira/browse/FLINK-4190 and changes here https://github.com/joshfg/flink/tree/flink-4190 The changes follow Kostas's suggestion in this thread. Thanks, Josh On Thu, Ma

Re: Dynamic partitioning for stream output

2016-05-26 Thread Aljoscha Krettek
Hi, while I think it would be possible to do it by creating a "meta sink" that contains several RollingSinks I think the approach of integrating it into the current RollinkSink is better. I think it's mostly a question of style and architectural purity but also of resource consumption and maintain

Re: Dynamic partitioning for stream output

2016-05-25 Thread Kostas Kloudas
Hi Juho, To be more aligned with the semantics in Flink, I would suggest a solution with a single modified RollingSink that caches multiple buckets (from the Bucketer) and flushes (some of) them to disk whenever certain time or space criteria are met. I would say that it is worth modifying th

Re: Dynamic partitioning for stream output

2016-05-24 Thread Juho Autio
Related issue: https://issues.apache.org/jira/browse/FLINK-2672 On Wed, May 25, 2016 at 9:21 AM, Juho Autio wrote: > Thanks, indeed the desired behavior is to flush if bucket size exceeds a > limit but also if the bucket has been open long enough. Contrary to the > current RollingSink we don't w

Re: Dynamic partitioning for stream output

2016-05-24 Thread Juho Autio
Thanks, indeed the desired behavior is to flush if bucket size exceeds a limit but also if the bucket has been open long enough. Contrary to the current RollingSink we don't want to flush all the time if the bucket changes but have multiple buckets "open" as needed. In our case the date to use for

Re: Dynamic partitioning for stream output

2016-05-24 Thread Kostas Kloudas
Hi Juho, If I understand correctly, you want a custom RollingSink that caches some buckets, one for each topic/date key, and whenever the volume of data buffered exceeds a limit, then it flushes to disk, right? If this is the case, then you are right that this is not currently supported out-of-