Re: FileStreamingSink is using the same counter for different files

2020-01-28 Thread Kostas Kloudas
Hi Pawel, You are correct that the write method invocation is guaranteed to be thread safe for the same sub operator instance. But I am not sure if having a unique counter per subtask across buckets would add much to the user experience of the sink. I think that in both cases, the interpretation o

Re: FileStreamingSink is using the same counter for different files

2020-01-25 Thread Pawel Bartoszek
Hi Kostas, Thanks for confirming that. I started thinking it might be useful or more user friendly to use unique counter across buckets for the same operator subtask? The way I could imagine this working is to pass max counter to the https://github.com/apache/flink/blob/e7e24471240dbaa6b5148d40657

Re: FileStreamingSink is using the same counter for different files

2020-01-24 Thread Kostas Kloudas
Hi Pawel, You are correct that counters are unique within the same bucket but NOT across buckets. Across buckets, you may see the same counter being used. The max counter is used only upon restoring from a failure, resuming from a savepoint or rescaling and this is done to guarantee that n valid d

Re: FileStreamingSink is using the same counter for different files

2020-01-24 Thread Pawel Bartoszek
I have looked into the source code and it looks likes that the same counter counter value being used in two buckets is correct. Each Bucket class https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/filesystem/Bucket.java is pa