[
https://issues.apache.org/jira/browse/FLINK-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643785#comment-15643785
]
ASF GitHub Bot commented on FLINK-5021:
---------------------------------------
GitHub user kl0u opened a pull request:
https://github.com/apache/flink/pull/2763
[FLINK-5021] Makes the ContinuousFileReaderOperator rescalable.
This is the last PR that completes the refactoring of the
`ContinuousFileReaderOperator` so that it can be rescalable. With this, the
reader can restart from a savepoint with a different parallelism without
compromising the provided exactly-once guarantees.
The whole PR contains 3 commits.
The first removes the EOS special split which was used to signal that no
new splits are to be processed. This was useful in the `PROCESS_ONCE` mode. Now
the reader closes by setting a flag and waiting for all the pending splits to
be fully processed.
The second puts an additional check in the
`ContinuousFileMonitoringFunction` that guarantees that in the case of the
`PROCESS_ONCE`, the source will not reprocess the directory after recovering
from a failure.
Finally, the third integrates the new rescalable state abstractions with
the reader so that the reader can restart from a savepoint with different
parallelism and still guarantee exactly-once semantics.
R: @aljoscha
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kl0u/flink repart_fs
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2763.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2763
----
commit 3976c70449f958da24ded2f7e5c3c1a3ef6cccd7
Author: kl0u <[email protected]>
Date: 2016-11-03T09:50:04Z
[FLINK-5021] Removes the special EOS TimestampedFileInputSplit.
Without this special split signaling that no more splits are
to arrive, the ContinuousFileReaderOperator now closes by
setting a flag that marks it as closed and exiting when the
flag is set to true and the pending split queue is empty.
commit 18ebac80f34c8c5db88d8493c21a3ba2e9fa2c6c
Author: kl0u <[email protected]>
Date: 2016-11-03T10:08:45Z
[FLINK-5021] Guarantees PROCESS_ONCE works correctly after recovering.
commit bbabe5e95aa6f115772099af250744a066eccb77
Author: kl0u <[email protected]>
Date: 2016-11-03T10:21:08Z
[FLINK-5021] Makes the ContinuousFileReaderOperator rescalable.
This is the last commit that completes the refactoring of the
ContinuousFileReaderOperator so that it can be rescalable.
With this, the reader can restart from a savepoint with a
different parallelism without compromising the provided
exactly-once guarantees.
----
> Makes the ContinuousFileReaderOperator rescalable.
> --------------------------------------------------
>
> Key: FLINK-5021
> URL: https://issues.apache.org/jira/browse/FLINK-5021
> Project: Flink
> Issue Type: Improvement
> Components: filesystem-connector
> Reporter: Kostas Kloudas
> Assignee: Kostas Kloudas
> Fix For: 1.2.0
>
>
> This targets integrating the ContinuousFileReaderOperator with the new
> rescalable state abstractions, so that the operator can change parallelism
> without jeopardizing the guarantees offered by it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)