GitHub user ChrisChinchilla opened a pull request: https://github.com/apache/flink/pull/5023
[hotfix][docs] Review of concepts docs for grammar, formatting etc Spending some time doing a brief review of a few docs sections, just a start⦠You can merge this pull request into a Git repository by running: $ git pull https://github.com/ChrisChinchilla/flink concepts-review Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5023.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5023 ---- commit 77946f1d8772f2c2c095e93d5a76c8df42fbe790 Author: Chris Ward <chriswhw...@gmail.com> Date: 2017-10-02T13:20:50Z Polishing grammar and tone for consistency and clarity commit e5eafba7053eb41f0bcd4e177ed1fdae7c2c3e66 Author: Chris Ward <chriswhw...@gmail.com> Date: 2017-10-05T18:26:21Z Clarify support of older APIs commit 027914715744e673e41893067beec58b52a177ce Author: Chris Ward <chriswhw...@gmail.com> Date: 2017-10-12T13:44:28Z Begin fixes for voicing, passive voice, weasel words etc⦠commit 05cb3f04d332886aafc2b5675358aed316f76826 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-07-11T09:19:20Z [FLINK-7068][blob] change BlobService sub-classes for permanent and transient BLOBs [FLINK-7068][blob] start introducing a new BLOB storage abstraction This is incomplete and may not compile and/or run tests successfully yet. [FLINK-7068][blob] remove BlobView from TransientBlobCache The transient BLOB cache is not supposed to work with the HA store since it only serves non-HA files. [FLINK-7068][blob] remove unnecessary use of BlobClient [FLINK-7068][blob] implement TransientBlobCache#put methods [FLINK-7068][blob] remove further unnecessary use of BlobClient and adapt to HA get/put methods [FLINK-7068][blob] fix BlobServer#getFileInternal not being guarded by locks [FLINK-7068][blob] add incoming file cleanup at BlobServer in cases of errors [FLINK-7068] fix missing BlobServer#putHA() jobId propagation [FLINK-7068][blob] remove BlobClient use from BlobServer{Get|Put}Test [FLINK-7068][blob] make helper methods work with any BlobService [FLINK-7068][blob] start adding a BlobCacheGetTest [FLINK-7068][blob] verify get contents in separate threads This allows (at a slight chance) that we may see an intermediate file. [FLINK-7068][blob] better locking granularity during file retrieval This allows multiple parallel downloads from the HA store to the BlobServer's local store although only one of these downloaded staging files will actually be used. In practice, this happens only during recovery and not in parallel anyways. [FLINK-7068][blob] share more code among BlobServer and BlobServerConnection This also applies the better locking granularity of the previous commit to BlobServerConnection. [FLINK-7068][blob] properly cleanup temporary staging files in all cases [FLINK-7068][blob] make PermanentBlobCache and TransientBlobCache thread-safe [FLINK-7068][tests] improve various tests [FLINK-7068][blob] change the signature of the delete calls to return success We will not throw exceptions in case of failures anymore and return whether the operation was successful instead. Failure details will still be accessible in the written logs. [FLINK-7068][tests] extend and adapt BlobServerDeleteTest [FLINK-7068][tests] adapt further BlobCache tests [FLINK-7068][tests] adapt BlobClientTest [FLINK-7068][blob] cleanup BlobClient methods BlobClient is not supposed to be used by anyone else than the BlobServer/BlobCache classes. Most accessors were already package-private, now remove the ones that just blow up the code. [FLINK-7068] add a TODO to fix the currently failing tests [FLINK-7068][tests] add a BlobCacheRecoveryTest This currently fails due to TransientBlobCache#put also storing files in HA store which it should not! [FLINK-7068][tests] improve failure message [FLINK-7068][blob] add permanent/transient BLOB modes to BlobClient This allows a better control of which should end up in HA store and which should not. Also, during GET methods, we do not check the HA store unnecessarily. [FLINK-7068][tests] extend the Blob{Server|Cache}GetTest This adds some failing GET operations and verifies that the files are cleaned up accordingly. [FLINK-7068][blob] remove "final" flag from BlobCache class This re-enables mocking in various unit tests. [FLINK-7068][tests] fix test relying on order of folder contents [FLINK-7068][blob] some BlobServer cleanup [FLINK-7068][hotfix] fix checkstyle errors [FLINK-7068][tests] fix tests now requiring a more complete BlobCache mock A suitable BlobCache mock should at least return a mock for a permanent and a transient BLOB store, so mock(BlobCache.class) is not sufficient anymore. [FLINK-7068] final wrap up * remove a left-over TODO * remove useless tests for the concurrency of the GET operations (we cannot test that the file write is guarded by a lock directly - rely on the concurrent checks in the individual threads instead) * fix some log messages [FLINK-7068][blob] remove Thread#start() call from BlobServer constructor This is bad design and limits extensibility, e.g. in tests like the BlobCacheRetriesTest where this caused a race condition with the sub-class. Instead, the user must now call BlobServer#start() explicitely. [FLINK-7068][tests] remove unused imports [FLINK-7068][tests] fix a typo [FLINK-7068][tests] add some tests that verify behaviour with corrupted files Also add corruption checks for HA-store downloads which was not implemented yet. [FLINK-7068][blob] ensure consistency in PermanentBlobCache even in cases of invalid use During cleanup, no write lock was taken but the storage directory of an (unused!) job was deleted. Normally, there should be no process left accessing its data and no new process can jump in since the registration is locked. In case of invalid use cases, i.e. using a job's data outside a register() and release() block, this could lead to strange effects. By guarding the cleanup with the write lock as well, we circumvent that. [FLINK-7068][hotfix] remove an unused import commit c7d4562db8af446e79967215cda597be37261eed Author: Nico Kruber <n...@data-artisans.com> Date: 2017-07-25T11:00:33Z [FLINK-7261][blob] extend BlobStore#get/put with boolean return values This way, using code can distinguish non-HA cases, i.e. VoidBlobStore, from HA cases, i.e. FileSystemBlobStore, in a general way and have better error reporting. commit d7ad3aed56ca72139760f04af541b061b3107cca Author: Nico Kruber <n...@data-artisans.com> Date: 2017-08-07T16:12:28Z [FLINK-7412][network] optimise NettyMessage.TaskEventRequest#readFrom() to read from netty buffers directly This closes #4518. commit f3ef92fca6f6b572a29e5b2b37caa2d669a06f20 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-08-18T10:28:23Z [FLINK-7057][tests][hotfix] fix test instability of JobManagerCleanupITCase#testBlobServerCleanupCancelledJob This test expected two messages to arrice (job cancellation and job state change notification) but did not take different receive orders into account. The fix: - removes state change listening for this test case so that only one message arrives, and - adds message comparison by object, not just class (to improve debugging) commit b2d2d584d9f1849bf81fbe09ee81ca44efacc421 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-08-21T08:36:56Z [FLINK-7483][blob] prevent cleanup of re-registered jobs When a job is registered, it may have been released before and we thus need to reset the cleanup timeout again. commit a1a1cf59c0c677b593934b17c239e6832440c56d Author: Fabian Hueske <fhue...@apache.org> Date: 2017-09-10T22:05:06Z [FLINK-7446] [table] Change DefinedRowtimeAttribute to work on existing field. This closes #4710. commit 404779108319d1b4ead02bb4af1fb979ba1824dd Author: Nico Kruber <n...@data-artisans.com> Date: 2017-09-20T10:05:25Z [FLINK-7068][blob] Introduce permanent and transient BLOB keys [FLINK-7068][blob] address PR review comments, part 1 [FLINK-7068][blob] create a common base class for the BLOB caches [FLINK-7068][blob] update some comments [FLINK-7068][blob] integrate the BLOB type into the BlobKey [FLINK-7068][blob] rename a few methods for better consistency [FLINK-7068][blob] fix Blob*DeleteTest not working as documented in one test [FLINK-7068][blob] add checks for jobId being null in PermanentBlobCache [FLINK-7068][blob] implement get-and-delete logic for transient BLOBs Transient BLOB files are deleted on the BlobServer upon first access from a cache. Therefore, we do not need the DELETE operations anymore, aside from deleting the file from the local cache (for now). [FLINK-7068][blob] address PR comments, part 2 [FLINK-7068][blob] separate permanent and transient BLOB keys * create PermanentBlobKey and TransientBlobKey (inheriting from BlobKey) and forbid using transient BLOBs with permanent caches and vice versa * make BlobKey package-private, similarly for the BlobType which is now reflected by the two BlobKey sub-classes -> this gives a cleaner interface for the user This closes #4358. commit 1546277f7e2351f83efeb6e57c04e5b5c0323c6a Author: Till Rohrmann <trohrm...@apache.org> Date: 2017-09-25T13:29:59Z [FLINK-7668] Add ExecutionGraphCache for ExecutionGraph based REST handlers The ExecutionGraphCache replaces the ExecutionGraphHolder. Unlike the latter, the former does not expect the AccessExecutionGraph to be the true ExecutionGraph. Instead it assumes it to be the ArchivedExecutionGraph. Therefore, it invalidates the cache entries after a given time to live period. This will trigger requesting the AccessExecutionGraph again and, thus, updating the ExecutionGraph information for the ExecutionGraph based REST handlers. In order to avoid memory leaks, the WebRuntimeMonitor starts now a periodic cleanup task which triggers ExecutionGraphCache.cleanup. This methods releases all cache entries which have exceeded their time to live. Currently it is set to 20 * refreshInterval of the web gui. This closes #4728. commit 63f003be24ddb5190a1f2d4b5b710549c285e848 Author: Till Rohrmann <trohrm...@apache.org> Date: 2017-09-26T16:39:15Z [FLINK-7695] [flip6] Add JobConfigHandler for new RestServerEndpoint This closes #4737. commit bd55021374380f811b5f338fa6f30588635f84f4 Author: Tzu-Li (Gordon) Tai <tzuli...@apache.org> Date: 2017-09-27T18:05:21Z [FLINK-7721] [DataStream] Only emit new min watermark iff it was aggregated from watermark-aligned inputs Prior to this commit, In the calculation of the new min watermark in StatusWatermarkValve#findAndOutputNewMinWatermarkAcrossAlignedChannels(), there is no verification that the calculated new min watermark really is aggregated from some aligned channel. In the corner case where all input channels are currently not aligned but actually some are active, we would then incorrectly determine that the final aggregation is Long.MAX_VALUE and emit that. This commit fixes this by only emitting the aggregated watermark iff it was really calculated from some aligned input channel (as well as the already existing constraint that it needs to be larger than the last emitted watermark). This change should also safely cover the case that a Long.MAX_VALUE was genuinely aggregated from the input channels. commit 3b457321f39803b955ffc3d01c9d862ccb7a4769 Author: Tzu-Li (Gordon) Tai <tzuli...@apache.org> Date: 2017-09-28T12:11:22Z [FLINK-7728] [DataStream] Simplify BufferedValveOutputHandler used in StatusWatermarkValveTest The previous implementation was overly complicated. Having separate buffers for the StreamStatus and Watermarks is not required for our tests. Also, that design doesn't allow checking the order StreamStatus / Watermarks are emitted from a single input to the valve. This commit reworks it by buffering both StreamStatus and Watermarks in a shared queue. commit 9538eae0312c74a1e84fa4088fc09154b9290c5e Author: Tzu-Li (Gordon) Tai <tzuli...@apache.org> Date: 2017-09-28T14:49:22Z [FLINK-7728] [DataStream] Flush max watermark across all inputs once all become idle Prior to this commit, once all inputs of the StatusWatermarkValve becomes idle, we only emit the StreamStatus.IDLE marker, and check nothing else. This makes the watermark advancement behaviour inconsistent in the case that all inputs become idle depending on the order that they become idle. This commit fixes this by "flushing" the max watermark across all channels once all inputs become idle. At a high-level, what this means for downstream operators is that all inputs have become idle and will temporariliy cease to advance their watermarks, so they can safely advance their event time to whatever the largest watermark is. commit f14dfd04c748e26e8f0c59146a8dd209871404fc Author: Till Rohrmann <trohrm...@apache.org> Date: 2017-09-28T16:35:50Z [FLINK-7708] [flip6] Add CheckpointConfigHandler for new REST endpoint This commit implements the CheckpointConfigHandler which now returns a CheckpointConfigInfo object if checkpointing is enabled. In case that checkpointing is disabled for a job, it will return a 404 response. This closes #4744. commit 85f5168f58e9a46e1009ce79f4285245199612cb Author: Tzu-Li (Gordon) Tai <tzuli...@apache.org> Date: 2017-09-29T09:16:22Z [FLINK-7728] [DataStream] Make StatusWatermarkValve unit tests more fine-grained Previously, the unit tests in StatusWatermarkValveTest were too cluttered and testing too many behaviours in a single test. This makes it hard to have a good overview of what test cases are covered. This commit is a rework of the previous tests, making them more fine-grained so that the scope of each test is small enough. All previously tested behaviours are still covered. commit 32d90137e3984ce067761e0d18707cc65992131d Author: Till Rohrmann <trohrm...@apache.org> Date: 2017-09-29T13:09:06Z [FLINK-7710] [flip6] Add CheckpointStatisticsHandler for the new REST endpoint This commit also makes the CheckpointStatsHistory object serializable by removing the CheckpointStatsHistoryIterable and replacing it with a static ArrayList. This closes #4750. commit 9f773c3f60e31569023087454a71e660c3d669cd Author: Aljoscha Krettek <aljoscha.kret...@gmail.com> Date: 2017-10-02T10:28:10Z [FLINK-7721] Verify StatusWatermarkValve only emits WM iff it has aligned inputs commit ebc01c7d7e693666bad4a6d65b126f22dc264812 Author: Stephan Ewen <se...@apache.org> Date: 2017-10-02T12:15:14Z [hotfix] [core] Prevent potential null pointer in MemorySize.equals(...) commit 2aa618a2bab4260fc3d46b7a52f68450d0f34626 Author: Stephan Ewen <se...@apache.org> Date: 2017-10-02T12:34:27Z [FLINK-7643] [core] Misc. cleanups in FileSystem - Simplify access to local file system - Use a fair lock for all FileSystem.get() operations - Robust falback to local fs for default scheme (avoids URI parsing error on Windows) - Deprecate 'getDefaultBlockSize()' - Deprecate create(...) with block sizes and replication factor, which is not applicable to many FS commit d186c169591c36a6fca8dbb5d40ff2d5815ed810 Author: Stephan Ewen <se...@apache.org> Date: 2017-10-02T14:25:18Z [FLINK-7643] [core] Rework FileSystem loading to use factories This makes sure that configurations are loaded once and file system instances are properly reused by scheme and authority. This also factors out a lot of the special treatment of Hadoop file systems and simply makes the Hadoop File System factory the default fallback factory. commit 0f3fcb8967a95d31f383f8b4d895869b3d2b8b30 Author: Stephan Ewen <se...@apache.org> Date: 2017-10-02T14:30:07Z [FLINK-7643] [core] Drop eager checks for file system support. Some places validate if the file URIs are resolvable on the client. This leads to problems when file systems are not accessible from the client, when the full libraries for the file systems are not present on the client (for example often the case in cloud setups), or when the configuration on the client is different from the nodes/containers that will execute the application. commit 1479b72ba3f11a985f977ee568d72e25a97ab450 Author: Till Rohrmann <trohrm...@apache.org> Date: 2017-10-02T20:29:12Z [FLINK-7754] [rpc] Complete termination future after actor has been stopped This commit waits not only until the Actor has called postStop but also until the actor has been completely shut down by the ActorSystem before completing the termination future. This closes #4770. commit 8fbade07443dd2ca3d408ed61485d3dcaf86415e Author: zentol <ches...@apache.org> Date: 2017-10-02T20:58:21Z [hotfix] [dispatcher] Remove leftover javadoc from DispatcherGateway commit 56aac28c614462a119edba13461b8de11ea9fbc9 Author: Stephan Ewen <se...@apache.org> Date: 2017-10-02T21:06:07Z [hotfix] Fix testing log level in flink-runtime commit f9b726df9bb86b41d0c903a36ed2e318ae94ce62 Author: Wright, Eron <eron.wri...@emc.com> Date: 2017-10-04T03:36:09Z [FLINK-7752] [flip-6] RedirectHandler should execute on the IO thread This closes #4766. commit 5d631dd79810e571c9baff2f3079084380e196bd Author: Fabian Hueske <fhue...@apache.org> Date: 2017-10-04T11:01:22Z [hotfix] [hbase] Set root log level to OFF for flink-hbase tests. Log level is changed due to a buggy Calcite check that causes a NPE. The check is only performed if log level DEBUG is enabled. This closes #4771 commit 1b17941b31831ecfcad9b1d5bc1505415ac66306 Author: Stephan Ewen <se...@apache.org> Date: 2017-10-05T09:26:13Z [FLINK-7767] [file system sinks] Avoid loading Hadoop conf dynamically at runtime commit c6cdd4172107985a4160869d6af6ddc0ade44fbf Author: Stephan Ewen <se...@apache.org> Date: 2017-10-05T09:27:48Z [FLINK-7766] [file system sink] Drop obsolete reflective hflush calls This was done reflectively before for Hadoop 1 compatibility. Since Hadoop 1 is no longer supported, this is obsolete now. ---- ---