Hi Till I agree with you about the Flink's DC. It is another topic indeed. I just thought that we can think more about it before refactoring BLOB service. Make sure that it's easy to implement DC on the refactored architecture.
I have another question about BLOB service. Can we abstract the BLOB service to some high-level interfaces? May be just some put/get methods in the interfaces. Easy to extend will be useful in some scenarios. For example in Yarn mode, there are some cool features interesting us. 1. Yarn can localize files only once in one slave machine, all TMs in the same job can share these files. That may save lots of bandwidth for large scale jobs or jobs which have large BLOBs. 2. We can skip uploading files if they are already on DFS. That's a common scenario in distributed cache. 3. Even more, actually we don't need a BlobServer component in Yarn mode. We can rely on DFS to distribute files. There is always a DFS available in Yarn cluster. If we do so, the BLOB service through network can be the default implementation. It could work in any situation. It's also clear that it does not dependent on Hadoop explicitly. And we can do some optimization in different kinds of clusters without any hacking. That are just some rough ideas above. But I think well abstracted interfaces will be very helpful.