Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/6147#discussion_r195066210 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/cache/DistributedCache.java --- @@ -40,6 +41,14 @@ @Public public class DistributedCache { + /** + * An entry for a single file or directory that should be cached. + * + * <p>Entries have different semantics for local directories depending on where we are in the job-submission process. + * After registration through the API {@code filePath} denotes the original directory. + * Before the job is submitted to the cluster directories are zipped, at which point {@code filePath} denotes the path to the local zip. + * After the upload to the cluster, {@code filePath} denotes the (server-side) copy of the zip. + */ public static class DistributedCacheEntry implements Serializable { --- End diff -- As a neat side effect, we could also refactor how this information is sent to the cluster, namely changing it such that it is no longer serialized into the `Configuration`.
---