[ https://issues.apache.org/jira/browse/FLINK-20505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244999#comment-17244999 ]
Xintong Song commented on FLINK-20505: -------------------------------------- Thanks for the replies, [~ZhenqiuHuang] and [~zoucao]. [~ZhenqiuHuang], Do you mean there's another `HttpFileSystem` that gives the actual length of files? I checked this with hadoop-common-3.1.0, where `AbstractHttpFileSystem#getFileStatus` always returns `-1` for the lengths, and neither `HttpFileSystem` nor `HttpsFileSystem` overrides, as also mentioned by [~zoucao] . [~zoucao], Yes, I opened this ticket based on the problem you reported in the mailing list. I've not added a link to the original mailing thread because it is in Chinese. [~ZhenqiuHuang] & [~zoucao], I think this is issue may not necessarily be a bug, because one can also argue that the provided lib feature currently only supports HDFS paths, which makes supporting HTTP paths an improvement. However, I do believe this is a limitation of Flink not supporting HTTP paths. As far as I can see, both `Http/HttpsFileSystem` and Yarn does not require non negative file lengths. (I would need further investigation to be completely sure about this, but it seems at least Yarn `ContainerLocalizer` assumes the file length can be negative.) It seems to me the only reason this feature does not work with Http paths is that Flink makes an assumption on non-negative file lengths. I think users should not be forced to use a specific `HttpFileSystem` implementation, or worse providing their own implementation, only because of this. WDYT? > Yarn provided lib does not work with http paths. > ------------------------------------------------ > > Key: FLINK-20505 > URL: https://issues.apache.org/jira/browse/FLINK-20505 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN > Affects Versions: 1.12.0, 1.11.2 > Reporter: Xintong Song > Assignee: Xintong Song > Priority: Major > > If an http path is used for provided lib, the following exception will be > thrown on the resource manager side: > {code:java} > 2020-12-04 17:01:28.955 ERROR org.apache.flink.yarn.YarnResourceManager - > Could not start TaskManager in container containerXXXXXX. > org.apache.flink.util.FlinkException: Error to parse > YarnLocalResourceDescriptor from YarnLocalResourceDescriptor{key=XXXXX.jar, > path=https://XXXXXXX.jar, size=-1, modificationTime=0, visibility=APPLICATION} > at > org.apache.flink.yarn.YarnLocalResourceDescriptor.fromString(YarnLocalResourceDescriptor.java:99) > at > org.apache.flink.yarn.Utils.decodeYarnLocalResourceDescriptorListFromString(Utils.java:721) > at org.apache.flink.yarn.Utils.createTaskExecutorContext(Utils.java:626) > at > org.apache.flink.yarn.YarnResourceManager.getOrCreateContainerLaunchContext(YarnResourceManager.java:746) > at > org.apache.flink.yarn.YarnResourceManager.createTaskExecutorLaunchContext(YarnResourceManager.java:726) > at > org.apache.flink.yarn.YarnResourceManager.startTaskExecutorInContainer(YarnResourceManager.java:500) > at > org.apache.flink.yarn.YarnResourceManager.onContainersOfResourceAllocated(YarnResourceManager.java:455) > at > org.apache.flink.yarn.YarnResourceManager.lambda$onContainersAllocated$1(YarnResourceManager.java:415) > {code} > The problem is that, `HttpFileSystem#getFilsStatus` returns file status with > length `-1`, while `YarnLocalResourceDescriptor` does not recognize the > negative file length. -- This message was sent by Atlassian Jira (v8.3.4#803005)