Hi tison, For #3, if you mean registering remote HDFS file as local resource, we should make the "-yt/--yarnship" to support remote directory. I think it is the right direction.
For #1, if the users could ship remote directory, then they could also specify like this "-yt hdfs://hdpdev/flink/release/flink-1.x, hdfs://hdpdev/user/someone/mylib". Do you mean we add an option for whether trying to avoid unnecessary uploading? Maybe we could filter by names and file size. I think this is a good suggestion, and we do not need to introduce a new config option "-ypl". For #2, for flink-dist, the #1 could already solve the problem. We do not need to support remote schema. It will confuse the users when we only support HDFS, not S3, OSS, etc. Best, Yang tison <wander4...@gmail.com> 于2020年4月17日周五 下午8:05写道: > Hi Yang, > > I agree that these two of works would benefit from single assignee. My > concern is as below > > 1. Both share libs & remote flink dist/libs are remote ship files. I don't > think we have to implement multiple codepath/configuration. > 2. So, for concept clarification, there are > (1) an option to disable shipping local libs > (2) flink-dist supports multiple schema at least we said "hdfs://" > (3) an option for registering remote shipfiles with path & visibility. I > think new configuration system helps. > > the reason we have to special handling (2) instead of including it in (3) > is because when shipping flink-dist to TM container, we specially > detect flink-dist. Of course we can merge it into general ship files and > validate shipfiles finally contain flink-dist, which is an alternative. > > The *most important* difference is (1) and (3) which we don't have an > option for only remote libs. Is this clarification satisfy your proposal? > > Best, > tison. > > > Till Rohrmann <trohrm...@apache.org> 于2020年4月17日周五 下午7:49写道: > >> Hi Yang, >> >> from what I understand it sounds reasonable to me. Could you sync with >> Tison on FLINK-14964 on how to proceed. I'm not super deep into these >> issues but they seem to be somewhat related and Tison already did some >> implementation work. >> >> I'd say it be awesome if we could include this kind of improvement into >> the release. >> >> Cheers, >> Till >> >> On Thu, Apr 16, 2020 at 4:43 AM Yang Wang <danrtsey...@gmail.com> wrote: >> >>> Hi All, thanks a lot for reviving this discussion. >>> >>> I think we could unify the FLINK-13938 and FLINK-14964 since they have >>> the similar >>> purpose, avoid unnecessary uploading and downloading jars in YARN >>> deployment. >>> The difference is FLINK-13938 aims to support the flink system lib >>> directory only, while >>> FLINK-14964 is trying to support arbitrary pre-uloaded jars(including >>> user and system jars). >>> >>> >>> So i suggest to do this feature as following. >>> 1. Upload the flink lib directory or users to hdfs, e.g. >>> "hdfs://hdpdev/flink/release/flink-1.x" >>> "hdfs://hdpdev/user/someone/mylib" >>> 2. Use the -ypl argument to specify the shared lib, multiple directories >>> could be specified >>> 3. YarnClusterDescriptor will use the pre-uploaded jars to avoid >>> unnecessary uploading, >>> both for system and user jars >>> 4. YarnClusterDescriptor needs to set the system jars to public >>> visibility so that the distributed >>> cache in the YARN nodemanager could be reused by multiple applications. >>> This is to avoid >>> unnecessary downloading, especially for the "flink-dist-*.jar". For the >>> user shared lib, the >>> visibility is still set to "APPLICATION" level. >>> >>> >>> For our past internal use case, the shared lib could help with >>> accelerating the submission a lot. >>> Also it helps to reduce the pressure of HDFS when we want to launch many >>> applications together. >>> >>> @tison @Till Rohrmann <trohrm...@apache.org> @Hailu, Andreas >>> <andreas.ha...@gs.com> If you guys thinks the suggestion makes sense. I >>> will try to find some time to work on this and hope it could catch up >>> with release-1.1 cycle. >>> >>> >>> Best, >>> Yang >>> >>> Hailu, Andreas [Engineering] <andreas.ha...@gs.com> 于2020年4月16日周四 >>> 上午8:47写道: >>> >>>> Okay, I’ll continue to watch the JIRAs. Thanks for the update, Till. >>>> >>>> >>>> >>>> *// *ah >>>> >>>> >>>> >>>> *From:* Till Rohrmann <trohrm...@apache.org> >>>> *Sent:* Wednesday, April 15, 2020 10:51 AM >>>> *To:* Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com> >>>> *Cc:* Yang Wang <danrtsey...@gmail.com>; tison <wander4...@gmail.com>; >>>> user@flink.apache.org >>>> *Subject:* Re: Flink Conf "yarn.flink-dist-jar" Question >>>> >>>> >>>> >>>> Hi Andreas, >>>> >>>> >>>> >>>> it looks as if FLINK-13938 and FLINK-14964 won't make it into the >>>> 1.10.1 release because the community is about to start the release process. >>>> Since FLINK-13938 is a new feature it will be shipped with a major release. >>>> There is still a bit of time until the 1.11 feature freeze and if Yang Wang >>>> has time to finish this PR, then we could ship it. >>>> >>>> >>>> >>>> Cheers, >>>> >>>> Till >>>> >>>> >>>> >>>> On Wed, Apr 15, 2020 at 3:23 PM Hailu, Andreas [Engineering] < >>>> andreas.ha...@gs.com> wrote: >>>> >>>> Yang, Tison, >>>> >>>> >>>> >>>> Do we know when some solution for 13938 and 14964 will arrive? Do you >>>> think it will be in a 1.10.x version? >>>> >>>> >>>> >>>> *// *ah >>>> >>>> >>>> >>>> *From:* Hailu, Andreas [Engineering] >>>> *Sent:* Friday, March 20, 2020 9:19 AM >>>> *To:* 'Yang Wang' <danrtsey...@gmail.com> >>>> *Cc:* tison <wander4...@gmail.com>; user@flink.apache.org >>>> *Subject:* RE: Flink Conf "yarn.flink-dist-jar" Question >>>> >>>> >>>> >>>> Hi Yang, >>>> >>>> >>>> >>>> This is good to know. As a stopgap measure until a solution between >>>> 13938 and 14964 arrives, we can automate the application staging directory >>>> cleanup from our client should the process fail. It’s not ideal, but will >>>> at least begin to manage our users’ quota. I’ll continue to watch the two >>>> tickets. Thank you. >>>> >>>> >>>> >>>> *// *ah >>>> >>>> >>>> >>>> *From:* Yang Wang <danrtsey...@gmail.com> >>>> *Sent:* Monday, March 16, 2020 9:37 PM >>>> *To:* Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com> >>>> *Cc:* tison <wander4...@gmail.com>; user@flink.apache.org >>>> *Subject:* Re: Flink Conf "yarn.flink-dist-jar" Question >>>> >>>> >>>> >>>> Hi Hailu, >>>> >>>> >>>> >>>> Sorry for the late response. If the Flink cluster(e.g. Yarn >>>> application) is stopped directly >>>> >>>> by `yarn application -kill`, then the staging directory will be left >>>> behind. Since the jobmanager >>>> >>>> do not have any change to clean up the staging directly. Also it may >>>> happen when the >>>> >>>> jobmanager crashed and reached the attempts limit of Yarn. >>>> >>>> >>>> >>>> For FLINK-13938, yes, it is trying to use the Yarn public cache to >>>> accelerate the container >>>> >>>> launch. >>>> >>>> >>>> >>>> >>>> >>>> Best, >>>> >>>> Yang >>>> >>>> >>>> >>>> Hailu, Andreas <andreas.ha...@gs.com> 于2020年3月10日周二 上午4:38写道: >>>> >>>> Also may I ask what causes these application ID directories to be left >>>> behind? Is it a job failure, or can they persist even if the application >>>> succeeds? I’d like to know so that I can implement my own cleanup in the >>>> interim to prevent exceeding user disk space quotas. >>>> >>>> >>>> >>>> *// *ah >>>> >>>> >>>> >>>> *From:* Hailu, Andreas [Engineering] >>>> *Sent:* Monday, March 9, 2020 1:20 PM >>>> *To:* 'Yang Wang' <danrtsey...@gmail.com> >>>> *Cc:* tison <wander4...@gmail.com>; user@flink.apache.org >>>> *Subject:* RE: Flink Conf "yarn.flink-dist-jar" Question >>>> >>>> >>>> >>>> Hi Yang, >>>> >>>> >>>> >>>> Yes, a combination of these two would be very helpful for us. We have a >>>> single shaded binary which we use to run all of the jobs on our YARN >>>> cluster. If we could designate a single location in HDFS for that as well, >>>> we could also greatly benefit from FLINK-13938. >>>> >>>> >>>> >>>> It sounds like a general public cache solution is what’s being called >>>> for? >>>> >>>> >>>> >>>> *// *ah >>>> >>>> >>>> >>>> *From:* Yang Wang <danrtsey...@gmail.com> >>>> *Sent:* Sunday, March 8, 2020 10:52 PM >>>> *To:* Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com> >>>> *Cc:* tison <wander4...@gmail.com>; user@flink.apache.org >>>> *Subject:* Re: Flink Conf "yarn.flink-dist-jar" Question >>>> >>>> >>>> >>>> Hi Hailu, tison, >>>> >>>> >>>> >>>> I created a very similar ticket before to accelerate Flink submission >>>> on Yarn[1]. However, >>>> >>>> we do not get a consensus in the PR. Maybe it's time to revive the >>>> discussion and try >>>> >>>> to find a common solution for both the two tickets[1][2]. >>>> >>>> >>>> >>>> >>>> >>>> [1]. https://issues.apache.org/jira/browse/FLINK-13938 >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D13938&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=rlD0F8Cr4H0aPlN6O2_K13Q76RFOERSWuJANh4q6X_8&s=njA3vGYTf0g7Zsog8AiwS4bbXxblOxepBEWUV9W3E0s&e=> >>>> >>>> [2]. https://issues.apache.org/jira/browse/FLINK-14964 >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D14964&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=rlD0F8Cr4H0aPlN6O2_K13Q76RFOERSWuJANh4q6X_8&s=9kT1RZkGwWh3MAbc_ZUrsEsmRRfw6VK4rlNIeNxs6GU&e=> >>>> >>>> >>>> >>>> >>>> >>>> Best, >>>> >>>> Yang >>>> >>>> >>>> >>>> Hailu, Andreas <andreas.ha...@gs.com> 于2020年3月7日周六 上午11:21写道: >>>> >>>> Hi Tison, thanks for the reply. I’ve replied to the ticket. I’ll be >>>> watching it as well. >>>> >>>> >>>> >>>> *// *ah >>>> >>>> >>>> >>>> *From:* tison <wander4...@gmail.com> >>>> *Sent:* Friday, March 6, 2020 1:40 PM >>>> *To:* Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com> >>>> *Cc:* user@flink.apache.org >>>> *Subject:* Re: Flink Conf "yarn.flink-dist-jar" Question >>>> >>>> >>>> >>>> FLINK-13938 seems a bit different than your requirement. The one >>>> totally matches is FLINK-14964 >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D14964&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=9sMjDI0I_9Yni5ZWqV8GScK_KBTaA65yK9kBG-LE5_4&s=X1ZoN456fuc5mNxO6fBzDboEhrI0EHL873LzOd6tnN8&e=>. >>>> I'll appreciate it if you can share you opinion on the JIRA ticket. >>>> >>>> >>>> >>>> Best, >>>> >>>> tison. >>>> >>>> >>>> >>>> >>>> >>>> tison <wander4...@gmail.com> 于2020年3月7日周六 上午2:35写道: >>>> >>>> Yes your requirement is exactly taken into consideration by the >>>> community. We currently have an open JIRA ticket for the specific >>>> feature[1] and works for loosing the constraint of flink-jar schema to >>>> support DFS location should happen. >>>> >>>> >>>> >>>> Best, >>>> >>>> tison. >>>> >>>> >>>> >>>> [1] https://issues.apache.org/jira/browse/FLINK-13938 >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D13938&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=9sMjDI0I_9Yni5ZWqV8GScK_KBTaA65yK9kBG-LE5_4&s=ediMPoQtcPX7K-5fjXJxE2cPp5OySkzwXYfYj8mDWO0&e=> >>>> >>>> >>>> >>>> >>>> >>>> Hailu, Andreas <andreas.ha...@gs.com> 于2020年3月7日周六 上午2:03写道: >>>> >>>> Hi, >>>> >>>> >>>> >>>> We noticed that every time an application runs, it uploads the >>>> flink-dist artifact to the /user/<user>/.flink HDFS directory. This causes >>>> a user disk space quota issue as we submit thousands of apps to our cluster >>>> an hour. We had a similar problem with our Spark applications where it >>>> uploaded the Spark Assembly package for every app. Spark provides an >>>> argument to use a location in HDFS its for applications to leverage so they >>>> don’t need to upload them for every run, and that was our solution (see >>>> “spark.yarn.jar” configuration if interested.) >>>> >>>> >>>> >>>> Looking at the Resource Orchestration Frameworks page >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Dstable_ops_config.html-23yarn-2Dflink-2Ddist-2Djar&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=9sMjDI0I_9Yni5ZWqV8GScK_KBTaA65yK9kBG-LE5_4&s=3SPuvZu9nPph-qnE3TtbTngG-k3XDBLQGyk9I_tjNtI&e=>, >>>> I see there’s might be a similar concept through a “yarn.flink-dist-jar” >>>> configuration option. I wanted to place the flink-dist package we’re using >>>> in a location in HDFS and configure out jobs to point to it, e.g. >>>> >>>> >>>> >>>> yarn.flink-dist-jar: hdfs:////user/delp/.flink/flink-dist_2.11-1.9.1.jar >>>> >>>> >>>> >>>> Am I correct in that this is what I’m looking for? I gave this a try >>>> with some jobs today, and based on what I’m seeing in the >>>> launch_container.sh in our YARN application, it still looks like it’s being >>>> uploaded: >>>> >>>> >>>> >>>> export >>>> _FLINK_JAR_PATH="hdfs://d279536/user/delp/.flink/application_1583031705852_117863/flink-dist_2.11-1.9.1.jar" >>>> >>>> >>>> >>>> How can I confirm? Or is this perhaps not config I’m looking for? >>>> >>>> >>>> >>>> Best, >>>> >>>> Andreas >>>> >>>> >>>> ------------------------------ >>>> >>>> >>>> Your Personal Data: We may collect and process information about you >>>> that may be subject to data protection laws. For more information about how >>>> we use and disclose your personal data, how we protect your information, >>>> our legal basis to use your information, your rights and who you can >>>> contact, please refer to: www.gs.com/privacy-notices >>>> >>>> >>>> ------------------------------ >>>> >>>> >>>> Your Personal Data: We may collect and process information about you >>>> that may be subject to data protection laws. For more information about how >>>> we use and disclose your personal data, how we protect your information, >>>> our legal basis to use your information, your rights and who you can >>>> contact, please refer to: www.gs.com/privacy-notices >>>> >>>> >>>> ------------------------------ >>>> >>>> >>>> Your Personal Data: We may collect and process information about you >>>> that may be subject to data protection laws. For more information about how >>>> we use and disclose your personal data, how we protect your information, >>>> our legal basis to use your information, your rights and who you can >>>> contact, please refer to: www.gs.com/privacy-notices >>>> >>>> >>>> ------------------------------ >>>> >>>> >>>> Your Personal Data: We may collect and process information about you >>>> that may be subject to data protection laws. For more information about how >>>> we use and disclose your personal data, how we protect your information, >>>> our legal basis to use your information, your rights and who you can >>>> contact, please refer to: www.gs.com/privacy-notices >>>> >>>> >>>> ------------------------------ >>>> >>>> Your Personal Data: We may collect and process information about you >>>> that may be subject to data protection laws. For more information about how >>>> we use and disclose your personal data, how we protect your information, >>>> our legal basis to use your information, your rights and who you can >>>> contact, please refer to: www.gs.com/privacy-notices >>>> >>>