Hey Diwakar, the logs you are providing still don't contain the full Flink logs.
You can not stop the Flink on YARN using "yarn app -stop application_1603649952937_0002". To stop Flink on YARN, use: "yarn application -kill <appId>". On Sat, Oct 31, 2020 at 6:26 PM Diwakar Jha <diwakar.n...@gmail.com> wrote: > Hi, > > I wanted to check if anyone can help me with the logs. I have sent several > emails but not getting any response. > > I'm running Flink 1.11 on EMR 6.1. I don't see any logs though I get this > stdout error. I'm trying to upgrade Flink 1.8 to Flink 1.11 > > 18:29:19.834 [flink-akka.actor.default-dispatcher-28] ERROR > org.apache.flink.runtime.rest.handler.taskmanager. > TaskManagerLogFileHandler - Failed to transfer file from TaskExecutor > container_1604033334508_0001_01_000004. > java.util.concurrent.CompletionException: org.apache.flink.util. > FlinkException: The file LOG does not exist on the TaskExecutor. > > Thanks! > > > On Fri, Oct 30, 2020 at 9:04 AM Diwakar Jha <diwakar.n...@gmail.com> > wrote: > >> Hello, >> >> I see that in my class path (below) I have both log4j-1 and lo4j-api-2. >> is this because of which i'm not seeing any logs. If so, could someone >> suggest how to fix it? >> >> export >> CLASSPATH=":lib/flink-csv-1.11.0.jar:lib/flink-json-1.11.0.jar:lib/flink-shaded-zookeeper-3.4.14.jar:lib/flink-table-blink_2.12-1.11.0.jar:lib/flink-table_2.12-1.11.0.jar: >> *lib/log4j-1.2-api-2.12.1.jar:lib/log4j-api-2.12.1.jar* >> :lib/log4j-core-2.12.1.jar:lib/ >> >> export >> _FLINK_CLASSPATH=":lib/flink-csv-1.11.0.jar:lib/flink-json-1.11.0.jar:lib/flink-shaded-zookeeper-3.4.14.jar:lib/flink-table-blink_2.12-1.11.0.jar:lib/flink-table_2.12-1.11.0.jar: >> *lib/log4j-1.2-api-2.12.1.jar:lib/log4j-api-2.12.1.jar* >> :lib/log4j-core-2.12.1.jar:lib/log4j-slf4j-impl-2.12.1.jar:flink-dist_2.12-1.11.0.jar:flink-conf.yaml:" >> >> thanks. >> >> On Thu, Oct 29, 2020 at 6:21 PM Diwakar Jha <diwakar.n...@gmail.com> >> wrote: >> >>> Hello Everyone, >>> >>> I'm able to get my Flink UI up and running (it was related to the >>> session manager plugin on my local laptop) but I'm not seeing any >>> taskmanager/jobmanager logs in my Flink application. I have attached some >>> yarn application logs while it's running but am not able to figure out how >>> to stop and get more logs. Could someone please help me figure this out? >>> I'm running Flink 1.11 on the EMR 6.1 cluster. >>> >>> On Tue, Oct 27, 2020 at 1:06 PM Diwakar Jha <diwakar.n...@gmail.com> >>> wrote: >>> >>>> Hi Robert, >>>> Could please correct me. I'm not able to stop the app. Also, i >>>> stopped flink job already. >>>> >>>> sh-4.2$ yarn app -stop application_1603649952937_0002 >>>> 2020-10-27 20:04:25,543 INFO client.RMProxy: Connecting to >>>> ResourceManager at ip-10-0-55-50.ec2.internal/10.0.55.50:8032 >>>> 2020-10-27 20:04:25,717 INFO client.AHSProxy: Connecting to Application >>>> History server at ip-10-0-55-50.ec2.internal/10.0.55.50:10200 >>>> Exception in thread "main" java.lang.IllegalArgumentException: App >>>> admin client class name not specified for type Apache Flink >>>> at >>>> org.apache.hadoop.yarn.client.api.AppAdminClient.createAppAdminClient(AppAdminClient.java:76) >>>> at >>>> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:597) >>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) >>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) >>>> at >>>> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:126) >>>> sh-4.2$ >>>> >>>> On Tue, Oct 27, 2020 at 9:34 AM Robert Metzger <rmetz...@apache.org> >>>> wrote: >>>> >>>>> Hi, >>>>> are you intentionally not posting this response to the mailing list? >>>>> >>>>> As you can see from the yarn logs, log aggregation only works for >>>>> finished applications ("End of LogType:prelaunch.out.This log file belongs >>>>> to a running container (container_1603649952937_0002_01_000002) and so may >>>>> not be complete.") >>>>> >>>>> Please stop the app, then provide the logs. >>>>> >>>>> >>>>> On Tue, Oct 27, 2020 at 5:11 PM Diwakar Jha <diwakar.n...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Robert, >>>>>> >>>>>> Yes, i'm using Flink on EMR using YARN. Please find attached the yarn >>>>>> logs -applicationId. I also attached haddop-yarn-nodemanager logs. >>>>>> Also, I followed this link below which has the same problem : >>>>>> http://mail-archives.apache.org/mod_mbox/flink-user/202009.mbox/%3CCAGDv3o5WyJTrXs9Pg+Vy-b+LwgEE26iN54iqE0=f5t+m8vw...@mail.gmail.com%3E >>>>>> >>>>>> https://www.talkend.net/post/75078.html >>>>>> Based on this I changed the log4j.properties. >>>>>> Let me know what you think. Please also let me know if you need some >>>>>> specific logs. Appreciate your help. >>>>>> >>>>>> Best, >>>>>> Diwakar >>>>>> >>>>>> On Tue, Oct 27, 2020 at 12:26 AM Robert Metzger <rmetz...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hey Diwakar, >>>>>>> >>>>>>> how are you deploying Flink on EMR? Are you using YARN? >>>>>>> If so, you could also use log aggregation to see all the logs at >>>>>>> once (from both JobManager and TaskManagers). (yarn logs -applicationId >>>>>>> <Application ID>) >>>>>>> >>>>>>> Could you post (or upload somewhere) all logs you have of one run? >>>>>>> It is much easier for us to debug something if we have the full logs >>>>>>> (the >>>>>>> logs show for example the classpath that you are using, we would see how >>>>>>> you are deploying Flink, etc.) >>>>>>> >>>>>>> From the information available, my guess is that you have modified >>>>>>> your deployment in some way (use of a custom logging version, custom >>>>>>> deployment method, version mixup with jars from both Flink 1.8 and 1.11, >>>>>>> ...). >>>>>>> >>>>>>> Best, >>>>>>> Robert >>>>>>> >>>>>>> >>>>>>> On Tue, Oct 27, 2020 at 12:41 AM Diwakar Jha <diwakar.n...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> This is what I see on the WebUI. >>>>>>>> >>>>>>>> 23:19:24.263 [flink-akka.actor.default-dispatcher-1865] ERROR >>>>>>>> org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler >>>>>>>> - Failed to transfer file from TaskExecutor >>>>>>>> container_1603649952937_0002_01_000004. >>>>>>>> java.util.concurrent.CompletionException: >>>>>>>> org.apache.flink.util.FlinkException: The file LOG does not exist on >>>>>>>> the >>>>>>>> TaskExecutor. at >>>>>>>> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$25( >>>>>>>> TaskExecutor.java:1742 <http://taskexecutor.java:1742/>) >>>>>>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0] at >>>>>>>> java.util.concurrent.CompletableFuture$AsyncSupply.run >>>>>>>> <http://java.util.concurrent.completablefuture$asyncsupply.run/>( >>>>>>>> CompletableFuture.java:1604 <http://completablefuture.java:1604/>) >>>>>>>> ~[?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor.runWorker( >>>>>>>> ThreadPoolExecutor.java:1149 <http://threadpoolexecutor.java:1149/>) >>>>>>>> ~[?:1.8.0_252] at >>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run >>>>>>>> <http://java.util.concurrent.threadpoolexecutor$worker.run/>( >>>>>>>> ThreadPoolExecutor.java:624 <http://threadpoolexecutor.java:624/>) >>>>>>>> ~[?:1.8.0_252] at java.lang.Thread.run >>>>>>>> <http://java.lang.thread.run/>(Thread.java:748 >>>>>>>> <http://thread.java:748/>) ~[?:1.8.0_252] Caused by: >>>>>>>> org.apache.flink.util.FlinkException: The file LOG does not exist on >>>>>>>> the >>>>>>>> TaskExecutor. ... 5 more 23:19:24.275 >>>>>>>> [flink-akka.actor.default-dispatcher-1865] ERROR >>>>>>>> org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler >>>>>>>> - Unhandled exception. org.apache.flink.util.FlinkException: The file >>>>>>>> LOG >>>>>>>> does not exist on the TaskExecutor. at >>>>>>>> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$requestFileUploadByFilePath$25( >>>>>>>> TaskExecutor.java:1742 <http://taskexecutor.java:1742/>) >>>>>>>> ~[flink-dist_2.12-1.11.0.jar:1.11.0] at >>>>>>>> java.util.concurrent.CompletableFuture$AsyncSupply.run >>>>>>>> <http://java.util.concurrent.completablefuture$asyncsupply.run/>( >>>>>>>> CompletableFuture.java:1604 <http://completablefuture.java:1604/>) >>>>>>>> ~[?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor.runWorker( >>>>>>>> ThreadPoolExecutor.java:1149 <http://threadpoolexecutor.java:1149/>) >>>>>>>> ~[?:1.8.0_252] at >>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run >>>>>>>> <http://java.util.concurrent.threadpoolexecutor$worker.run/>( >>>>>>>> ThreadPoolExecutor.java:624 <http://threadpoolexecutor.java:624/>) >>>>>>>> ~[?:1.8.0_252] at java.lang.Thread.run >>>>>>>> <http://java.lang.thread.run/>(Thread.java:748 >>>>>>>> <http://thread.java:748/>) ~[?:1.8.0_252] >>>>>>>> >>>>>>>> Appreciate if anyone has any pointer for this. >>>>>>>> >>>>>>>> On Mon, Oct 26, 2020 at 10:45 AM Chesnay Schepler < >>>>>>>> ches...@apache.org> wrote: >>>>>>>> >>>>>>>>> Flink 1.11 uses slf4j 1.7.15; the easiest way to check the log >>>>>>>>> files is usually via the WebUI. >>>>>>>>> >>>>>>>>> On 10/26/2020 5:30 PM, Diwakar Jha wrote: >>>>>>>>> >>>>>>>>> I think my problem is with Sl4j library. I'm using sl4j 1.7 with >>>>>>>>> Flink 1.11. If that's correct then i appreciate if someone can point >>>>>>>>> me to >>>>>>>>> the exact Slf4j library that i should use with Flink 1.11 >>>>>>>>> >>>>>>>>> Flink = 1.11.x; >>>>>>>>> Slf4j = 1.7; >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Oct 25, 2020 at 8:00 PM Diwakar Jha < >>>>>>>>> diwakar.n...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Thanks for checking my configurations. Could you also point me >>>>>>>>>> where I can see the log files? Just to give more details. I'm trying >>>>>>>>>> to >>>>>>>>>> access these logs in AWS cloudwatch. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Diwakar >>>>>>>>>> >>>>>>>>>> On Sun, Oct 25, 2020 at 2:16 PM Chesnay Schepler < >>>>>>>>>> ches...@apache.org> wrote: >>>>>>>>>> >>>>>>>>>>> With Flink 1.11 reporters were refactored to plugins, and are >>>>>>>>>>> now accessible by default (so you no longer have to bother with >>>>>>>>>>> copying >>>>>>>>>>> jars around). >>>>>>>>>>> >>>>>>>>>>> Your configuration appears to be correct, so I suggest to take a >>>>>>>>>>> look at the log files. >>>>>>>>>>> >>>>>>>>>>> On 10/25/2020 9:52 PM, Diwakar Jha wrote: >>>>>>>>>>> >>>>>>>>>>> Hello Everyone, >>>>>>>>>>> >>>>>>>>>>> I'm new to flink and i'm trying to upgrade from flink 1.8 to >>>>>>>>>>> flink 1.11 on an emr cluster. after upgrading to flink1.11 One of >>>>>>>>>>> the >>>>>>>>>>> differences that i see is i don't get any metrics. I found out that >>>>>>>>>>> flink >>>>>>>>>>> 1.11 does not have >>>>>>>>>>> *org.apache.flink.metrics.statsd.StatsDReporterFactory* jar in >>>>>>>>>>> /usr/lib/flink/opt which was the case for flink 1.8. Could anyone >>>>>>>>>>> have any >>>>>>>>>>> pointer to locate >>>>>>>>>>> *org.apache.flink.metrics.statsd.StatsDReporterFactory* jar or >>>>>>>>>>> how to use metrics in flink.1.11? >>>>>>>>>>> >>>>>>>>>>> Things i tried : >>>>>>>>>>> a) the below setup >>>>>>>>>>> >>>>>>>>>>> metrics.reporters: stsdmetrics.reporter.stsd.factory.class: >>>>>>>>>>> org.apache.flink.metrics.statsd.StatsDReporterFactorymetrics.reporter.stsd.host: >>>>>>>>>>> localhostmetrics.reporter.stsd.port: 8125 >>>>>>>>>>> >>>>>>>>>>> b) I tried downloading the statsd jar from >>>>>>>>>>> https://mvnrepository.com/artifact/org.apache.flink/flink-metrics-statsd >>>>>>>>>>> putting it inside plugins/statsd directory. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Best, >>>>>>>>>>> Diwakar Jha. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best, >>>>>>>>>> Diwakar Jha. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best, >>>>>>>>> Diwakar Jha. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best, >>>>>>>> Diwakar Jha. >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Best, >>>>>> Diwakar Jha. >>>>>> >>>>> >>>> >>>> -- >>>> Best, >>>> Diwakar Jha. >>>> >>>