hi, Xintong Thanks for reply , I attached those lines below for application master start command :
2020-05-11 21:16:16,635 DEBUG org.apache.hadoop.util.PerformanceAdvisory - Crypto codec org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec is not available. 2020-05-11 21:16:16,635 DEBUG org.apache.hadoop.util.PerformanceAdvisory - Using crypto codec org.apache.hadoop.crypto.JceAesCtrCryptoCodec. 2020-05-11 21:16:16,636 DEBUG org.apache.hadoop.hdfs.DataStreamer - DataStreamer block BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false lastByteOffsetInBlock: 1697 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer - DFSClient seqno: 0 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer - DataStreamer block BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet packet seqno: 1 offsetInBlock: 1697 lastPacketInBlock: true lastByteOffsetInBlock: 1697 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer - DFSClient seqno: 1 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer - Closing old block BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 2020-05-11 21:16:16,641 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #70 org.apache.hadoop.hdfs.protocol.ClientProtocol.complete 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #70 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: complete took 2ms 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #71 org.apache.hadoop.hdfs.protocol.ClientProtocol.setTimes 2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #71 2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: setTimes took 2ms 2020-05-11 21:16:16,647 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #72 org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission 2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.Client - IPC Client (1954985045) connection to ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value #72 2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: setPermission took 2ms 2020-05-11 21:16:16,654 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor - Application Master start command: $JAVA_HOME/bin/java -Xmx424m "-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly" -Dlog.file="<LOG_DIR>/jobmanager.log" -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint 1> <LOG_DIR>/jobmanager.out 2> <LOG_DIR>/jobmanager.err 2020-05-11 21:16:16,654 DEBUG org.apache.hadoop.ipc.Client - stopping client from cache: org.apache.hadoop.ipc.Client@28194a50 2020-05-11 21:16:16,656 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports method setApplicationTags. 2020-05-11 21:16:16,656 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports method setAttemptFailuresValidityInterval. 2020-05-11 21:16:16,656 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports method setKeepContainersAcrossApplicationAttempts. 2020-05-11 21:16:16,656 DEBUG org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports method setNodeLabelExpression. Xintong Song <tonysong...@gmail.com> 于2020年5月11日周一 下午10:11写道: > Hi Jacky, > > Could you search for "Application Master start command:" in the debug log > and post the result and a few lines before & after that? This is not > included in the clip of attached log file. > > Thank you~ > > Xintong Song > > > > On Tue, May 12, 2020 at 5:33 AM Jacky D <jacky.du0...@gmail.com> wrote: > >> hi, Robert >> >> Thanks so much for quick reply , I changed the log level to debug and >> attach the log file . >> >> Thanks >> Jacky >> >> Robert Metzger <rmetz...@apache.org> 于2020年5月11日周一 下午4:14写道: >> >>> Thanks a lot for posting the full output. >>> >>> It seems that Flink is passing an invalid list of arguments to the JVM. >>> Can you >>> - set the root log level in conf/log4j-yarn-session.properties to DEBUG >>> - then launch the YARN session >>> - share the log file of the yarn session on the mailing list? >>> >>> I'm particularly interested in the line printed here, as it shows the >>> JVM invocation. >>> >>> https://github.com/apache/flink/blob/release-1.6/flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java#L1630 >>> >>> >>> On Mon, May 11, 2020 at 9:56 PM Jacky D <jacky.du0...@gmail.com> wrote: >>> >>>> Hi,Robert >>>> >>>> Yes , I tried to retrieve more log info from yarn UI , the full logs >>>> showing below , this happens when I try to create a flink yarn session on >>>> emr when set up jitwatch configuration . >>>> >>>> 2020-05-11 19:06:09,552 ERROR >>>> org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error while >>>> running the Flink Yarn session. >>>> java.lang.reflect.UndeclaredThrowableException >>>> at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1862) >>>> at >>>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) >>>> at >>>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:813) >>>> Caused by: >>>> org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't >>>> deploy Yarn session cluster >>>> at >>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:429) >>>> at >>>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:610) >>>> at >>>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:813) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>> at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) >>>> ... 2 more >>>> Caused by: >>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: >>>> The YARN application unexpectedly switched to state FAILED during >>>> deployment. >>>> Diagnostics from YARN: Application application_1584459865196_0165 >>>> failed 1 times (global limit =2; local limit is =1) due to AM Container for >>>> appattempt_1584459865196_0165_000001 exited with exitCode: 1 >>>> Failing this attempt.Diagnostics: Exception from container-launch. >>>> Container id: container_1584459865196_0165_01_000001 >>>> Exit code: 1 >>>> Exception message: Usage: java [-options] class [args...] >>>> (to execute a class) >>>> or java [-options] -jar jarfile [args...] >>>> (to execute a jar file) >>>> where options include: >>>> -d32 use a 32-bit data model if available >>>> -d64 use a 64-bit data model if available >>>> -server to select the "server" VM >>>> The default VM is server, >>>> because you are running on a server-class machine. >>>> >>>> >>>> -cp <class search path of directories and zip/jar files> >>>> -classpath <class search path of directories and zip/jar files> >>>> A : separated list of directories, JAR archives, >>>> and ZIP archives to search for class files. >>>> -D<name>=<value> >>>> set a system property >>>> -verbose:[class|gc|jni] >>>> enable verbose output >>>> -version print product version and exit >>>> -version:<value> >>>> Warning: this feature is deprecated and will be >>>> removed >>>> in a future release. >>>> require the specified version to run >>>> -showversion print product version and continue >>>> -jre-restrict-search | -no-jre-restrict-search >>>> Warning: this feature is deprecated and will be >>>> removed >>>> in a future release. >>>> include/exclude user private JREs in the version >>>> search >>>> -? -help print this help message >>>> -X print help on non-standard options >>>> -ea[:<packagename>...|:<classname>] >>>> -enableassertions[:<packagename>...|:<classname>] >>>> enable assertions with specified granularity >>>> -da[:<packagename>...|:<classname>] >>>> -disableassertions[:<packagename>...|:<classname>] >>>> disable assertions with specified granularity >>>> -esa | -enablesystemassertions >>>> enable system assertions >>>> -dsa | -disablesystemassertions >>>> disable system assertions >>>> -agentlib:<libname>[=<options>] >>>> load native agent library <libname>, e.g. >>>> -agentlib:hprof >>>> see also, -agentlib:jdwp=help and -agentlib:hprof=help >>>> -agentpath:<pathname>[=<options>] >>>> load native agent library by full pathname >>>> -javaagent:<jarpath>[=<options>] >>>> load Java programming language agent, see >>>> java.lang.instrument >>>> -splash:<imagepath> >>>> show splash screen with specified image >>>> See >>>> http://www.oracle.com/technetwork/java/javase/documentation/index.html >>>> for more details. >>>> >>>> Thanks >>>> Jacky >>>> >>>> Robert Metzger <rmetz...@apache.org> 于2020年5月11日周一 下午3:42写道: >>>> >>>>> Hey Jacky, >>>>> >>>>> The error says "The YARN application unexpectedly switched to state >>>>> FAILED during deployment.". >>>>> Have you tried retrieving the YARN application logs? >>>>> Does the YARN UI / resource manager logs reveal anything on the reason >>>>> for the deployment to fail? >>>>> >>>>> Best, >>>>> Robert >>>>> >>>>> >>>>> On Mon, May 11, 2020 at 9:34 PM Jacky D <jacky.du0...@gmail.com> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> ---------- Forwarded message --------- >>>>>> 发件人: Jacky D <jacky.du0...@gmail.com> >>>>>> Date: 2020年5月11日周一 下午3:12 >>>>>> Subject: Re: Flink Memory analyze on AWS EMR >>>>>> To: Khachatryan Roman <khachatryan.ro...@gmail.com> >>>>>> >>>>>> >>>>>> Hi, Roman >>>>>> >>>>>> Thanks for quick response , I tried without logFIle option but failed >>>>>> with same error , I'm currently using flink 1.6 >>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/application_profiling.html, >>>>>> so I can only use Jitwatch or JMC . I guess those tools only available >>>>>> on >>>>>> Standalone cluster ? as document mentioned "Each standalone >>>>>> JobManager, TaskManager, HistoryServer, and ZooKeeper daemon redirects >>>>>> stdout and stderr to a file with a .out filename suffix and writes >>>>>> internal logging to a file with a .log suffix. Java options >>>>>> configured by the user in env.java.opts" ? >>>>>> >>>>>> Thanks >>>>>> Jacky >>>>>> >>>>>