Is your user-jar packaging and relocating Flink classes? If so, then your job actually operate against the classes provided by the cluster, which, well, just wouldn't work.

On 18/06/2020 09:34, Sourabh Mehta wrote:
Hi ,
application is using 1.10.0 but cluster is setup on 1.9.0.

Yes I do have access. please find below starting logs from cluster


2020-06-17 11:28:18,989 INFO  org.apache.shaded.flink.table.module.ModuleManager  - Got FunctionDefinition equals from module core 2020-06-17 11:28:20,538 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2020-06-17 11:28:20,538 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2020-06-17 11:28:20,538 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m 2020-06-17 11:28:20,538 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m 2020-06-17 11:28:20,538 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2020-06-17 11:28:20,538 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2020-06-17 11:28:20,539 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region 2020-06-17 11:28:20,539 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, cluster-flink-poc-m 2020-06-17 11:28:20,539 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 12288 2020-06-17 11:28:20,539 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 12288 2020-06-17 11:28:20,540 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 4 2020-06-17 11:28:20,540 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 28 2020-06-17 11:28:20,540 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.network.numberOfBuffers, 2048 2020-06-17 11:28:20,540 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /etc/hadoop/conf 2020-06-17 11:28:20,550 INFO  org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils  - The configuration option Key: 'taskmanager.cpu.cores' , default: null (fallback keys: []) required for local execution is not set, setting it to its default value 1.7976931348623157E308 2020-06-17 11:28:20,552 INFO  org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils  - The configuration option Key: 'taskmanager.memory.task.heap.size' , default: null (fallback keys: []) required for local execution is not set, setting it to its default value 9223372036854775807 bytes 2020-06-17 11:28:20,552 INFO  org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils  - The configuration option Key: 'taskmanager.memory.task.off-heap.size' , default: 0 bytes (fallback keys: []) required for local execution is not set, setting it to its default value 9223372036854775807 bytes 2020-06-17 11:28:20,552 INFO  org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils  - The configuration option Key: 'taskmanager.memory.network.min' , default: 64 mb (fallback keys: [{key=taskmanager.network.memory.min, isDeprecated=true}]) required for local execution is not set, setting it to its default value 64 mb 2020-06-17 11:28:20,553 INFO  org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils  - The configuration option Key: 'taskmanager.memory.network.max' , default: 1 gb (fallback keys: [{key=taskmanager.network.memory.max, isDeprecated=true}]) required for local execution is not set, setting it to its default value 64 mb 2020-06-17 11:28:20,553 INFO  org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils  - The configuration option Key: 'taskmanager.memory.managed.size' , default: null (fallback keys: [{key=taskmanager.memory.size, isDeprecated=true}]) required for local execution is not set, setting it to its default value 128 mb 2020-06-17 11:28:20,558 INFO  org.apache.shaded.flink.runtime.minicluster.MiniCluster - Starting Flink Mini Cluster 2020-06-17 11:28:20,561 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2020-06-17 11:28:20,561 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2020-06-17 11:28:20,561 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m 2020-06-17 11:28:20,561 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m 2020-06-17 11:28:20,561 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2020-06-17 11:28:20,561 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2020-06-17 11:28:20,561 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region 2020-06-17 11:28:20,562 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, cluster-flink-poc-m 2020-06-17 11:28:20,562 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 12288 2020-06-17 11:28:20,562 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 12288 2020-06-17 11:28:20,562 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 4 2020-06-17 11:28:20,562 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 28 2020-06-17 11:28:20,563 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.network.numberOfBuffers, 2048 2020-06-17 11:28:20,563 INFO  org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /etc/hadoop/conf 2020-06-17 11:28:20,563 INFO  org.apache.shaded.flink.runtime.minicluster.MiniCluster - Starting Metrics Registry 2020-06-17 11:28:20,610 INFO  org.apache.shaded.flink.runtime.metrics.MetricRegistryImpl  - No metrics reporter configured, no metrics will be exposed/reported. 2020-06-17 11:28:20,610 INFO  org.apache.shaded.flink.runtime.minicluster.MiniCluster - Starting RPC Service(s) 2020-06-17 11:28:20,976 INFO  akka.event.slf4j.Slf4jLogger                              - Slf4jLogger started 2020-06-17 11:28:21,070 INFO  org.apache.shaded.flink.runtime.rpc.akka.AkkaRpcServiceUtils  - Trying to start actor system at :0 2020-06-17 11:28:21,115 INFO  akka.event.slf4j.Slf4jLogger                              - Slf4jLogger started 2020-06-17 11:28:21,131 INFO  akka.remote.Remoting                              - Starting remoting 2020-06-17 11:28:21,279 INFO  akka.remote.Remoting                              - Remoting started; listening on addresses :[akka.tcp://flink-metrics@<<IP:PORT>>] 2020-06-17 11:28:21,283 INFO  org.apache.shaded.flink.runtime.rpc.akka.AkkaRpcServiceUtils  - Actor system started at akka.tcp://flink-metrics@<<IP:PORT>>



Note : I have removed a few IP addresses from the log.

On Thu, Jun 18, 2020 at 12:08 AM Till Rohrmann <trohrm...@apache.org <mailto:trohrm...@apache.org>> wrote:

    Hi Sourabh,

    do you have access to the cluster logs? They could be helpful for
    debugging the problem. Which version of Flink are you using?

    Cheers,
    Till

    On Wed, Jun 17, 2020 at 7:39 PM Sourabh Mehta
    <sourabhmehta2...@gmail.com <mailto:sourabhmehta2...@gmail.com>>
    wrote:

        No, I am not.

        On Wed, 17 Jun 2020 at 10:48 PM, Chesnay Schepler
        <ches...@apache.org <mailto:ches...@apache.org>> wrote:

            Are you by any chance creating a local environment via
            (Stream)ExecutionEnvironment#createLocalEnvironment?

            On 17/06/2020 17:05, Sourabh Mehta wrote:
            Hi Team,

            I'm  exploring flink for one of my use case, I'm facing
            some issues while running a flink job in cluster mode.
            Below are the steps I followed to setup and run job in
            cluster mode :
            1. Setup flink on google cloud dataproc using
            
https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/flink

            2. After setting up the cluster I could see the flink
            session started and could see the UI for the same.

            3 Submitted job from dataproc master node using below command

            sudo HADOOP_CONF_DIR=/etc/hadoop/conf
            /usr/lib/flink/bin/flink run -m yarn-cluster -yid
            application_1592311654771_0001 -class
            com.sm.flink.FlinkDriver
            /usr/lib/flink/lib/flink-1.0.10-sm-SNAPSHOT.jar
            hdfs://cluster-flink-poc-m:8020/user/flink/rocksdb/

            After running the job I see the job started successfully
            but created a mini local cluster and ran in local mode. I
            don't see any jobs submitted to JobManger and I also see
            0 task managers on UI.

            Can someone please help me understand here?, do let me
            know what input is required to investigate the same.





Reply via email to