Thanks Jacob, the job is getting a bit further now but am seeing a different issue now.
The job fails and never moves into 'running'. The job looks to be launching correctly: [10.201.11.64] out: 13:48:59.450 [IPC Client (2052489518) connection to porter-samza-1.porter.int/127.0.0.1:8032 from centos] DEBUG org.apache.hadoop.ipc.Client - IPC Client (2052489518) connection to porter-samza-1.porter.int/127.0.0.1:8032 from centos got value #3 [10.201.11.64] out: 13:48:59.451 [main] DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: getApplicationReport took 2ms [10.201.11.64] out: 13:48:59.452 [main] INFO org.apache.samza.job.JobRunner - job started successfully - Running [10.201.11.64] out: 13:48:59.452 [main] INFO org.apache.samza.job.JobRunner - exiting When I dig into the userlogs, the job never moves from the starting container, stderr contains: [centos@porter-yarn-slave-1 container_1502753192195_0007_02_000001]$ more stderr /bin/bash: /tmp/hadoop-centos/nm-local-dir/usercache/centos/appcache/application_1502753192195_0007/container_1502753192195_0007_02_000001/__package/bin/run-am.sh: No such file or directory When I poke at the directory structure, the directory is empty at appcache/ and filecache/ both: [centos@porter-yarn-slave-1 container_1502753192195_0007_02_000001]$ ls /tmp/hadoop-centos/nm-local-dir/usercache/centos/appcache/ [centos@porter-yarn-slave-1 container_1502753192195_0007_02_000001]$ Jeremiah Adams Software Engineer www.helixeducation.com Blog | Twitter | Facebook | LinkedIn ________________________________________ From: Jacob Maes <jacob.m...@gmail.com> Sent: Monday, August 14, 2017 3:12 PM To: dev@samza.apache.org Subject: Re: Issue with TopicExistsException in 0.13.0 Correction, the exception seems to have moved between kafka version 0.10.0.1 and 0.10.1.1 Here's the patch that changed both the kafka version and the import statement for TopicExistsException: https://url.serverdata.net/?aZyQRg2CGut2qgyHrdHxA3r2wRZBhFBnHgQFe8bv7-emnODgdhciwPkVKB_BE-ZnZmhwA18Q7rimVruRFx5g0vsvC9cGt2jrAYfAucx0goYepLp8ZyfPAPxCv0Xh9CQVXTrqVMnByrbWTNcczkXashg2zljIWFPYiRKbG_5H2BvM~ So, you'll want to be using kafka 0.10.1.1. On Mon, Aug 14, 2017 at 2:00 PM, Jacob Maes <jacob.m...@gmail.com> wrote: > Hey Jeremiah, > > It looks like the TopicExistsException should be handled by the system > admin and not rethrown: > https://url.serverdata.net/?aZyQRg2CGut2qgyHrdHxA3r2wRZBhFBnHgQFe8bv7-eli7bCaPPi9BUx7SPWnrBZJsWvG7fAvAkJZWsHy8YrwNKbg0eJOFg9N9UDBAB2ODwZOGu2TuRvoZ9NyWbJmDt_g > b84b20ffd2/samza-kafka/src/main/scala/org/apache/samza/ > system/kafka/KafkaSystemAdmin.scala#L442 > > I have a theory what's happening here. I think the TopicExistsException > was moved from the org.apache.kafka.common package in kafka 0.8.2 > https://url.serverdata.net/?aGYQUT2PfoZ_Oed64B3A9noxqDhLnbYFqBHw3jimnO5vi3F8i7RsxdGks87OLmlvVSbRBbvJOT8rWW0hz_3vOmg~~ > common/TopicExistsException.html > > to the org.apache.kafka.common.errors package in kafka 0.10 > https://url.serverdata.net/?atT2ehXMhI-BK13fx1xs1ts_Kf81VsaPrd-NHf6sUGn2ecNA4kUI3dYoA0607M-H1sV2xtByyu3eJSKvz3Cecre4DPAttj3Qs9n_BrkW6lDT8Xt-ACWGgEYMDI0JoIyzV > TopicExistsException.html > > And Samza 0.13 expects the latter. > > Can you double check that your job is actually using kafka 0.10.1.1, > perhaps by inspecting the jars? > > -Jake > > On Mon, Aug 14, 2017 at 11:55 AM, Jeremiah Adams < > jad...@helixeducation.com> wrote: > >> I am having an issue with topic creation after updating dependencies. I >> bumped samza dependencies from scala 2.10 v 0.10.1 to scala 2.11 0.13.0 >> and org.apache.kafka dependency from kafka_2.10 0.8.1 to kafka_2.11 >> 0.10.1.1. >> I am seeing an error that the topic already exists and the job gets stuck >> in a loop with logs like below. The job will not move into 'accepted' state >> in yarn and never consumes the topics it should be consuming. The zk, yarn >> and kafka nodes are newly deployed. I'm at a loss, any ideas? >> >> >> [10.201.9.105] out: 17:18:49.347 [main] DEBUG >> org.apache.samza.system.kafka.KafkaSystemAdmin - Exception detail: >> [10.201.9.105] out: kafka.common.TopicExistsException: Topic >> "__samza_coordinator_inquiry-submission_1" already exists. >> [10.201.9.105] out: at kafka.admin.AdminUtils$.create >> OrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:420) >> [10.201.9.105] out: at kafka.admin.AdminUtils$.create >> Topic(AdminUtils.scala:404) >> [10.201.9.105] out: at org.apache.samza.system.kafka. >> KafkaSystemAdmin$$anonfun$createStream$1.apply(KafkaSystemAd >> min.scala:425) >> [10.201.9.105] out: at org.apache.samza.system.kafka. >> KafkaSystemAdmin$$anonfun$createStream$1.apply(KafkaSystemAd >> min.scala:422) >> [10.201.9.105] out: at org.apache.samza.util.Exponent >> ialSleepStrategy.run(ExponentialSleepStrategy.scala:82) >> [10.201.9.105] out: at org.apache.samza.system.kafka. >> KafkaSystemAdmin.createStream(KafkaSystemAdmin.scala:421) >> [10.201.9.105] out: at org.apache.samza.system.kafka. >> KafkaSystemAdmin.createCoordinatorStream(KafkaSystemAdmin.scala:336) >> [10.201.9.105] out: at org.apache.samza.job.JobRunner >> .run(JobRunner.scala:88) >> [10.201.9.105] out: at org.apache.samza.job.JobRunner >> $.doOperation(JobRunner.scala:52) >> [10.201.9.105] out: at org.apache.samza.job.JobRunner >> $.main(JobRunner.scala:47) >> [10.201.9.105] out: at org.apache.samza.job.JobRunner >> .main(JobRunner.scala) >> [10.201.9.105] out: 17:18:49.347 [main-SendThread(ip-10-201-9-2 >> 43.us-west-2.compute.internal:2181)] DEBUG org.apache.zookeeper.ClientCnxn >> - An exception was thrown while closing send thread for session >> 0x25de16b1f500013 : Unable to read additional data from server sessionid >> 0x25de16b1f500013, likely server has closed socket >> [10.201.9.105] out: 17:18:49.349 [main-EventThread] INFO >> org.apache.zookeeper.ClientCnxn - EventThread shut down? >> >> >> >> Jeremiah Adams >> Software Engineer >> https://url.serverdata.net/?ahfhEufaAWbezBrUFPG98ZJcterGfIerU3ZwsA3Gv_C0~<https://url.serverdata.net/?a49H2rNGIIBtQOw6md8OcHp-qKE3Xn2gNiZ3dlqAeSDA~> >> Blog<https://url.serverdata.net/?a49H2rNGIIBtQOw6md8OcHgFEZu-KYuiu8doY66NWwmmyWxz7kC-27Yfnbdgd2wyh5gjXUa6LMT_NRXsj1g1VVg~~> >> | Twitter< >> https://url.serverdata.net/?a0Q7ct5_6cOdbJ86kpWB0zx6RbtgugTVC7lU_W7za50jLdZQGpLgVlR1V06zckSaM5oOKb6QBo46Qp9xt0Tt7Aw~~> >> | >> Facebook<https://url.serverdata.net/?aAmyAO_nS_C1aDgBLeKyGTt253c4xO8jY2FEj4eUKEJA~. >> com/HelixEducation> | >> LinkedIn<https://url.serverdata.net/?aanlcNI-cN74Gdz-TD332xAl6lHu7TRNICWoHUFjYf-KlBjrCGHoYR65b3rl-OyW10nWFv6hwYvUSoVHL4b3vGA~~> >> > >