Hi Yi, I pulled from master and built this morning.
Yes, that’s the output from JobRunner. I also tried setting a job.id to see if this was an issue migrating from an old task checkpoint topic but I got the same result. Would you like me to open a jira ticket? Thanks, Rick > On Nov 12, 2015, at 12:59 PM, Yi Pan <nickpa...@gmail.com> wrote: > > Hi, Rick, > > Did you get the fix in SAMZA-723 in your test? And could you confirm that > the errors are from JobRunner log? > > -Yi > > On Thu, Nov 12, 2015 at 8:48 AM, Rick Mangi <r...@chartbeat.com> wrote: > >> Hi, >> >> I’m trying to migrate our samza jobs to 0.10.0 snapshot (built against the >> latest). Everything works fine running locally (although I had to make some >> changes to the local grid’s kafka since the checkpointing seems to require >> replication_factor > 1) but when I deploy it against my production yarn >> cluster I get these errors. >> >> [yarnmaster01] out: 2015-11-12 10:40:53 ZkClient [INFO] zookeeper state >> changed (SyncConnected) >> [yarnmaster01] out: 2015-11-12 10:40:53 ZkEventThread [INFO] Terminate >> ZkClient event thread. >> [yarnmaster01] out: 2015-11-12 10:40:53 ZooKeeper [INFO] Session: >> 0x250233cdf57f2fa closed >> [yarnmaster01] out: 2015-11-12 10:40:53 ClientCnxn [INFO] EventThread shut >> down >> [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemAdmin [INFO] >> Coordinator stream __samza_coordinator_metrics-reporter_1 already exists. >> [yarnmaster01] out: 2015-11-12 10:40:53 JobRunner [INFO] Storing config in >> coordinator stream. >> [yarnmaster01] out: 2015-11-12 10:40:53 CoordinatorStreamSystemProducer >> [INFO] Starting coordinator stream producer. >> [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemProducer [INFO] >> Creating a new producer for system mykafka. >> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [INFO] >> ProducerConfig values: >> [yarnmaster01] out: value.serializer = class >> org.apache.kafka.common.serialization.ByteArraySerializer >> [yarnmaster01] out: key.serializer = class >> org.apache.kafka.common.serialization.ByteArraySerializer >> [yarnmaster01] out: block.on.buffer.full = true >> [yarnmaster01] out: retry.backoff.ms = 100 >> [yarnmaster01] out: buffer.memory = 33554432 >> [yarnmaster01] out: batch.size = 16384 >> [yarnmaster01] out: metrics.sample.window.ms = 30000 >> [yarnmaster01] out: metadata.max.age.ms = 300000 >> [yarnmaster01] out: receive.buffer.bytes = 32768 >> [yarnmaster01] out: timeout.ms = 30000 >> [yarnmaster01] out: max.in.flight.requests.per.connection = 1 >> [yarnmaster01] out: bootstrap.servers = [ >> devstream01.chartbeat.net:9092] >> [yarnmaster01] out: metric.reporters = [] >> [yarnmaster01] out: client.id = >> samza_producer-metrics_reporter-1-1447342853273-4 >> [yarnmaster01] out: compression.type = none >> [yarnmaster01] out: retries = 2147483647 >> [yarnmaster01] out: max.request.size = 1048576 >> [yarnmaster01] out: send.buffer.bytes = 131072 >> [yarnmaster01] out: acks = 1 >> [yarnmaster01] out: reconnect.backoff.ms = 10 >> [yarnmaster01] out: linger.ms = 0 >> [yarnmaster01] out: metrics.num.samples = 2 >> [yarnmaster01] out: metadata.fetch.timeout.ms = 60000 >> [yarnmaster01] out: >> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The >> configuration batch.num.messages = null was supplied but isn't a known >> config. >> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The >> configuration producer.type = null was supplied but isn't a known config. >> [yarnmaster01] out: Exception in thread "main" >> org.apache.samza.SamzaException: >> org.apache.kafka.common.errors.TimeoutException: Failed to update metadata >> after 60000 ms. >> [yarnmaster01] out: at >> org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.send(CoordinatorStreamSystemProducer.java:115) >> [yarnmaster01] out: at >> org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.writeConfig(CoordinatorStreamSystemProducer.java:132) >> [yarnmaster01] out: at >> org.apache.samza.job.JobRunner.run(JobRunner.scala:85) >> [yarnmaster01] out: at >> org.apache.samza.job.JobRunner$.main(JobRunner.scala:43) >> [yarnmaster01] out: at >> org.apache.samza.job.JobRunner.main(JobRunner.scala) >> [yarnmaster01] out: Caused by: >> org.apache.kafka.common.errors.TimeoutException: Failed to update metadata >> after 60000 ms. >> [yarnmaster01] out: >> >> >> Warning: run() received nonzero return code 1 while executing >> './bin/run-job.sh >> -config-factory=org.apache.samza.config.factories.PropertiesConfigFactory >> --config-path=file://$PWD/conf/metrics_reporter.properties'! >> >> >> This looks similar to https://issues.apache.org/jira/browse/SAMZA-560 but >> I’m not using a StreamAppender in log4j. >> >> Any ideas? My first thought is that I might have to delete the existing >> checkpoint topics but that would mean we can’t migrate completely until the >> 10.0 release unless we want to run snapshot code in production. >> >> Thanks! >> >> Rick >> >> >>
signature.asc
Description: Message signed with OpenPGP using GPGMail