Hi, I’m trying to migrate our samza jobs to 0.10.0 snapshot (built against the latest). Everything works fine running locally (although I had to make some changes to the local grid’s kafka since the checkpointing seems to require replication_factor > 1) but when I deploy it against my production yarn cluster I get these errors.
[yarnmaster01] out: 2015-11-12 10:40:53 ZkClient [INFO] zookeeper state changed (SyncConnected) [yarnmaster01] out: 2015-11-12 10:40:53 ZkEventThread [INFO] Terminate ZkClient event thread. [yarnmaster01] out: 2015-11-12 10:40:53 ZooKeeper [INFO] Session: 0x250233cdf57f2fa closed [yarnmaster01] out: 2015-11-12 10:40:53 ClientCnxn [INFO] EventThread shut down [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemAdmin [INFO] Coordinator stream __samza_coordinator_metrics-reporter_1 already exists. [yarnmaster01] out: 2015-11-12 10:40:53 JobRunner [INFO] Storing config in coordinator stream. [yarnmaster01] out: 2015-11-12 10:40:53 CoordinatorStreamSystemProducer [INFO] Starting coordinator stream producer. [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemProducer [INFO] Creating a new producer for system mykafka. [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [INFO] ProducerConfig values: [yarnmaster01] out: value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer [yarnmaster01] out: key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer [yarnmaster01] out: block.on.buffer.full = true [yarnmaster01] out: retry.backoff.ms = 100 [yarnmaster01] out: buffer.memory = 33554432 [yarnmaster01] out: batch.size = 16384 [yarnmaster01] out: metrics.sample.window.ms = 30000 [yarnmaster01] out: metadata.max.age.ms = 300000 [yarnmaster01] out: receive.buffer.bytes = 32768 [yarnmaster01] out: timeout.ms = 30000 [yarnmaster01] out: max.in.flight.requests.per.connection = 1 [yarnmaster01] out: bootstrap.servers = [devstream01.chartbeat.net:9092] [yarnmaster01] out: metric.reporters = [] [yarnmaster01] out: client.id = samza_producer-metrics_reporter-1-1447342853273-4 [yarnmaster01] out: compression.type = none [yarnmaster01] out: retries = 2147483647 [yarnmaster01] out: max.request.size = 1048576 [yarnmaster01] out: send.buffer.bytes = 131072 [yarnmaster01] out: acks = 1 [yarnmaster01] out: reconnect.backoff.ms = 10 [yarnmaster01] out: linger.ms = 0 [yarnmaster01] out: metrics.num.samples = 2 [yarnmaster01] out: metadata.fetch.timeout.ms = 60000 [yarnmaster01] out: [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The configuration batch.num.messages = null was supplied but isn't a known config. [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The configuration producer.type = null was supplied but isn't a known config. [yarnmaster01] out: Exception in thread "main" org.apache.samza.SamzaException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms. [yarnmaster01] out: at org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.send(CoordinatorStreamSystemProducer.java:115) [yarnmaster01] out: at org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.writeConfig(CoordinatorStreamSystemProducer.java:132) [yarnmaster01] out: at org.apache.samza.job.JobRunner.run(JobRunner.scala:85) [yarnmaster01] out: at org.apache.samza.job.JobRunner$.main(JobRunner.scala:43) [yarnmaster01] out: at org.apache.samza.job.JobRunner.main(JobRunner.scala) [yarnmaster01] out: Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms. [yarnmaster01] out: Warning: run() received nonzero return code 1 while executing './bin/run-job.sh -config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/conf/metrics_reporter.properties'! This looks similar to https://issues.apache.org/jira/browse/SAMZA-560 but I’m not using a StreamAppender in log4j. Any ideas? My first thought is that I might have to delete the existing checkpoint topics but that would mean we can’t migrate completely until the 10.0 release unless we want to run snapshot code in production. Thanks! Rick
signature.asc
Description: Message signed with OpenPGP using GPGMail