Hi,

I’m trying to migrate our samza jobs to 0.10.0 snapshot (built against the 
latest). Everything works fine running locally (although I had to make some 
changes to the local grid’s kafka since the checkpointing seems to require 
replication_factor > 1) but when I deploy it against my production yarn cluster 
I get these errors.

[yarnmaster01] out: 2015-11-12 10:40:53 ZkClient [INFO] zookeeper state changed 
(SyncConnected)
[yarnmaster01] out: 2015-11-12 10:40:53 ZkEventThread [INFO] Terminate ZkClient 
event thread.
[yarnmaster01] out: 2015-11-12 10:40:53 ZooKeeper [INFO] Session: 
0x250233cdf57f2fa closed
[yarnmaster01] out: 2015-11-12 10:40:53 ClientCnxn [INFO] EventThread shut down
[yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemAdmin [INFO] Coordinator 
stream __samza_coordinator_metrics-reporter_1 already exists.
[yarnmaster01] out: 2015-11-12 10:40:53 JobRunner [INFO] Storing config in 
coordinator stream.
[yarnmaster01] out: 2015-11-12 10:40:53 CoordinatorStreamSystemProducer [INFO] 
Starting coordinator stream producer.
[yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemProducer [INFO] Creating a 
new producer for system mykafka.
[yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [INFO] ProducerConfig 
values:
[yarnmaster01] out:     value.serializer = class 
org.apache.kafka.common.serialization.ByteArraySerializer
[yarnmaster01] out:     key.serializer = class 
org.apache.kafka.common.serialization.ByteArraySerializer
[yarnmaster01] out:     block.on.buffer.full = true
[yarnmaster01] out:     retry.backoff.ms = 100
[yarnmaster01] out:     buffer.memory = 33554432
[yarnmaster01] out:     batch.size = 16384
[yarnmaster01] out:     metrics.sample.window.ms = 30000
[yarnmaster01] out:     metadata.max.age.ms = 300000
[yarnmaster01] out:     receive.buffer.bytes = 32768
[yarnmaster01] out:     timeout.ms = 30000
[yarnmaster01] out:     max.in.flight.requests.per.connection = 1
[yarnmaster01] out:     bootstrap.servers = [devstream01.chartbeat.net:9092]
[yarnmaster01] out:     metric.reporters = []
[yarnmaster01] out:     client.id = 
samza_producer-metrics_reporter-1-1447342853273-4
[yarnmaster01] out:     compression.type = none
[yarnmaster01] out:     retries = 2147483647
[yarnmaster01] out:     max.request.size = 1048576
[yarnmaster01] out:     send.buffer.bytes = 131072
[yarnmaster01] out:     acks = 1
[yarnmaster01] out:     reconnect.backoff.ms = 10
[yarnmaster01] out:     linger.ms = 0
[yarnmaster01] out:     metrics.num.samples = 2
[yarnmaster01] out:     metadata.fetch.timeout.ms = 60000
[yarnmaster01] out:
[yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The configuration 
batch.num.messages = null was supplied but isn't a known config.
[yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The configuration 
producer.type = null was supplied but isn't a known config.
[yarnmaster01] out: Exception in thread "main" org.apache.samza.SamzaException: 
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 60000 ms.
[yarnmaster01] out:     at 
org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.send(CoordinatorStreamSystemProducer.java:115)
[yarnmaster01] out:     at 
org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.writeConfig(CoordinatorStreamSystemProducer.java:132)
[yarnmaster01] out:     at 
org.apache.samza.job.JobRunner.run(JobRunner.scala:85)
[yarnmaster01] out:     at 
org.apache.samza.job.JobRunner$.main(JobRunner.scala:43)
[yarnmaster01] out:     at org.apache.samza.job.JobRunner.main(JobRunner.scala)
[yarnmaster01] out: Caused by: org.apache.kafka.common.errors.TimeoutException: 
Failed to update metadata after 60000 ms.
[yarnmaster01] out:


Warning: run() received nonzero return code 1 while executing './bin/run-job.sh 
-config-factory=org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path=file://$PWD/conf/metrics_reporter.properties'!


This looks similar to https://issues.apache.org/jira/browse/SAMZA-560 but I’m 
not using a StreamAppender in log4j.

Any ideas? My first thought is that I might have to delete the existing 
checkpoint topics but that would mean we can’t migrate completely until the 
10.0 release unless we want to run snapshot code in production.

Thanks!

Rick


Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to