What branch of camus are you using? We have our own fork that we updated the camus dependency from the avro snapshot of the REST Schema Repository to the new "official" one you mention in github.com/schema-repo. I was not aware of a branch on the main linked-in camus repo that has this.
That being said, we are doing essentially this same thing however we are using a single shaded uber-jar. I believe the maven project builds this automatically doesnt it? I'll take a look at the details of how we are invoking this on our site and get back to you. Cheers, Thunder -----Original Message----- From: max square [max2subscr...@gmail.com] Received: Wednesday, 04 Mar 2015, 5:38PM To: users@kafka.apache.org [users@kafka.apache.org] Subject: Trying to get kafka data to Hadoop Hi all, I have browsed through different conversations around Camus, and bring this as a kinda Kafka question. I know is not the most orthodox, but if someone has some thoughts I'd appreciate ir. That said, I am trying to set up Camus, using a 3 node Kafka cluster 0.8.2.1, using a project that is trying to build Avro Schema-Repo <https://github.com/schema-repo/schema-repo>. All of the Avro schemas for the topics are published correctly. I am building Camus and using: hadoop jar camus-example-0.1.0-SNAPSHOT-shaded.jar com.linkedin.camus.etl. kafka.CamusJob -libjars $CAMUS_LIBJARS -D mapreduce.job.user.classpath. first=true -P config.properties As the command to start the job, where I have set up an environment variable that holds all the libjars that the mvn package command generates. I have also set the following properties to configure the job: camus.message.decoder.class=com.linkedin.camus.etl.kafka.coders. LatestSchemaKafkaAvroMessageDecoder kafka.message.coder.schema.registry.class=com.linkedin.camus.schemaregistry. AvroRestSchemaRegistry etl.schema.registry.url=http://10.0.14.25:2876/schema-repo/ When I execute the job I get an Exception indicating the AvroRestSchemaRegistry class can't be found (I've double checked it's part of the libjars). I wanted to ask if this is the correct way to set up this integration, and if anyone has pointers on why the job is not finding the class AvroRestSchemaRegistry Thanks in advance for the help! Max Follows the complete stack trace: [CamusJob] - failed to create decoder com.linkedin.camus.coders.MessageDecoderException: com.linkedin.camus.coders .MessageDecoderException:java.lang.ClassNotFoundException: com.linkedin. camus.schemaregistry.AvroRestSchemaRegistry at com.linkedin.camus.etl.kafka.coders.MessageDecoderFactory. createMessageDecoder(MessageDecoderFactory.java:29) at com.linkedin.camus.etl.kafka.mapred.EtlInputFormat.createMessageDecoder (EtlInputFormat.java:391) at com.linkedin.camus.etl.kafka.mapred.EtlInputFormat.getSplits( EtlInputFormat.java:256) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java: 1107) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1124 ) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:178) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1023) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:976) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs( UserGroupInformation.java:1642) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient. java:976) at org.apache.hadoop.mapreduce.Job.submit(Job.java:582) at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:335) at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:563) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at com.linkedin.camus.etl.kafka.CamusJob.main(CamusJob.java:518) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodessorImpl. java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.