Re: Trying to get kafka data to Hadoop

max square Wed, 04 Mar 2015 20:43:30 -0800

Thunder,

thanks for your reply. The hadoop job is now correctly configured (the
client was not getting the correct jars), however I am getting Avro
formatting exceptions due to the format the schema-repo server follows. I
think I will do something similar and create our own branch that uses the
schema repo. Any gotchas you can advice on?


Thanks!

Max

On Wed, Mar 4, 2015 at 9:24 PM, Thunder Stumpges <tstump...@ntent.com>
wrote:

> What branch of camus are you using? We have our own fork that we updated
> the camus dependency from the avro snapshot of the REST Schema Repository
> to the new "official" one you mention in github.com/schema-repo. I was
> not aware of a branch on the main linked-in camus repo that has this.
>
> That being said, we are doing essentially this same thing however we are
> using a single shaded uber-jar. I believe the maven project builds this
> automatically doesnt it?
>
> I'll take a look at the details of how we are invoking this on our site
> and get back to you.
>
> Cheers,
> Thunder
>
>
> -----Original Message-----
> From: max square [max2subscr...@gmail.com]
> Received: Wednesday, 04 Mar 2015, 5:38PM
> To: users@kafka.apache.org [users@kafka.apache.org]
> Subject: Trying to get kafka data to Hadoop
>
> Hi all,
>
> I have browsed through different conversations around Camus, and bring this
> as a kinda Kafka question. I know is not the most orthodox, but if someone
> has some thoughts I'd appreciate ir.
>
> That said, I am trying to set up Camus, using a 3 node Kafka cluster
> 0.8.2.1, using a project that is trying to build Avro Schema-Repo
> <https://github.com/schema-repo/schema-repo>. All of the Avro schemas for
> the topics are published correctly. I am building Camus and using:
>
> hadoop jar camus-example-0.1.0-SNAPSHOT-shaded.jar com.linkedin.camus.etl.
> kafka.CamusJob -libjars $CAMUS_LIBJARS  -D mapreduce.job.user.classpath.
> first=true -P config.properties
>
> As the command to start the job, where I have set up an environment
> variable that holds all the libjars that the mvn package command generates.
>
> I have also set the following properties to configure the job:
> camus.message.decoder.class=com.linkedin.camus.etl.kafka.coders.
> LatestSchemaKafkaAvroMessageDecoder
>
> kafka.message.coder.schema.registry.class=com.linkedin.camus.schemaregistry.
> AvroRestSchemaRegistry
>
> etl.schema.registry.url=http://10.0.14.25:2876/schema-repo/
>
> When I execute the job I get an Exception indicating the
> AvroRestSchemaRegistry class can't be found (I've double checked it's part
> of the libjars). I wanted to ask if this is the correct way to set up this
> integration, and if anyone has pointers on why the job is not finding the
> class AvroRestSchemaRegistry
>
> Thanks in advance for the help!
>
> Max
>
> Follows the complete stack trace:
>
> [CamusJob] - failed to create decoder
>
> com.linkedin.camus.coders.MessageDecoderException:
> com.linkedin.camus.coders
> .MessageDecoderException:java.lang.ClassNotFoundException: com.linkedin.
> camus.schemaregistry.AvroRestSchemaRegistry
>        at com.linkedin.camus.etl.kafka.coders.MessageDecoderFactory.
> createMessageDecoder(MessageDecoderFactory.java:29)
>
>   at
> com.linkedin.camus.etl.kafka.mapred.EtlInputFormat.createMessageDecoder
> (EtlInputFormat.java:391)
>
>        at com.linkedin.camus.etl.kafka.mapred.EtlInputFormat.getSplits(
> EtlInputFormat.java:256)
>
>        at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:
> 1107)
>
>        at
> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1124
> )
>
>  at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:178)
>
>        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1023)
>
>        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:976)
>
>        at java.security.AccessController.doPrivileged(Native Method)
>
>        at javax.security.auth.Subject.doAs(Subject.java:415)
>
>        at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1642)
>
>        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.
> java:976)
>
>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:582)
>
>        at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:335)
>
>        at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:563)
>
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>
>        at com.linkedin.camus.etl.kafka.CamusJob.main(CamusJob.java:518)
>
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodessorImpl.
> java:57)
>
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
>        at java.lang.reflect.Method.invoke(Method.
>

Re: Trying to get kafka data to Hadoop

Reply via email to