You can use this Maven dependency:
<dependency>
<groupId>com.twitter</groupId>
<artifactId>chill-avro</artifactId>
<version>0.4.0</version>
</dependency>
Simone Franzini, PhD
http://www.linkedin.com/in/simonefranzini
On Tue, Dec 9, 2014 at 9:53 AM, Cristovao Jose Domingues Cordeiro <
[email protected]> wrote:
> Thanks for the reply!
>
> I've tried in fact your code. But I lack the twiter chill package and I
> can not find it online. So I am now trying this
> http://spark.apache.org/docs/latest/tuning.html#data-serialization . But
> in case I can't do it, could you tell me where to get that Twiter package
> you used?
>
> Thanks
>
> Cumprimentos / Best regards,
> Cristóvão José Domingues Cordeiro
> IT Department - 28/R-018
> CERN
> ------------------------------
> *From:* Simone Franzini [[email protected]]
> *Sent:* 09 December 2014 16:42
> *To:* Cristovao Jose Domingues Cordeiro; user
>
> *Subject:* Re: NullPointerException When Reading Avro Sequence Files
>
> Hi Cristovao,
>
> I have seen a very similar issue that I have posted about in this thread:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-NPE-with-Array-td19797.html
> I think your main issue here is somewhat similar, in that the MapWrapper
> Scala class is not registered. This gets registered by the Twitter
> chill-scala AllScalaRegistrar class that you are currently not using.
>
> As far as I understand, in order to use Avro with Spark, you also have
> to use Kryo. This means you have to use the Spark KryoSerializer. This in
> turn uses Twitter chill. I posted the basic code that I am using here:
>
>
> http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-read-this-avro-file-using-spark-amp-scala-td19400.html#a19491
>
> Maybe there is a simpler solution to your problem but I am not that much
> of an expert yet. I hope this helps.
>
> Simone Franzini, PhD
>
> http://www.linkedin.com/in/simonefranzini
>
> On Tue, Dec 9, 2014 at 8:50 AM, Cristovao Jose Domingues Cordeiro <
> [email protected]> wrote:
>
>> Hi Simone,
>>
>> thanks but I don't think that's it.
>> I've tried several libraries within the --jar argument. Some do give what
>> you said. But other times (when I put the right version I guess) I get the
>> following:
>> 14/12/09 15:45:54 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID
>> 0)
>> java.io.NotSerializableException:
>> scala.collection.convert.Wrappers$MapWrapper
>> at
>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
>> at
>> java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
>>
>>
>> Which is odd since I am reading a Avro I wrote...with the same piece of
>> code:
>> https://gist.github.com/MLnick/5864741781b9340cb211
>>
>> Cumprimentos / Best regards,
>> Cristóvão José Domingues Cordeiro
>> IT Department - 28/R-018
>> CERN
>> ------------------------------
>> *From:* Simone Franzini [[email protected]]
>> *Sent:* 06 December 2014 15:48
>> *To:* Cristovao Jose Domingues Cordeiro
>> *Subject:* Re: NullPointerException When Reading Avro Sequence Files
>>
>> java.lang.IncompatibleClassChangeError: Found interface
>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>>
>> That is a sign that you are mixing up versions of Hadoop. This is
>> particularly an issue when dealing with AVRO. If you are using Hadoop 2,
>> you will need to get the hadoop 2 version of avro-mapred. In Maven you
>> would do this with the <classifier> hadoop2 </classifier> tag.
>>
>> Simone Franzini, PhD
>>
>> http://www.linkedin.com/in/simonefranzini
>>
>> On Fri, Dec 5, 2014 at 3:52 AM, cjdc <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> I've tried the above example on Gist, but it doesn't work (at least for
>>> me).
>>> Did anyone get this:
>>> 14/12/05 10:44:40 ERROR Executor: Exception in task 0.0 in stage 0.0
>>> (TID 0)
>>> java.lang.IncompatibleClassChangeError: Found interface
>>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>>> at
>>>
>>> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
>>> at
>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115)
>>> at
>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
>>> at
>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
>>> at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>> at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 14/12/05 10:44:40 ERROR ExecutorUncaughtExceptionHandler: Uncaught
>>> exception
>>> in thread Thread[Executor task launch worker-0,5,main]
>>> java.lang.IncompatibleClassChangeError: Found interface
>>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>>> at
>>>
>>> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
>>> at
>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115)
>>> at
>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
>>> at
>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
>>> at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>> at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 14/12/05 10:44:40 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1
>>> times;
>>> aborting job
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-reading-Avro-Sequence-files-tp10201p20456.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>