from:"Anish Haldiya"

Re: serialization issue

2015-08-13 Thread Anish Haldiya

While submitting the job, you can use --jars, --driver-classpath etc configurations to add the jar. Apart from that if you are running the job as a standalone application, then you can use the sc.addJar option to add the jar (which will ship this jar into all the executors) Regards, Anish On 8/

Re: Reduce number of partitions before saving to file. coalesce or repartition?

2015-08-13 Thread Anish Haldiya

Hi, If you are decreasing the number of partitions in this RDD, consider using coalesce, which can avoid performing a shuffle. However, if you're doing a drastic coalesce, e.g. to numPartitions = 1, this may result in your computation taking place on fewer nodes than you like (e.g. one node in th

Re: java.io.NotSerializableException: org.apache.avro.mapred.AvroKey using spark with avro

2014-12-18 Thread Anish Haldiya

Hi, I had the same problem. One option (starting with Spark 1.2, which is currently in preview) is to use the Avro library for Spark SQL. Other is using Kryo Serialization. by default spark uses Java Serialization, you can specify kryo serialization while creating spark context. val conf = new S

Re: serialization issue

Re: Reduce number of partitions before saving to file. coalesce or repartition?

Re: java.io.NotSerializableException: org.apache.avro.mapred.AvroKey using spark with avro

3 matches

Site Navigation

Mail list logo

Footer information