Re: java.io.NotSerializableException: org.apache.avro.mapred.AvroKey using spark with avro

2014-12-18 Thread anish
SparkConf().set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") val sc = new SparkContext(conf) This worked for me. Regards, Anish -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-NotSerializableException-or

Re: serialization issue

2015-08-13 Thread Anish Haldiya
While submitting the job, you can use --jars, --driver-classpath etc configurations to add the jar. Apart from that if you are running the job as a standalone application, then you can use the sc.addJar option to add the jar (which will ship this jar into all the executors) Regards, Anish On 8

Re: Reduce number of partitions before saving to file. coalesce or repartition?

2015-08-13 Thread Anish Haldiya
de in the case of numPartitions = 1). To avoid this, you can pass shuffle = true. This will add a shuffle step, but means the current upstream partitions will be executed in parallel (per whatever the current partitioning is). Regards, anish On 8/14/15, Alexander Pivovarov wrote: > Hi

Re: java.io.NotSerializableException: org.apache.avro.mapred.AvroKey using spark with avro

2014-12-18 Thread Anish Haldiya
SparkConf().set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") val sc = new SparkContext(conf) This worked for me. Regards, Anish

Spark Streaming with Kafka | Check if DStream is Empty | HDFS Write

2014-05-22 Thread Anish Sneh
messages.   Please suggest. TIA -- Anish Sneh "Experience is the best teacher." +91-99718-55883 http://in.linkedin.com/in/anishsneh