Re: Spark streaming RDDs to Parquet records

2014-06-17 Thread contractor
Thanks Krishna. Seems like you have to use Avro and then convert that to Parquet. I was hoping to directly convert RDDs to Parquet files. I’ll look into this some more. Thanks, Mahesh From: Krishna Sankar mailto:ksanka...@gmail.com>> Reply-To: "user@spark.apache.org

Re: Spark streaming RDDs to Parquet records

2014-06-19 Thread contractor
mbrust <[hidden email]> wrote: If you convert the data to a SchemaRDD you can save it as Parquet: http://spark.apache.org/docs/latest/sql-programming-guide.html#using-parquet On Tue, Jun 17, 2014 at 11:47 PM, Padmanabhan, Mahesh (contractor) <[hidden email]> wrote: Thanks Krishna. S

Spark 1.0.1 NotSerialized exception (a bit of a head scratcher)

2014-08-07 Thread contractor
Hello all, I am not sure what is going on – I am getting a NotSerializedException and initially I thought it was due to not registering one of my classes with Kryo but that doesn’t seem to be the case. I am essentially eliminating duplicates in a spark streaming application by using a “window”

Re: Spark 1.0.1 NotSerialized exception (a bit of a head scratcher)

2014-08-07 Thread contractor
("duplicateCount", (oc-dc).toString())) OperationalStatProducer.produce(statBody) } catch { case e: Exception => DebugLogger.report(e) } } } }) }) On Thu, Aug 7, 2014 at 9:03 AM, Padmanabhan, Mahesh (contractor)

Re: Spark 1.0.1 NotSerialized exception (a bit of a head scratcher)

2014-08-07 Thread contractor
("duplicateCount", (oc-dc).toString())) OperationalStatProducer.produce(statBody) } catch { case e: Exception => DebugLogger.report(e) } } } }) }) On Thu, Aug 7, 2014 at 9:03 AM, Padmanabhan, Mahesh (cont

Re: Spark 1.0.1 NotSerialized exception (a bit of a head scratcher)

2014-08-07 Thread contractor
ing.Checkpoint@73a68f0) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1180) From: Tathagata Das mailto:tathagata.das1...@gmail.com>> Date: Thursday, August 7, 2014 at 11:31 AM To: Mahesh Padmanabhan mailto:mahesh.padmanab...@twc-contractor.com>> Cc: &qu

Re: Spark 1.0.1 NotSerialized exception (a bit of a head scratcher)

2014-08-07 Thread contractor
Thanks TD, Amit. I think I figured out where the problem is through the process of commenting out individual lines of code one at a time :( Can either of you help me find the right solution? I tried creating the SparkContext outside the foreachRDD but that didn’t help. I have an object (let’s

Re: Spark 1.0.1 NotSerialized exception (a bit of a head scratcher)

2014-08-07 Thread contractor
the foreachRDD. See if that helps. TD On Thu, Aug 7, 2014 at 12:47 PM, Padmanabhan, Mahesh (contractor) mailto:mahesh.padmanab...@twc-contractor.com>> wrote: Thanks TD, Amit. I think I figured out where the problem is through the process of commenting out individual lines of code on

Sending multiple DStream outputs

2014-09-18 Thread contractor
Hi all, I am using Spark 1.0 streaming to ingest a a high volume stream of data (approx. 1mm lines every few seconds) transform it into two outputs and send those outputs to two separate Apache Kafka topics. I have two blocks of output code like this: Stream1 = …. Stream2 = … Stream1.foreac

Re: Sending multiple DStream outputs

2014-09-18 Thread contractor
s of the app for >each and pass as arguments to each instance the input source and >output topic? > >On Thu, Sep 18, 2014 at 8:07 AM, Padmanabhan, Mahesh (contractor) > wrote: >> Hi all, >> >> I am using Spark 1.0 streaming to ingest a a high volume stream of data >

Spark Mesos integration bug?

2014-11-26 Thread contractor
Hi, We have been running Spark 1.0.2 with Mesos 0.20.1 in fine grained mode and for the most part it has been working well. We have been using mesos://zk://server1:2181,server2:2181,server3:2181/mesos as the spark master URL and this works great to get the Mesos leader. Unfortunately, this lea

Custom receiver runtime Kryo exception

2015-01-05 Thread contractor
Hello all, I am using Spark 1.0.2 and I have a custom receiver that works well. I tried adding Kryo serialization to SparkConf: val spark = new SparkConf() ….. .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") and I am getting a strange error that I am not sure how

Re: Kafka Spark Partition Mapping

2015-08-24 Thread Syed, Nehal (Contractor)
Dear Cody, Thanks for your response, I am trying to do decoration which means when a message comes from Kafka (partitioned by key) in to the Spark I want to add more fields/data to it. How Does normally people do it in Spark? If it were you how would you decorate message without hitting database

Spark consumes more memory

2017-05-11 Thread Anantharaman, Srinatha (Contractor)
Hi, I am reading a Hive Orc table into memory, StorageLevel is set to (StorageLevel.MEMORY_AND_DISK_SER) Total size of the Hive table is 5GB Started the spark-shell as below spark-shell --master yarn --deploy-mode client --num-executors 8 --driver-memory 5G --executor-memory 7G --executor-cores

RE: Spark consumes more memory

2017-05-11 Thread Anantharaman, Srinatha (Contractor)
, May 11, 2017 1:34 PM To: Anantharaman, Srinatha (Contractor) ; user Subject: Re: Spark consumes more memory I would try to track down the "no space left on device" - find out where that originates from, since you should be able to allocate 10 executors with 4 cores and 15GB RAM

Java SPI jar reload in Spark

2017-06-06 Thread Jonnas Li(Contractor)
I have a Spark Streaming application, which dynamically calling a jar (Java SPI), and the jar is called in a mapWithState() function, it was working fine for a long time. Recently, I got a requirement which required to reload the jar during runtime. But when the reloading is completed, the spark

Re: Java SPI jar reload in Spark

2017-06-06 Thread Jonnas Li(Contractor)
us from a functional point of view- functionality in jar changed and all your computation is wrong. On 6. Jun 2017, at 11:35, Jonnas Li(Contractor) mailto:zhongshuang...@envisioncn.com>> wrote: I have a Spark Streaming application, which dynamically calling a jar (Java SPI), and the jar is cal

Re: Java SPI jar reload in Spark

2017-06-06 Thread Jonnas Li(Contractor)
n google. https://github.com/spark-jobserver/spark-jobserver/issues/130 <https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links> Alonso Isidoro Roman about.me/alonso.isidoro.roman 2017-06-06 12:14 GMT+02:00 Jon

Re: Java SPI jar reload in Spark

2017-06-07 Thread Jonnas Li(Contractor)
mp;utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links> Alonso Isidoro Roman about.me/alonso.isidoro.roman 2017-06-06 12:14 GMT+02:00 Jonnas Li(Contractor) mailto:zhongshuang...@envisioncn.com>>: Thank for your quick response. These jars are used to defin

streaming and piping to R, sending all data in window to pipe()

2015-07-17 Thread PAULI, KEVIN CHRISTIAN [AG-Contractor/1000]
Spark newbie here, using Spark 1.3.1. I’m consuming a stream and trying to pipe the data from the entire window to R for analysis. The R algorithm needs the entire dataset from the stream (everything in the window) in order to function properly; it can’t be broken up. So I tried doing a coales