Re: [Spark Streaming][Problem with DataFrame UDFs]

2016-01-21 Thread Jean-Pierre OCALAN
Quick correction in the code snippet I sent in my previous email: Line: val enrichedDF = inputDF.withColumn("semantic", udf(col("url"))) Should be replaced by: val enrichedDF = inputDF.withColumn("semantic", enrichUDF(col("url"))) On Thu, Jan 21, 2016 at 11:07 AM, Jean-Pierre OCALAN wrote: > H

Re: [Spark Streaming][Problem with DataFrame UDFs]

2016-01-21 Thread Jean-Pierre OCALAN
Hi Cody, First of all thanks a lot for your quick reply, although I have removed this post couple of hours after posting it because I ended up finding it was due to the way I was using DataFrame UDFs. Essentially I didn't know that UDFs were purely lazy and in case of the example below the UDF ge

Re: [Spark Streaming][Problem with DataFrame UDFs]

2016-01-21 Thread Cody Koeninger
If you can share an isolated example I'll take a look. Not something I've run into before. On Wed, Jan 20, 2016 at 3:53 PM, jpocalan wrote: > Hi, > > I have an application which creates a Kafka Direct Stream from 1 topic > having 5 partitions. > As a result each batch is composed of an RDD havi

Re: spark streaming problem saveAsTextFiles() does not write valid JSON to HDFS

2015-11-19 Thread Andy Davidson
Turns out data is in python format. ETL pipeline was over writing original data Andy From: Andrew Davidson Date: Thursday, November 19, 2015 at 6:58 PM To: "user @spark" Subject: spark streaming problem saveAsTextFiles() does not write valid JSON to HDFS > I am working on a simple POS. I a

Re: SPARK STREAMING PROBLEM

2015-05-28 Thread Sourav Chandra
The oproblem lies the way you are doing the processing. After the g.foreach(x => {println(x); println("")}) are you doing ssc.start. It means till now what you did is just setup the computation stpes but spark has not started any real processing. so when you do g.foreach what it iterat

Re: SPARK STREAMING PROBLEM

2015-05-28 Thread Sourav Chandra
You must start the StreamingContext by calling ssc.start() On Thu, May 28, 2015 at 6:57 PM, Animesh Baranawal < animeshbarana...@gmail.com> wrote: > Hi, > > I am trying to extract the filenames from which a Dstream is generated by > parsing the toDebugString method on RDD > I am implementing the