Re: Spark Streaming Json file groupby function

2015-07-28 Thread Tathagata Das
If you are trying to keep such long term state, it will be more robust in the long term to use a dedicated data store (cassandra/HBase/etc.) that is designed for long term storage. On Tue, Jul 28, 2015 at 4:37 PM, swetha wrote: > > > Hi TD, > > We have a requirement to maintain the user sessio

Re: Spark Streaming Json file groupby function

2015-07-28 Thread swetha
Hi TD, We have a requirement to maintain the user session state and to maintain/update the metrics for minute, day and hour granularities for a user session in our Streaming job. Can I keep those granularities in the state and recalculate each time there is a change? How would the performance

Re: Spark Streaming Json file groupby function

2014-07-30 Thread lalit1303
you can try repartition/coalesce and make the final RDD into a single partition before saveAsTextFile. This should bring the content of whole RDD into single part- - Lalit Yadav la...@sigmoidanalytics.com -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.c

Re: Spark Streaming Json file groupby function

2014-07-18 Thread srinivas
Hi I am able to save my RDD generated to local file that are coming from Spark SQL that are getting from Spark Streaming. If i put the steamingcontext to 10 sec the data coming in that 10 sec time window is only processed by my sql and the data is stored in the location i specified and for next

Re: Spark Streaming Json file groupby function

2014-07-17 Thread srinivas
Hi TD, It Worked...Thank you so much for all your help. Thanks, -Srinivas. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Json-file-groupby-function-tp9618p10132.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Streaming Json file groupby function

2014-07-17 Thread Tathagata Das
This is a basic scala problem. You cannot apply toInt to Any. Try doing toString.toInt For such scala issues, I recommend trying it out in the Scala shell. For example, you could have tried this out as the following. [tdas @ Xion streaming] scala Welcome to Scala version 2.10.3 (Java HotSpot(TM)

Re: Spark Streaming Json file groupby function

2014-07-17 Thread srinivas
hi TD, Thanks for the solutions for my previous post...I am running into other issue..i am getting data from json file and i am trying to parse it and trying to map it to a record given below val jsonf =lines.map(JSON.parseFull(_)).map(_.get.asInstanceOf[scala.collection.immutable.Map[Any

Re: Spark Streaming Json file groupby function

2014-07-16 Thread Tathagata Das
I think I know what the problem is. Spark Streaming is constantly doing garbage cleanup by throwing away data that it does not based on the operations in the DStream. Here the DSTream operations are not aware of the spark sql queries thats happening asynchronous to spark streaming. So data is being

Re: Spark Streaming Json file groupby function

2014-07-16 Thread Yin Huai
Hi Srinivas, Seems the query you used is val results =sqlContext.sql("select type from table1"). However, table1 does not have a field called type. The schema of table1 is defined as the class definition of your case class Record (i.e. ID, name, score, and school are fields of your table1). Can yo

Re: Spark Streaming Json file groupby function

2014-07-16 Thread srinivas
Hi TD, I Defines the Case Class outside the main method and was able to compile the code successfully. But getting a run time error when trying to process some json file from kafka. here is the code i an to compile import java.util.Properties import kafka.producer._ import org.apache.spark.stre

Re: Spark Streaming Json file groupby function

2014-07-15 Thread Tathagata Das
Can you try defining the case class outside the main function. In fact outside the object? TD On Tue, Jul 15, 2014 at 8:20 PM, srinivas wrote: > Hi TD, > > I uncomment import sqlContext._ and tried to compile the code > > import java.util.Properties > import kafka.producer._ > import org.apac

Re: Spark Streaming Json file groupby function

2014-07-15 Thread srinivas
Hi TD, I uncomment import sqlContext._ and tried to compile the code import java.util.Properties import kafka.producer._ import org.apache.spark.streaming._ import org.apache.spark.streaming.kafka._ import org.apache.spark.streaming.StreamingContext._ import org.apache.spark.SparkConf import scal

Re: Spark Streaming Json file groupby function

2014-07-15 Thread Tathagata Das
You need to have import sqlContext._ so just uncomment that and it should work. TD On Tue, Jul 15, 2014 at 1:40 PM, srinivas wrote: > I am still getting the error...even if i convert it to record > object KafkaWordCount { > def main(args: Array[String]) { > if (args.length < 4) { >

Re: Spark Streaming Json file groupby function

2014-07-15 Thread srinivas
I am still getting the error...even if i convert it to record object KafkaWordCount { def main(args: Array[String]) { if (args.length < 4) { System.err.println("Usage: KafkaWordCount ") System.exit(1) } //StreamingExamples.setStreamingLogLevels() val Array(zkQuorum

Re: Spark Streaming Json file groupby function

2014-07-15 Thread Tathagata Das
I see you have the code to convert to Record class but commented it out. That is the right way to go. When you are converting it to a 4-tuple with " (data("type"),data("name"),data("score"),data("school"))" ... its of type (Any, Any, Any, Any) as data("xyz") returns Any. And registerAsTable probab

Re: Spark Streaming Json file groupby function

2014-07-14 Thread srinivas
Hi TD, Thanks for ur help...i am able to convert map to records using case class. I am left with doing some aggregations. I am trying to do some SQL type operations on my records set. My code looks like case class Record(ID:Int,name:String,score:Int,school:String) //val records = jsonf.map(m =>

Re: Spark Streaming Json file groupby function

2014-07-14 Thread Tathagata Das
In general it may be a better idea to actually convert the records from hashmaps, to a specific data structure. Say case class Record(id: int, name: String, mobile: String, score: Int, test_type: String ... ) Then you should be able to do something like val records = jsonf.map(m => convertMapToR

Re: Spark Streaming Json file groupby function

2014-07-14 Thread srinivas
Hi, Thanks for ur reply...i imported StreamingContext and right now i am getting my Dstream as something like map(id -> 123, name -> srini, mobile -> 12324214, score -> 123, test_type -> math) map(id -> 321, name -> vasu, mobile -> 73942090, score -> 324, test_type ->sci) map(id -> 432, name -

Re: Spark Streaming Json file groupby function

2014-07-14 Thread srinivas
Hi, Thanks for ur reply...i imported StreamingContext and right now i am getting my Dstream as something like map(id -> 123, name -> srini, mobile -> 12324214, score -> 123, test_type -> math) map(id -> 321, name -> vasu, mobile -> 73942090, score -> 324, test_type ->sci) map(id -> 432, name -

Re: Spark Streaming Json file groupby function

2014-07-14 Thread Tathagata Das
You have to import StreamingContext._ to enable groupByKey operations on DStreams. After importing that you can apply groupByKey on any DStream, that is a DStream of key-value pairs (e.g. DStream[(String, Int)]) . The data in each pair RDDs will be grouped by the first element in the tuple as the