If you are trying to keep such long term state, it will be more robust in
the long term to use a dedicated data store (cassandra/HBase/etc.) that is
designed for long term storage.
On Tue, Jul 28, 2015 at 4:37 PM, swetha wrote:
>
>
> Hi TD,
>
> We have a requirement to maintain the user sessio
Hi TD,
We have a requirement to maintain the user session state and to
maintain/update the metrics for minute, day and hour granularities for a
user session in our Streaming job. Can I keep those granularities in the
state and recalculate each time there is a change? How would the performance
you can try repartition/coalesce and make the final RDD into a single
partition before saveAsTextFile.
This should bring the content of whole RDD into single part-
-
Lalit Yadav
la...@sigmoidanalytics.com
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.c
Hi
I am able to save my RDD generated to local file that are coming from Spark
SQL that are getting from Spark Streaming. If i put the steamingcontext to
10 sec the data coming in that 10 sec time window is only processed by my
sql and the data is stored in the location i specified and for next
Hi TD,
It Worked...Thank you so much for all your help.
Thanks,
-Srinivas.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Json-file-groupby-function-tp9618p10132.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
This is a basic scala problem. You cannot apply toInt to Any. Try doing
toString.toInt
For such scala issues, I recommend trying it out in the Scala shell. For
example, you could have tried this out as the following.
[tdas @ Xion streaming] scala
Welcome to Scala version 2.10.3 (Java HotSpot(TM)
hi TD,
Thanks for the solutions for my previous post...I am running into other
issue..i am getting data from json file and i am trying to parse it and
trying to map it to a record given below
val jsonf
=lines.map(JSON.parseFull(_)).map(_.get.asInstanceOf[scala.collection.immutable.Map[Any
I think I know what the problem is. Spark Streaming is constantly doing
garbage cleanup by throwing away data that it does not based on the
operations in the DStream. Here the DSTream operations are not aware of the
spark sql queries thats happening asynchronous to spark streaming. So data
is being
Hi Srinivas,
Seems the query you used is val results =sqlContext.sql("select type from
table1"). However, table1 does not have a field called type. The schema of
table1 is defined as the class definition of your case class Record (i.e. ID,
name, score, and school are fields of your table1). Can yo
Hi TD,
I Defines the Case Class outside the main method and was able to compile
the code successfully. But getting a run time error when trying to process
some json file from kafka. here is the code i an to compile
import java.util.Properties
import kafka.producer._
import org.apache.spark.stre
Can you try defining the case class outside the main function. In fact
outside the object?
TD
On Tue, Jul 15, 2014 at 8:20 PM, srinivas wrote:
> Hi TD,
>
> I uncomment import sqlContext._ and tried to compile the code
>
> import java.util.Properties
> import kafka.producer._
> import org.apac
Hi TD,
I uncomment import sqlContext._ and tried to compile the code
import java.util.Properties
import kafka.producer._
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.SparkConf
import scal
You need to have
import sqlContext._
so just uncomment that and it should work.
TD
On Tue, Jul 15, 2014 at 1:40 PM, srinivas wrote:
> I am still getting the error...even if i convert it to record
> object KafkaWordCount {
> def main(args: Array[String]) {
> if (args.length < 4) {
>
I am still getting the error...even if i convert it to record
object KafkaWordCount {
def main(args: Array[String]) {
if (args.length < 4) {
System.err.println("Usage: KafkaWordCount
")
System.exit(1)
}
//StreamingExamples.setStreamingLogLevels()
val Array(zkQuorum
I see you have the code to convert to Record class but commented it out.
That is the right way to go. When you are converting it to a 4-tuple with "
(data("type"),data("name"),data("score"),data("school"))" ... its of type
(Any, Any, Any, Any) as data("xyz") returns Any. And registerAsTable
probab
Hi TD,
Thanks for ur help...i am able to convert map to records using case class.
I am left with doing some aggregations. I am trying to do some SQL type
operations on my records set. My code looks like
case class Record(ID:Int,name:String,score:Int,school:String)
//val records = jsonf.map(m =>
In general it may be a better idea to actually convert the records from
hashmaps, to a specific data structure. Say
case class Record(id: int, name: String, mobile: String, score: Int,
test_type: String ... )
Then you should be able to do something like
val records = jsonf.map(m => convertMapToR
Hi,
Thanks for ur reply...i imported StreamingContext and right now i am
getting my Dstream as something like
map(id -> 123, name -> srini, mobile -> 12324214, score -> 123, test_type
-> math)
map(id -> 321, name -> vasu, mobile -> 73942090, score -> 324, test_type
->sci)
map(id -> 432, name -
Hi,
Thanks for ur reply...i imported StreamingContext and right now i am
getting my Dstream as something like
map(id -> 123, name -> srini, mobile -> 12324214, score -> 123, test_type
-> math)
map(id -> 321, name -> vasu, mobile -> 73942090, score -> 324, test_type
->sci)
map(id -> 432, name -
You have to import StreamingContext._ to enable groupByKey operations on
DStreams. After importing that you can apply groupByKey on any DStream,
that is a DStream of key-value pairs (e.g. DStream[(String, Int)]) . The
data in each pair RDDs will be grouped by the first element in the tuple as
the
20 matches
Mail list logo