Re: combineByKey

2019-04-05 Thread Jason Nerothin
Take a look at this SOF: https://stackoverflow.com/questions/24804619/how-does-spark-aggregate-function-aggregatebykey-work On Fri, Apr 5, 2019 at 12:25 PM Madabhattula Rajesh Kumar < mrajaf...@gmail.com> wrote: > Hi, > > Thank you for the details. It is a typo error while composing the mail. >

Re: combineByKey

2019-04-05 Thread Madabhattula Rajesh Kumar
Hi, Thank you for the details. It is a typo error while composing the mail. Below is the actual flow. Any idea, why the combineByKey is not working. aggregateByKey is working. //Defining createCombiner, mergeValue and mergeCombiner functions def createCombiner = (Id: String, value: String) => (

Re: combineByKey

2019-04-05 Thread Jason Nerothin
I broke some of your code down into the following lines: import spark.implicits._ val a: RDD[Messages]= sc.parallelize(messages) val b: Dataset[Messages] = a.toDF.as[Messages] val c: Dataset[(String, (String, String))] = b.map{x => (x.timeStamp + "-" + x.Id, (x.Id, x.value))} You

Re: combineByKey throws ClassCastException

2014-09-16 Thread Tao Xiao
This problem was caused by the fact that I used a package jar with a Spark version (0.9.1) different from that of the cluster (0.9.0). When I used the correct package jar (spark-assembly_2.10-0.9.0-cdh5.0.1-hadoop2.3.0-cdh5.0.1.jar) instead the application can run as expected. 2014-09-15 14:57 G

Re: combineByKey throws ClassCastException

2014-09-14 Thread x
How about this. scala> val rdd2 = rdd.combineByKey( | (v: Int) => v.toLong, | (c: Long, v: Int) => c + v, | (c1: Long, c2: Long) => c1 + c2) rdd2: org.apache.spark.rdd.RDD[(String, Long)] = MapPartitionsRDD[9] at combineB yKey at :14 xj @ Tokyo On Mon, Sep 15, 2014 at 3:06 PM, Tao

Re: combineByKey at ShuffledDStream.scala

2014-07-23 Thread Bill Jay
The streaming program contains the following main stages: 1. receive data from Kafka 2. preprocessing of the data. These are all map and filtering stages. 3. Group by a field 4. Process the groupBy results using map. Inside this processing, I use collect, count. Thanks! Bill On Tue, Jul 22, 20

Re: combineByKey at ShuffledDStream.scala

2014-07-22 Thread Tathagata Das
Can you give an idea of the streaming program? Rest of the transformation you are doing on the input streams? On Tue, Jul 22, 2014 at 11:05 AM, Bill Jay wrote: > Hi all, > > I am currently running a Spark Streaming program, which consumes data from > Kakfa and does the group by operation on the

Re: combinebykey throw classcastexception

2014-05-20 Thread xiemeilong
This issue is turned out cased by version mismatch between driver(0.9.1) and server(0.9.0-cdh5.0.1) just now. Other function works fine but combinebykey before. Thank you very much for your reply. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/combinebyke

Re: combinebykey throw classcastexception

2014-05-20 Thread Sean Owen
You asked off-list, and provided a more detailed example there: val random = new Random() val testdata = (1 to 1).map(_=>(random.nextInt(),random.nextInt())) sc.parallelize(testdata).combineByKey[ArrayBuffer[Int]]( (instant:Int)=>{new ArrayBuffer[Int]()}, (bucket:ArrayB