RE: how to setup steady state stream partitions

2014-09-10 Thread qihong
Thanks for your response! I found that too, and it does the trick! Here's refined code: val inputDStream = ... val keyedDStream = inputDStream.map(...) // use sensorId as key val partitionedDStream = keyedDstream.transform(rdd => rdd.partitionBy(new MyPartitioner(...))) val stateDStream = part

RE: how to setup steady state stream partitions

2014-09-10 Thread Anton Brazhnyk
Just a guess. updateStateByKey has overloaded variants with partitioner as parameter. Can it help? -Original Message- From: qihong [mailto:qc...@pivotal.io] Sent: Tuesday, September 09, 2014 9:13 PM To: u...@spark.incubator.apache.org Subject: Re: how to setup steady state stream

Re: how to setup steady state stream partitions

2014-09-09 Thread qihong
Thanks for your response. I do have something like: val inputDStream = ... val keyedDStream = inputDStream.map(...) // use sensorId as key val partitionedDStream = keyedDstream.transform(rdd => rdd.partitionBy(new MyPartitioner(...))) val stateDStream = partitionedDStream.updateStateByKey[...](ud

Re: how to setup steady state stream partitions

2014-09-09 Thread x
Using your own partitioner didn't work? e.g. YourRDD.partitionBy(new HashPartitioner(your number)) xj @ Tokyo On Wed, Sep 10, 2014 at 12:03 PM, qihong wrote: > I'm working on a DStream application. The input are sensors' measurements, > the data format is > > There are 10 thousands sensors,