Hi Thanks for the help. In my Case ... I want to perform operation on 30 record per second using spark streaming. and difference between key of records is around 33-34 ms and my RDD that have 30 records already have 4 partition. and right now my algo take around 400 ms to perform operation on 1 record . so i want to distribute my records evenly so every executor perform operation only on one record and my 1 second batch will be completed without delay.
On Tue, Nov 17, 2015 at 7:50 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote: > Think about how you want to distribute your data and how your keys are > spread currently. Do you want to compute something per day, per week etc. > Based on that, return a partition number. You could use mod 30 or some such > function to get the partitions. > On Nov 18, 2015 5:17 AM, "prateek arora" <prateek.arora...@gmail.com> > wrote: > >> Hi >> I am trying to implement custom partitioner using this link >> http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where >> ( in link example key value is from 0 to (noOfElement - 1)) >> >> but not able to understand how i implement custom partitioner in my >> case: >> >> my parent RDD have 4 partition and RDD key is : TimeStamp and Value is >> JPEG Byte Array >> >> >> Regards >> Prateek >> >> >> On Tue, Nov 17, 2015 at 9:28 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> Please take a look at the following for example: >>> >>> ./core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala >>> ./core/src/main/scala/org/apache/spark/Partitioner.scala >>> >>> Cheers >>> >>> On Tue, Nov 17, 2015 at 9:24 AM, prateek arora < >>> prateek.arora...@gmail.com> wrote: >>> >>>> Hi >>>> Thanks >>>> I am new in spark development so can you provide some help to write a >>>> custom partitioner to achieve this. >>>> if you have and link or example to write custom partitioner please >>>> provide to me. >>>> >>>> On Mon, Nov 16, 2015 at 6:13 PM, Sabarish Sasidharan < >>>> sabarish.sasidha...@manthan.com> wrote: >>>> >>>>> You can write your own custom partitioner to achieve this >>>>> >>>>> Regards >>>>> Sab >>>>> On 17-Nov-2015 1:11 am, "prateek arora" <prateek.arora...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> I have a RDD with 30 record ( Key/value pair ) and running 30 >>>>>> executor . i >>>>>> want to reparation this RDD in to 30 partition so every partition >>>>>> get one >>>>>> record and assigned to one executor . >>>>>> >>>>>> when i used rdd.repartition(30) its repartition my rdd in 30 >>>>>> partition but >>>>>> some partition get 2 record , some get 1 record and some not getting >>>>>> any >>>>>> record . >>>>>> >>>>>> is there any way in spark so i can evenly distribute my record in all >>>>>> partition . >>>>>> >>>>>> Regards >>>>>> Prateek >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html >>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>> Nabble.com. >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>> >>>>>> >>>> >>> >>