Re: how can evenly distribute my records in all partition

prateek arora Wed, 18 Nov 2015 09:43:02 -0800

Hi
Thanks for the help.
In my Case ...
I want to perform operation on 30 record per second using spark streaming.
and difference between key of records is around 33-34 ms and my RDD that
have 30 records already have 4 partition.
and right now my algo take around 400 ms to perform operation on 1 record .
so i want to distribute my records evenly so every executor perform
operation only on one record and my 1 second batch will be completed
without delay.



On Tue, Nov 17, 2015 at 7:50 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote:

> Think about how you want to distribute your data and how your keys are
> spread currently. Do you want to compute something per day, per week etc.
> Based on that, return a partition number. You could use mod 30 or some such
> function to get the partitions.
> On Nov 18, 2015 5:17 AM, "prateek arora" <prateek.arora...@gmail.com>
> wrote:
>
>> Hi
>> I am trying to implement custom partitioner using this link
>> http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where
>> ( in link example key value is from 0 to (noOfElement - 1))
>>
>> but not able to understand how i  implement  custom partitioner  in my
>> case:
>>
>> my parent RDD have 4 partition and RDD key is : TimeStamp and Value is
>> JPEG Byte Array
>>
>>
>> Regards
>> Prateek
>>
>>
>> On Tue, Nov 17, 2015 at 9:28 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Please take a look at the following for example:
>>>
>>> ./core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala
>>> ./core/src/main/scala/org/apache/spark/Partitioner.scala
>>>
>>> Cheers
>>>
>>> On Tue, Nov 17, 2015 at 9:24 AM, prateek arora <
>>> prateek.arora...@gmail.com> wrote:
>>>
>>>> Hi
>>>> Thanks
>>>> I am new in spark development so can you provide some help to write a
>>>> custom partitioner to achieve this.
>>>> if you have and link or example to write custom partitioner please
>>>> provide to me.
>>>>
>>>> On Mon, Nov 16, 2015 at 6:13 PM, Sabarish Sasidharan <
>>>> sabarish.sasidha...@manthan.com> wrote:
>>>>
>>>>> You can write your own custom partitioner to achieve this
>>>>>
>>>>> Regards
>>>>> Sab
>>>>> On 17-Nov-2015 1:11 am, "prateek arora" <prateek.arora...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I have a RDD with 30 record ( Key/value pair ) and running 30
>>>>>> executor . i
>>>>>> want to reparation this RDD in to 30 partition so every partition
>>>>>> get one
>>>>>> record and assigned to one executor .
>>>>>>
>>>>>> when i used rdd.repartition(30) its repartition my rdd in 30
>>>>>> partition but
>>>>>> some partition get 2 record , some get 1 record and some not getting
>>>>>> any
>>>>>> record .
>>>>>>
>>>>>> is there any way in spark so i can evenly distribute my record in all
>>>>>> partition .
>>>>>>
>>>>>> Regards
>>>>>> Prateek
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>
>>>>>>
>>>>
>>>
>>

Re: how can evenly distribute my records in all partition

Reply via email to