Hi Raju,

Have you tried setNumPartitions with a larger number?

2017-03-07 0:30 GMT-08:00 Eli Super <eli.su...@gmail.com>:

> Hi
>
> It's area of knowledge , you will need to read online several hours about
> it
>
> What is your programming language ?
>
> Try search online : "machine learning binning %my_programing_langauge%"
> and
> "machine learning feature engineering %my_programing_langauge%"
>
> On Tue, Mar 7, 2017 at 3:39 AM, Raju Bairishetti <r...@apache.org> wrote:
>
>> @Eli, Thanks for the suggestion. If you do not mind can you please
>> elaborate approaches?
>>
>> On Mon, Mar 6, 2017 at 7:29 PM, Eli Super <eli.su...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> Try to implement binning and/or feature engineering (smart feature
>>> selection for example)
>>>
>>> Good luck
>>>
>>> On Mon, Mar 6, 2017 at 6:56 AM, Raju Bairishetti <r...@apache.org>
>>> wrote:
>>>
>>>> Hi,
>>>>   I am new to Spark ML Lib. I am using FPGrowth model for finding
>>>> related items.
>>>>
>>>> Number of transactions are 63K and the total number of items in all
>>>> transactions are 200K.
>>>>
>>>> I am running FPGrowth model to generate frequent items sets. It is
>>>> taking huge amount of time to generate frequent itemsets.* I am
>>>> setting min-support value such that each item appears in at least ~(number
>>>> of items)/(number of transactions).*
>>>>
>>>> It is taking lots of time in case If I say item can appear at least
>>>> once in the database.
>>>>
>>>> If I give higher value to min-support then output is very smaller.
>>>>
>>>> Could anyone please guide me how to reduce the execution time for
>>>> generating frequent items?
>>>>
>>>> ------
>>>> Thanks,
>>>> Raju Bairishetti,
>>>> www.lazada.com
>>>>
>>>
>>>
>>
>>
>> --
>>
>> ------
>> Thanks,
>> Raju Bairishetti,
>> www.lazada.com
>>
>
>

Reply via email to