Hi Raju, Have you tried setNumPartitions with a larger number?
2017-03-07 0:30 GMT-08:00 Eli Super <eli.su...@gmail.com>: > Hi > > It's area of knowledge , you will need to read online several hours about > it > > What is your programming language ? > > Try search online : "machine learning binning %my_programing_langauge%" > and > "machine learning feature engineering %my_programing_langauge%" > > On Tue, Mar 7, 2017 at 3:39 AM, Raju Bairishetti <r...@apache.org> wrote: > >> @Eli, Thanks for the suggestion. If you do not mind can you please >> elaborate approaches? >> >> On Mon, Mar 6, 2017 at 7:29 PM, Eli Super <eli.su...@gmail.com> wrote: >> >>> Hi >>> >>> Try to implement binning and/or feature engineering (smart feature >>> selection for example) >>> >>> Good luck >>> >>> On Mon, Mar 6, 2017 at 6:56 AM, Raju Bairishetti <r...@apache.org> >>> wrote: >>> >>>> Hi, >>>> I am new to Spark ML Lib. I am using FPGrowth model for finding >>>> related items. >>>> >>>> Number of transactions are 63K and the total number of items in all >>>> transactions are 200K. >>>> >>>> I am running FPGrowth model to generate frequent items sets. It is >>>> taking huge amount of time to generate frequent itemsets.* I am >>>> setting min-support value such that each item appears in at least ~(number >>>> of items)/(number of transactions).* >>>> >>>> It is taking lots of time in case If I say item can appear at least >>>> once in the database. >>>> >>>> If I give higher value to min-support then output is very smaller. >>>> >>>> Could anyone please guide me how to reduce the execution time for >>>> generating frequent items? >>>> >>>> ------ >>>> Thanks, >>>> Raju Bairishetti, >>>> www.lazada.com >>>> >>> >>> >> >> >> -- >> >> ------ >> Thanks, >> Raju Bairishetti, >> www.lazada.com >> > >