MLIB is old RDD-based API since Apache Spark 2 is recommended to use dataset based APIs to get good performance and introduce ML.
ML contains new API build around Dataset and ML Pipelines ,mllib is slowly being deprecated (this already happened in case of linear regression) MLIB currently entered into maintenance mode. Regards, Vaquar khan On Sat, Sep 23, 2017 at 4:04 PM, Koert Kuipers <ko...@tresata.com> wrote: > our main challenge has been the lack of support for missing values > generally > > On Sat, Sep 23, 2017 at 3:41 AM, Irfan Kabli <irfan.kabli...@gmail.com> > wrote: > >> Dear All, >> >> We are looking to position MLLib in our organisation for machine learning >> tasks and are keen to understand if their are any challenges that you might >> have seen with MLLib in production. We will be going with the pure >> open-source approach here, rather than using one of the hadoop >> distributions out their in the market. >> >> Furthemore, with a multi-tenant hadoop cluster, and data in memory, would >> spark support encrypting the data in memory with DataFrames. >> >> -- >> Best Regards, >> Irfan Kabli >> >> > -- Regards, Vaquar Khan +1 -224-436-0783 Greater Chicago