+1 for renaming the jar file. Sincerely,
DB Tsai ---------------------------------------------------------- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Tue, Apr 5, 2016 at 8:02 PM, Chris Fregly <ch...@fregly.com> wrote: > perhaps renaming to Spark ML would actually clear up code and documentation > confusion? > > +1 for rename > > On Apr 5, 2016, at 7:00 PM, Reynold Xin <r...@databricks.com> wrote: > > +1 > > This is a no brainer IMO. > > > On Tue, Apr 5, 2016 at 7:32 PM, Joseph Bradley <jos...@databricks.com> > wrote: >> >> +1 By the way, the JIRA for tracking (Scala) API parity is: >> https://issues.apache.org/jira/browse/SPARK-4591 >> >> On Tue, Apr 5, 2016 at 4:58 PM, Matei Zaharia <matei.zaha...@gmail.com> >> wrote: >>> >>> This sounds good to me as well. The one thing we should pay attention to >>> is how we update the docs so that people know to start with the spark.ml >>> classes. Right now the docs list spark.mllib first and also seem more >>> comprehensive in that area than in spark.ml, so maybe people naturally move >>> towards that. >>> >>> Matei >>> >>> On Apr 5, 2016, at 4:44 PM, Xiangrui Meng <m...@databricks.com> wrote: >>> >>> Yes, DB (cc'ed) is working on porting the local linear algebra library >>> over (SPARK-13944). There are also frequent pattern mining algorithms we >>> need to port over in order to reach feature parity. -Xiangrui >>> >>> On Tue, Apr 5, 2016 at 12:08 PM Shivaram Venkataraman >>> <shiva...@eecs.berkeley.edu> wrote: >>>> >>>> Overall this sounds good to me. One question I have is that in >>>> addition to the ML algorithms we have a number of linear algebra >>>> (various distributed matrices) and statistical methods in the >>>> spark.mllib package. Is the plan to port or move these to the spark.ml >>>> namespace in the 2.x series ? >>>> >>>> Thanks >>>> Shivaram >>>> >>>> On Tue, Apr 5, 2016 at 11:48 AM, Sean Owen <so...@cloudera.com> wrote: >>>> > FWIW, all of that sounds like a good plan to me. Developing one API is >>>> > certainly better than two. >>>> > >>>> > On Tue, Apr 5, 2016 at 7:01 PM, Xiangrui Meng <men...@gmail.com> >>>> > wrote: >>>> >> Hi all, >>>> >> >>>> >> More than a year ago, in Spark 1.2 we introduced the ML pipeline API >>>> >> built >>>> >> on top of Spark SQL’s DataFrames. Since then the new DataFrame-based >>>> >> API has >>>> >> been developed under the spark.ml package, while the old RDD-based >>>> >> API has >>>> >> been developed in parallel under the spark.mllib package. While it >>>> >> was >>>> >> easier to implement and experiment with new APIs under a new package, >>>> >> it >>>> >> became harder and harder to maintain as both packages grew bigger and >>>> >> bigger. And new users are often confused by having two sets of APIs >>>> >> with >>>> >> overlapped functions. >>>> >> >>>> >> We started to recommend the DataFrame-based API over the RDD-based >>>> >> API in >>>> >> Spark 1.5 for its versatility and flexibility, and we saw the >>>> >> development >>>> >> and the usage gradually shifting to the DataFrame-based API. Just >>>> >> counting >>>> >> the lines of Scala code, from 1.5 to the current master we added >>>> >> ~10000 >>>> >> lines to the DataFrame-based API while ~700 to the RDD-based API. So, >>>> >> to >>>> >> gather more resources on the development of the DataFrame-based API >>>> >> and to >>>> >> help users migrate over sooner, I want to propose switching RDD-based >>>> >> MLlib >>>> >> APIs to maintenance mode in Spark 2.0. What does it mean exactly? >>>> >> >>>> >> * We do not accept new features in the RDD-based spark.mllib package, >>>> >> unless >>>> >> they block implementing new features in the DataFrame-based spark.ml >>>> >> package. >>>> >> * We still accept bug fixes in the RDD-based API. >>>> >> * We will add more features to the DataFrame-based API in the 2.x >>>> >> series to >>>> >> reach feature parity with the RDD-based API. >>>> >> * Once we reach feature parity (possibly in Spark 2.2), we will >>>> >> deprecate >>>> >> the RDD-based API. >>>> >> * We will remove the RDD-based API from the main Spark repo in Spark >>>> >> 3.0. >>>> >> >>>> >> Though the RDD-based API is already in de facto maintenance mode, >>>> >> this >>>> >> announcement will make it clear and hence important to both MLlib >>>> >> developers >>>> >> and users. So we’d greatly appreciate your feedback! >>>> >> >>>> >> (As a side note, people sometimes use “Spark ML” to refer to the >>>> >> DataFrame-based API or even the entire MLlib component. This also >>>> >> causes >>>> >> confusion. To be clear, “Spark ML” is not an official name and there >>>> >> are no >>>> >> plans to rename MLlib to “Spark ML” at this time.) >>>> >> >>>> >> Best, >>>> >> Xiangrui >>>> > >>>> > --------------------------------------------------------------------- >>>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> > For additional commands, e-mail: user-h...@spark.apache.org >>>> > >>> >>> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org