IMHO I would go with choice #1 Cheers
On Wed, Apr 29, 2015 at 10:03 PM, Reynold Xin <r...@databricks.com> wrote: > We definitely still have the name collision problem in SQL. > > On Wed, Apr 29, 2015 at 10:01 PM, Punyashloka Biswal < > punya.bis...@gmail.com > > wrote: > > > Do we still have to keep the names of the functions distinct to avoid > > collisions in SQL? Or is there a plan to allow "importing" a namespace > into > > SQL somehow? > > > > I ask because if we have to keep worrying about name collisions then I'm > > not sure what the added complexity of #2 and #3 buys us. > > > > Punya > > > > On Wed, Apr 29, 2015 at 3:52 PM Reynold Xin <r...@databricks.com> wrote: > > > >> Scaladoc isn't much of a problem because scaladocs are grouped. > >> Java/Python > >> is the main problem ... > >> > >> See > >> > >> > https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$ > >> > >> On Wed, Apr 29, 2015 at 3:38 PM, Shivaram Venkataraman < > >> shiva...@eecs.berkeley.edu> wrote: > >> > >> > My feeling is that we should have a handful of namespaces (say 4 or > 5). > >> It > >> > becomes too cumbersome to import / remember more package names and > >> having > >> > everything in one package makes it hard to read scaladoc etc. > >> > > >> > Thanks > >> > Shivaram > >> > > >> > On Wed, Apr 29, 2015 at 3:30 PM, Reynold Xin <r...@databricks.com> > >> wrote: > >> > > >> >> To add a little bit more context, some pros/cons I can think of are: > >> >> > >> >> Option 1: Very easy for users to find the function, since they are > all > >> in > >> >> org.apache.spark.sql.functions. However, there will be quite a large > >> >> number > >> >> of them. > >> >> > >> >> Option 2: I can't tell why we would want this one over Option 3, > since > >> it > >> >> has all the problems of Option 3, and not as nice of a hierarchy. > >> >> > >> >> Option 3: Opposite of Option 1. Each "package" or static class has a > >> small > >> >> number of functions that are relevant to each other, but for some > >> >> functions > >> >> it is unclear where they should go (e.g. should "min" go into basic > or > >> >> math?) > >> >> > >> >> > >> >> > >> >> > >> >> On Wed, Apr 29, 2015 at 3:21 PM, Reynold Xin <r...@databricks.com> > >> wrote: > >> >> > >> >> > Before we make DataFrame non-alpha, it would be great to decide how > >> we > >> >> > want to namespace all the functions. There are 3 alternatives: > >> >> > > >> >> > 1. Put all in org.apache.spark.sql.functions. This is how SQL does > >> it, > >> >> > since SQL doesn't have namespaces. I estimate eventually we will > >> have ~ > >> >> 200 > >> >> > functions. > >> >> > > >> >> > 2. Have explicit namespaces, which is what master branch currently > >> looks > >> >> > like: > >> >> > > >> >> > - org.apache.spark.sql.functions > >> >> > - org.apache.spark.sql.mathfunctions > >> >> > - ... > >> >> > > >> >> > 3. Have explicit namespaces, but restructure them slightly so > >> everything > >> >> > is under functions. > >> >> > > >> >> > package object functions { > >> >> > > >> >> > // all the old functions here -- but deprecated so we keep source > >> >> > compatibility > >> >> > def ... > >> >> > } > >> >> > > >> >> > package org.apache.spark.sql.functions > >> >> > > >> >> > object mathFunc { > >> >> > ... > >> >> > } > >> >> > > >> >> > object basicFuncs { > >> >> > ... > >> >> > } > >> >> > > >> >> > > >> >> > > >> >> > >> > > >> > > >> > > >