Re: [discuss] DataFrame function namespacing

Ted Yu Thu, 30 Apr 2015 10:05:11 -0700

IMHO I would go with choice #1

Cheers


On Wed, Apr 29, 2015 at 10:03 PM, Reynold Xin <r...@databricks.com> wrote:

> We definitely still have the name collision problem in SQL.
>
> On Wed, Apr 29, 2015 at 10:01 PM, Punyashloka Biswal <
> punya.bis...@gmail.com
> > wrote:
>
> > Do we still have to keep the names of the functions distinct to avoid
> > collisions in SQL? Or is there a plan to allow "importing" a namespace
> into
> > SQL somehow?
> >
> > I ask because if we have to keep worrying about name collisions then I'm
> > not sure what the added complexity of #2 and #3 buys us.
> >
> > Punya
> >
> > On Wed, Apr 29, 2015 at 3:52 PM Reynold Xin <r...@databricks.com> wrote:
> >
> >> Scaladoc isn't much of a problem because scaladocs are grouped.
> >> Java/Python
> >> is the main problem ...
> >>
> >> See
> >>
> >>
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$
> >>
> >> On Wed, Apr 29, 2015 at 3:38 PM, Shivaram Venkataraman <
> >> shiva...@eecs.berkeley.edu> wrote:
> >>
> >> > My feeling is that we should have a handful of namespaces (say 4 or
> 5).
> >> It
> >> > becomes too cumbersome to import / remember more package names and
> >> having
> >> > everything in one package makes it hard to read scaladoc etc.
> >> >
> >> > Thanks
> >> > Shivaram
> >> >
> >> > On Wed, Apr 29, 2015 at 3:30 PM, Reynold Xin <r...@databricks.com>
> >> wrote:
> >> >
> >> >> To add a little bit more context, some pros/cons I can think of are:
> >> >>
> >> >> Option 1: Very easy for users to find the function, since they are
> all
> >> in
> >> >> org.apache.spark.sql.functions. However, there will be quite a large
> >> >> number
> >> >> of them.
> >> >>
> >> >> Option 2: I can't tell why we would want this one over Option 3,
> since
> >> it
> >> >> has all the problems of Option 3, and not as nice of a hierarchy.
> >> >>
> >> >> Option 3: Opposite of Option 1. Each "package" or static class has a
> >> small
> >> >> number of functions that are relevant to each other, but for some
> >> >> functions
> >> >> it is unclear where they should go (e.g. should "min" go into basic
> or
> >> >> math?)
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Apr 29, 2015 at 3:21 PM, Reynold Xin <r...@databricks.com>
> >> wrote:
> >> >>
> >> >> > Before we make DataFrame non-alpha, it would be great to decide how
> >> we
> >> >> > want to namespace all the functions. There are 3 alternatives:
> >> >> >
> >> >> > 1. Put all in org.apache.spark.sql.functions. This is how SQL does
> >> it,
> >> >> > since SQL doesn't have namespaces. I estimate eventually we will
> >> have ~
> >> >> 200
> >> >> > functions.
> >> >> >
> >> >> > 2. Have explicit namespaces, which is what master branch currently
> >> looks
> >> >> > like:
> >> >> >
> >> >> > - org.apache.spark.sql.functions
> >> >> > - org.apache.spark.sql.mathfunctions
> >> >> > - ...
> >> >> >
> >> >> > 3. Have explicit namespaces, but restructure them slightly so
> >> everything
> >> >> > is under functions.
> >> >> >
> >> >> > package object functions {
> >> >> >
> >> >> >   // all the old functions here -- but deprecated so we keep source
> >> >> > compatibility
> >> >> >   def ...
> >> >> > }
> >> >> >
> >> >> > package org.apache.spark.sql.functions
> >> >> >
> >> >> > object mathFunc {
> >> >> >   ...
> >> >> > }
> >> >> >
> >> >> > object basicFuncs {
> >> >> >   ...
> >> >> > }
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
>

Re: [discuss] DataFrame function namespacing

Reply via email to