Re: [discuss] DataFrame function namespacing

2015-05-04 Thread Reynold Xin
After talking with people on this thread and offline, I've decided to go with option 1, i.e. putting everything in a single "functions" object. On Thu, Apr 30, 2015 at 10:04 AM, Ted Yu wrote: > IMHO I would go with choice #1 > > Cheers > > On Wed, Apr 29, 2015 at 10:03 PM, Reynold Xin wrote: >

Re: [discuss] DataFrame function namespacing

2015-04-30 Thread Ted Yu
IMHO I would go with choice #1 Cheers On Wed, Apr 29, 2015 at 10:03 PM, Reynold Xin wrote: > We definitely still have the name collision problem in SQL. > > On Wed, Apr 29, 2015 at 10:01 PM, Punyashloka Biswal < > punya.bis...@gmail.com > > wrote: > > > Do we still have to keep the names of the

Re: [discuss] DataFrame function namespacing

2015-04-29 Thread Reynold Xin
We definitely still have the name collision problem in SQL. On Wed, Apr 29, 2015 at 10:01 PM, Punyashloka Biswal wrote: > Do we still have to keep the names of the functions distinct to avoid > collisions in SQL? Or is there a plan to allow "importing" a namespace into > SQL somehow? > > I ask b

Re: [discuss] DataFrame function namespacing

2015-04-29 Thread Punyashloka Biswal
Do we still have to keep the names of the functions distinct to avoid collisions in SQL? Or is there a plan to allow "importing" a namespace into SQL somehow? I ask because if we have to keep worrying about name collisions then I'm not sure what the added complexity of #2 and #3 buys us. Punya On

Re: [discuss] DataFrame function namespacing

2015-04-29 Thread Reynold Xin
Scaladoc isn't much of a problem because scaladocs are grouped. Java/Python is the main problem ... See https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$ On Wed, Apr 29, 2015 at 3:38 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > My feeli

Re: [discuss] DataFrame function namespacing

2015-04-29 Thread Shivaram Venkataraman
My feeling is that we should have a handful of namespaces (say 4 or 5). It becomes too cumbersome to import / remember more package names and having everything in one package makes it hard to read scaladoc etc. Thanks Shivaram On Wed, Apr 29, 2015 at 3:30 PM, Reynold Xin wrote: > To add a littl

Re: [discuss] DataFrame function namespacing

2015-04-29 Thread Reynold Xin
To add a little bit more context, some pros/cons I can think of are: Option 1: Very easy for users to find the function, since they are all in org.apache.spark.sql.functions. However, there will be quite a large number of them. Option 2: I can't tell why we would want this one over Option 3, sinc

[discuss] DataFrame function namespacing

2015-04-29 Thread Reynold Xin
Before we make DataFrame non-alpha, it would be great to decide how we want to namespace all the functions. There are 3 alternatives: 1. Put all in org.apache.spark.sql.functions. This is how SQL does it, since SQL doesn't have namespaces. I estimate eventually we will have ~ 200 functions. 2. Ha