Re: SchemaRDD select expression

2014-07-31 Thread Buntu Dev
Thanks Michael for confirming! On Thu, Jul 31, 2014 at 2:43 PM, Michael Armbrust wrote: > The performance should be the same using the DSL or SQL strings. > > > On Thu, Jul 31, 2014 at 2:36 PM, Buntu Dev wrote: > >> I was not sure if registerAsTable() and then query against that table >> have

Re: SchemaRDD select expression

2014-07-31 Thread Michael Armbrust
The performance should be the same using the DSL or SQL strings. On Thu, Jul 31, 2014 at 2:36 PM, Buntu Dev wrote: > I was not sure if registerAsTable() and then query against that table have > additional performance impact and if DSL eliminates that. > > > On Thu, Jul 31, 2014 at 2:33 PM, Zong

Re: SchemaRDD select expression

2014-07-31 Thread Buntu Dev
I was not sure if registerAsTable() and then query against that table have additional performance impact and if DSL eliminates that. On Thu, Jul 31, 2014 at 2:33 PM, Zongheng Yang wrote: > Looking at what this patch [1] has to do to achieve it, I am not sure > if you can do the same thing in 1.

Re: SchemaRDD select expression

2014-07-31 Thread Zongheng Yang
Looking at what this patch [1] has to do to achieve it, I am not sure if you can do the same thing in 1.0.0 using DSL only. Just curious, why don't you use the hql() / sql() methods and pass a query string in? [1] https://github.com/apache/spark/pull/1211/files On Thu, Jul 31, 2014 at 2:20 PM, Bu

Re: SchemaRDD select expression

2014-07-31 Thread Buntu Dev
Thanks Zongheng for the pointer. Is there a way to achieve the same in 1.0.0 ? On Thu, Jul 31, 2014 at 1:43 PM, Zongheng Yang wrote: > countDistinct is recently added and is in 1.0.2. If you are using that > or the master branch, you could try something like: > > r.select('keyword, countDis

Re: SchemaRDD select expression

2014-07-31 Thread Zongheng Yang
countDistinct is recently added and is in 1.0.2. If you are using that or the master branch, you could try something like: r.select('keyword, countDistinct('userId)).groupBy('keyword) On Thu, Jul 31, 2014 at 12:27 PM, buntu wrote: > I'm looking to write a select statement to get a distinct c