That sounds great! When it is decided, please let us know and we can add more features and make it ANSI SQL compliant.
Thank you! Xiao Li 2015-12-09 11:31 GMT-08:00 Michael Armbrust <mich...@databricks.com>: > I don't plan to abandon HiveQL compatibility, but I'd like to see us move > towards something with more SQL compliance (perhaps just newer versions of > the HiveQL parser). Exactly which parser will do that for us is under > investigation. > > On Wed, Dec 9, 2015 at 11:02 AM, Xiao Li <gatorsm...@gmail.com> wrote: > >> Hi, Michael, >> >> Does that mean SqlContext will be built on HiveQL in the near future? >> >> Thanks, >> >> Xiao Li >> >> >> 2015-12-09 10:36 GMT-08:00 Michael Armbrust <mich...@databricks.com>: >> >>> I think that it is generally good to have parity when the functionality >>> is useful. However, in some cases various features are there just to >>> maintain compatibility with other system. For example CACHE TABLE is eager >>> because Shark's cache table was. df.cache() is lazy because Spark's cache >>> is. Does that mean that we need to add some eager caching mechanism to >>> dataframes to have parity? Probably not, users can just call .count() if >>> they want to force materialization. >>> >>> Regarding the differences between HiveQL and the SQLParser, I think we >>> should get rid of the SQL parser. Its kind of a hack that I built just so >>> that there was some SQL story for people who didn't compile with Hive. >>> Moving forward, I'd like to see the distinction between the HiveContext and >>> SQLContext removed and we can standardize on a single parser. For this >>> reason I'd be opposed to spending a lot of dev/reviewer time on adding >>> features there. >>> >>> On Wed, Dec 9, 2015 at 8:34 AM, Cristian O < >>> cristian.b.op...@googlemail.com> wrote: >>> >>>> Hi, >>>> >>>> I was wondering what the "official" view is on feature parity between >>>> SQL and DF apis. Docs are pretty sparse on the SQL front, and it seems that >>>> some features are only supported at various times in only one of Spark SQL >>>> dialect, HiveQL dialect and DF API. DF.cube(), DISTRIBUTE BY, CACHE LAZY >>>> are some examples >>>> >>>> Is there an explicit goal of having consistent support for all features >>>> in both DF and SQL ? >>>> >>>> Thanks, >>>> Cristian >>>> >>> >>> >> >