Yeah, I would like to address any actual gaps in functionality that are present.
On Wed, Dec 9, 2015 at 4:24 PM, Cristian Opris <cristian.b.op...@gmail.com> wrote: > The reason I'm asking is because it's important in larger projects to be > able to stick to a particular programming style. Some people are more > comfortable with SQL, others might find the DF api more suitable, but it's > important to have full expressivity in both to make it easier to adopt one > approach rather than have to mix and match to achieve full functionality. > > On 9 December 2015 at 19:41, Xiao Li <gatorsm...@gmail.com> wrote: > >> That sounds great! When it is decided, please let us know and we can add >> more features and make it ANSI SQL compliant. >> >> Thank you! >> >> Xiao Li >> >> >> 2015-12-09 11:31 GMT-08:00 Michael Armbrust <mich...@databricks.com>: >> >>> I don't plan to abandon HiveQL compatibility, but I'd like to see us >>> move towards something with more SQL compliance (perhaps just newer >>> versions of the HiveQL parser). Exactly which parser will do that for us >>> is under investigation. >>> >>> On Wed, Dec 9, 2015 at 11:02 AM, Xiao Li <gatorsm...@gmail.com> wrote: >>> >>>> Hi, Michael, >>>> >>>> Does that mean SqlContext will be built on HiveQL in the near future? >>>> >>>> Thanks, >>>> >>>> Xiao Li >>>> >>>> >>>> 2015-12-09 10:36 GMT-08:00 Michael Armbrust <mich...@databricks.com>: >>>> >>>>> I think that it is generally good to have parity when the >>>>> functionality is useful. However, in some cases various features are >>>>> there >>>>> just to maintain compatibility with other system. For example CACHE TABLE >>>>> is eager because Shark's cache table was. df.cache() is lazy because >>>>> Spark's cache is. Does that mean that we need to add some eager caching >>>>> mechanism to dataframes to have parity? Probably not, users can just call >>>>> .count() if they want to force materialization. >>>>> >>>>> Regarding the differences between HiveQL and the SQLParser, I think we >>>>> should get rid of the SQL parser. Its kind of a hack that I built just so >>>>> that there was some SQL story for people who didn't compile with Hive. >>>>> Moving forward, I'd like to see the distinction between the HiveContext >>>>> and >>>>> SQLContext removed and we can standardize on a single parser. For this >>>>> reason I'd be opposed to spending a lot of dev/reviewer time on adding >>>>> features there. >>>>> >>>>> On Wed, Dec 9, 2015 at 8:34 AM, Cristian O < >>>>> cristian.b.op...@googlemail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I was wondering what the "official" view is on feature parity between >>>>>> SQL and DF apis. Docs are pretty sparse on the SQL front, and it seems >>>>>> that >>>>>> some features are only supported at various times in only one of Spark >>>>>> SQL >>>>>> dialect, HiveQL dialect and DF API. DF.cube(), DISTRIBUTE BY, CACHE LAZY >>>>>> are some examples >>>>>> >>>>>> Is there an explicit goal of having consistent support for all >>>>>> features in both DF and SQL ? >>>>>> >>>>>> Thanks, >>>>>> Cristian >>>>>> >>>>> >>>>> >>>> >>> >> >