I would not say that either method is preferred (neither is old/deprecated). One advantage to the second is that you are referencing a column from a specific dataframe, instead of just providing a string that will be resolved much like an identifier in a SQL query.
This means given: df1 = [id: int, name: string ....] df2 = [id: int, zip: int] I can do something like: df1.join(df2, df1("id") === df2("id")) Where as I would need aliases if I was only using strings: df1.as("a").join(df2.as("b"), $"a.id" === $"b.id") On Wed, May 13, 2015 at 9:55 AM, Diana Carroll <dcarr...@cloudera.com> wrote: > I'm just getting started with Spark SQL and DataFrames in 1.3.0. > > I notice that the Spark API shows a different syntax for referencing > columns in a dataframe than the Spark SQL Programming Guide. > > For instance, the API docs for the select method show this: > df.select($"colA", $"colB") > > > Whereas the programming guide shows this: > df.filter(df("name") > 21).show() > > I tested and both the $"column" and df(column) syntax works, but I'm > wondering which is *preferred*. Is one the original and one a new > feature we should be using? > > Thanks, > Diana > (Spark Curriculum Developer for Cloudera) >