Re: Spark SQL: preferred syntax for column reference?

Michael Armbrust Wed, 13 May 2015 10:24:32 -0700

I would not say that either method is preferred (neither is
old/deprecated).  One advantage to the second is that you are referencing a
column from a specific dataframe, instead of just providing a string that
will be resolved much like an identifier in a SQL query.


This means given:
df1 = [id: int, name: string ....]
df2 = [id: int, zip: int]

I can do something like:

df1.join(df2, df1("id") === df2("id"))

Where as I would need aliases if I was only using strings:

df1.as("a").join(df2.as("b"), $"a.id" === $"b.id")

On Wed, May 13, 2015 at 9:55 AM, Diana Carroll <dcarr...@cloudera.com>
wrote:

> I'm just getting started with Spark SQL and DataFrames in 1.3.0.
>
> I notice that the Spark API shows a different syntax for referencing
> columns in a dataframe than the Spark SQL Programming Guide.
>
> For instance, the API docs for the select method show this:
> df.select($"colA", $"colB")
>
>
> Whereas the programming guide shows this:
> df.filter(df("name") > 21).show()
>
> I tested and both the $"column" and df(column) syntax works, but I'm
> wondering which is *preferred*.  Is one the original and one a new
> feature we should be using?
>
> Thanks,
> Diana
> (Spark Curriculum Developer for Cloudera)
>

Re: Spark SQL: preferred syntax for column reference?

Reply via email to