Hi,

What about this?

scala> final case class Person(name: String, age: Int)
warning: there was one unchecked warning; re-run with -unchecked for details
defined class Person

scala> val ds = Seq(Person("foo", 42), Person("bar", 24)).toDS
ds: org.apache.spark.sql.Dataset[Person] = [name: string, age: int]

scala> ds.as("a").joinWith(ds.as("b"), $"a.name" === $"b.name").show(false)
+--------+--------+
|_1      |_2      |
+--------+--------+
|[foo,42]|[foo,42]|
|[bar,24]|[bar,24]|
+--------+--------+

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Tue, Jun 7, 2016 at 9:30 PM, Koert Kuipers <ko...@tresata.com> wrote:
> for some operators on Dataset, like joinWith, one needs to use an expression
> which means referring to columns by name.
>
> how can i set the column names for a Dataset before doing a joinWith?
>
> currently i am aware of:
> df.toDF("k", "v").as[(K, V)]
>
> but that seems inefficient/anti-pattern? i shouldn't have to go to a
> DataFrame and back to set the column names?
>
> or if this is the only way to set names, and column names really shouldn't
> be used in Datasets, can i perhaps refer to the columns by their position?
>
> thanks, koert

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to