Hi, What about this?
scala> final case class Person(name: String, age: Int) warning: there was one unchecked warning; re-run with -unchecked for details defined class Person scala> val ds = Seq(Person("foo", 42), Person("bar", 24)).toDS ds: org.apache.spark.sql.Dataset[Person] = [name: string, age: int] scala> ds.as("a").joinWith(ds.as("b"), $"a.name" === $"b.name").show(false) +--------+--------+ |_1 |_2 | +--------+--------+ |[foo,42]|[foo,42]| |[bar,24]|[bar,24]| +--------+--------+ Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Jun 7, 2016 at 9:30 PM, Koert Kuipers <ko...@tresata.com> wrote: > for some operators on Dataset, like joinWith, one needs to use an expression > which means referring to columns by name. > > how can i set the column names for a Dataset before doing a joinWith? > > currently i am aware of: > df.toDF("k", "v").as[(K, V)] > > but that seems inefficient/anti-pattern? i shouldn't have to go to a > DataFrame and back to set the column names? > > or if this is the only way to set names, and column names really shouldn't > be used in Datasets, can i perhaps refer to the columns by their position? > > thanks, koert --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org