Re: Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Alexandr Dzhagriev
Good to know, thanks. On Mon, Feb 1, 2016 at 6:57 PM, Ted Yu wrote: > Got around the previous error by adding: > > scala> implicit val kryoEncoder = Encoders.kryo[RecordExample] > kryoEncoder: org.apache.spark.sql.Encoder[RecordExample] = class[value[0]: > binary] > > On Mon, Feb 1, 2016 at 9:55

Re: Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Ted Yu
Got around the previous error by adding: scala> implicit val kryoEncoder = Encoders.kryo[RecordExample] kryoEncoder: org.apache.spark.sql.Encoder[RecordExample] = class[value[0]: binary] On Mon, Feb 1, 2016 at 9:55 AM, Alexandr Dzhagriev wrote: > Hi, > > That's another thing: that the Record ca

Re: Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Alexandr Dzhagriev
Hi, That's another thing: that the Record case class should be outside. I ran it as spark-submit. Thanks, Alex. On Mon, Feb 1, 2016 at 6:41 PM, Ted Yu wrote: > Running your sample in spark-shell built in master branch, I got: > > scala> val dataset = sc.parallelize(Seq(RecordExample(1, "apple"

Re: Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Ted Yu
Running your sample in spark-shell built in master branch, I got: scala> val dataset = sc.parallelize(Seq(RecordExample(1, "apple"), RecordExample(2, "orange"))).toDS() org.apache.spark.sql.AnalysisException: Unable to generate an encoder for inner class `RecordExample` without access to the scope

Re: Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Alexandr Dzhagriev
Hello again, Also I've tried the following snippet with concat_ws: val dataset = sc.parallelize(Seq( RecordExample(1, "apple"), RecordExample(1, "banana"), RecordExample(2, "orange")) ).toDS().groupBy($"a").agg(concat_ws(",", $"b").as[String]) dataset.take(10).foreach(println) which also

Re: Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Alexandr Dzhagriev
Hi Ted, That doesn't help neither as one method delegates to another as far as I can see: def collect_list(columnName: String): Column = collect_list(Column(columnName)) Thanks, Alex On Mon, Feb 1, 2016 at 5:55 PM, Ted Yu wrote: > bq. agg(collect_list("b") > > Have you tried: > > agg(collect

Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Alexandr Dzhagriev
Hello, I'm trying to run the following example code: import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.sql.functions._ case class RecordExample(a: Int, b: String) object ArrayExample { def main(args: Array[String]) { va

Re: Failed to 'collect_set' with dataset in spark 1.6

2016-02-01 Thread Ted Yu
bq. agg(collect_list("b") Have you tried: agg(collect_list($"b") On Mon, Feb 1, 2016 at 8:50 AM, Alexandr Dzhagriev wrote: > Hello, > > I'm trying to run the following example code: > > import org.apache.spark.sql.hive.HiveContext > import org.apache.spark.{SparkContext, SparkConf} > import or