Re: Map tuple to case class in Dataset

Tim Gautier Wed, 01 Jun 2016 08:46:30 -0700

I was getting a warning about /tmp/hive not being writable whenever I
started spark-shell, but I was ignoring it. I decided to set the
permissions to 777 and restart the shell. After doing that, I now get the
same result as Ted Yu when running Seq(1,2).toDS.map(t => Test(t)).show.


On Wed, Jun 1, 2016 at 9:05 AM Tim Gautier <tim.gaut...@gmail.com> wrote:

> I spun up another EC2 cluster today with Spark 1.6.1 and I still get the
> error.
>
> scala>       case class Test(a: Int)
> defined class Test
>
> scala>       Seq(1,2).toDS.map(t => Test(t)).show
> 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 39.0 in stage
> 0.0 (TID 39, ip-10-2-2-203.us-west-2.compute.internal):
> java.lang.NoClassDefFoundError: Could not initialize class $line29.$read$
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> 16/06/01 15:04:21 INFO scheduler.TaskSetManager: Starting task 39.1 in
> stage 0.0 (TID 40, ip-10-2-2-111.us-west-2.compute.internal, partition
> 39,PROCESS_LOCAL, 2386 bytes)
> 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 19.0 in stage
> 0.0 (TID 19, ip-10-2-2-203.us-west-2.compute.internal):
> java.lang.ExceptionInInitializerError
> at $line29.$read$$iwC.<init>(<console>:7)
> at $line29.$read.<init>(<console>:24)
> at $line29.$read$.<init>(<console>:28)
> at $line29.$read$.<clinit>(<console>)
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
> at
> $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:35)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at $line3.$read$$iwC$$iwC.<init>(<console>:15)
> at $line3.$read$$iwC.<init>(<console>:24)
> at $line3.$read.<init>(<console>:26)
> at $line3.$read$.<init>(<console>:30)
> at $line3.$read$.<clinit>(<console>)
> ... 18 more
>
>
> On Tue, May 31, 2016 at 8:48 PM Tim Gautier <tim.gaut...@gmail.com> wrote:
>
>> That's really odd. I copied that code directly out of the shell and it
>> errored out on me, several times. I wonder if something I did previously
>> caused some instability. I'll see if it happens again tomorrow.
>>
>> On Tue, May 31, 2016, 8:37 PM Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Using spark-shell of 1.6.1 :
>>>
>>> scala> case class Test(a: Int)
>>> defined class Test
>>>
>>> scala> Seq(1,2).toDS.map(t => Test(t)).show
>>> +---+
>>> |  a|
>>> +---+
>>> |  1|
>>> |  2|
>>> +---+
>>>
>>> FYI
>>>
>>> On Tue, May 31, 2016 at 7:35 PM, Tim Gautier <tim.gaut...@gmail.com>
>>> wrote:
>>>
>>>> 1.6.1 The exception is a null pointer exception. I'll paste the whole
>>>> thing after I fire my cluster up again tomorrow.
>>>>
>>>> I take it by the responses that this is supposed to work?
>>>>
>>>> Anyone know when the next version is coming out? I keep running into
>>>> bugs with 1.6.1 that are hindering my progress.
>>>>
>>>> On Tue, May 31, 2016, 8:21 PM Saisai Shao <sai.sai.s...@gmail.com>
>>>> wrote:
>>>>
>>>>> It works fine in my local test, I'm using latest master, maybe this
>>>>> bug is already fixed.
>>>>>
>>>>> On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust <
>>>>> mich...@databricks.com> wrote:
>>>>>
>>>>>> Version of Spark? What is the exception?
>>>>>>
>>>>>> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier <tim.gaut...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> How should I go about mapping from say a Dataset[(Int,Int)] to a
>>>>>>> Dataset[<case class here>]?
>>>>>>>
>>>>>>> I tried to use a map, but it throws exceptions:
>>>>>>>
>>>>>>> case class Test(a: Int)
>>>>>>> Seq(1,2).toDS.map(t => Test(t)).show
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tim
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>

Re: Map tuple to case class in Dataset

Reply via email to