Spark driver main thread hanging after SQL insert

2014-12-31 Thread Alessandro Baretta
Here's what the console shows: 15/01/01 01:12:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 58.0, whose tasks have all completed, from pool 15/01/01 01:12:29 INFO scheduler.DAGScheduler: Stage 58 (runJob at ParquetTableOperations.scala:326) finished in 5493.549 s 15/01/01 01:12:29 INFO sche

Today's Jenkins failures in the Spark Maven builds

2014-12-31 Thread Josh Rosen
If you've been following AMPLab Jenkins today, you'll notice that there's been a huge number of Spark test failures in the maintenance branches and Maven builds. My best guess as to what's causing this is that I pushed a backport to all maintenance branches at a moment where Jenkins was otherwise

Re: Why the major.minor version of the new hive-exec is 51.0?

2014-12-31 Thread Michael Armbrust
This was not intended, can you open a JIRA? On Tue, Dec 30, 2014 at 8:40 PM, Ted Yu wrote: > I extracted org/apache/hadoop/hive/common/CompressionUtils.class from the > jar and used hexdump to view the class file. > Bytes 6 and 7 are 00 and 33, respectively. > > According to http://en.wikipedia.

Re: Big performance difference between "client" and "cluster" deployment mode; is this expected?

2014-12-31 Thread Sean Owen
-dev, +user A decent guess: Does your 'save' function entail collecting data back to the driver? and are you running this from a machine that's not in your Spark cluster? Then in client mode you're shipping data back to a less-nearby machine, compared to with cluster mode. That could explain the b

Big performance difference between "client" and "cluster" deployment mode; is this expected?

2014-12-31 Thread Enno Shioji
Hi, I have a very, very simple streaming job. When I deploy this on the exact same cluster, with the exact same parameters, I see big (40%) performance difference between "client" and "cluster" deployment mode. This seems a bit surprising.. Is this expected? The streaming job is: val msgStre