Re: Gradient Descent with large model size

2015-10-19 Thread Mike Hynes
Hi Alexander, Joseph, Evan, I just wanted to weigh in an empirical result that we've had on a standalone cluster with 16 nodes and 256 cores. Typically we run optimization tasks with 256 partitions for 1 partition per core, and find that performance worsens with more partitions than physical core

Re: Problem building Spark

2015-10-19 Thread Ted Yu
See this thread http://search-hadoop.com/m/q3RTtV3VFNdgNri2&subj=Re+Build+spark+1+5+1+branch+fails > On Oct 19, 2015, at 6:59 PM, Annabel Melongo > wrote: > > I tried to build Spark according to the build directions and the it failed > due to the following error: > > > > > > > Bui

Re: Problem building Spark

2015-10-19 Thread Tathagata Das
Seems to be a heap space issue for Maven. Have you configured Maven's memory according the instruction on the web page? export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" On Mon, Oct 19, 2015 at 6:59 PM, Annabel Melongo < melongo_anna...@yahoo.com.invalid> wrote: > I

Problem building Spark

2015-10-19 Thread Annabel Melongo
I tried to build Spark according to the build directions and the it failed due to the following error:  |   | |   |   |   |   |   | | Building Spark - Spark 1.5.1 DocumentationBuilding Spark Building with build/mvn Building a Runnable Distribution Setting up Maven’s Memory Usage Specifying the H

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread shane knapp
things are green, nice catch on the job config, josh. On Mon, Oct 19, 2015 at 1:57 PM, shane knapp wrote: > ++joshrosen > > some of those 1.4 builds were incorrectly configured and launching on > a reserved executor... josh fixed them and we're looking a lot better > (meaning that we're building

Problem using User Defined Predicate pushdown with core RDD and parquet - UDP class not found

2015-10-19 Thread Vladimir Vladimirov
Hi all I feel like this questions is more Spark dev related that Spark user related. Please correct me if I'm wrong. My project's data flow involves sampling records from the data stored as Parquet dataset. I've checked DataFrames API and it doesn't support user defined predicates projection push

Re: Spark SQL: what does an exclamation mark mean in the plan?

2015-10-19 Thread Xiao Li
Hi, Michael, Thank you again! Just found the functions that generate the ! mark /** * A prefix string used when printing the plan. * * We use "!" to indicate an invalid plan, and "'" to indicate an unresolved plan. */ protected def statePrefix = if (missingInput.nonEmpty && childr

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread shane knapp
++joshrosen some of those 1.4 builds were incorrectly configured and launching on a reserved executor... josh fixed them and we're looking a lot better (meaning that we're building and not failing at launch). shane On Mon, Oct 19, 2015 at 1:49 PM, Patrick Wendell wrote: > I think many of them

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread Patrick Wendell
I think many of them are coming form the Spark 1.4 builds: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/Spark-1.4-Maven-pre-YARN/3900/console On Mon, Oct 19, 2015 at 1:44 PM, Patrick Wendell wrote: > This is what I'm looking at: > > > https://amplab.cs.berkele

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread Patrick Wendell
This is what I'm looking at: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ On Mon, Oct 19, 2015 at 12:58 PM, shane knapp wrote: > all we did was reboot -05 and -03... i'm seeing a bunch of green > builds. could you provide me w/some specific failures so i can

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread shane knapp
all we did was reboot -05 and -03... i'm seeing a bunch of green builds. could you provide me w/some specific failures so i can look in to them more closely? On Mon, Oct 19, 2015 at 12:27 PM, Patrick Wendell wrote: > Hey Shane, > > It also appears that every Spark build is failing right now. Co

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread Patrick Wendell
Hey Shane, It also appears that every Spark build is failing right now. Could it be related to your changes? - Patrick On Mon, Oct 19, 2015 at 11:13 AM, shane knapp wrote: > worker 05 is back up now... looks like the machine OOMed and needed > to be kicked. > > On Mon, Oct 19, 2015 at 9:39 AM

Re: Spark SQL: what does an exclamation mark mean in the plan?

2015-10-19 Thread Michael Armbrust
It means that there is an invalid attribute reference (i.e. a #n where the attribute is missing from the child operator). On Sun, Oct 18, 2015 at 11:38 PM, Xiao Li wrote: > Hi, all, > > After turning on the trace, I saw a strange exclamation mark in > the intermediate plans. This happened in cat

Re: BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread shane knapp
worker 05 is back up now... looks like the machine OOMed and needed to be kicked. On Mon, Oct 19, 2015 at 9:39 AM, shane knapp wrote: > i'll have to head down to the colo and see what's up with it... it > seems to be wedged (pings ok, can't ssh in) and i'll update the list > when i figure out w

RE: Gradient Descent with large model size

2015-10-19 Thread Ulanov, Alexander
Evan, Joseph Thank you for valuable suggestions. It would be great to improve TreeAggregate (if possible). Making less updates would certainly make sense, though that will mean using batch gradient such as LBFGS. It seems as today it is the only viable option in Spark. I will also take a look

Re: ShuffledHashJoin Possible Issue

2015-10-19 Thread Davies Liu
Can you reproduce it on master? I can't reproduce it with the following code: >>> t2 = sqlContext.range(50).selectExpr("concat('A', id) as id") >>> t1 = sqlContext.range(10).selectExpr("concat('A', id) as id") >>> t1.join(t2).where(t1.id == t2.id).explain() ShuffledHashJoin [id#21], [id#19], Buil

BUILD SYSTEM: amp-jenkins-worker-05 offline

2015-10-19 Thread shane knapp
i'll have to head down to the colo and see what's up with it... it seems to be wedged (pings ok, can't ssh in) and i'll update the list when i figure out what's wrong. i don't think it caught fire (#toosoon?), because everything else is up and running. :) shane

Building Spark w/ 1.8 and binary incompatibilities

2015-10-19 Thread Iulian Dragoș
Hey all, tl;dr; I built Spark with Java 1.8 even though my JAVA_HOME pointed to 1.7. Then it failed with binary incompatibilities. I couldn’t find any mention of this in the docs, so It might be a known thing, but it’s definitely too easy to do the wrong thing. The problem is that Maven is using

failed mesos task loses executor

2015-10-19 Thread Adrian Bridgett
Just testing spark v1.5.0 (on mesos v0.23) and we saw something unexpected (according to the event timeline) - when a spark task failed (intermittent S3 connection failure), the whole executor was removed and was never recovered so the job proceeded slower than normal. Looking at the code I sa

Re: Unable to run applications on spark in standalone cluster mode

2015-10-19 Thread Jean-Baptiste Onofré
Hi Rohith, Do you have multiple interfaces on the machine hosting the master ? If so, can you try to force to the public interface using: sbin/start-master.sh --ip xxx.xxx.xxx.xxx Regards JB On 10/19/2015 02:05 PM, Rohith Parameshwara wrote: Hi all, I am doing some experime

Unable to run applications on spark in standalone cluster mode

2015-10-19 Thread Rohith Parameshwara
Hi all, I am doing some experiments on spark standalone cluster setup and I am facing the following issue: I have a 4 node cluster setup. As per http://spark.apache.org/docs/latest/spark-standalone.html#starting-a-cluster-manually I tried to start the cluster with the scripts but

RE: ShuffledHashJoin Possible Issue

2015-10-19 Thread gsvic
Hi Hao, Each table is created with the following python code snippet: data = [{'id': 'A%d'%i, 'value':ceil(random()*10)} for i in range(0,50)] with open('A.json', 'w+') as output: json.dump(data, output) The tables A and B containing 10 and 50 tuples respectively. In spark shell I type sq

Re: Spark driver reducing total executors count even when Dynamic Allocation is disabled.

2015-10-19 Thread Saisai Shao
This is a deliberate killing request by heartbeat mechanism, have nothing to do with dynamic allocation. Here because you're running on yarn mode, so "supportDynamicAllocation" will be true, but actually there's no relation to dynamic allocation. >From my understanding "doRequestTotalExecutors" is

Re: Haskell language Spark support

2015-10-19 Thread weymouth
Wojciech, I am a programmer with over 30 years of programming experience, most recently in Java, and with lots of experience in languages like LISP (functional), and R (array/list). I'm currently learning Haskell, and working in an environment where I need to apply Spark to "large data". I'd be v

Spark driver reducing total executors count even when Dynamic Allocation is disabled.

2015-10-19 Thread prakhar jauhari
Hey all, Thanks in advance. I ran into a situation where spark driver reduced the total executors count for my job even with dynamic allocation disabled, and caused the job to hang for ever. Setup: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. All servers in cluster running Linux version 2.6.32.