date:20140621

Re: Spark Processing Large Data Stuck

2014-06-21 Thread Peng Cheng

JVM will quit after spending most of its time on GC (about 95%), but usually before that you have to wait for a long time, particularly if your job is already at massive scale. Since it is hard to run profiling online, maybe its easier for debugging if you make a lot of partitions (so you can watc

Re: Powered by Spark addition

2014-06-21 Thread Sonal Goyal

Thanks a lot Matei. Sent from my iPad > On Jun 22, 2014, at 5:20 AM, Matei Zaharia wrote: > > Alright, added you — sorry for the delay. > > Matei > >> On Jun 12, 2014, at 10:29 PM, Sonal Goyal wrote: >> >> Hi, >> >> Can we get added too? Here are the details: >> >> Name: Nube Technologie

Re: Spark throws NoSuchFieldError when testing on cluster mode

2014-06-21 Thread Peng Cheng

Hi Sean, OK I'm about 90% sure about the cause of this problem: Just another classic Dependency conflict: Myproject -> Selenium -> apache.httpcomponents:httpcore 4.3.1 (has ContentType) Spark -> Spark SQL Hive -> Hive -> Thrift -> apache.httpcomponents:httpcore 4.1.3 (has no ContentType) Though I

Re: Powered by Spark addition

2014-06-21 Thread Matei Zaharia

Alright, added you — sorry for the delay. Matei On Jun 12, 2014, at 10:29 PM, Sonal Goyal wrote: > Hi, > > Can we get added too? Here are the details: > > Name: Nube Technologies > URL: www.nubetech.co > Description: Nube provides solutions for data curation at scale helping > customer targe

Re: Using Spark

2014-06-21 Thread Matei Zaharia

Alright, added you. On Jun 20, 2014, at 2:52 PM, Ricky Thomas wrote: > Hi, > > Would like to add ourselves to the user list if possible please? > > Company: truedash > url: truedash.io > > Automatic pulling of all your data in to Spark for enterprise visualisation, > predictive analytics an

Re: Spark Processing Large Data Stuck

2014-06-21 Thread yxzhao

Thanks Krishna, I use a small cluster and each compute node has 16GB of RAM and 8 2.66GHz CPU cores. On Sat, Jun 21, 2014 at 3:16 PM, Krishna Sankar [via Apache Spark User List] wrote: > Hi, > >- I have seen similar behavior before. As far as I can tell, the root >cause is the ou

Re: Spark throws NoSuchFieldError when testing on cluster mode

2014-06-21 Thread Peng Cheng

I also found that any buggy application submitted in --deploy-mode = cluster mode will crash the worker (turn status to 'DEAD'). This shouldn't really happen, otherwise nobody will use this mode. It is yet unclear whether all workers will crash or only the one running the driver will (as I only hav

Re: Spark throws NoSuchFieldError when testing on cluster mode

2014-06-21 Thread Peng Cheng

Latest Advancement: I found the cause of NoClassDef exception: I wasn't using spark-submit, instead I tried to run the spark application directly with SparkConf set in the code. (this is handy in local debugging). However the old problem remains: Even my maven-shade plugin doesn't give any warning

Re: Spark Processing Large Data Stuck

2014-06-21 Thread Krishna Sankar

Hi, - I have seen similar behavior before. As far as I can tell, the root cause is the out of memory error - verified this by monitoring the memory. - I had a 30 GB file and was running on a single machine with 16GB. So I knew it would fail. - But instead of raising an exce

Re: Spark throws NoSuchFieldError when testing on cluster mode

2014-06-21 Thread Peng Cheng

Indeed I see a lot of duplicate package warning in the maven-shade assembly package output, so I tried to eliminate them: First I set scope of dependency to apache-spark to 'provided', as suggested in this page: http://spark.apache.org/docs/latest/submitting-applications.html But spark master gav

Spark Processing Large Data Stuck

2014-06-21 Thread yxzhao

I run the pagerank example processing a large data set, 5GB in size, using 48 machines. The job got stuck at the time point: 14/05/20 21:32:17, as the attached log shows. It was stuck there for more than 10 hours and then I killed it at last. But I did not find any information explaining why it was

Re: Performance problems on SQL JOIN

2014-06-21 Thread Michael Armbrust

Its probably because our LEFT JOIN performance isn't super great ATM since we'll use a nest loop join. Sorry! We are aware of the problem and there is a JIRA to let us do this with a HashJoin instead. If you are feeling brave you might try pulling in the related PR. https://issues.apache.org/jira/

Re: Spark throws NoSuchFieldError when testing on cluster mode

2014-06-21 Thread Peng Cheng

Thanks a lot! Let me check my maven shade plugin config and see if there is a fix -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-throws-NoSuchFieldError-when-testing-on-cluster-mode-tp8064p8073.html Sent from the Apache Spark User List mailing list ar

Re: zip in pyspark truncates RDD to number of processors

2014-06-21 Thread Kan Zhang

I couldn't reproduce your issue locally, but I suspect it has something to do with partitioning. zip() does it by partition and it assumes the two RDDs have the same number of partitions and the same number of elements in each partition. By default, map() doesn't preserve partitioning. Try set pres

Re: sc.textFile can't recognize '\004'

2014-06-21 Thread anny9699

Thanks a lot Sean! It works now for me now~~ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sc-textFile-can-t-recognize-004-tp8059p8071.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

zip in pyspark truncates RDD to number of processors

2014-06-21 Thread madeleine

Consider the following simple zip: n = 6 a = sc.parallelize(range(n)) b = sc.parallelize(range(n)).map(lambda j: j) c = a.zip(b) print a.count(), b.count(), c.count() >> 6 6 4 by varying n, I find that c.count() is always min(n,4), where 4 happens to be the number of threads on my computer. by

Re: Set the number/memory of workers under mesos

2014-06-21 Thread Mayur Rustagi

You can do that after as well, it changes application wide settings for subsequent task. On 20 Jun 2014 17:05, "Shuo Xiang" wrote: > Hi Mayur, > Are you referring to overriding the default sc in sparkshell? Is there > any way to do that before running the shell? > > > On Fri, Jun 20, 2014 at 1

Re: How to terminate job from the task code?

2014-06-21 Thread Mayur Rustagi

You can terminate job group from spark context, Youll have to send across the spark context to your task. On 21 Jun 2014 01:09, "Piotr Kołaczkowski" wrote: > If the task detects unrecoverable error, i.e. an error that we can't > expect to fix by retrying nor moving the task to another node, how

Re: Spark throws NoSuchFieldError when testing on cluster mode

2014-06-21 Thread Sean Owen

This inevitably means the run-time classpath includes a different copy of the same library/class as something in your uber jar and the different version is taking precedence. Here it's Commons HttpComponents. Where exactly it's coming from is specific to your deployment, but that's the issue. On S

Re: How do you run your spark app?

2014-06-21 Thread Gerard Maas

Hi Michael, +1 on the deployment stack. (almost) Same thing here. One question: Are you deploying the JobServer on Mesos? Through Marathon? I've been working on solving some of the port assignment issues on Mesos but I'm not there yet. Did you guys solved that? -kr, Gerard. On Thu, Jun 19,

Spark throws NoSuchFieldError when testing on cluster mode

2014-06-21 Thread Peng Cheng

I have a Spark application that runs perfectly in local mode with 8 threads, but when deployed on a single-node cluster. It gives the following error: ROR TaskSchedulerImpl: Lost executor 0 on 192.168.42.202: Uncaught exception Spark assembly has been built with Hive, including Datanucleus jars on

Re: Repeated Broadcasts

2014-06-21 Thread Daedalus

Anyone who has used this sort of construct? (Read: bump) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Repeated-Broadcasts-tp7977p8063.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Processing Large Data Stuck

Re: Powered by Spark addition

Re: Spark throws NoSuchFieldError when testing on cluster mode

Re: Powered by Spark addition

Re: Using Spark

Re: Spark Processing Large Data Stuck

Re: Spark throws NoSuchFieldError when testing on cluster mode

Re: Spark throws NoSuchFieldError when testing on cluster mode

Re: Spark Processing Large Data Stuck

Re: Spark throws NoSuchFieldError when testing on cluster mode

Spark Processing Large Data Stuck

Re: Performance problems on SQL JOIN

Re: Spark throws NoSuchFieldError when testing on cluster mode

Re: zip in pyspark truncates RDD to number of processors

Re: sc.textFile can't recognize '\004'

zip in pyspark truncates RDD to number of processors

Re: Set the number/memory of workers under mesos

Re: How to terminate job from the task code?

Re: Spark throws NoSuchFieldError when testing on cluster mode

Re: How do you run your spark app?

Spark throws NoSuchFieldError when testing on cluster mode

Re: Repeated Broadcasts

22 matches

Site Navigation

Mail list logo

Footer information