Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-23 Thread Martin Goodson
Have you tried to repartition() your original data to make more partitions before you aggregate? -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Mon, Mar 23, 2015 at 4:12 PM, Yiannis Gkoufas wrote: > Hi Yin, > > Yes, I have set spark.executor.memory

Re: Avoid broacasting huge variables

2014-09-20 Thread Martin Goodson
-- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- -- Martin Goodson @martingoodson - - To unsubsc

Re: Reading from HDFS no faster than reading from S3 - how to tell if data locality respected?

2014-08-04 Thread Martin Goodson
disks is not much faster than accessing s3 across the network? -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Fri, Aug 1, 2014 at 10:44 AM, Martin Goodson wrote: > Hi all, > I'm consistently finding that reading from HDFS is not appreciably fa

Reading from HDFS no faster than reading from S3 - how to tell if data locality respected?

2014-08-01 Thread Martin Goodson
educe/samples/spark/1.0.0/install-spark-shark.rb and ami-version 3.1.0). -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1]

Job using Spark for Machine Learning

2014-07-29 Thread Martin Goodson
billion users per month and are second only to Google in the contextual advertising space (ok - a distant second!). Details here: *http://grnh.se/rl8f25 <http://grnh.se/rl8f25>* -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1]

Re: Configuring Spark Memory

2014-07-24 Thread Martin Goodson
Great - thanks for the clarification Aaron. The offer stands for me to write some documentation and an example that covers this without leaving *any* room for ambiguity. -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Thu, Jul 24, 2014 at 6:09 PM, Aaron

Re: Configuring Spark Memory

2014-07-24 Thread Martin Goodson
Thank you Nishkam, I have read your code. So, for the sake of my understanding, it seems that for each spark context there is one executor per node? Can anyone confirm this? -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Thu, Jul 24, 2014 at 6:12 AM, Nishkam

Re: Configuring Spark Memory

2014-07-23 Thread Martin Goodson
GB used by Spark.)" Am I reading this incorrectly? Anyway our configuration is 21 machines (one master and 20 slaves) each with 60Gb. We would like to use 4 cores per machine. This is pyspark so we want to leave say 16Gb on each machine for python processes. Thanks again for the advice! --

Configuring Spark Memory

2014-07-23 Thread Martin Goodson
this and the myriad of other memory settings available (daemon memory, worker memory etc). Perhaps a worked example could be added to the docs? I would be happy to provide some text as soon as someone can enlighten me on the technicalities! Thank you -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1]

Re: Problem running Spark shell (1.0.0) on EMR

2014-07-22 Thread Martin Goodson
I am also having exactly the same problem, calling using pyspark. Has anyone managed to get this script to work? -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Wed, Jul 16, 2014 at 2:10 PM, Ian Wilkinson wrote: > Hi, > > I’m trying to run the Spa

Re: Spark vs Google cloud dataflow

2014-06-27 Thread Martin Goodson
My experience is that gaining 20 spot instances accounts for a tiny fraction of the total time of provisioning a cluster with spark-ec2. This is not (solely) an AWS issue. -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Thu, Jun 26, 2014 at 10:14 PM, Nicholas

Fwd: New Spark Meetup Group in London, UK. First meeting 28th May

2014-05-02 Thread Martin Goodson
tion at Sony. Thanks to Skimlinks <http://skimlinks.com/> for the beer and food! -- Martin Goodson @martingoodson -

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Martin Goodson
How about London? -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski wrote: > Hi folks, > > We have seen a lot of community growth outside of the Bay Area and we are > looking to help spur even more! >