Have you tried to repartition() your original data to make more partitions
before you aggregate?
--
Martin Goodson | VP Data Science
(0)20 3397 1240
[image: Inline image 1]
On Mon, Mar 23, 2015 at 4:12 PM, Yiannis Gkoufas
wrote:
> Hi Yin,
>
> Yes, I have set spark.executor.memory
--
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
--
Martin Goodson
@martingoodson
-
-
To unsubsc
disks is not much faster than
accessing s3 across the network?
--
Martin Goodson | VP Data Science
(0)20 3397 1240
[image: Inline image 1]
On Fri, Aug 1, 2014 at 10:44 AM, Martin Goodson
wrote:
> Hi all,
> I'm consistently finding that reading from HDFS is not appreciably fa
educe/samples/spark/1.0.0/install-spark-shark.rb
and ami-version 3.1.0).
--
Martin Goodson | VP Data Science
(0)20 3397 1240
[image: Inline image 1]
billion
users per month and are second only to Google in the contextual advertising
space (ok - a distant second!).
Details here:
*http://grnh.se/rl8f25 <http://grnh.se/rl8f25>*
--
Martin Goodson | VP Data Science
(0)20 3397 1240
[image: Inline image 1]
Great - thanks for the clarification Aaron. The offer stands for me to
write some documentation and an example that covers this without leaving
*any* room for ambiguity.
--
Martin Goodson | VP Data Science
(0)20 3397 1240
[image: Inline image 1]
On Thu, Jul 24, 2014 at 6:09 PM, Aaron
Thank you Nishkam,
I have read your code. So, for the sake of my understanding, it seems that
for each spark context there is one executor per node? Can anyone confirm
this?
--
Martin Goodson | VP Data Science
(0)20 3397 1240
[image: Inline image 1]
On Thu, Jul 24, 2014 at 6:12 AM, Nishkam
GB used by Spark.)"
Am I reading this incorrectly?
Anyway our configuration is 21 machines (one master and 20 slaves) each
with 60Gb. We would like to use 4 cores per machine. This is pyspark so we
want to leave say 16Gb on each machine for python processes.
Thanks again for the advice!
--
this and the myriad of other
memory settings available (daemon memory, worker memory etc). Perhaps a
worked example could be added to the docs? I would be happy to provide some
text as soon as someone can enlighten me on the technicalities!
Thank you
--
Martin Goodson | VP Data Science
(0)20 3397 1240
[image: Inline image 1]
I am also having exactly the same problem, calling using pyspark. Has
anyone managed to get this script to work?
--
Martin Goodson | VP Data Science
(0)20 3397 1240
[image: Inline image 1]
On Wed, Jul 16, 2014 at 2:10 PM, Ian Wilkinson wrote:
> Hi,
>
> I’m trying to run the Spa
My experience is that gaining 20 spot instances accounts for a tiny
fraction of the total time of provisioning a cluster with spark-ec2. This
is not (solely) an AWS issue.
--
Martin Goodson | VP Data Science
(0)20 3397 1240
[image: Inline image 1]
On Thu, Jun 26, 2014 at 10:14 PM, Nicholas
tion at Sony.
Thanks to Skimlinks <http://skimlinks.com/> for the beer and food!
--
Martin Goodson
@martingoodson
-
How about London?
--
Martin Goodson | VP Data Science
(0)20 3397 1240
[image: Inline image 1]
On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski wrote:
> Hi folks,
>
> We have seen a lot of community growth outside of the Bay Area and we are
> looking to help spur even more!
>
13 matches
Mail list logo