from:"Joe Wass"

Is anyone using Amazon EC2?

2015-05-23 Thread Joe Wass

I used Spark on EC2 a while ago

Is anyone using Amazon EC2? (second attempt!)

2015-05-23 Thread Joe Wass

I used Spark on EC2 a while ago, but recent revisions seem to have broken the functionality. Is anyone actually using Spark on EC2 at the moment? The bug in question is: https://issues.apache.org/jira/browse/SPARK-5008 It makes it impossible to use persistent HDFS without a workround on each sl

Re: Is anyone using Amazon EC2?

2015-05-23 Thread Joe Wass

;Johan Beisser" wrote: > >> Yes. >> >> We're looking at bootstrapping in EMR... >> On Sat, May 23, 2015 at 07:21 Joe Wass wrote: >> >>> I used Spark on EC2 a while ago >>> >>

Re: Spark dramatically slow when I add "saveAsTextFile"

2015-05-24 Thread Joe Wass

This may sound like an obvious question, but are you sure that the program is doing any work when you don't have a saveAsTextFile? If there are transformations but no actions to actually collect the data, there's no need for Spark to execute the transformations. As to the question of 'is this taki

ephemeral-hdfs vs persistent-hdfs - performance

2015-02-03 Thread Joe Wass

I want to process about 800 GB of data on an Amazon EC2 cluster. So, I need to store the input in HDFS somehow. I currently have a cluster of 5 x m3.xlarge, each of which has 80GB disk. Each HDFS node reports 73 GB, and the total capacity is ~370 GB. If I want to process 800 GB of data (assuming

Re: ephemeral-hdfs vs persistent-hdfs - performance

2015-02-03 Thread Joe Wass

nk the > size of the cluster needed to process the data from the size of the data. > > DR > > > On 02/03/2015 11:43 AM, Joe Wass wrote: > >> I want to process about 800 GB of data on an Amazon EC2 cluster. So, I >> need >> to store the input in HDFS somehow. &

Kyro serialization and OOM

2015-02-03 Thread Joe Wass

I have about 500 MB of data and I'm trying to process it on a single `local` instance. I'm getting an Out of Memory exception. Stack trace at the end. Spark 1.1.1 My JVM has --Xmx2g spark.driver.memory = 1000M spark.executor.memory = 1000M spark.kryoserializer.buffer.mb = 256 spark.kryoserializer

Re: ephemeral-hdfs vs persistent-hdfs - performance

2015-02-03 Thread Joe Wass

output from HDFS back to S3. Works for us ... YMMV. > :-) > > HTH, > > DR > > > On 02/03/2015 12:32 PM, Joe Wass wrote: > >> The data is coming from S3 in the first place, and the results will be >> uploaded back there. But even in the same availability

How many stages in my application?

2015-02-04 Thread Joe Wass

I'm sitting here looking at my application crunching gigabytes of data on a cluster and I have no idea if it's an hour away from completion or a minute. The web UI shows progress through each stage, but not how many stages remaining. How can I work out how many stages my program will take automatic

Re: How many stages in my application?

2015-02-05 Thread Joe Wass

on and then from the >> webui you will be able to see how many operations have happened so far. >> >> Thanks >> Best Regards >> >> On Wed, Feb 4, 2015 at 4:33 PM, Joe Wass wrote: >> >>> I'm sitting here looking at my application crunching gi

How do I set spark.local.dirs?

2015-02-06 Thread Joe Wass

I'm running on EC2 and I want to set the directory to use on the slaves (mounted EBS volumes). I have set: spark.local.dir /vol3/my-spark-dir in /root/spark/conf/spark-defaults.conf and replicated to all nodes. I have verified that in the console the value in the config corresponds. I have

Has Spark 1.2.0 changed EC2 persistent-hdfs?

2015-02-13 Thread Joe Wass

I've updated to Spark 1.2.0 and the EC2 and the persistent-hdfs behaviour appears to have changed. My launch script is spark-1.2.0-bin-hadoop2.4/ec2/spark-ec2 --instance-type=m3.xlarge -s 5 --ebs-vol-size=1000 launch myproject When I ssh into master I get: $ df -h FilesystemSize Us

Re: Has Spark 1.2.0 changed EC2 persistent-hdfs?

2015-02-13 Thread Joe Wass

Looks like this is caused by issue SPARK-5008: https://issues.apache.org/jira/browse/SPARK-5008 On 13 February 2015 at 19:04, Joe Wass wrote: > I've updated to Spark 1.2.0 and the EC2 and the persistent-hdfs behaviour > appears to have changed. > > My launch script is &g

Unzipping large files and 2GB partition size.

2015-02-19 Thread Joe Wass

On the advice of some recent discussions on this list, I thought I would try and consume gz files directly. I'm reading them, doing a preliminary map, then repartitioning, then doing normal spark things. As I understand it, zip files aren't readable in partitions because of the format, so I though

Re: Unzipping large files and 2GB partition size.

2015-02-19 Thread Joe Wass

big your partitions are. The > question is, when does this happen? what operation? > > On Thu, Feb 19, 2015 at 9:35 AM, Joe Wass wrote: > > On the advice of some recent discussions on this list, I thought I would > try > > and consume gz files directly. I'm reading

Re: Unzipping large files and 2GB partition size.

2015-02-19 Thread Joe Wass

t; repartitionedData.count() > ... > > > The point is, you avoid caching any data until you have ensured that the > partitions are small. You might have big partitions before that in > rawData, but that is OK. > > Imran > > > On Thu, Feb 19, 2015 at 4:43 AM, Joe

Running out of space (when there's no shortage)

2015-02-24 Thread Joe Wass

I'm running a cluster of 3 Amazon EC2 machines (small number because it's expensive when experiments keep crashing after a day!). Today's crash looks like this (stacktrace at end of message). org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 On my thr

Re: Running out of space (when there's no shortage)

2015-02-24 Thread Joe Wass

FDs (resource leak). In this case, Linux >>> holds all space allocated and does not release it until application >>> exits (crashes in your case). You check file system and everything is >>> normal, you have enough space and you have no idea why does application >

Are failures normal / to be expected on an AWS cluster?

2014-12-20 Thread Joe Wass

I have a Spark job running on about 300 GB of log files, on Amazon EC2, with 10 x Large instances (each with 600 GB disk). The job hasn't yet completed. So far, 18 stages have completed (2 of which have retries) and 3 stages have failed. In each failed stage there are ~160 successful tasks, but "C

PermGen issues on AWS

2015-01-09 Thread Joe Wass

I'm running on an AWS cluster of 10 x m1.large (64 bit, 7.5 GiB RAM). FWIW I'm using the Flambo Clojure wrapper which uses the Java API but I don't think that should make any difference. I'm running with the following command: spark/bin/spark-submit --class mything.core --name "My Thing" --conf sp

Re: PermGen issues on AWS

2015-01-09 Thread Joe Wass

no longer has a permanent generation, so this particular > type of problem and tuning is not needed. You might consider running > on Java 8. > > On Fri, Jan 9, 2015 at 10:38 AM, Joe Wass wrote: > > I'm running on an AWS cluster of 10 x m1.large (64 bit, 7.5 GiB RAM). > FW

Accidental kill in UI

2015-01-09 Thread Joe Wass

So I had a Spark job with various failures, and I decided to kill it and start again. I clicked the 'kill' link in the web console, restarted the job on the command line and headed back to the web console and refreshed to see how my job was doing... the URL at the time was: /stages/stage/kill?id=1

Is anyone using Amazon EC2?

Is anyone using Amazon EC2? (second attempt!)

Re: Is anyone using Amazon EC2?

Re: Spark dramatically slow when I add "saveAsTextFile"

ephemeral-hdfs vs persistent-hdfs - performance

Re: ephemeral-hdfs vs persistent-hdfs - performance

Kyro serialization and OOM

Re: ephemeral-hdfs vs persistent-hdfs - performance

How many stages in my application?

Re: How many stages in my application?

How do I set spark.local.dirs?

Has Spark 1.2.0 changed EC2 persistent-hdfs?

Re: Has Spark 1.2.0 changed EC2 persistent-hdfs?

Unzipping large files and 2GB partition size.

Re: Unzipping large files and 2GB partition size.

Re: Unzipping large files and 2GB partition size.

Running out of space (when there's no shortage)

Re: Running out of space (when there's no shortage)

Are failures normal / to be expected on an AWS cluster?

PermGen issues on AWS

Re: PermGen issues on AWS

Accidental kill in UI

22 matches

Site Navigation

Mail list logo

Footer information