Re: Spark on EMR suddenly stalling

2018-01-02 Thread Gourav Sengupta
Hi Jeroen, in case you are using HIVE partitions how many partitions do you have? Also is there any chance that you might post the code? Regards, Gourav Sengupta On Tue, Jan 2, 2018 at 7:50 AM, Jeroen Miller wrote: > Hello Gourav, > > On 30 Dec 2017, at 20:20, Gourav Sengupta > wrote: > > Pl

Re: Spark on EMR suddenly stalling

2018-01-01 Thread Jeroen Miller
Hello Mans, On 1 Jan 2018, at 17:12, M Singh wrote: > I am not sure if I missed it - but can you let us know what is your input > source and output sink ? Reading from S3 and writing to S3. However the never-ending task 0.0 happens in a stage way before outputting anything to S3. Regards, J

Re: Spark on EMR suddenly stalling

2018-01-01 Thread Jeroen Miller
Hello Gourav, On 30 Dec 2017, at 20:20, Gourav Sengupta wrote: > Please try to use the SPARK UI from the way that AWS EMR recommends, it > should be available from the resource manager. I never ever had any problem > working with it. THAT HAS ALWAYS BEEN MY PRIMARY AND SOLE SOURCE OF DEBUGGING.

Re: Spark on EMR suddenly stalling

2018-01-01 Thread M Singh
Hi Jeroen: I am not sure if I missed it - but can you let us know what is your input source and output sink ?   In some cases, I found that saving to S3 was a problem. In this case I started saving the output to the EMR HDFS and later copied to S3 using s3-dist-cp which solved our issue. Mans

Re: Spark on EMR suddenly stalling

2018-01-01 Thread Rohit Karlupia
Here is the list that I will probably try to fill: 1. Check GC on the offending executor when the task is running. May be you need even more memory. 2. Go back to some previous successful run of the job and check the spark ui for the offending stage and check max task time/max input/ma

Re: Spark on EMR suddenly stalling

2017-12-30 Thread Gourav Sengupta
Hi, Please try to use the SPARK UI from the way that AWS EMR recommends, it should be available from the resource manager. I never ever had any problem working with it. THAT HAS ALWAYS BEEN MY PRIMARY AND SOLE SOURCE OF DEBUGGING. Sadly, I cannot be of much help unless we go for a screen share se

Re: Spark on EMR suddenly stalling

2017-12-29 Thread Shushant Arora
you may have to recreate your cluster with below configuration at emr creation "Configurations": [ { "Properties": { "maximizeResourceAllocation": "false" }, "Classification": "spark" } ] On Fri

Re: Spark on EMR suddenly stalling

2017-12-29 Thread Jeroen Miller
On 28 Dec 2017, at 19:25, Patrick Alwell wrote: > Dynamic allocation is great; but sometimes I’ve found explicitly setting the > num executors, cores per executor, and memory per executor to be a better > alternative. No difference with spark.dynamicAllocation.enabled set to false. JM --

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Gourav Sengupta
Hi Jeroen, can you try to then use the EMR version 5.10 instead or EMR version 5.11 instead? can you please try selecting a subnet which is in a different availability zone? if possible just try to increase the number of task instances and see the difference? also in case you are using caching, tr

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Jeroen Miller
On 28 Dec 2017, at 19:42, Gourav Sengupta wrote: > In the EMR cluster what are the other applications that you have enabled > (like HIVE, FLUME, Livy, etc). Nothing that I can think of, just a Spark step (unless EMR is doing fancy stuff behind my back). > Are you using SPARK Session? Yes. >

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Jeroen Miller
On 28 Dec 2017, at 19:40, Maximiliano Felice wrote: > I experienced a similar issue a few weeks ago. The situation was a result of > a mix of speculative execution and OOM issues in the container. Interesting! However I don't have any OOM exception in the logs. Does that rule out your hypothes

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Gourav Sengupta
HI Jeroen, Can I get a few pieces of additional information please? In the EMR cluster what are the other applications that you have enabled (like HIVE, FLUME, Livy, etc). Are you using SPARK Session? If yes is your application using cluster mode or client mode? Have you read the EC2 service leve

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Maximiliano Felice
Hi Jeroen, I experienced a similar issue a few weeks ago. The situation was a result of a mix of speculative execution and OOM issues in the container. First of all, when an executor takes too much time in Spark, it is handled by the YARN speculative execution, which will launch a new executor an

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Patrick Alwell
Joren, Anytime there is a shuffle in the network, Spark moves to a new stage. It seems like you are having issues either pre or post shuffle. Have you looked at a resource management tool like ganglia to determine if this is a memory or thread related issue? The spark UI? You are using groupBy

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Jeroen Miller
On 28 Dec 2017, at 17:41, Richard Qiao wrote: > Are you able to specify which path of data filled up? I can narrow it down to a bunch of files but it's not so straightforward. > Any logs not rolled over? I have to manually terminate the cluster but there is nothing more in the driver's log whe

Re: Spark on EMR: out-of-the-box solution for real-time application logs monitoring?

2015-12-11 Thread Roberto Coluccio
Thanks for your advice, Steve. I'm mainly talking about application logs. To be more clear, just for instance think about the "//hadoop/userlogs/application_blablabla/container_blablabla/stderr_or_stdout". So YARN's applications containers logs, stored (at least for EMR's hadoop 2.4) on DataNodes

Re: Spark on EMR: out-of-the-box solution for real-time application logs monitoring?

2015-12-10 Thread Steve Loughran
> On 10 Dec 2015, at 14:52, Roberto Coluccio wrote: > > Hello, > > I'm investigating on a solution to real-time monitor Spark logs produced by > my EMR cluster in order to collect statistics and trigger alarms. Being on > EMR, I found the CloudWatch Logs + Lambda pretty straightforward and, s

Re: Spark on EMR with S3 example (Python)

2015-07-15 Thread Sujit Pal
ide the keys? > > > > Thank you, > > > > > > *From:* Sujit Pal [mailto:sujitatgt...@gmail.com] > *Sent:* Tuesday, July 14, 2015 3:14 PM > *To:* Pagliari, Roberto > *Cc:* user@spark.apache.org > *Subject:* Re: Spark on EMR with S3 example (Python) > > &g

Re: Spark on EMR with S3 example (Python)

2015-07-14 Thread Akhil Das
on. Do I still need to > provide the keys? > > > > Thank you, > > > > > > *From:* Sujit Pal [mailto:sujitatgt...@gmail.com] > *Sent:* Tuesday, July 14, 2015 3:14 PM > *To:* Pagliari, Roberto > *Cc:* user@spark.apache.org > *Subject:* Re: Spark on EMR with S3

RE: Spark on EMR with S3 example (Python)

2015-07-14 Thread Pagliari, Roberto
Hi Sujit, I just wanted to access public datasets on Amazon. Do I still need to provide the keys? Thank you, From: Sujit Pal [mailto:sujitatgt...@gmail.com] Sent: Tuesday, July 14, 2015 3:14 PM To: Pagliari, Roberto Cc: user@spark.apache.org Subject: Re: Spark on EMR with S3 example (Python

Re: Spark on EMR with S3 example (Python)

2015-07-14 Thread Sujit Pal
Hi Roberto, I have written PySpark code that reads from private S3 buckets, it should be similar for public S3 buckets as well. You need to set the AWS access and secret keys into the SparkContext, then you can access the S3 folders and files with their s3n:// paths. Something like this: sc = Spa

Re: Spark on EMR

2015-06-19 Thread Bozeman, Christopher
; kamatsuoka; user Subject: Re: Spark on EMR Yes, for now it is a wrapper around the old install-spark BA, but that will change soon. The currently supported version in AMI 3.8.0 is 1.3.1, as 1.4.0 was released too late to include it in AMI 3.8.0. Spark 1.4.0 support is coming soon though, of course

Re: Spark on EMR

2015-06-17 Thread Kelly, Jonathan
currently being used under the hood, passing "-v,1.4.0" in the options is not supported. Sent from Nine<http://www.9folders.com/> From: Eugen Cepoi Sent: Jun 17, 2015 6:37 AM To: Hideyoshi Maeda Cc: ayan guha;kamatsuoka;user Subject: Re: Spark on EMR It looks like it is a wrapp

Re: Spark on EMR

2015-06-17 Thread Eugen Cepoi
It looks like it is a wrapper around https://github.com/awslabs/emr-bootstrap-actions/tree/master/spark So basically adding an option -v,1.4.0.a should work. https://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-spark-configure.html 2015-06-17 15:32 GMT+02:00 Hideyoshi Maeda : >

Re: Spark on EMR

2015-06-17 Thread Hideyoshi Maeda
Any ideas what version of Spark is underneath? i.e. is it 1.4? and is SparkR supported on Amazon EMR? On Wed, Jun 17, 2015 at 12:06 AM, ayan guha wrote: > That's great news. Can I assume spark on EMR supports kinesis to hbase > pipeline? > On 17 Jun 2015 05:29, "kamatsuoka" wrote: > >> Spark i

Re: Spark on EMR

2015-06-16 Thread ayan guha
That's great news. Can I assume spark on EMR supports kinesis to hbase pipeline? On 17 Jun 2015 05:29, "kamatsuoka" wrote: > Spark is now officially supported on Amazon Elastic Map Reduce: > http://aws.amazon.com/elasticmapreduce/details/spark/ > > > > -- > View this message in context: > http://