Hi Jeroen,
in case you are using HIVE partitions how many partitions do you have?
Also is there any chance that you might post the code?
Regards,
Gourav Sengupta
On Tue, Jan 2, 2018 at 7:50 AM, Jeroen Miller
wrote:
> Hello Gourav,
>
> On 30 Dec 2017, at 20:20, Gourav Sengupta
> wrote:
> > Pl
Hello Mans,
On 1 Jan 2018, at 17:12, M Singh wrote:
> I am not sure if I missed it - but can you let us know what is your input
> source and output sink ?
Reading from S3 and writing to S3.
However the never-ending task 0.0 happens in a stage way before outputting
anything to S3.
Regards,
J
Hello Gourav,
On 30 Dec 2017, at 20:20, Gourav Sengupta wrote:
> Please try to use the SPARK UI from the way that AWS EMR recommends, it
> should be available from the resource manager. I never ever had any problem
> working with it. THAT HAS ALWAYS BEEN MY PRIMARY AND SOLE SOURCE OF DEBUGGING.
Hi Jeroen:
I am not sure if I missed it - but can you let us know what is your input
source and output sink ?
In some cases, I found that saving to S3 was a problem. In this case I started
saving the output to the EMR HDFS and later copied to S3 using s3-dist-cp which
solved our issue.
Mans
Here is the list that I will probably try to fill:
1. Check GC on the offending executor when the task is running. May be
you need even more memory.
2. Go back to some previous successful run of the job and check the
spark ui for the offending stage and check max task time/max input/ma
Hi,
Please try to use the SPARK UI from the way that AWS EMR recommends, it
should be available from the resource manager. I never ever had any problem
working with it. THAT HAS ALWAYS BEEN MY PRIMARY AND SOLE SOURCE OF
DEBUGGING.
Sadly, I cannot be of much help unless we go for a screen share se
you may have to recreate your cluster with below configuration at emr
creation
"Configurations": [
{
"Properties": {
"maximizeResourceAllocation": "false"
},
"Classification": "spark"
}
]
On Fri
On 28 Dec 2017, at 19:25, Patrick Alwell wrote:
> Dynamic allocation is great; but sometimes I’ve found explicitly setting the
> num executors, cores per executor, and memory per executor to be a better
> alternative.
No difference with spark.dynamicAllocation.enabled set to false.
JM
--
Hi Jeroen,
can you try to then use the EMR version 5.10 instead or EMR version 5.11
instead?
can you please try selecting a subnet which is in a different availability
zone?
if possible just try to increase the number of task instances and see the
difference?
also in case you are using caching, tr
On 28 Dec 2017, at 19:42, Gourav Sengupta wrote:
> In the EMR cluster what are the other applications that you have enabled
> (like HIVE, FLUME, Livy, etc).
Nothing that I can think of, just a Spark step (unless EMR is doing fancy stuff
behind my back).
> Are you using SPARK Session?
Yes.
>
On 28 Dec 2017, at 19:40, Maximiliano Felice
wrote:
> I experienced a similar issue a few weeks ago. The situation was a result of
> a mix of speculative execution and OOM issues in the container.
Interesting! However I don't have any OOM exception in the logs. Does that rule
out your hypothes
HI Jeroen,
Can I get a few pieces of additional information please?
In the EMR cluster what are the other applications that you have enabled
(like HIVE, FLUME, Livy, etc).
Are you using SPARK Session? If yes is your application using cluster mode
or client mode?
Have you read the EC2 service leve
Hi Jeroen,
I experienced a similar issue a few weeks ago. The situation was a result
of a mix of speculative execution and OOM issues in the container.
First of all, when an executor takes too much time in Spark, it is handled
by the YARN speculative execution, which will launch a new executor an
Joren,
Anytime there is a shuffle in the network, Spark moves to a new stage. It seems
like you are having issues either pre or post shuffle. Have you looked at a
resource management tool like ganglia to determine if this is a memory or
thread related issue? The spark UI?
You are using groupBy
On 28 Dec 2017, at 17:41, Richard Qiao wrote:
> Are you able to specify which path of data filled up?
I can narrow it down to a bunch of files but it's not so straightforward.
> Any logs not rolled over?
I have to manually terminate the cluster but there is nothing more in the
driver's log whe
Thanks for your advice, Steve.
I'm mainly talking about application logs. To be more clear, just for
instance think about the
"//hadoop/userlogs/application_blablabla/container_blablabla/stderr_or_stdout".
So YARN's applications containers logs, stored (at least for EMR's hadoop
2.4) on DataNodes
> On 10 Dec 2015, at 14:52, Roberto Coluccio wrote:
>
> Hello,
>
> I'm investigating on a solution to real-time monitor Spark logs produced by
> my EMR cluster in order to collect statistics and trigger alarms. Being on
> EMR, I found the CloudWatch Logs + Lambda pretty straightforward and, s
ide the keys?
>
>
>
> Thank you,
>
>
>
>
>
> *From:* Sujit Pal [mailto:sujitatgt...@gmail.com]
> *Sent:* Tuesday, July 14, 2015 3:14 PM
> *To:* Pagliari, Roberto
> *Cc:* user@spark.apache.org
> *Subject:* Re: Spark on EMR with S3 example (Python)
>
>
&g
on. Do I still need to
> provide the keys?
>
>
>
> Thank you,
>
>
>
>
>
> *From:* Sujit Pal [mailto:sujitatgt...@gmail.com]
> *Sent:* Tuesday, July 14, 2015 3:14 PM
> *To:* Pagliari, Roberto
> *Cc:* user@spark.apache.org
> *Subject:* Re: Spark on EMR with S3
Hi Sujit,
I just wanted to access public datasets on Amazon. Do I still need to provide
the keys?
Thank you,
From: Sujit Pal [mailto:sujitatgt...@gmail.com]
Sent: Tuesday, July 14, 2015 3:14 PM
To: Pagliari, Roberto
Cc: user@spark.apache.org
Subject: Re: Spark on EMR with S3 example (Python
Hi Roberto,
I have written PySpark code that reads from private S3 buckets, it should
be similar for public S3 buckets as well. You need to set the AWS access
and secret keys into the SparkContext, then you can access the S3 folders
and files with their s3n:// paths. Something like this:
sc = Spa
; kamatsuoka; user
Subject: Re: Spark on EMR
Yes, for now it is a wrapper around the old install-spark BA, but that will
change soon. The currently supported version in AMI 3.8.0 is 1.3.1, as 1.4.0
was released too late to include it in AMI 3.8.0. Spark 1.4.0 support is coming
soon though, of course
currently being
used under the hood, passing "-v,1.4.0" in the options is not supported.
Sent from Nine<http://www.9folders.com/>
From: Eugen Cepoi
Sent: Jun 17, 2015 6:37 AM
To: Hideyoshi Maeda
Cc: ayan guha;kamatsuoka;user
Subject: Re: Spark on EMR
It looks like it is a wrapp
It looks like it is a wrapper around
https://github.com/awslabs/emr-bootstrap-actions/tree/master/spark
So basically adding an option -v,1.4.0.a should work.
https://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-spark-configure.html
2015-06-17 15:32 GMT+02:00 Hideyoshi Maeda :
>
Any ideas what version of Spark is underneath?
i.e. is it 1.4? and is SparkR supported on Amazon EMR?
On Wed, Jun 17, 2015 at 12:06 AM, ayan guha wrote:
> That's great news. Can I assume spark on EMR supports kinesis to hbase
> pipeline?
> On 17 Jun 2015 05:29, "kamatsuoka" wrote:
>
>> Spark i
That's great news. Can I assume spark on EMR supports kinesis to hbase
pipeline?
On 17 Jun 2015 05:29, "kamatsuoka" wrote:
> Spark is now officially supported on Amazon Elastic Map Reduce:
> http://aws.amazon.com/elasticmapreduce/details/spark/
>
>
>
> --
> View this message in context:
> http://
26 matches
Mail list logo