Re: Why do I see five attempts on my Spark application

2017-12-13 Thread sanat kumar Patnaik
It should be within your yarn-site.xml config file.The parameter name is yarn.resourcemanager.am.max-attempts. The directory should be /usr/lib/spark/conf/yarn-conf. Try to find this directory on your gateway node if using Cloudera distribution. On Wed, Dec 13, 2017 at 2:33 PM, Subhash Sriram wr

Databricks Certification Registration

2017-10-24 Thread sanat kumar Patnaik
Hello All, Can anybody here please provide me a link to register for Databricks Spark developer certification(US based). I have been googling but always end up with this page at end: http://www.oreilly.com/data/sparkcert.html?cmp=ex-data-confreg-lp-na_databricks&__hssc=249029528.5.1508846982378&_

Re: Executors - running out of memory

2017-01-19 Thread sanat kumar Patnaik
Please try and play with spark-defaults.conf for EMR. Dynamic allocation = true is there by default for EMR 4.4 and above. What is the EMR version you are using? http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#d0e20458 On Thu, Jan 19, 2017 at 5:02 PM, Venkata D wrote:

Re: Efficiently write a Dataframe to Text file(Spark Version 1.6.1)

2016-09-14 Thread sanat kumar Patnaik
e compression, avoid repartitioning (to avoid > network transfer), avoid spilling to disk (provide memory in yarn etc), > increase network bandwidth ... > > On 14 Sep 2016, at 14:22, sanat kumar Patnaik > wrote: > > These are not csv files, utf8 files with a specific delimiter. > I

Re: Efficiently write a Dataframe to Text file(Spark Version 1.6.1)

2016-09-14 Thread sanat kumar Patnaik
ay arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 14 September 2016 at 12:46, sanat kumar Patnaik < > patnaik.sa...@

Efficiently write a Dataframe to Text file(Spark Version 1.6.1)

2016-09-14 Thread sanat kumar Patnaik
Hi All, - I am writing a batch application using Spark SQL and Dataframes. This application has a bunch of file joins and there are intermediate points where I need to drop a file for downstream applications to consume. - The problem is all these downstream applications are still on l