Re: Standalone cluster node utilization

2016-07-14 Thread Jakub Stransky
uot;SELECT statmement ... Condition = '$Condition'""".stripMargin) } else { df_init }).repartition(Configuration.appPartitioning) df.persist() Seems that none of those actually work as expected. It seems that I cannot distribute the data across the cluster.

Re: Standalone cluster node utilization

2016-07-14 Thread Jakub Stransky
0.021t 6836 S* 676.7 79.4* 40:08.61 java Thanks Jakub On 14 July 2016 at 19:22, Jakub Stransky wrote: > HI Talebzadeh, > > we are using 6 worker machines - running. > > We are reading the data through sqlContext (data frame) as it is suggested > in the documentation over th

Re: Standalone cluster node utilization

2016-07-14 Thread Jakub Stransky
ordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be lia

Standalone cluster node utilization

2016-07-14 Thread Jakub Stransky
Hello, I have a spark cluster running in a single mode, master + 6 executors. My application is reading a data from database via DataFrame.read then there is a filtering of rows. After that I re-partition data and I wonder why on the executors page of the driver UI I see RDD blocks all allocated

Re: Spark application doesn't scale to worker nodes

2016-07-05 Thread Jakub Stransky
andalone that was set in >> conf/spark-defaults.conf perhaps. >> >> >> Pozdrawiam, >> Jacek Laskowski >> >> https://medium.com/@jaceklaskowski/ >> Mastering Apache Spark http://bit.ly/mastering-apache-spark >> Follow me at https://twitter.co

Re: Spark application doesn't scale to worker nodes

2016-07-05 Thread Jakub Stransky
Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > On Tue, Jul 5, 2016 at 12:04 PM, Jakub Stransky > wrote: > >> Hel

Standalone mode resource allocation questions

2016-07-05 Thread Jakub Stransky
Hello, I went through Spark documentation and several posts from Cloudera etc and as my background is heavily on Hadoop/YARN there is a little confusion still there. Could someone more experienced clarify please? What I am trying to achieve: - Running cluster in standalone mode version 1.6.1 Qu

Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Jakub Stransky
sclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages a

Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Jakub Stransky
d all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss

Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Jakub Stransky
own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such los

Spark application doesn't scale to worker nodes

2016-07-04 Thread Jakub Stransky
Hello, I have a spark cluster consisting of 4 nodes in a standalone mode, master + 3 workers nodes with configured available memory and cpus etc. I have an spark application which is essentially a MLlib pipeline for training a classifier, in this case RandomForest but could be a DecesionTree jus

Write RDD to Elasticsearch

2016-03-24 Thread Jakub Stransky
Hi, I am trying to write JavaPairRDD into elasticsearch 1.7 using spark 1.2.1 using elasticsearch-hadoop 2.0.2 JavaPairRDD output = ... final JobConf jc = new JobConf(output.context().hadoopConfiguration()); jc.set("mapred.output.format.class", "org.elasticsearch.hadoop.mr.EsOutp