uot;SELECT statmement ... Condition = '$Condition'""".stripMargin)
} else {
df_init
}).repartition(Configuration.appPartitioning)
df.persist()
Seems that none of those actually work as expected. It seems that I cannot
distribute the data across the cluster.
0.021t 6836 S* 676.7 79.4* 40:08.61 java
Thanks
Jakub
On 14 July 2016 at 19:22, Jakub Stransky wrote:
> HI Talebzadeh,
>
> we are using 6 worker machines - running.
>
> We are reading the data through sqlContext (data frame) as it is suggested
> in the documentation over th
ordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be lia
Hello,
I have a spark cluster running in a single mode, master + 6 executors.
My application is reading a data from database via DataFrame.read then
there is a filtering of rows. After that I re-partition data and I wonder
why on the executors page of the driver UI I see RDD blocks all allocated
andalone that was set in
>> conf/spark-defaults.conf perhaps.
>>
>>
>> Pozdrawiam,
>> Jacek Laskowski
>>
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.co
Pozdrawiam,
> Jacek Laskowski
>
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> On Tue, Jul 5, 2016 at 12:04 PM, Jakub Stransky
> wrote:
>
>> Hel
Hello,
I went through Spark documentation and several posts from Cloudera etc and
as my background is heavily on Hadoop/YARN there is a little confusion
still there. Could someone more experienced clarify please?
What I am trying to achieve:
- Running cluster in standalone mode version 1.6.1
Qu
sclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages a
d all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss
own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such los
Hello,
I have a spark cluster consisting of 4 nodes in a standalone mode, master +
3 workers nodes with configured available memory and cpus etc.
I have an spark application which is essentially a MLlib pipeline for
training a classifier, in this case RandomForest but could be a
DecesionTree jus
Hi,
I am trying to write JavaPairRDD into elasticsearch 1.7 using spark 1.2.1
using elasticsearch-hadoop 2.0.2
JavaPairRDD output = ...
final JobConf jc = new JobConf(output.context().hadoopConfiguration());
jc.set("mapred.output.format.class",
"org.elasticsearch.hadoop.mr.EsOutp
12 matches
Mail list logo