from:"Stéphane Verlet"

Re: Poor performance caused by coalesce to 1

2021-02-03 Thread Stéphane Verlet

I had that issue too and from what I gathered, it is an expected optimization... Try using repartiion instead ⁣Get BlueMail for Android On Feb 3, 2021, 11:55, at 11:55, James Yu wrote: >Hi Team, > >We are running into this poor performance issue and seeking your >suggestion on how to improve

Re: Java Rdd of String to dataframe

2017-10-12 Thread Stéphane Verlet

you can specify the schema programmatically https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema On Wed, Oct 11, 2017 at 3:35 PM, sk skk wrote: > Can we create a dataframe from a Java pair rdd of String . I don’t have a > schema as it will be a

Re: Spark job taking 10s to allocate executors and memory before submitting job

2017-09-28 Thread Stéphane Verlet

Sounds like such a small job , if you running in on a cluster have you consider simply running it locally (master = local) ? On Wed, Sep 27, 2017 at 7:06 AM, navneet sharma wrote: > Hi, > > I am running spark job taking total 18s, in that 8 seconds for actual > processing logic(business logic)

Re: Running Hive and Spark together with Dynamic Resource Allocation

2016-10-28 Thread Stéphane Verlet

This works for us yarn.nodemanager.aux-services mapreduce_shuffle,spark_shuffle yarn.nodemanager.aux-services.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.nodemanager.aux-services.spark_shuffle.class org.apache.spark.network.yarn.Yarn

Re: rdd split into new rdd

2015-12-23 Thread Stéphane Verlet

aps and generate your lists } On Wed, Dec 23, 2015 at 10:49 AM, Yasemin Kaya wrote: > How can i use mapPartion? Could u give me an example? > > 2015-12-23 17:26 GMT+02:00 Stéphane Verlet : > >> You should be able to do that using mapPartition >> >> On Wed, Dec

Re: rdd split into new rdd

2015-12-23 Thread Stéphane Verlet

You should be able to do that using mapPartition On Wed, Dec 23, 2015 at 8:24 AM, Ted Yu wrote: > bq. {a=1, b=1, c=2, d=2} > > Can you elaborate your criteria a bit more ? The above seems to be a Set, > not a Map. > > Cheers > > On Wed, Dec 23, 2015 at 7:11 AM, Yasemin Kaya wrote: > >> Hi, >> >

Re: How to kill spark applications submitted using spark-submit reliably?

2015-11-20 Thread Stéphane Verlet

illing the app in spark UI doesn't kill the process launched via script > > > On Friday, November 20, 2015, Stéphane Verlet > wrote: > >> I solved the first issue by adding a shutdown hook in my code. The >> shutdown hook get call when you exit your script (

Re: How to kill spark applications submitted using spark-submit reliably?

2015-11-20 Thread Stéphane Verlet

I solved the first issue by adding a shutdown hook in my code. The shutdown hook get call when you exit your script (ctrl-C , kill … but nor kill -9) val shutdownHook = scala.sys.addShutdownHook { try { sparkContext.stop() //Make sure to kill any other threads or thread pool you may be ru

Re: PairRDD from SQL

2015-11-04 Thread Stéphane Verlet

sqlContext.sql().map(row=> ((row.getString(0), row.getString(1)),row.getInt(2))) On Wed, Nov 4, 2015 at 1:44 PM, pratik khadloya wrote: > Hello, > > Is it possible to have a pair RDD from the below SQL query. > The pair being ((item_id, flight_id), metric1) > > item_id, flight_id are part of gr

Re: custom join using complex keys

2015-05-09 Thread Stéphane Verlet

Create a custom key class implement the equals methods and make sure the hash method is compatible. Use that key to map and join your row. On Sat, May 9, 2015 at 4:02 PM, Mathieu D wrote: > Hi folks, > > I need to join RDDs having composite keys like this : (K1, K2 ... Kn). > > The joining ru

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-10 Thread Stéphane Verlet

>From your pseudo code, it would be sequential and done twice 1+2+3 then 1+2+4 If you do a .cache() in step 2 then you would have 1+2+3 , then 4 I ran several steps in parrallel from the same program but never using the same source RDD so I do not know the limitations there. I simply started

Re: Spark 1.1.0 Can not read snappy compressed sequence file

2014-12-04 Thread Stéphane Verlet

Yes , It is working with this in spark-env.sh export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:$HADOOP_HOME/lib/native export SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_HO

Re: SQL query in scala API

2014-12-04 Thread Stéphane Verlet

Disclaimer : I am new at Spark I did something similar in a prototype which works but I that did not test at scale yet val agg =3D users.mapValues(_ =3D> 1)..aggregateByKey(new CustomAggregation())(CustomAggregation.sequenceOp, CustomAggregation.comboO= p) class CustomAggregation() extends

Spark 1.1.0 Can not read snappy compressed sequence file

2014-11-07 Thread Stéphane Verlet

I first saw this using SparkSQL but the result is the same with plain Spark. 14/11/07 19:46:36 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1) java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildS

Re: Poor performance caused by coalesce to 1

Re: Java Rdd of String to dataframe

Re: Spark job taking 10s to allocate executors and memory before submitting job

Re: Running Hive and Spark together with Dynamic Resource Allocation

Re: rdd split into new rdd

Re: rdd split into new rdd

Re: How to kill spark applications submitted using spark-submit reliably?

Re: How to kill spark applications submitted using spark-submit reliably?

Re: PairRDD from SQL

Re: custom join using complex keys

Re: Does Spark automatically run different stages concurrently when possible?

Re: Spark 1.1.0 Can not read snappy compressed sequence file

Re: SQL query in scala API

Spark 1.1.0 Can not read snappy compressed sequence file

14 matches

Site Navigation

Mail list logo

Footer information