Re: Merge two dataframes

2021-05-19 Thread Mich Talebzadeh
Hi Kushagra, I believe you are referring to this warning below WARN window.WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation. I don't know an easy way around it. If the operation is only once you may be

Re: Merge two dataframes

2021-05-19 Thread Mich Talebzadeh
That generation of row_number() has to be performed through a window call and I don't think there is any way around it without orderBy() df1 = df1.select(F.row_number().over(Window.partitionBy().orderBy(df1['amount_6m'])).alias("row_num"),"amount_6m") The problem is that without partitionBy() cla

Re: Merge two dataframes

2021-05-19 Thread ayan guha
Hi Kushagra I still think this is a bad idea. By definition data in a dataframe or rdd is unordered, you are imposing an order where there is none, and if it works it will be by chance. For example a simple repartition may disrupt the row ordering. It is just too unpredictable. I would suggest y

Spark Executor dies in K8 cluster

2021-05-19 Thread Philipp Kraus
Hello, I have got the following first testing setup: Kubernetes Cluster 1.20 (4 nodes, each node with 120 GB hard disk, 4 cpus, 40 GB memory) Spark installation by Binami Helm Charts https://artifacthub.io/packages/helm/bitnami/spark (Chart Version 5.4.2 / Spark 3.1.1) using GeoSpark versi

unresolved dependency: graphframes#graphframes;0.8.1-spark2.4-s_2.11: not found

2021-05-19 Thread Wensheng Deng
Hi experts: I tried the example as shown on this page, and it is not working for me:https://spark-packages.org/package/graphframes/graphframes  Please advise how to proceed. I also tried to unzip the zip file, ran 'sbt assembly', and got an error of 'sbt-spark-package;0.2.6: not found'. Is there

Re: unresolved dependency: graphframes#graphframes;0.8.1-spark2.4-s_2.11: not found

2021-05-19 Thread Sean Owen
I think it's because the bintray repo has gone away. Did you see the recent email about the new repo for these packages? On Wed, May 19, 2021 at 12:42 PM Wensheng Deng wrote: > Hi experts: > > I tried the example as shown on this page, and it is not working for me: > https://spark-packages.org/p

Re: unresolved dependency: graphframes#graphframes;0.8.1-spark2.4-s_2.11: not found

2021-05-19 Thread Wensheng Deng
Thanks Sean. Your are right! Yes it works when replacing the bintray repo with repos.spark-packages.org.   On Wednesday, May 19, 2021, 02:03:14 PM EDT, Sean Owen wrote: I think it's because the bintray repo has gone away. Did you see the recent email about the new repo for these pack

PySpark Write File Container exited with a non-zero exit code 143

2021-05-19 Thread Clay McDonald
Hello all, I'm hoping someone can give me some direction for troubleshooting this issue, I'm trying to write from Spark on an HortonWorks(Cloudera) HDP cluster. I ssh directly to the first datanode and run PySpark with the following command; however, it is always failing no matter what size I s

Re: PySpark Write File Container exited with a non-zero exit code 143

2021-05-19 Thread Mich Talebzadeh
Hi Clay, Those parameters you are passing are not valid pyspark --conf queue=default --conf executory-memory=24G Python 3.7.3 (default, Apr 3 2021, 20:42:31) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux Type "help", "copyright", "credits" or "license" for more information. Warning: Ignoring

RE: PySpark Write File Container exited with a non-zero exit code 143

2021-05-19 Thread Clay McDonald
How so? From: Mich Talebzadeh Sent: Wednesday, May 19, 2021 5:45 PM To: Clay McDonald Cc: user@spark.apache.org Subject: Re: PySpark Write File Container exited with a non-zero exit code 143 *** EXTERNAL EMAIL *** Hi Clay, Those parameters you are passing are not valid pyspark --conf qu

Re: PySpark Write File Container exited with a non-zero exit code 143

2021-05-19 Thread ayan guha
Hi -- Notice the additional "y" in red (as Mich mentioned) pyspark --conf queue=default --conf executory-memory=24G On Thu, May 20, 2021 at 12:02 PM Clay McDonald < stuart.mcdon...@bateswhite.com> wrote: > How so? > > > > *From:* Mich Talebzadeh > *Sent:* Wednesday, May 19, 2021 5:45 PM > *To:*