Hi -- Notice the additional "y" in red (as Mich mentioned)
pyspark --conf queue=default --conf executory-memory=24G
On Thu, May 20, 2021 at 12:02 PM Clay McDonald <
stuart.mcdon...@bateswhite.com> wrote:
> How so?
>
>
>
> *From:* Mich Talebzadeh
> *Sent:* Wednesday, May 19, 2021 5:45 PM
> *To:*
How so?
From: Mich Talebzadeh
Sent: Wednesday, May 19, 2021 5:45 PM
To: Clay McDonald
Cc: user@spark.apache.org
Subject: Re: PySpark Write File Container exited with a non-zero exit code 143
*** EXTERNAL EMAIL ***
Hi Clay,
Those parameters you are passing are not valid
pyspark --conf qu
Hi Clay,
Those parameters you are passing are not valid
pyspark --conf queue=default --conf executory-memory=24G
Python 3.7.3 (default, Apr 3 2021, 20:42:31)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Warning: Ignoring
Hello all,
I'm hoping someone can give me some direction for troubleshooting this issue,
I'm trying to write from Spark on an HortonWorks(Cloudera) HDP cluster. I ssh
directly to the first datanode and run PySpark with the following command;
however, it is always failing no matter what size I s
Thanks Sean. Your are right! Yes it works when replacing the bintray repo with
repos.spark-packages.org.
On Wednesday, May 19, 2021, 02:03:14 PM EDT, Sean Owen
wrote:
I think it's because the bintray repo has gone away. Did you see the recent
email about the new repo for these pack
I think it's because the bintray repo has gone away. Did you see the recent
email about the new repo for these packages?
On Wed, May 19, 2021 at 12:42 PM Wensheng Deng
wrote:
> Hi experts:
>
> I tried the example as shown on this page, and it is not working for me:
> https://spark-packages.org/p
Hi experts:
I tried the example as shown on this page, and it is not working for
me:https://spark-packages.org/package/graphframes/graphframes
Please advise how to proceed. I also tried to unzip the zip file, ran 'sbt
assembly', and got an error of 'sbt-spark-package;0.2.6: not found'. Is there
Hello,
I have got the following first testing setup:
Kubernetes Cluster 1.20 (4 nodes, each node with 120 GB hard disk, 4 cpus, 40
GB memory)
Spark installation by Binami Helm Charts
https://artifacthub.io/packages/helm/bitnami/spark (Chart Version 5.4.2 / Spark
3.1.1)
using GeoSpark versi
Hi Kushagra
I still think this is a bad idea. By definition data in a dataframe or rdd
is unordered, you are imposing an order where there is none, and if it
works it will be by chance. For example a simple repartition may disrupt
the row ordering. It is just too unpredictable.
I would suggest y
That generation of row_number() has to be performed through a window call
and I don't think there is any way around it without orderBy()
df1 =
df1.select(F.row_number().over(Window.partitionBy().orderBy(df1['amount_6m'])).alias("row_num"),"amount_6m")
The problem is that without partitionBy() cla
Hi Kushagra,
I believe you are referring to this warning below
WARN window.WindowExec: No Partition Defined for Window operation! Moving
all data to a single partition, this can cause serious performance
degradation.
I don't know an easy way around it. If the operation is only once you may
be
11 matches
Mail list logo