Re: Naming files while saving a Dataframe

2021-08-12 Thread Eric Beabes
This doesn't work as given here ( https://stackoverflow.com/questions/36107581/change-output-filename-prefix-for-dataframe-write) but the answer suggests using FileOutputFormat class. Will try that. Thanks. Regards. On Sun, Jul 18, 2021 at 12:44 AM Jörn Franke wrote: > Spark heavily depends on H

Replacing BroadcastNestedLoopJoin

2021-08-12 Thread Eric Beabes
We’ve two datasets that look like this: Dataset A: App specific data that contains (among other fields): ip_address Dataset B: Location data that contains start_ip_address_int, end_ip_address_int, latitude, longitude We’re (left) joining these two datasets as: A.ip_address >= B.start_ip_address

Re: K8S submit client vs. cluster

2021-08-12 Thread Mich Talebzadeh
OK amazon not much difference compared to Google Cloud Kubernetes Engines (GKE). When I submit a job, you need a powerful compute server to submit the job. It is another host but you cannot submit from K8s cluster nodes (I am not aware if one can actually do that). Anyway you submit something lik

RE: K8S submit client vs. cluster

2021-08-12 Thread Bode, Meikel, NMA-CFD
On EKS... From: Mich Talebzadeh Sent: Donnerstag, 12. August 2021 15:47 To: Bode, Meikel, NMA-CFD Cc: user@spark.apache.org Subject: Re: K8S submit client vs. cluster Ok As I see it with PySpark even if it is submitted as cluster, it will be converted to client mode anyway Are you running t

Re: [EXTERNAL] [Marketing Mail] Reading SPARK 3.1.x generated parquet in SPARK 2.4.x

2021-08-12 Thread Gourav Sengupta
Hi Saurabh, a very big note of thanks from Gourav :) Regards, Gourav Sengupta On Thu, Aug 12, 2021 at 4:16 PM Saurabh Gulati wrote: > We had issues with this migration mainly because of changes in spark date > calendars. See >

Re: [EXTERNAL] [Marketing Mail] Reading SPARK 3.1.x generated parquet in SPARK 2.4.x

2021-08-12 Thread Saurabh Gulati
We had issues with this migration mainly because of changes in spark date calendars. See We got this working by setting the below params: ("spark.sql.legacy.parquet.datetimeReba

Re: K8S submit client vs. cluster

2021-08-12 Thread Mich Talebzadeh
Ok As I see it with PySpark even if it is submitted as cluster, it will be converted to client mode anyway Are you running this on AWS or GCP? view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibil

RE: K8S submit client vs. cluster

2021-08-12 Thread Bode, Meikel, NMA-CFD
Hi Mich, All PySpark. Best, Meikel From: Mich Talebzadeh Sent: Donnerstag, 12. August 2021 13:41 To: Bode, Meikel, NMA-CFD Cc: user@spark.apache.org Subject: Re: K8S submit client vs. cluster Is this Spark or PySpark? [https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9

Re: K8S submit client vs. cluster

2021-08-12 Thread Mich Talebzadeh
Is this Spark or PySpark? view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's tec

K8S submit client vs. cluster

2021-08-12 Thread Bode, Meikel, NMA-CFD
Hi all, If we schedule a spark job on k8s, how are volume mappings handled? In client mode I would expect that drivers volumes have to mapped manually in the pod template. Executor volumes are attached dynamically based on submit parameters. Right...? I cluster mode I would expect that volumes