Help requested: Spark security triage and followup

2025-05-09 Thread Apache Security Team
Dear Spark users and developers, As you know, the Apache Software Foundation takes our users' security seriously, and defines sensible release and security processes to make sure potential security issues are dealt with responsibly. These indirectly also protect our committers, shielding individua

Re: Help choose a GraphFrames logo

2025-01-31 Thread Mich Talebzadeh
; *Cc:* Ángel ; Russell Jurney < > rjur...@graphlet.ai>; Denny Lee ; user < > user@spark.apache.org>; graphfra...@googlegroups.com < > graphfra...@googlegroups.com> > *Subject:* Re: Help choose a GraphFrames logo > > I went with this one, it is the most original, although

Re: Help choose a GraphFrames logo

2025-01-18 Thread Matei Zaharia
It looks great to me! > On Jan 17, 2025, at 8:56 PM, Felix Cheung wrote: > > Nice > From: Russell Jurney > Sent: Friday, January 17, 2025 2:46:14 PM > To: Mich Talebzadeh > Cc: Ángel ; Russell Jurney > ; Denny Lee ; user > ; graphfra...@googlegroups.com >

Re: Help choose a GraphFrames logo

2025-01-15 Thread Denny Lee
Thanks Russell, just wanted to give a shout out that this is really cool :) On Wed, Jan 15, 2025 at 1:13 AM Russell Jurney wrote: > GraphFrames needs a logo, so I created a 99designs contest to create one. > There are six finalists. Please vote for the one you like the most :) > > https://99desi

Help choose a GraphFrames logo

2025-01-15 Thread Russell Jurney
GraphFrames needs a logo, so I created a 99designs contest to create one. There are six finalists. Please vote for the one you like the most :) https://99designs.com/contests/poll/c00e5edaf5 Thanks, Russell Jurney | rjur...@graphlet.ai | graphlet.ai | Graphlet AI Blog

Help needed in Spark Job.

2024-10-24 Thread Khushal Manish
Hi Spark Users, I am facing an issue while running a spark job in aws glue. The whole case is on this link <https://stackoverflow.com/questions/79099329/hive-partition-schema-mismatch-there-is-a-mismatch-between-the-table-and-partit> please help me to figure it out what is the issue w

Re: Help - Learning/Understanding spark web UI

2024-09-26 Thread Daniel Aronovic
Hey Karthick, The best way to deepen your understanding is by using the Spark Web UI as much as possible while learning the fundamentals of Spark. To help ease the learning curve, I recommend trying an open-source project called *Dataflint*. It adds an extra tab to the Spark Web UI and presents

Re: Help - Learning/Understanding spark web UI

2024-09-26 Thread Ilango
Hi Karthick, I found one of the spark summit talk few years back on spark UI was quite useful. Just search in youtube. let me Check it out and will share it with you if i found it again Thanks, Elango On Thu, 26 Sep 2024 at 4:04 PM, Karthick Nk wrote: > Hi All, > I am looking to deepen my und

Help - Learning/Understanding spark web UI

2024-09-26 Thread Karthick Nk
Hi All, I am looking to deepen my understanding of the Spark Web UI. Could anyone recommend some useful materials, online courses, or share how you learned about it? I've already reviewed the official Spark Web UI documentation, but it only covers the basics. Note: I am using azure databricks for

Re: Need help understanding tuning docs

2024-08-14 Thread Subhasis Mukherjee
where storage will always have more priority than execution and will never be released to execution. Regards, Subhasis Mukherjee From: Sreyan Chakravarty Sent: Wednesday, August 14, 2024 9:00:45 PM To: user@spark.apache.org Subject: Need help understanding tuning

Need help understanding tuning docs

2024-08-14 Thread Sreyan Chakravarty
https://spark.apache.org/docs/latest/tuning.html#memory-management-overview What is the meaning of : "Execution may evict storage if necessary, but only until total storage memory usage falls under a certain threshold (R). In other words, R describes a subregion within M where cached blocks are ne

Re: Help wanted on securing spark with Apache Knox / JWT

2024-07-12 Thread Adam Binford
wrote: > Hello, > I am sending this email to the mailing list, to get your help on a problem > that I can't seem to resolve myself. > > I am trying to secure Spark history ui running with Yarn as master using > Apache Knox. > > From the Knox configuration point of

Help wanted on securing spark with Apache Knox / JWT

2024-07-11 Thread Thomas Mauran
Hello, I am sending this email to the mailing list, to get your help on a problem that I can't seem to resolve myself. I am trying to secure Spark history ui running with Yarn as master using Apache Knox. >From the Knox configuration point of view I managed to secure the Spark

Need help to confirm vulnerable issue

2024-07-04 Thread Will.Qin
into our firm, but due to the vulnerable issue, we can't. Could you help us confirm whether this problem will affect the above version of spark docker image and pyspark lib? We need a release note / security bulletin to confirm this. Thank you for your assistance. Regards Wil

Help Needed: Distributed Logging in Spark Application

2024-06-24 Thread Zsuzsanna D
Hi, In my Spark application, I need to log data row by row directly from the executors to avoid overwhelming the driver's memory, which is already going places. I am exploring the possibility of implementing a distributed logging strategy where each executor logs its output directly, rather than

Re: Help in understanding Exchange in Spark UI

2024-06-20 Thread Mich Talebzadeh
OK, I gave an answer in StackOverflow. Happy reading Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime PhD Imperial College London London, United Kingdom

Help in understanding Exchange in Spark UI

2024-06-20 Thread Dhruv Singla
Hey Team I've posted a question of StackOverflow. The link is - https://stackoverflow.com/questions/78644118/understanding-exchange-in-spark-ui I haven't got any responses yet. If possible could you please look into it? If you need me to write the question in the mailing list, I can do that as we

Help needed optimize spark history server performance

2024-05-03 Thread Vikas Tharyani
rformance of our SHS and prevent these timeouts. Here are some areas we're particularly interested in exploring: - Are there additional configuration options we should consider for handling large event logs? - Could Nginx configuration adjustments help with timeouts? - Are

help needed with SPARK-45598 and SPARK-45769

2023-11-09 Thread Maksym M
Greetings, tl;dr there must have been a regression in spark *connect*'s ability to retrieve data, more details in linked issues https://issues.apache.org/jira/browse/SPARK-45598 https://issues.apache.org/jira/browse/SPARK-45769 we have projects that depend on spark connect 3.5 and we'd apprec

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-04-01 Thread Mich Talebzadeh
Apr 2023 at 13:12, Khalid Mammadov wrote: > Hey AN-TRUONG > > I have got some articles about this subject that should help. > E.g. > https://khalidmammadov.github.io/spark/spark_internals_rdd.html > > Also check other Spark Internals on web. > > Regards > Khalid >

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-04-01 Thread Khalid Mammadov
Hey AN-TRUONG I have got some articles about this subject that should help. E.g. https://khalidmammadov.github.io/spark/spark_internals_rdd.html Also check other Spark Internals on web. Regards Khalid On Fri, 31 Mar 2023, 16:29 AN-TRUONG Tran Phan, wrote: > Thank you for your informat

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread Mich Talebzadeh
yes history refers to completed jobs. 4040 is the running jobs you should have screen shots for executors and stages as well. HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread AN-TRUONG Tran Phan
Thank you for your information, I have tracked the spark history server on port 18080 and the spark UI on port 4040. I see the result of these two tools as similar right? I want to know what each Task ID (Example Task ID 0, 1, 3, 4, 5, ) in the images does, is it possible? https://i.stack.img

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread Mich Talebzadeh
Are you familiar with spark GUI default on port 4040? have a look. HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

Re: Kind help request

2023-03-25 Thread Sean Owen
It is telling you that the UI can't bind to any port. I presume that's because of container restrictions? If you don't want the UI at all, just set spark.ui.enabled to false On Sat, Mar 25, 2023 at 8:28 AM Lorenzo Ferrando < lorenzo.ferra...@edu.unige.it> wrote: > Dear Spark team, > > I am Lorenz

Kind help request

2023-03-25 Thread Lorenzo Ferrando
Dear Spark team, I am Lorenzo from University of Genoa. I am currently using (ubuntu 18.04) the nextflow/sarek pipeline to analyse genomic data through a singularity container. One of the step of the pipeline uses GATK4 and it implements Spark. However, after some time I get the following error:

Re: Help needed regarding error with 5 node Spark cluster (shuffle error)- Comcast

2023-01-30 Thread Artemis User
sure if this is the intended DL for reaching out for help. Please redirect to the right DL *From: *Jain, Sanchi *Date: *Monday, January 30, 2023 at 10:10 AM *To: *priv...@spark.apache.org *Subject: *Request for access to create a jira account- Comcast Hello there I am a principal engineer at Co

Re: Help needed regarding error with 5 node Spark cluster (shuffle error)- Comcast

2023-01-30 Thread Mich Talebzadeh
e be liable for any monetary damages arising from such loss, damage or destruction. On Mon, 30 Jan 2023 at 15:15, Jain, Sanchi wrote: > I am not sure if this is the intended DL for reaching out for help. Please > redirect to the right DL > > > > *From: *Jain, Sanchi > *Da

Help needed regarding error with 5 node Spark cluster (shuffle error)- Comcast

2023-01-30 Thread Jain, Sanchi
I am not sure if this is the intended DL for reaching out for help. Please redirect to the right DL From: Jain, Sanchi Date: Monday, January 30, 2023 at 10:10 AM To: priv...@spark.apache.org Subject: Request for access to create a jira account- Comcast Hello there I am a principal engineer at

Help with ClassNotFoundException: org.apache.spark.internal.io.cloud.PathOutputCommitProtocol

2022-12-30 Thread Meharji Arumilli
-spark-internal-io-cloud Could you kindly help to solve this. Regards Mehar

Re: Help with Shuffle Read performance

2022-09-30 Thread Igor Calabria
reports values ranging from 25s to several minutes(the task sizes are > really close, they aren't skewed). I've tried increasing > "spark.reducer.maxSizeInFlight" and > "spark.shuffle.io.numConnectionsPerPeer" and it did improve performance by > a little, but n

Re: Help with Shuffle Read performance

2022-09-30 Thread Artemis User
ve tried increasing "spark.reducer.maxSizeInFlight" and "spark.shuffle.io.numConnectionsPerPeer" and it did improve performance by a little, but not enough to saturate the cluster resources. Did I miss some more tuning parameters that could help? One obvious thing would be to

Re: Help with Shuffle Read performance

2022-09-30 Thread Leszek Reimus
t;>>>> sized shuffle of almost 4TB. The relevant cluster config is as follows: >>>>> >>>>> - 30 Executors. 16 physical cores, configured with 32 Cores for spark >>>>> - 128 GB RAM >>>>> - shuffle.partitions is 18k which gives me tasks of a

Re: Help with Shuffle Read performance

2022-09-29 Thread Sungwoo Park
ring the map(reading data from s3 and >>>> writing the shuffle data) CPU usage, disk throughput and network usage is >>>> as expected, but during the reduce phase it gets really low. It seems the >>>> main bottleneck is reading shuffle data from oth

Re: Help with Shuffle Read performance

2022-09-29 Thread Gourav Sengupta
;>> >>>> The job runs fine but I'm bothered by how underutilized the cluster >>>> gets during the reduce phase. During the map(reading data from s3 and >>>> writing the shuffle data) CPU usage, disk throughput and network usage is >>>> as expected, but dur

Re: Help with Shuffle Read performance

2022-09-29 Thread Leszek Reimus
ected, but during the reduce phase it gets really low. It seems the main >>> bottleneck is reading shuffle data from other nodes, task statistics >>> reports values ranging from 25s to several minutes(the task sizes are >>> really close, they aren't skewed). I've

Re: Help with Shuffle Read performance

2022-09-29 Thread Gourav Sengupta
is reading shuffle data from other nodes, task statistics >> reports values ranging from 25s to several minutes(the task sizes are >> really close, they aren't skewed). I've tried increasing >> "spark.reducer.maxSizeInFlight" and >> "spark.shu

Re: Help with Shuffle Read performance

2022-09-29 Thread Igor Calabria
task statistics >> reports values ranging from 25s to several minutes(the task sizes are >> really close, they aren't skewed). I've tried increasing >> "spark.reducer.maxSizeInFlight" and >> "spark.shuffle.io.numConnectionsPerPeer" and it did improve performance by >> a li

Re: Help with Shuffle Read performance

2022-09-29 Thread Vladimir Prus
rformance by > a little, but not enough to saturate the cluster resources. > > Did I miss some more tuning parameters that could help? > One obvious thing would be to vertically increase the machines and use > less nodes to minimize traffic, but 30 nodes doesn't seem like much even > considering 30x30 connections. > > Thanks in advance! > > -- Vladimir Prus http://vladimirprus.com

Re: Help with Shuffle Read performance

2022-09-29 Thread Tufan Rakshit
that's Total Nonsense , EMR is total crap , use kubernetes i will help you . can you please provide whats the size of the shuffle file that is getting generated in each task . What's the total number of Partitions that you have ? What machines are you using ? Are you using an SSD ?

Re: Help with Shuffle Read performance

2022-09-29 Thread Gourav Sengupta
zes are > really close, they aren't skewed). I've tried increasing > "spark.reducer.maxSizeInFlight" and > "spark.shuffle.io.numConnectionsPerPeer" and it did improve performance by > a little, but not enough to saturate the cluster resources. > > Did I mi

Help with Shuffle Read performance

2022-09-29 Thread Igor Calabria
nectionsPerPeer" and it did improve performance by a little, but not enough to saturate the cluster resources. Did I miss some more tuning parameters that could help? One obvious thing would be to vertically increase the machines and use less nodes to minimize traffic, but 30 nodes doesn't seem

HELP, Populating an empty pyspark dataframe with auto-generated dates

2022-09-22 Thread Jamie Arodi
I need help populating an empty dataframe in pyspark with auto-generated dates in a column in the format -mm-dd from 1900-01-01 to 2030-12-31. Kindly help.

Re: Need help with the configuration for AWS glue jobs

2022-06-23 Thread Sid
23:44 Gourav Sengupta, wrote: > Please use EMR, Glue is not made for heavy processing jobs. > > On Thu, Jun 23, 2022 at 6:36 AM Sid wrote: > >> Hi Team, >> >> Could anyone help me in the below problem: >> >> >> https://stackoverflow.com/questions/7

Re: Need help with the configuration for AWS glue jobs

2022-06-23 Thread Gourav Sengupta
Please use EMR, Glue is not made for heavy processing jobs. On Thu, Jun 23, 2022 at 6:36 AM Sid wrote: > Hi Team, > > Could anyone help me in the below problem: > > > https://stackoverflow.com/questions/72724999/how-to-calculate-number-of-g-1-workers-in-aws-glue-for-p

Need help with the configuration for AWS glue jobs

2022-06-22 Thread Sid
Hi Team, Could anyone help me in the below problem: https://stackoverflow.com/questions/72724999/how-to-calculate-number-of-g-1-workers-in-aws-glue-for-processing-1tb-data Thanks, Sid

Structured streaming help on releasing memory

2022-05-09 Thread Xavi Gervilla
nd mode but the memory consumption is very similar. Is there something wrong with the declaration of the window/watermark? What could be causing the data to keep accumulating even after the 10 minute watermark and after the batch is processed? If there's any additional information you might need or think might be helpful to understand better the problem I'll be happy to provide it. You all have been able to help in the past so thank you in advance.

Need help on migrating Spark on Hortonworks to Kubernetes Cluster

2022-05-08 Thread Chetan Khatri
Hi Everyone, I need help on my Airflow DAG which has Spark Submit and Now I have Kubernetes Cluster instead Hortonworks Linux Distributed Spark Cluster.My existing Spark-Submit is through BashOperator as below: calculation1 = '/usr/hdp/2.6.5.0-292/spark2/bin/spark-submit -

Unusual bug,please help me,i can do nothing!!!

2022-03-30 Thread spark User
uot;Failed to initialize Spark session.org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@x.168.137.41:49963". When I try to add "x.168.137.41" in 'etc/hosts' it works fine, then use "ctrl+c" again. The result is that it cannot start normally. Please help me

error bug,please help me!!!

2022-03-20 Thread spark User
uot;Failed to initialize Spark session.org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@x.168.137.41:49963". When I try to add "x.168.137.41" in 'etc/hosts' it works fine, then use "ctrl+c" again. The result is that it cannot start normally. Please help me

Re: Help With unstructured text file with spark scala

2022-02-25 Thread Danilo Sousa
AL ...| 65751353| Jose Silva| >>> |58693 - NACIONAL ...| 65751388| Joana Silva| >>> |58693 - NACIONAL ...| 65751353| Felipe Silva| >>> |58693 - NACIONAL ...| 65751388| Julia Silva| >>> ++-

Re: Help With unstructured text file with spark scala

2022-02-21 Thread Danilo Sousa
1388| Julia Silva| >> ++---+-+ >> >> >> cat csv_file: >> >> Plano#Código Beneficiário#Nome Beneficiário >> 58693 - NACIONAL R COPART PJCE#065751353#Jose Silva >> 58693 - NACIONAL R COPART PJ

Re: Help With unstructured text file with spark scala

2022-02-13 Thread Rafael Mendes
Jose Silva| >> |58693 - NACIONAL ...| 65751388| Joana Silva| >> |58693 - NACIONAL ...| 65751353| Felipe Silva| >> |58693 - NACIONAL ...| 65751388| Julia Silva| >> ++---+-+

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Bitfox
t; Plano#Código Beneficiário#Nome Beneficiário > 58693 - NACIONAL R COPART PJCE#065751353#Jose Silva > 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva > 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva > > 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva >

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Danilo Sousa
open attachments unless you can confirm the sender and know > the content is safe. > > > >Hi >I have to transform unstructured text to dataframe. >Could anyone please help with Scala code ? > >Dataframe need as: > >operadora filial un

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Danilo Sousa
#065751353#Jose Silva > 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva > 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva > 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva > > > Regards > > > On Wed, Feb 9, 2022 at 12:50 AM Danilo Sousa <mail

Re: Help With unstructured text file with spark scala

2022-02-08 Thread Bitfox
va 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva Regards On Wed, Feb 9, 2022 at 12:50 AM Danilo Sousa wrote: > Hi > I have to transform unstructured text to dataframe. &g

Re: Help With unstructured text file with spark scala

2022-02-08 Thread Lalwani, Jayesh
, 11:50 AM, "Danilo Sousa" wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi I have to transform unstructured text to dataframe. Could any

Help With unstructured text file with spark scala

2022-02-08 Thread Danilo Sousa
Hi I have to transform unstructured text to dataframe. Could anyone please help with Scala code ? Dataframe need as: operadora filial unidade contrato empresa plano codigo_beneficiario nome_beneficiario Relação de Beneficiários Ativos e Excluídos Carteira em#27/12/2019##Todos os Beneficiários

Re: help check my simple job

2022-02-06 Thread capitnfrakass
That did resolve my issue. Thanks a lot. frakass n 06/02/2022 17:25, Hannes Bibel wrote: Hi, looks like you're packaging your application for Scala 2.13 (should be specified in your build.sbt) while your Spark installation is built for Scala 2.12. Go to https://spark.apache.org/downloads.htm

Re: help check my simple job

2022-02-06 Thread Hannes Bibel
Hi, looks like you're packaging your application for Scala 2.13 (should be specified in your build.sbt) while your Spark installation is built for Scala 2.12. Go to https://spark.apache.org/downloads.html, select under "Choose a package type" the package type that says "Scala 2.13". With that rel

help check my simple job

2022-02-06 Thread capitnfrakass
Hello I wrote this simple job in scala: $ cat Myjob.scala import org.apache.spark.sql.SparkSession object Myjob { def main(args: Array[String]): Unit = { val sparkSession = SparkSession.builder.appName("Simple Application").getOrCreate() val sparkContext = sparkSession.sparkContext

Re: About some Spark technical help

2021-12-24 Thread sam smith
>> Hi Sam >>>>> >>>>> >>>>> >>>>> Can you tell us more? What is the algorithm? Can you send us the URL >>>>> the publication >>>>> >>>>> >>>>> >>>>> Kind regards

Re: About some Spark technical help

2021-12-24 Thread Andrew Davidson
t;>> >>>> Hi Sam >>>> >>>> >>>> >>>> Can you tell us more? What is the algorithm? Can you send us the URL >>>> the publication >>>> >>>> >>>> >>>> Kind regards >>>> >>&g

Re: About some Spark technical help

2021-12-24 Thread sam smith
>>> >>> >>> >>> Kind regards >>> >>> >>> >>> Andy >>> >>> >>> >>> *From: *sam smith >>> *Date: *Wednesday, December 22, 2021 at 10:59 AM >>> *To: *"user@spark.apache.org" >

Re: About some Spark technical help

2021-12-24 Thread Gourav Sengupta
t;> >> >> Andy >> >> >> >> *From: *sam smith >> *Date: *Wednesday, December 22, 2021 at 10:59 AM >> *To: *"user@spark.apache.org" >> *Subject: *About some Spark technical help >> >> >> >> Hello guys, >> &

Re: About some Spark technical help

2021-12-23 Thread sam smith
you send us the URL the > publication > > > > Kind regards > > > > Andy > > > > *From: *sam smith > *Date: *Wednesday, December 22, 2021 at 10:59 AM > *To: *"user@spark.apache.org" > *Subject: *About some Spark technical help > > >

Re: About some Spark technical help

2021-12-23 Thread Andrew Davidson
Hi Sam Can you tell us more? What is the algorithm? Can you send us the URL the publication Kind regards Andy From: sam smith Date: Wednesday, December 22, 2021 at 10:59 AM To: "user@spark.apache.org" Subject: About some Spark technical help Hello guys, I am replicating

dataset partitioning algorithm implementation help

2021-12-23 Thread sam smith
Hello All, I am replicating a paper's algorithm about a partitioning approach to anonymize datasets with Spark / Java, and want to ask you for some help to review my 150 lines of code. My github repo, attached below, contains both my java class and the related paper: https://githu

About some Spark technical help

2021-12-22 Thread sam smith
Hello guys, I am replicating a paper's algorithm in Spark / Java, and want to ask you guys for some assistance to validate / review about 150 lines of code. My github repo contains both my java class and the related paper, Any interested reviewer here ? Thanks.

About some Spark technical help

2021-12-22 Thread sam smith
Hello guys, I am replicating a paper's algorithm in Spark / Java, and want to ask you guys for some assistance to validate / review about 150 lines of code. My github repo contains both my java class and the related paper, Any interested reviewer here ? Thanks.

Spark usage help

2021-09-01 Thread yinghua...@163.com
Hi: I found that the following methods are used when setting parameters to create a sparksession access hive table 1) hive.execution.engine:spark spark = SparkSession.builder() .appName("get data from hive") .config("hive.execution.engine", "spark") .enableHiveSupport() .getOrCreate()

Re: Spark AQE Post-Shuffle partitions coalesce don't work as expected, and even make data skew in some partitions. Need help to debug issue.

2021-07-06 Thread Mich Talebzadeh
email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Sun, 4 Jul 2021 at 14:13, Nick Grigoriev wrote: >

Re: Spark AQE Post-Shuffle partitions coalesce don't work as expected, and even make data skew in some partitions. Need help to debug issue.

2021-07-05 Thread Nick Grigoriev
n Sun, 4 Jul 2021 at 14:13, Nick Grigoriev <mailto:grigo...@gmail.com>> wrote: > I have ask this question on stack overflow, but it look to complex for Q/A > resource. > https://stackoverflow.com/questions/68236323/spark-aqe-post-shuffle-partitions-coalesce-dont-work-as-expected

Re: Spark AQE Post-Shuffle partitions coalesce don't work as expected, and even make data skew in some partitions. Need help to debug issue.

2021-07-04 Thread Mich Talebzadeh
tps://stackoverflow.com/questions/68236323/spark-aqe-post-shuffle-partitions-coalesce-dont-work-as-expected-and-even-make > So I want ask for help here. > > I use global sort on my spark DF, and when I enable AQE and post-shuffle > coalesce, my partitions after sort operatio

Spark AQE Post-Shuffle partitions coalesce don't work as expected, and even make data skew in some partitions. Need help to debug issue.

2021-07-04 Thread Nick Grigoriev
I have ask this question on stack overflow, but it look to complex for Q/A resource. https://stackoverflow.com/questions/68236323/spark-aqe-post-shuffle-partitions-coalesce-dont-work-as-expected-and-even-make So I want ask for help here. I use global sort on my spark DF, and when I enable AQE

Need help to create database and integration woth Spark App in local machine

2021-06-12 Thread Himanshu Soni
Hi Team, Could you please help with below : 1. Want to create a database (Oracle) with some tables in local machine 2. Integrate the database tables so i can query them from Spark App in local machine Thanks & Regards- Himanshu Soni Mobile: +91 8411000279

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-15 Thread Mich Talebzadeh
This is an interesting one. I have never tried to add --files ... spark-submit --master yarn --deploy-mode client --files /etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml Rather, under $SPARK_HOME/conf, I create soft links to the needed XML files as belo

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-15 Thread KhajaAsmath Mohammed
Thanks everyone. I was able to resolve this. Here is what I did. Just passed conf file using —files option. Mistake that I did was reading the json conf file before creating spark session . Reading if after creating spark session helped it. Thanks once again for your valuable suggestions Tha

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-15 Thread Sean Owen
If code running on the executors need some local file like a config file, then it does have to be passed this way. That much is normal. On Sat, May 15, 2021 at 1:41 AM Gourav Sengupta wrote: > Hi, > > once again lets start with the requirement. Why are you trying to pass xml > and json files to

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread Gourav Sengupta
at to all executors. >>>>> >>>>> >>>>> On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang < >>>>> longjiang.y...@target.com> wrote: >>>>> >>>>>> Could you check whether this file is accessible in executors? (is it >>>>>> in HDFS or in the client local FS) >>>>>> /appl/common/ftp/conf.json >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *From: *KhajaAsmath Mohammed >>>>>> *Date: *Friday, May 14, 2021 at 4:50 PM >>>>>> *To: *"user @spark" >>>>>> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error >>>>>> >>>>>> >>>>>> >>>>>> /appl/common/ftp/conf.json >>>>>> >>>>>

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread Amit Joshi
t;>>> >>>>> Could you check whether this file is accessible in executors? (is it >>>>> in HDFS or in the client local FS) >>>>> /appl/common/ftp/conf.json >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *From: *KhajaAsmath Mohammed >>>>> *Date: *Friday, May 14, 2021 at 4:50 PM >>>>> *To: *"user @spark" >>>>> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error >>>>> >>>>> >>>>> >>>>> /appl/common/ftp/conf.json >>>>> >>>>

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
gt;>> >>> On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang < >>> longjiang.y...@target.com> wrote: >>> >>>> Could you check whether this file is accessible in executors? (is it in >>>> HDFS or in the client local FS) >>>> /appl/

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
>> >>> >>> >>> >>> >>> *From: *KhajaAsmath Mohammed >>> *Date: *Friday, May 14, 2021 at 4:50 PM >>> *To: *"user @spark" >>> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error >>> >>> >>> >>> /appl/common/ftp/conf.json >>> >>

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
>> >> >> >> *From: *KhajaAsmath Mohammed >> *Date: *Friday, May 14, 2021 at 4:50 PM >> *To: *"user @spark" >> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error >> >> >> >> /appl/common/ftp/conf.json >> >

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
nt local FS) > /appl/common/ftp/conf.json > > > > > > *From: *KhajaAsmath Mohammed > *Date: *Friday, May 14, 2021 at 4:50 PM > *To: *"user @spark" > *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error > > > > /appl/common/ftp/conf.json >

Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
Hi, I am having a weird situation where the below command works when the deploy mode is a client and fails if it is a cluster. spark-submit --master yarn --deploy-mode client --files /etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml --driver-memory 70g --n

Re: Installation Error - Please Help!

2021-05-11 Thread Sean Owen
or > "'spark-shell' is not recognized as an internal or external command,operable > program or batch file." > I am sharing the screenshots of my environment variables. Please help me. > I am stuck now. > > I am looking forward to hearing from you > Thanks &

Installation Error - Please Help!

2021-05-11 Thread Talha Javed
va HotSpot(TM) 64-Bit Server VM (build 25.291-b10, mixed mode) WHEN I ENTER THE COMMAND spark-shell in cmd it gives me this error "'spark-shell' is not recognized as an internal or external command,operable program or batch file." I am sharing the screenshots of my environment v

Need help on Calling Pyspark code using Wheel

2020-10-23 Thread Sachit Murarka
Hi Users, I have created a wheel file using Poetry. I tried running the following commands to run spark job using wheel , but it is not working. Can anyone please let me know about the invocation step for the wheel file? spark-submit --py-files /path/to/wheel spark-submit --files /path/to/wheel

Re: Spark : Very simple query failing [Needed help please]

2020-09-26 Thread Gourav Sengupta
Hi How did you set up your environment? And can you print the schema of your table as well? It looks like you are using hive tables? Regards Gourav On Fri, 18 Sep 2020, 14:11 Debabrata Ghosh, wrote: > Hi, > I needed some help from you on the attached Spark problem > ple

Spark : Very simple query failing [Needed help please]

2020-09-18 Thread Debabrata Ghosh
Hi, I needed some help from you on the attached Spark problem please. I am running the following query: >>> df_location = spark.sql("""select dt from ql_raw_zone.ext_ql_location where ( lat between 41.67 and 45.82) and (lon between -86.74 and -82.42 ) and y

Re: help on use case - spark parquet processing

2020-08-13 Thread Amit Sharma
Can you keep option field in your case class. Thanks Amit On Thu, Aug 13, 2020 at 12:47 PM manjay kumar wrote: > Hi , > > I have a use case, > > where i need to merge three data set and build one where ever data is > available. > > And my dataset is a complex object. > > Customer > - name - st

help on use case - spark parquet processing

2020-08-13 Thread manjay kumar
Hi , I have a use case, where i need to merge three data set and build one where ever data is available. And my dataset is a complex object. Customer - name - string - accounts - List Account - type - String - Adressess - List Address -name - String --- And it goes on. These file ar

Re: Apache Spark- Help with email library

2020-07-27 Thread Suat Toksöz
Why I am not able to send my question to the spark email list? Thanks On Mon, Jul 27, 2020 at 10:31 AM tianlangstudio wrote: > I use SimpleJavaEmail http://www.simplejavamail.org/#/features for Send > email and parse email file. It is awesome and may help you. > > <htt

回复:Apache Spark- Help with email library

2020-07-27 Thread tianlangstudio
I use SimpleJavaEmail http://www.simplejavamail.org/#/features for Send email and parse email file. It is awesome and may help you. TianlangStudio Some of the biggest lies: I will start tomorrow/Others are better than me/I am not good enough/I don't have time/This is the way

Apache Spark- Help with email library

2020-07-26 Thread sn . noufal
Hi, I am looking to send a dataframe as email.How do I do that? Do you have any library with sample.Appreciate your response Regards, Mohamed - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Need your help!! (URGENT Code works fine when submitted as java main but part of data missing when running as Spark-Submit)

2020-07-23 Thread murat migdisoglu
a potential reason might be that you are getting a classnotfound exception when you run on the cluster (due to a missing jar in your uber jar) and you are possibly silently eating up exceptions in your code. 1- you can check if there are any failed tasks 2- you can check if there are any failed ex

Re: Need your help!! (URGENT Code works fine when submitted as java main but part of data missing when running as Spark-Submit)

2020-07-21 Thread Pasha Finkelshteyn
Hi Rachana, Couls you please provide us with mre details: Minimal repro Spark version Java version Scala version On 20/07/21 08:27AM, Rachana Srivastava wrote: > I am unable to identify the root cause of why my code is missing data when I > run as spark-submit but the code works fine when I ru

Need your help!! (URGENT Code works fine when submitted as java main but part of data missing when running as Spark-Submit)

2020-07-21 Thread Rachana Srivastava
I am unable to identify the root cause of why my code is missing data when I run as spark-submit but the code works fine when I run as java main  Any idea

  1   2   3   4   5   6   7   8   9   >