Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
using kubeflow spark operator, and in the process of doing performance comparison/optimization as well. regds,Karan AlangOn Fri, Apr 11, 2025 at 9:07 AM Prem Sahoo <prem.re...@gmail.com> wrote:Hello Karan,I am using Spark open source in kubernetes and Spark mapr bundle in YARN.For launching job in bo

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
Spark are you using ?how long does it take to launch the job ?wrt Spark Shuffle, what is the approach you are using - storing shuffle data in MinIO or using host path ?regds,KaranOn Fri, Apr 11, 2025 at 4:58 AM Prem Sahoo <prem.re...@gmail.com> wrote:Hello Team,I have a peculiar case of

SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
Hello Team, I have a peculiar case of Spark slowness. I am using Minio as Object storage from where Spark reads & writes data. I am using YARN as Master and executing a Spark job which takes ~5mins the same job when run with Kubernetes as Master it takes ~8 mins . I checked the Spark DAG in both a

performance issue Spark 3.5.2 on kubernetes

2025-03-26 Thread Prem Sahoo
Hello Team, I was working with Spark 3.2 and Hadoop 2.7.6 and writing to MinIO object storage . It was slower when compared to writing to MapR FS with the above tech stack. Then moved on to a later upgraded version of Spark 3.5.2 and Hadoop 4.3.1 which started writing to MinIO with V2 fileoutputcom

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-25 Thread Prem Sahoo
issue .I will be happy to raise JIRA .Sent from my iPhoneOn Mar 24, 2025, at 4:20 PM, Prem Sahoo wrote:The problem is on the writer's side. It takes longer to write to Minio with Spark 3.5.2 and Hadoop 3.4.1 . so it seems there are some tech changes between hadoop 2.7.6 to 3.4.1 which made the 

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-24 Thread Prem Sahoo
arez.pas...@gmail.com> wrote: > @Prem Sahoo , could you test both versions of > Spark+Hadoop by replacing your "write to MinIO" statement with > write.format("noop")? This would help us determine whether the issue lies > on the reader side or the writer side. >

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-22 Thread Prem Sahoo
Álvarez Pascua wrote:Sure. I love performance challenges and mysteries!Please, could you provide an example project or the steps to build one?Thanks.El dom, 23 mar 2025, 2:17, Prem Sahoo <prem.re...@gmail.com> escribió:Hello Team, I was working with Spark 3.2 and Hadoop 2.7.6 and writing to

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-22 Thread Prem Sahoo
a and a few sample fake rows should be sufficient. El dom, 23 mar 2025 a las 3:17, Prem Sahoo (<prem.re...@gmail.com>) escribió:I am providing the schema , and schema is actually correct means it has all the columns available in csv . So we can take out this issue for slowness .  May be th

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-22 Thread Prem Sahoo
lly. Given all this, my initial hypothesis is that thousands upon thousands of exceptions are being thrown internally, only to be handled by the Univocity parser—so the user isn't even aware of what's happening.El dom, 23 mar 2025 a las 2:40, Prem Sahoo (<prem.re...@gmail.com>) escr

Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-22 Thread Prem Sahoo
Hello Team, I was working with Spark 3.2 and Hadoop 2.7.6 and writing to MinIO object storage . It was slower when compared to write to MapR FS with above tech stack. Then moved on to later upgraded version of Spark 3.5.2 and Hadoop 4.3.1 which started writing to MinIO with V2 fileoutputcommitte

Re: Spark Reads from MapR and Write to MinIO fails for few batches

2024-08-24 Thread Prem Sahoo
Issue resolved , thanks for your time folks.Sent from my iPhoneOn Aug 21, 2024, at 5:38 PM, Prem Sahoo wrote:Hello Team,Could you please check on this request ?On Mon, Aug 19, 2024 at 7:00 PM Prem Sahoo <prem.re...@gmail.com> wrote:Hello Spark and User,could you please shed some light ?

Re: Spark Reads from MapR and Write to MinIO fails for few batches

2024-08-21 Thread Prem Sahoo
Hello Team, Could you please check on this request ? On Mon, Aug 19, 2024 at 7:00 PM Prem Sahoo wrote: > Hello Spark and User, > could you please shed some light ? > > On Thu, Aug 15, 2024 at 7:15 PM Prem Sahoo wrote: > >> Hello Spark and User, >> we have a Sp

Re: Spark Reads from MapR and Write to MinIO fails for few batches

2024-08-19 Thread Prem Sahoo
Hello Spark and User, could you please shed some light ? On Thu, Aug 15, 2024 at 7:15 PM Prem Sahoo wrote: > Hello Spark and User, > we have a Spark project which is a long running Spark session where it > does below > 1. We are reading from Mapr FS and writing to MapR FS.

Spark Reads from MapR and Write to MinIO fails for few batches

2024-08-15 Thread Prem Sahoo
Hello Spark and User, we have a Spark project which is a long running Spark session where it does below 1. We are reading from Mapr FS and writing to MapR FS. 2. Another parallel job which reads from MapR Fs and Writes to MinIO object storage. We are finding issues for a few batches of Spark jo

Re: BUG :: UI Spark

2024-05-26 Thread Prem Sahoo
Can anyone please assist me ? On Fri, May 24, 2024 at 12:29 AM Prem Sahoo wrote: > Does anyone have a clue ? > > On Thu, May 23, 2024 at 11:40 AM Prem Sahoo wrote: > >> Hello Team, >> in spark DAG UI , we have Stages tab. Once you click on each stage you >> can

Re: BUG :: UI Spark

2024-05-23 Thread Prem Sahoo
Does anyone have a clue ? On Thu, May 23, 2024 at 11:40 AM Prem Sahoo wrote: > Hello Team, > in spark DAG UI , we have Stages tab. Once you click on each stage you can > view the tasks. > > In each task we have a column "ShuffleWrite Size/Records " that column > pr

BUG :: UI Spark

2024-05-23 Thread Prem Sahoo
Hello Team, in spark DAG UI , we have Stages tab. Once you click on each stage you can view the tasks. In each task we have a column "ShuffleWrite Size/Records " that column prints wrong data when it gets the data from cache/persist . it typically will show the wrong record number though the data

Re: EXT: Dual Write to HDFS and MinIO in faster way

2024-05-21 Thread Prem Sahoo
level. > > Regards, > Vibhor > From: Prem Sahoo > Date: Tuesday, 21 May 2024 at 8:16 AM > To: Spark dev list > Subject: EXT: Dual Write to HDFS and MinIO in faster way > > EXTERNAL: Report suspicious emails to Email Abuse. > > Hello Team, > I am plan

Dual Write to HDFS and MinIO in faster way

2024-05-20 Thread Prem Sahoo
Hello Team, I am planning to write to two datasource at the same time . Scenario:- Writing the same dataframe to HDFS and MinIO without re-executing the transformations and no cache(). Then how can we make it faster ? Read the parquet file and do a few transformations and write to HDFS and MinIO

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Prem Sahoo
tps://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Wed, 8 May 2024 at 13:41, Prem Sahoo wrote: > >> Could any one help me here ? >> Sent from my iPhone >> >> > On May 7, 2024

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Prem Sahoo
Could any one help me here ? Sent from my iPhone > On May 7, 2024, at 4:30 PM, Prem Sahoo wrote: > >  > Hello Folks, > in Spark I have read a file and done some transformation and finally writing > to hdfs. > > Now I am interested in writing the same dataframe to MapR

caching a dataframe in Spark takes lot of time

2024-05-07 Thread Prem Sahoo
Hello Folks, in Spark I have read a file and done some transformation and finally writing to hdfs. Now I am interested in writing the same dataframe to MapRFS but for this Spark will execute the full DAG again (recompute all the previous steps)(all the read + transformations ). I don't want this

Re: Which version of spark version supports parquet version 2 ?

2024-04-26 Thread Prem Sahoo
Confirmed, closing this . Thanks everyone for valuable information. Sent from my iPhone > On Apr 25, 2024, at 9:55 AM, Prem Sahoo wrote: > >  > Hello Spark , > After discussing with the Parquet and Pyarrow community . We can use the > below config so that Spark can writ

Re: Which version of spark version supports parquet version 2 ?

2024-04-25 Thread Prem Sahoo
Hello Spark , After discussing with the Parquet and Pyarrow community . We can use the below config so that Spark can write Parquet V2 files. *"hadoopConfiguration.set(“parquet.writer.version”, “v2”)" while creating Parquet then those are V2 parquet.* *Could you please confirm ?* >

Re: Which version of spark version supports parquet version 2 ?

2024-04-18 Thread Prem Sahoo
thub.com/apache/parquet-mr?tab=readme-ov-file#java-vector-api-supportYou are using spark 3.2.0spark version 3.2.4 was released April 13, 2023 https://spark.apache.org/releases/spark-release-3-2-4.htmlYou are using a spark version that is EOL.tor. 18. apr. 2024 kl. 00:25 skrev Prem Sahoo <prem.re.

Re: Which version of spark version supports parquet version 2 ?

2024-04-17 Thread Prem Sahoo
the Parquet community. > > On Wed, Apr 17, 2024 at 11:05 AM Prem Sahoo wrote: > >> Hello Community, >> Could anyone shed more light on this (Spark Supporting Parquet V2)? >> >> On Tue, Apr 16, 2024 at 3:42 PM Mich Talebzadeh < >> mich.talebza...@gmail.com&g

Re: Which version of spark version supports parquet version 2 ?

2024-04-17 Thread Prem Sahoo
be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > &

Re: Which version of spark version supports parquet version 2 ?

2024-04-16 Thread Prem Sahoo
enerative AILondonUnited Kingdom    view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh  Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-t

Re: Which version of spark version supports parquet version 2 ?

2024-04-16 Thread Prem Sahoo
Hello Community,Could any of you shed some light on below questions please ?Sent from my iPhoneOn Apr 15, 2024, at 9:02 PM, Prem Sahoo wrote:Any specific reason spark does not support or community doesn't want to go to Parquet V2 , which is more optimized and read and write is too much f

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Prem Sahoo
ngs just fine. You just don't > need to worry about making Spark produce v2. And you should probably also > not produce v2 encodings from other systems. > > On Mon, Apr 15, 2024 at 4:37 PM Prem Sahoo wrote: > >> oops but so spark does not support parquet V2 atm ?, as W

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Prem Sahoo
by > the community. I highly recommend not using v2 encodings at this time. > > Ryan > > On Mon, Apr 15, 2024 at 3:05 PM Prem Sahoo wrote: > >> I am using spark 3.2.0 . but my spark package comes with parquet-mr 1.2.1 >> which writes in parquet version 1 not version versio

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Prem Sahoo
opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Mon, 15 Apr 2024 at 21:33, Prem Sahoo wrote: > >> Thank you so much for the info! But do we have any release notes wher

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Prem Sahoo
a.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Mon, 15 Apr 2024 at 20:53, Prem Sahoo wrote: > >> Thank you for the information! >> I can use any version of parquet-mr to produce parquet file. >> >>

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Prem Sahoo
be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > &

Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Prem Sahoo
Hello Team, May I know how to check which version of parquet is supported by parquet-mr 1.2.1 ? Which version of parquet-mr is supporting parquet version 2 (V2) ? Which version of spark is supporting parquet version 2? May I get the release notes where parquet versions are mentioned ?

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Prem Sahoo
ormation. Hopefully, someone with more knowledge can > provide further insight. > > Best, > Jason > > On Mon, Mar 4, 2024 at 9:41 AM Prem Sahoo wrote: > >> super :( >> >> On Mon, Mar 4, 2024 at 6:19 AM Mich Talebzadeh >> wrote: >> >>> "

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Prem Sahoo
m/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https:

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-03 Thread Prem Sahoo
b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-01 Thread Prem Sahoo
information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > B

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-01 Thread Prem Sahoo
Jira with 'Spark' project and three > labels, 'Correctness', 'correctness' and 'data-loss'. > > Dongjoon > > On Thu, Feb 29, 2024 at 11:54 Prem Sahoo wrote: > >> Hello Dongjoon, >> Thanks for emailing me. >> Could you plea

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread Prem Sahoo
Congratulations 👍Sent from my iPhoneOn Feb 29, 2024, at 4:54 PM, Xinrong Meng wrote:Congratulations!Thanks,XinrongOn Thu, Feb 29, 2024 at 11:16 AM Dongjoon Hyun wrote:Congratulations!Bests,Dongjoon.On Wed, Feb 28, 2024 at 11:43 AM beliefer wrote:Congra

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-02-29 Thread Prem Sahoo
orrectness issues with Apache > Spark 3.5.1. > > Thanks, > Dongjoon. > > On 2024/02/29 15:04:41 Prem Sahoo wrote: > > When Spark job shows FetchFailedException it creates few duplicate data > and > > we see few data also missing , please explain why. We have scenario when >

When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-02-29 Thread Prem Sahoo
When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why. We have scenario when spark job complains FetchFailedException as one of the data node got rebooted middle of job running . Now due to this we have few duplicate data and

Re: Spark Union performance issue

2023-02-22 Thread Prem Sahoo
gt; How many columns do all these tables have? > > Are you sure creating the plan depends on the number of rows? > > Enrico > > > Am 22.02.23 um 19:08 schrieb Prem Sahoo: > > here is the information missed > 1. Spark 3.2.0 > 2. it is scala based > 3. size of tab

Re: Spark Union performance issue

2023-02-22 Thread Prem Sahoo
r destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Wed, 22 Feb 2023 at

Spark Union performance issue

2023-02-22 Thread Prem Sahoo
Hello Team, We are observing Spark Union performance issues when unioning big tables with lots of rows. Do we have any option apart from the Union ?

Executor tab missing information

2023-02-13 Thread Prem Sahoo
Hello All, I am executing spark jobs but in executor tab I am missing information, I cant see any data/info coming up. Please let me know what I am missing .

Re: [VOTE][SPIP] Lazy Materialization for Parquet Read Performance Improvement

2023-02-13 Thread Prem Sahoo
+1 On Mon, Feb 13, 2023 at 8:13 PM L. C. Hsieh wrote: > +1 > > On Mon, Feb 13, 2023 at 3:49 PM Mich Talebzadeh > wrote: > >> +1 for me >> >> >> >>view my Linkedin profile >> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >>