from:"Deepak Sharma"

Re: Any committers interested in reviewing SPARK-52023

2025-06-28 Thread Deepak Sharma

I can review though I am not a committer yet On Sat, 28 Jun 2025 at 17:48, Emil Ejbyfeldt wrote: > Create this MR https://github.com/apache/spark/pull/50827 that fixes a > segfault/data corruption issue when using a udaf returning an option. > > Anyone committer willing to have a look? > > Best,

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Deepak Sharma

+1 . I can contribute to it as well . On Tue, 19 Mar 2024 at 9:19 AM, Code Tutelage wrote: > +1 > > Thanks for proposing > > On Mon, Mar 18, 2024 at 9:25 AM Parsian, Mahmoud > wrote: > >> Good idea. Will be useful >> >> >> >> +1 >> >> >> >> >> >> >> >> *From: *ashok34...@yahoo.com.INVALID >> *

Re: Data Contracts

2023-06-19 Thread Deepak Sharma

demonstrates > that one that does not, throws an Exception. > > I've had to slightly modify 3 Spark files to add the data contract > functionality. If you can think of a more elegant solution, I'd be very > grateful. > > Regards, > > Phillip > > > > >

Re: Data Contracts

2023-06-19 Thread Deepak Sharma

friend who >>>> complained that his company's Zurich office made a breaking change and was >>>> not even aware that his London based department existed, never mind >>>> depended on their data. In large organisations, this is pretty common. >>>

Re: Data Contracts

2023-06-12 Thread Deepak Sharma

Spark can be used with tools like great expectations as well to implement the data contracts . I am not sure though if spark alone can do the data contracts . I was reading a blog on data mesh and how to glue it together with data contracts , that’s where I came across this spark and great expectat

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-07 Thread Deepak Sharma

Please count me in . Can we have spark on k8s with spark-connect feature covered? On Wed, 8 Feb 2023 at 10:03, Kirti Ruge wrote: > Greetings everyone, > I would love to be part of this session. > IST > > > On Wed, 8 Feb 2023 at 9:13 AM, Colin Williams < > colin.williams.seat...@gmail.com> wrote:

Re: Spark Issue with Istio in Distributed Mode

2022-09-11 Thread Deepak Sharma

oy-v3-api-field-config-core-v3-httpprotocoloptions-idle-timeout > > > On Sat, Sep 3, 2022 at 4:23 AM Deepak Sharma > wrote: > >> Thank for the reply IIan . >> Can we set this in spark conf or does it need to goto istio / envoy conf? >> >> >> >> On S

Re: Spark Issue with Istio in Distributed Mode

2022-09-03 Thread Deepak Sharma

at 12:17 AM Deepak Sharma > wrote: > >> Hi All, >> In 1 of our cluster , we enabled Istio where spark is running in >> distributed mode. >> Spark works fine when we run it with Istio in standalone mode. >> In spark distributed mode , we are seeing that every 1 hou

Spark Issue with Istio in Distributed Mode

2022-09-02 Thread Deepak Sharma

Hi All, In 1 of our cluster , we enabled Istio where spark is running in distributed mode. Spark works fine when we run it with Istio in standalone mode. In spark distributed mode , we are seeing that every 1 hour or so the workers are getting disassociated from master and then master is not able t

Observability around Flink Pipeline/stateful functions

2021-07-22 Thread Deepak Sharma

@dev@spark.apache.org @user I am looking for an example around the observability framework for Apache Flink pipelines. This could be message tracing across multiple flink pipelines or query on the past state of a message that was processed by any flink pipeline. If anyone has done similar work an

Write to same hdfs dir from multiple spark jobs

2020-07-29 Thread Deepak Sharma

Hi Is there any design pattern around writing to the same hdfs directory from multiple spark jobs? -- Thanks Deepak www.bigdatabig.com

GroupBy issue while running K-Means - Dataframe

2020-06-16 Thread Deepak Sharma

Hi All, I have a custom implementation of K-Means where it needs the data to be grouped by a key in a dataframe. Now there is a big data skew for some of the keys , where it exceeds the BufferHolder: Cannot grow BufferHolder by size 17112 because the size after growing exceeds size limitation 2147

unsubscribe

2019-12-07 Thread Deepak Sharma

Read hdfs files in spark streaming

2019-06-09 Thread Deepak Sharma

I am using spark streaming application to read from kafka. The value coming from kafka message is path to hdfs file. I am using spark 2.x , spark.read.stream. What is the best way to read this path in spark streaming and then read the json stored at the hdfs path , may be using spark.read.json , i

Re: welcoming Burak and Holden as committers

2017-01-24 Thread Deepak Sharma

Congratulations Holden & Burak On Wed, Jan 25, 2017 at 8:23 AM, jiangxingbo wrote: > Congratulations Burak & Holden! > > > 在 2017年1月25日，上午2:13，Reynold Xin 写道： > > > > Hi all, > > > > Burak and Holden have recently been elected as Apache Spark committers. > > > > Burak has been very active in a

Auto start spark jobs

2016-10-10 Thread Deepak Sharma

Hi All Is there any way to schedule the ever running spark in such a way that it comes up on its own , after the cluster maintenance? -- Thanks Deepak www.bigdatabig.com www.keosha.net

Re: Reading back hdfs files saved as case class

2016-10-07 Thread Deepak Sharma

hat are not very complex. > > On Fri, Oct 7, 2016 at 12:20 PM, Deepak Sharma > wrote: > >> Hi >> I am saving RDD[Example] in hdfs from spark program , where Example is >> case class. >> Now when i am trying to read it back , it returns RDD[String] with the >&

Reading back hdfs files saved as case class

2016-10-07 Thread Deepak Sharma

Hi I am saving RDD[Example] in hdfs from spark program , where Example is case class. Now when i am trying to read it back , it returns RDD[String] with the content as below: *Example(1,name,value)* The workaround can be to write as a string in hdfs and read it back as string and perform further p

Use cases around image/video processing in spark

2016-08-10 Thread Deepak Sharma

Hi If anyone is using or knows about github repo that can help me get started with image and video processing using spark. The images/videos will be stored in s3 and i am planning to use s3 with Spark. In this case , how will spark achieve distributed processing? Any code base or references is real

How to map values read from test file to 2 different RDDs

2016-05-23 Thread Deepak Sharma

Hi I am reading a text file with 16 fields. All the place holders for the values of this text file has been defined in say 2 different case classes: Case1 and Case2 How do i map values read from text file , so my function in scala should be able to return 2 different RDDs , with each each RDD of t

Re: Any committers interested in reviewing SPARK-52023

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

Re: Data Contracts

Re: Data Contracts

Re: Data Contracts

Re: Spark on Kube (virtua) coffee/tea/pop times

Re: Spark Issue with Istio in Distributed Mode

Re: Spark Issue with Istio in Distributed Mode

Spark Issue with Istio in Distributed Mode

Observability around Flink Pipeline/stateful functions

Write to same hdfs dir from multiple spark jobs

GroupBy issue while running K-Means - Dataframe

unsubscribe

Read hdfs files in spark streaming

Re: welcoming Burak and Holden as committers

Auto start spark jobs

Re: Reading back hdfs files saved as case class

Reading back hdfs files saved as case class

Use cases around image/video processing in spark

How to map values read from test file to 2 different RDDs

20 matches

Site Navigation

Mail list logo

Footer information