from:"Karthick"

Help - Learning/Understanding spark web UI

2024-09-26 Thread Karthick Nk

Hi All, I am looking to deepen my understanding of the Spark Web UI. Could anyone recommend some useful materials, online courses, or share how you learned about it? I've already reviewed the official Spark Web UI documentation, but it only covers the basics. Note: I am using azure databricks for

Setting forarch microbatch processing data count in structured streaming

2024-08-30 Thread Karthick Nk

Hi All, I am using structured streaming in Databricks by using foreach functionality to do my transformation and action and finally need to write the data into a delta table my data soruce is either (Eventhub or delta table or azure cosmos changefeed). Whenever there are huge changes in source(De

Handling load distribution and addressing data skew.

2024-08-16 Thread Karthick

Hi Team, I'm using repartition and sortWithinPartitions to maintain field-based ordering across partitions, but I'm facing data skewness among the partitions. I have 96 partitions, and I'm working with 500 distinct keys. While reviewing the Spark UI, I noticed that a few partitions are underutiliz

OOM issue in Spark Driver

2024-06-07 Thread Karthick Nk

Hi All, I am using the pyspark structure streaming with Azure Databricks for data load process. In the Pipeline I am using a Job cluster and I am running only one pipeline, I am getting the OUT OF MEMORY issue while running for a long time. When I inspect the metrics of the cluster I found that,

Re: pyspark dataframe join with two different data type

2024-05-16 Thread Karthick Nk

f the solution, I have tried is below, but here I am doing explode and doing distinct again, But I need to perform the action without doing this since this will impact performance again for the huge data. Thanks, solutions On Thu, May 16, 2024 at 8:33 AM Karthick Nk wrote: > Thanks

Re: pyspark dataframe join with two different data type

2024-05-15 Thread Karthick Nk

my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essenti

Re: pyspark dataframe join with two different data type

2024-05-14 Thread Karthick Nk

a" and "b" both exist in the > array. So Spark is correctly performing the join. It looks like you need to > find another way to model this data to get what you want to achieve. > > Are the values of "a" and "b" related to each other in any way? >

Re: pyspark dataframe join with two different data type

2024-05-10 Thread Karthick Nk

e > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note &g

Re: Spark streaming issue to Elastic data**

2024-05-05 Thread Karthick Nk

is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Thu, 2 May 2024 at 21:25, Karthick Nk wrote: > >> Hi All, >> >> Requirements: &

Spark streaming issue to Elastic data**

2024-05-02 Thread Karthick Nk

Hi All, Requirements: I am working on the data flow, which will use the view definition(view definition already defined in schema), there are multiple tables used in the view definition. Here we want to stream the view data into elastic index based on if any of the table(used in the view definitio

Data ingestion into elastic failing using pyspark

2024-03-11 Thread Karthick Nk

Hi @all, I am using pyspark program to write the data into elastic index by using upsert operation (sample code snippet below). def writeDataToES(final_df): write_options = { "es.nodes": elastic_host, "es.net.ssl": "false", "es.nodes.wan.only": "true", "es.net.http.auth.user"

pyspark dataframe join with two different data type

2024-02-29 Thread Karthick Nk

Hi All, I have two dataframe with below structure, i have to join these two dataframe - the scenario is one column is string in one dataframe and in other df join column is array of string, so we have to inner join two df and get the data if string value is present in any of the array of string va

Issue in Creating Temp_view in databricks and using spark.sql().

2024-01-30 Thread Karthick Nk

Hi Team, I am using structered streaming in pyspark in azure Databricks, in that I am creating temp_view from dataframe (df.createOrReplaceTempView('temp_view')) for performing spark sql query transformation. In that I am facing the issue that temp_view not found, so that as a workaround i have cr

Re: Updating delta file column data

2023-10-08 Thread Karthick Nk

sible way to perform the required action in an optimistic way? Note: Please feel free to ask, if you need further information. Thanks & regards, Karthick On Mon, Oct 2, 2023 at 10:53 PM Karthick Nk wrote: > Hi community members, > > In databricks adls2 delta tables, I need to perform

Updating delta file column data

2023-10-02 Thread Karthick Nk

Hi community members, In databricks adls2 delta tables, I need to perform the below operation, could you help me with your thoughts I have the delta tables with one colum with data type string , which contains the json data in string data type, I need to do the following 1. I have to update one

Re: Urgent: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem

2023-09-22 Thread Karthick

Hi All, It will be helpful if we gave any pointers to the problem addressed. Thanks Karthick. On Wed, Sep 20, 2023 at 3:03 PM Gowtham S wrote: > Hi Spark Community, > > Thank you for bringing up this issue. We've also encountered the same > challenge and are actively wor

Urgent: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem

2023-09-19 Thread Karthick

Thank you for your time and consideration. Thanks & regards, Karthick.

Re: Error while merge in delta table

2023-05-12 Thread Karthick Nk

tables in a concurrent manner, are this is the issue(so we have any constraint for it) For this kind of run time how we can usually identify the root cause of it? On Thu, May 11, 2023 at 9:37 PM Farhan Misarwala wrote: > Hi Karthick, > > I think I have seen this before and this

Error while merge in delta table

2023-05-10 Thread Karthick Nk

Hi, I am trying to merge daaframe with delta table in databricks, but i am getting error, i have attached the code nippet and error message for reference below, code: [image: image.png] error: [image: image.png] Thanks

pyspark.sql.functions.monotonically_increasing_id()

2023-04-28 Thread Karthick Nk

Hi @all, I am using monotonically_increasing_id(), in the pyspark function, for removing one field from json field in one column from the delta table, please refer the below code df = spark.sql(f"SELECT * from {database}.{table}") df1 = spark.read.json(df.rdd.map(lambda x: x.data), multiLine = Tr

Re: Converting None/Null into json in pyspark

2022-10-04 Thread Karthick Nk

022 at 10:57 AM Yeachan Park wrote: > Hi, > > There's a config option for this. Try setting this to false in your spark > conf. > > spark.sql.jsonGenerator.ignoreNullFields > > On Tuesday, October 4, 2022, Karthick Nk wrote: > >> Hi all, >> >> I need to

Converting None/Null into json in pyspark

2022-10-03 Thread Karthick Nk

Hi all, I need to convert pyspark dataframe into json . While converting , if all rows values are null/None for that particular column that column is getting removed from data. Could you suggest a way to do this. I need to convert dataframe into json with columns. Thanks

Help - Learning/Understanding spark web UI

Setting forarch microbatch processing data count in structured streaming

Handling load distribution and addressing data skew.

OOM issue in Spark Driver

Re: pyspark dataframe join with two different data type

Re: pyspark dataframe join with two different data type

Re: pyspark dataframe join with two different data type

Re: pyspark dataframe join with two different data type

Re: Spark streaming issue to Elastic data**

Spark streaming issue to Elastic data**

Data ingestion into elastic failing using pyspark

pyspark dataframe join with two different data type

Issue in Creating Temp_view in databricks and using spark.sql().

Re: Updating delta file column data

Updating delta file column data

Re: Urgent: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem

Urgent: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem

Re: Error while merge in delta table

Error while merge in delta table

pyspark.sql.functions.monotonically_increasing_id()

Re: Converting None/Null into json in pyspark

Converting None/Null into json in pyspark

22 matches

Site Navigation

Mail list logo

Footer information