Help - Learning/Understanding spark web UI

2024-09-26 Thread Karthick Nk
Hi All, I am looking to deepen my understanding of the Spark Web UI. Could anyone recommend some useful materials, online courses, or share how you learned about it? I've already reviewed the official Spark Web UI documentation, but it only covers the basics. Note: I am using azure databricks for

Setting forarch microbatch processing data count in structured streaming

2024-08-30 Thread Karthick Nk
Hi All, I am using structured streaming in Databricks by using foreach functionality to do my transformation and action and finally need to write the data into a delta table my data soruce is either (Eventhub or delta table or azure cosmos changefeed). Whenever there are huge changes in source(De

OOM issue in Spark Driver

2024-06-07 Thread Karthick Nk
Hi All, I am using the pyspark structure streaming with Azure Databricks for data load process. In the Pipeline I am using a Job cluster and I am running only one pipeline, I am getting the OUT OF MEMORY issue while running for a long time. When I inspect the metrics of the cluster I found that,

Re: pyspark dataframe join with two different data type

2024-05-16 Thread Karthick Nk
f the solution, I have tried is below, but here I am doing explode and doing distinct again, But I need to perform the action without doing this since this will impact performance again for the huge data. Thanks, solutions On Thu, May 16, 2024 at 8:33 AM Karthick Nk wrote: > Thanks

Re: pyspark dataframe join with two different data type

2024-05-15 Thread Karthick Nk
my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essenti

Re: pyspark dataframe join with two different data type

2024-05-14 Thread Karthick Nk
a" and "b" both exist in the > array. So Spark is correctly performing the join. It looks like you need to > find another way to model this data to get what you want to achieve. > > Are the values of "a" and "b" related to each other in any way? >

Re: pyspark dataframe join with two different data type

2024-05-10 Thread Karthick Nk
e > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note &g

Re: ********Spark streaming issue to Elastic data**********

2024-05-05 Thread Karthick Nk
is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Thu, 2 May 2024 at 21:25, Karthick Nk wrote: > >> Hi All, >> >> Requirements: &

********Spark streaming issue to Elastic data**********

2024-05-02 Thread Karthick Nk
Hi All, Requirements: I am working on the data flow, which will use the view definition(view definition already defined in schema), there are multiple tables used in the view definition. Here we want to stream the view data into elastic index based on if any of the table(used in the view definitio

Data ingestion into elastic failing using pyspark

2024-03-11 Thread Karthick Nk
Hi @all, I am using pyspark program to write the data into elastic index by using upsert operation (sample code snippet below). def writeDataToES(final_df): write_options = { "es.nodes": elastic_host, "es.net.ssl": "false", "es.nodes.wan.only": "true", "es.net.http.auth.user"

pyspark dataframe join with two different data type

2024-02-29 Thread Karthick Nk
Hi All, I have two dataframe with below structure, i have to join these two dataframe - the scenario is one column is string in one dataframe and in other df join column is array of string, so we have to inner join two df and get the data if string value is present in any of the array of string va

Issue in Creating Temp_view in databricks and using spark.sql().

2024-01-30 Thread Karthick Nk
Hi Team, I am using structered streaming in pyspark in azure Databricks, in that I am creating temp_view from dataframe (df.createOrReplaceTempView('temp_view')) for performing spark sql query transformation. In that I am facing the issue that temp_view not found, so that as a workaround i have cr

Re: Updating delta file column data

2023-10-08 Thread Karthick Nk
sible way to perform the required action in an optimistic way? Note: Please feel free to ask, if you need further information. Thanks & regards, Karthick On Mon, Oct 2, 2023 at 10:53 PM Karthick Nk wrote: > Hi community members, > > In databricks adls2 delta tables, I need to perform

Updating delta file column data

2023-10-02 Thread Karthick Nk
Hi community members, In databricks adls2 delta tables, I need to perform the below operation, could you help me with your thoughts I have the delta tables with one colum with data type string , which contains the json data in string data type, I need to do the following 1. I have to update one

Re: Error while merge in delta table

2023-05-12 Thread Karthick Nk
t;> could pinpoint the root cause. >> >> Pozdrawiam, >> Jacek Laskowski >> >> "The Internals Of" Online Books <https://books.japila.pl/> >> Follow me on https://twitter.com/jaceklaskowski >> >> <https://twitter.com/jaceklaskows

Error while merge in delta table

2023-05-10 Thread Karthick Nk
Hi, I am trying to merge daaframe with delta table in databricks, but i am getting error, i have attached the code nippet and error message for reference below, code: [image: image.png] error: [image: image.png] Thanks

***pyspark.sql.functions.monotonically_increasing_id()***

2023-04-28 Thread Karthick Nk
Hi @all, I am using monotonically_increasing_id(), in the pyspark function, for removing one field from json field in one column from the delta table, please refer the below code df = spark.sql(f"SELECT * from {database}.{table}") df1 = spark.read.json(df.rdd.map(lambda x: x.data), multiLine = Tr

Re: Converting None/Null into json in pyspark

2022-10-04 Thread Karthick Nk
022 at 10:57 AM Yeachan Park wrote: > Hi, > > There's a config option for this. Try setting this to false in your spark > conf. > > spark.sql.jsonGenerator.ignoreNullFields > > On Tuesday, October 4, 2022, Karthick Nk wrote: > >> Hi all, >> >> I need to

Converting None/Null into json in pyspark

2022-10-03 Thread Karthick Nk
Hi all, I need to convert pyspark dataframe into json . While converting , if all rows values are null/None for that particular column that column is getting removed from data. Could you suggest a way to do this. I need to convert dataframe into json with columns. Thanks