Hi All,
I am looking to deepen my understanding of the Spark Web UI. Could anyone
recommend some useful materials, online courses, or share how you learned
about it? I've already reviewed the official Spark Web UI documentation,
but it only covers the basics.
Note: I am using azure databricks for
Hi All,
I am using structured streaming in Databricks by using foreach
functionality to do my transformation and action and finally need to write
the data into a delta table my data soruce is either (Eventhub or delta
table or azure cosmos changefeed).
Whenever there are huge changes in source(De
Hi Team,
I'm using repartition and sortWithinPartitions to maintain field-based
ordering across partitions, but I'm facing data skewness among the
partitions. I have 96 partitions, and I'm working with 500 distinct keys.
While reviewing the Spark UI, I noticed that a few partitions are
underutiliz
Hi All,
I am using the pyspark structure streaming with Azure Databricks for data
load process.
In the Pipeline I am using a Job cluster and I am running only one
pipeline, I am getting the OUT OF MEMORY issue while running for a
long time. When I inspect the metrics of the cluster I found that,
f the solution, I have tried is below, but here I am doing explode and
doing distinct again, But I need to perform the action without doing this
since this will impact performance again for the huge data.
Thanks,
solutions
On Thu, May 16, 2024 at 8:33 AM Karthick Nk wrote:
> Thanks
my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
> https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essenti
a" and "b" both exist in the
> array. So Spark is correctly performing the join. It looks like you need to
> find another way to model this data to get what you want to achieve.
>
> Are the values of "a" and "b" related to each other in any way?
>
e
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
> https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
&g
is worth one-thousand
> expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Thu, 2 May 2024 at 21:25, Karthick Nk wrote:
>
>> Hi All,
>>
>> Requirements:
&
Hi All,
Requirements:
I am working on the data flow, which will use the view definition(view
definition already defined in schema), there are multiple tables used in
the view definition. Here we want to stream the view data into elastic
index based on if any of the table(used in the view definitio
Hi @all,
I am using pyspark program to write the data into elastic index by using
upsert operation (sample code snippet below).
def writeDataToES(final_df):
write_options = {
"es.nodes": elastic_host,
"es.net.ssl": "false",
"es.nodes.wan.only": "true",
"es.net.http.auth.user"
Hi All,
I have two dataframe with below structure, i have to join these two
dataframe - the scenario is one column is string in one dataframe and in
other df join column is array of string, so we have to inner join two df
and get the data if string value is present in any of the array of string
va
Hi Team,
I am using structered streaming in pyspark in azure Databricks, in that I
am creating temp_view from dataframe
(df.createOrReplaceTempView('temp_view')) for performing spark sql query
transformation.
In that I am facing the issue that temp_view not found, so that as a
workaround i have cr
sible way to perform the required action in an
optimistic way?
Note: Please feel free to ask, if you need further information.
Thanks & regards,
Karthick
On Mon, Oct 2, 2023 at 10:53 PM Karthick Nk wrote:
> Hi community members,
>
> In databricks adls2 delta tables, I need to perform
Hi community members,
In databricks adls2 delta tables, I need to perform the below operation,
could you help me with your thoughts
I have the delta tables with one colum with data type string , which
contains the json data in string data type, I need to do the following
1. I have to update one
Hi All,
It will be helpful if we gave any pointers to the problem addressed.
Thanks
Karthick.
On Wed, Sep 20, 2023 at 3:03 PM Gowtham S wrote:
> Hi Spark Community,
>
> Thank you for bringing up this issue. We've also encountered the same
> challenge and are actively wor
Thank you for your time and consideration.
Thanks & regards,
Karthick.
tables in a concurrent manner, are this is the issue(so we have any
constraint for it)
For this kind of run time how we can usually identify the root cause of
it?
On Thu, May 11, 2023 at 9:37 PM Farhan Misarwala
wrote:
> Hi Karthick,
>
> I think I have seen this before and this
Hi,
I am trying to merge daaframe with delta table in databricks, but i am
getting error, i have attached the code nippet and error message for
reference below,
code:
[image: image.png]
error:
[image: image.png]
Thanks
Hi @all,
I am using monotonically_increasing_id(), in the pyspark function, for
removing one field from json field in one column from the delta table,
please refer the below code
df = spark.sql(f"SELECT * from {database}.{table}")
df1 = spark.read.json(df.rdd.map(lambda x: x.data), multiLine = Tr
022 at 10:57 AM Yeachan Park wrote:
> Hi,
>
> There's a config option for this. Try setting this to false in your spark
> conf.
>
> spark.sql.jsonGenerator.ignoreNullFields
>
> On Tuesday, October 4, 2022, Karthick Nk wrote:
>
>> Hi all,
>>
>> I need to
Hi all,
I need to convert pyspark dataframe into json .
While converting , if all rows values are null/None for that particular
column that column is getting removed from data.
Could you suggest a way to do this. I need to convert dataframe into json
with columns.
Thanks
22 matches
Mail list logo