Re: Reading too many files

2022-10-04 Thread Artemis User
Read by default can't be parallelized in a Spark job, and doing your own multi-threaded programming in a Spark program isn't a good idea.  Adding fast disk I/O and increase RAM may speed things up, but won't help with parallelization. You may have to be more creative here.  One option would be,

Re: [Spark Core][Release]Can we consider add SPARK-39725 into 3.3.1 or 3.3.2 release?

2022-10-04 Thread Bjørn Jørgensen
I have made a PR for this now. tir. 4. okt. 2022 kl. 19:02 skrev Sean Owen : > I think it's fine to backport that to 3.3.x, regardless of whether it > clearly affects Spark or not. > > On Tue, Oct 4, 2022 at 11:31 AM phoebe chen > wrote: > >> Hi: >> (

Re: [Spark Core][Release]Can we consider add SPARK-39725 into 3.3.1 or 3.3.2 release?

2022-10-04 Thread Sean Owen
I think it's fine to backport that to 3.3.x, regardless of whether it clearly affects Spark or not. On Tue, Oct 4, 2022 at 11:31 AM phoebe chen wrote: > Hi: > (Not sure if this mailing group is good to use for such question, but just > try my luck here, thanks) > > SPARK-39725

[Spark Core][Release]Can we consider add SPARK-39725 into 3.3.1 or 3.3.2 release?

2022-10-04 Thread phoebe chen
Hi: (Not sure if this mailing group is good to use for such question, but just try my luck here, thanks) SPARK-39725 has fix for security issues CVE-2022-2047 and CVE2022-2048 (High), which was set to 3.4.0 release but that will happen Feb 2023.

Re: Converting None/Null into json in pyspark

2022-10-04 Thread Yeachan Park
You can try this (replace spark with whatever variable your sparksession is): spark.conf.set("spark.sql.jsonGenerator.ignoreNullFields", False) On Tue, Oct 4, 2022 at 4:55 PM Karthick Nk wrote: > Thanks > I am using Pyspark in databricks, I have seen through multiple reference > but I couldn't f

Re: Converting None/Null into json in pyspark

2022-10-04 Thread Karthick Nk
Thanks I am using Pyspark in databricks, I have seen through multiple reference but I couldn't find the exact snippet. Could you share a sample snippet for the same how do I set that property. My step: df = df.selectExpr(f'to_json(struct(*)) as json_data') On Tue, Oct 4, 2022 at 10:57 AM Yeachan