date:20240507

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau

I think signing the artifacts produced from a secure CI sounds like a good idea. I know we’ve been asked to reduce our GitHub action usage but perhaps someone interested could volunteer to set that up. Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): h

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Nimrod Ofek

Hi, Thanks for the reply. >From my experience, a build on a build server would be much more predictable and less error prone than building on some laptop- and of course much faster to have builds, snapshots, release candidates, early previews releases, release candidates or final releases. It will

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau

Indeed. We could conceivably build the release in CI/CD but the final verification / signing should be done locally to keep the keys safe (there was some concern from earlier release processes). Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Dongjoon Hyun

Thank you so much for the update, Wenchen! Dongjoon. On Tue, May 7, 2024 at 10:49 AM Wenchen Fan wrote: > UPDATE: > > Unfortunately, it took me quite some time to set up my laptop and get it > ready for the release process (docker desktop doesn't work anymore, my pgp > key is lost, etc.). I'll

caching a dataframe in Spark takes lot of time

2024-05-07 Thread Prem Sahoo

Hello Folks, in Spark I have read a file and done some transformation and finally writing to hdfs. Now I am interested in writing the same dataframe to MapRFS but for this Spark will execute the full DAG again (recompute all the previous steps)(all the read + transformations ). I don't want this

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Nimrod Ofek

Hi, Sorry for the novice question, Wenchen - the release is done manually from a laptop? Not using a CI CD process on a build server? Thanks, Nimrod On Tue, May 7, 2024 at 8:50 PM Wenchen Fan wrote: > UPDATE: > > Unfortunately, it took me quite some time to set up my laptop and get it > ready

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Wenchen Fan

UPDATE: Unfortunately, it took me quite some time to set up my laptop and get it ready for the release process (docker desktop doesn't work anymore, my pgp key is lost, etc.). I'll start the RC process at my tomorrow. Thanks for your patience! Wenchen On Fri, May 3, 2024 at 7:47 AM yangjie01 wr

Spark not creating staging dir for insertInto partitioned table

2024-05-07 Thread Sanskar Modi

Hi Folks, I wanted to check why spark doesn't create staging dir while doing an insertInto on partitioned tables. I'm running below example code – ``` spark.sql("set hive.exec.dynamic.partition.mode=nonstrict") val rdd = sc.parallelize(Seq((1, 5, 1), (2, 1, 2), (4, 4, 3))) val df = spark.createDa

Re: [DISCUSS] Spark 4.0.0 release

Re: [DISCUSS] Spark 4.0.0 release

Re: [DISCUSS] Spark 4.0.0 release

Re: [DISCUSS] Spark 4.0.0 release

caching a dataframe in Spark takes lot of time

Re: [DISCUSS] Spark 4.0.0 release

Re: [DISCUSS] Spark 4.0.0 release

Spark not creating staging dir for insertInto partitioned table

8 matches

Site Navigation

Mail list logo

Footer information