driver and executors shared same Kubernetes PVC

2023-04-28 Thread second_co...@yahoo.com.INVALID
i able to shared same PVC for spark 3.3. but on Spark 3.4 onward. i get below error.  I would like all the executors and driver to mount the same PVC. Is this a bug ? I don't want to use SPARK_EXECUTOR_ID or OnDemandOn because otherwise each of the executors will use an unique and separate PVC.

Re: ***pyspark.sql.functions.monotonically_increasing_id()***

2023-04-28 Thread Winston Lai
Hi Karthick, A few points that may help you: As stated in the URL you posted, "The function is non-deterministic because its result depends on partition IDs." Hence, the generated ID is dependent on partition IDs. Based on the code snippet you provided, I didn't see the partion columns you sel

***pyspark.sql.functions.monotonically_increasing_id()***

2023-04-28 Thread Karthick Nk
Hi @all, I am using monotonically_increasing_id(), in the pyspark function, for removing one field from json field in one column from the delta table, please refer the below code df = spark.sql(f"SELECT * from {database}.{table}") df1 = spark.read.json(df.rdd.map(lambda x: x.data), multiLine = Tr