Re: repartition before writing to table with bucketed partitioning

2024-11-30 Thread Soumasish
;"" CREATE TABLE IF NOT EXISTS my_catalog.default.some_table ( date STRING, x INT, y INT ) USING iceberg PARTITIONED BY (date, x, bucket(10, y)) """) // Step 2: Write data to the Iceberg table df.writeTo("my_catalog.default.so

Re: repartition before writing to table with bucketed partitioning

2024-12-01 Thread Soumasish
ll use the output as an input. Anyway glad you've a solution. Best Regards Soumasish Goswami in: www.linkedin.com/in/soumasish # (415) 530-0405 - On Sun, Dec 1, 2024 at 4:29 AM Henryk Česnolovič < henryk.cesnolo...@gmail.com> wrote: > Henryk Česnolovič > 08:30 (5 hours a

Re: [Spark SQL]: Are SQL User-Defined Functions on the Roadmap?

2025-02-05 Thread Soumasish
Here I create one, https://issues.apache.org/jira/browse/SPARK-51102 Best Regards Soumasish Goswami in: www.linkedin.com/in/soumasish # (415) 530-0405 - On Wed, Feb 5, 2025 at 4:49 PM Frank Bertsch wrote: > Thank you Mich. > > Hi Folks, any lead on this? Just a pointer to a Ji

Re: S3 Metrics when reading/writing using Spark

2024-12-23 Thread Soumasish
Spark leverages the hadoop S3A connector, https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/connecting.html Specifics of S3 metrics are documented with AWS, https://docs.aws.amazon.com/AmazonS3/latest/userguide/metrics-dimensions.html Hope this helps. Best Regards Soumasish

Re: Multiple CVE issues in apache/spark-py:3.4.0 + Pyspark 3.4.0

2025-03-15 Thread Soumasish
Two things come to mind, low hanging fruits - update to Spark 3.5 that should reduce the CVEs. Alternatively consider using Spark connect - where you can address the client side vulnerabilities yourself. Best Regards Soumasish Goswami in: www.linkedin.com/in/soumasish # (415) 530-0405 - On

Re: Executors not getting released dynamically once task is over

2025-04-04 Thread Soumasish
Soumasish Goswami in: www.linkedin.com/in/soumasish # (415) 530-0405 - On Fri, Apr 4, 2025 at 4:43 AM Shivang Modi wrote: > Hi Team, > > > > We are using spark java 3.5.3 and we have a requirement to run a batch of > millions of transactions. i.e. I am running batch of 1M

Re: [PYSPARK] df.collect throws exception for MapType with ArrayType as key

2025-05-23 Thread Soumasish
y type is actually usable as a map key. This to my mind is a bug and you should create a bug ticket. Best Regards Soumasish Goswami in: www.linkedin.com/in/soumasish # (415) 530-0405 - On Fri, May 23, 2025 at 5:04 AM Eyck Troschke wrote: > Dear Spark Development Community, > > Accord