Re: Java coding with spark API

2025-04-07 Thread Stephen Coy
Hi Tim, We have a large ETL project comprising about forty individual Apache Spark applications, all built exclusively in Java. They are executed on three different Spark clusters built on AWS EC2 instances. The applications are built in Java 17 for Spark 3.5.x. Cheers, Steve C > On 4 Apr 20

Spark Streaming Dataset with Multiple S3 Sources is too Slow

2025-04-07 Thread Jevon Cowell
I have a spark streaming dataset that is a union of 12 datasets (for 12 different s3 buckets). On start up , it takes nearly 18/20 mins for the Spark Streaming Job to show up on the Spark Streaming UI and an additional 18-20 mins for the job to even start. When looking at the logs I see somethin

Is "SORTED BY (col DESC)" Supported for Bucketed Table?

2025-04-07 Thread Joe Lee
Hi Experts, I have a question regarding the supported syntax for bucketed table sort order. By looking at the documentation for Spark 3.5.3, CREATE DATASOURCE TABLE