jiayuasu opened a new pull request, #2518: URL: https://github.com/apache/sedona/pull/2518
## Did you read the Contributor Guide? - Yes, I have read the [Contributor Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor Development Guide](https://sedona.apache.org/latest/community/develop/) ## Is this PR related to a ticket? - Yes, and the PR name follows the format `[GH-2489] my subject`. Closes #2489 ## What changes were proposed in this PR? This pull request updates the project to support Spark 4.0.1 and modernizes the environment by upgrading related dependencies, base images, and build configurations. The changes ensure compatibility with newer versions of Spark, Sedona, GeoTools, and Python libraries, while also simplifying the build process and improving maintainability. **Environment and Dependency Upgrades** * Updated the base image in `docker/sedona-docker.dockerfile` to `ubuntu:24.04` and bumped major dependency versions: Spark to 4.0.1, Sedona to 1.8.0, GeoTools-wrapper to 1.8.1-33.1, Spark extension to 2.14.2, Hadoop AWS to 3.4.1, and AWS SDK to 2.38.2. * Upgraded Python package versions in `docker/requirements.txt`, including `geopandas` to 1.1.1, `numpy` to 1.26.4, `pandas` to 2.3.3, and `shapely` to 2.1.2 for compatibility with new Spark/Sedona versions. **Build and Workflow Modernization** * Changed Spark version matrix in `.github/workflows/docker-build.yml` to only test Spark 4.0.1 and updated Java version to 17 for builds. [[1]](diffhunk://#diff-3414847e2ad632333f775cabb810f0dc0df61a570365df34750a08b00912fe82L49-R56) [[2]](diffhunk://#diff-3414847e2ad632333f775cabb810f0dc0df61a570365df34750a08b00912fe82L66-R66) * Updated Maven build in `docker/build.sh` to use Scala 2.13 and adjusted download/install scripts in `docker/install-sedona.sh` to fetch 2.13 artifacts for Sedona and Spark extension. [[1]](diffhunk://#diff-efe74abc7213e95246105b4a7b029557ef16ff997e9cc13eadcfffa18e6757c0L83-R83) [[2]](diffhunk://#diff-3ea6202bfc88840cc3df1444288536e80072f8d5195810d02dffe247a8ec9305L38-R52) **Simplification and Cleanup** * Removed unused `spark_xml_version` argument and related code from `docker/install-spark.sh` and `docker/sedona-docker.dockerfile`, streamlining Spark installation. [[1]](diffhunk://#diff-750c969ccb006f6f5c4553cad3487d236f71976672d55a4c6e34e26c82a691f4L26) [[2]](diffhunk://#diff-750c969ccb006f6f5c4553cad3487d236f71976672d55a4c6e34e26c82a691f4L37-L39) [[3]](diffhunk://#diff-e7fa2d33c4eae1577dd78c1a92fc1e68ce2956ffdb3f67dd5af6f0fe4285fadcL18-R26) [[4]](diffhunk://#diff-e7fa2d33c4eae1577dd78c1a92fc1e68ce2956ffdb3f67dd5af6f0fe4285fadcL46-R53) **Configuration Updates** * Updated Zeppelin interpreter configuration in `docker/zeppelin/conf/interpreter.json` to use new Sedona and GeoTools-wrapper jar paths matching the upgraded versions. **Package Installation Adjustments** * Added `--break-system-packages` to `pip3 install` commands throughout the Docker scripts to prevent installation errors with system Python packages. [[1]](diffhunk://#diff-3ea6202bfc88840cc3df1444288536e80072f8d5195810d02dffe247a8ec9305L38-R52) [[2]](diffhunk://#diff-e7fa2d33c4eae1577dd78c1a92fc1e68ce2956ffdb3f67dd5af6f0fe4285fadcL46-R53) Let me know if you have questions about any specific upgrade or script change! ## How was this patch tested? ## Did this PR include necessary documentation updates? - No, this PR does not affect any public API so no need to change the documentation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
