golfalot commented on code in PR #1673: URL: https://github.com/apache/sedona/pull/1673#discussion_r1827461329
########## docs/setup/azure-synapse-analytics.md: ########## @@ -0,0 +1,207 @@ +This tutorial will guide you through the process of installing Sedona on Azure Synapse Analytics when Data Exfiltration Protection (DEP) is enabled or when you have no internet connection from the Spark pools due to other networking constraints. + +## Strong recommendations +1. Start with a clean Spark pool with no other packages installed to avoid package conflicts. +2. Apache Spark pool -> Apache Spark configuration: Use default configuration + +## Sedona 1.6.1 on Spark 3.4 Python 3.10 + +### Step1: Download packages (9) +Caution: Precise versions are critical, latest is not always greatest here. + +From Maven + +- [sedona-spark-shaded-3.4_2.12-1.6.1.jar](https://mvnrepository.com/artifact/org.apache.sedona/sedona-spark-shaded-3.4_2.12/1.6.1) + +- [geotools-wrapper-1.6.1-28.2.jar](https://mvnrepository.com/artifact/org.datasyslab/geotools-wrapper/1.6.1-28.2) + +From PyPi + +- [rasterio-1.4.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl](https://files.pythonhosted.org/packages/cd/ad/2d3a14e5a97ca827a38d4963b86071267a6cd09d45065cd753d7325699b6/rasterio-1.4.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) + +- [shapely-2.0.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl](https://files.pythonhosted.org/packages/2b/a6/302e0d9c210ccf4d1ffadf7ab941797d3255dcd5f93daa73aaf116a4db39/shapely-2.0.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) + +- [apache_sedona-1.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl](https://files.pythonhosted.org/packages/b6/71/09f7ca2b6697b2699c04d1649bb379182076d263a9849de81295d253220d/apache_sedona-1.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) + +- [click_plugins-1.1.1-py2.py3-none-any.whl](https://files.pythonhosted.org/packages/e9/da/824b92d9942f4e472702488857914bdd50f73021efea15b4cad9aca8ecef/click_plugins-1.1.1-py2.py3-none-any.whl) + +- [cligj-0.7.2-py3-none-any.whl](https://files.pythonhosted.org/packages/73/86/43fa9f15c5b9fb6e82620428827cd3c284aa933431405d1bcf5231ae3d3e/cligj-0.7.2-py3-none-any.whl) + +- [affine-2.4.0-py3-none-any.whl](https://files.pythonhosted.org/packages/0b/f7/85273299ab57117850cc0a936c64151171fac4da49bc6fba0dad984a7c5f/affine-2.4.0-py3-none-any.whl) + +- [numpy-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl](https://files.pythonhosted.org/packages/fb/25/ba023652a39a2c127200e85aed975fc6119b421e2c348e5d0171e2046edb/numpy-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) + + +### Step 2: Upload packages to Synapse Workspace + +https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-workspace-packages + +### Step 3: Add packages to Spark Pool +I used the second method on this page: **If you are updating from the Synapse Studio** + +https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-pool-packages#manage-packages-from-synapse-studio-or-azure-portal + + +### Step 4: Notebook +Start your notebook with: +```python +from sedona.spark import SedonaContext + +config = SedonaContext.builder() \ + .config('spark.jars.packages', + 'org.apache.sedona:sedona-spark-shaded-3.4_2.12-1.6.1,' Review Comment: Hi @jiayuasu. If we do this, we are almost certainly forcing the user into the path detailed for configuring the VM and going to into package diagnosis mode. This is technically onerous and very time consuming, and likely not within the grasp of the average Sedona user. I would like to advocate we focus in this first round of docs in getting people up and running with a guaranteed working environment. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
