This is an automated email from the ASF dual-hosted git repository.
jiayu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/sedona.git
The following commit(s) were added to refs/heads/master by this push:
new 93ebfe14da [GH-2182] Add blacken-docs with pre-commit to run Black on
Python code blocks in documentation files (#2190)
93ebfe14da is described below
commit 93ebfe14da720f042f35d9f8dc662392bd1ccd68
Author: Gautam Kumar <[email protected]>
AuthorDate: Thu Jul 31 23:43:18 2025 +0530
[GH-2182] Add blacken-docs with pre-commit to run Black on Python code
blocks in documentation files (#2190)
* Fix code blocks for blacken-docs compliance
* upgraded black to the latest version
* removed extra spaces
* merging two blocks
* Update docs/setup/glue.md
* added new line to differ
---------
Co-authored-by: John Bampton <[email protected]>
---
.pre-commit-config.yaml | 5 ++
docs/api/sql/Visualization-SedonaKepler.md | 6 +-
docs/api/sql/Visualization-SedonaPyDeck.md | 61 ++++++++++---
docs/api/stats/sql.md | 9 +-
docs/setup/databricks.md | 13 +--
docs/setup/fabric.md | 4 +-
docs/setup/glue.md | 4 +-
docs/tutorial/concepts/distance-spark.md | 63 ++++++-------
docs/tutorial/flink/pyflink-sql.md | 11 +--
docs/tutorial/geopandas-shapely.md | 137 ++++++++---------------------
docs/tutorial/rdd.md | 22 +++--
docs/tutorial/sql.md | 23 +++--
12 files changed, 172 insertions(+), 186 deletions(-)
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index d941bace44..676267ef3b 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -325,3 +325,8 @@ repos:
name: run oxipng
description: check PNG files with oxipng
args: ['--fix', '-o', '4', '--strip', 'safe', '--alpha']
+ - repo: https://github.com/adamchainz/blacken-docs
+ rev: 1.19.1
+ hooks:
+ - id: blacken-docs
+ additional_dependencies: [black==25.1.0]
diff --git a/docs/api/sql/Visualization-SedonaKepler.md
b/docs/api/sql/Visualization-SedonaKepler.md
index fa4fb4b302..220109caa3 100644
--- a/docs/api/sql/Visualization-SedonaKepler.md
+++ b/docs/api/sql/Visualization-SedonaKepler.md
@@ -38,7 +38,9 @@ Following are details on all the APIs exposed via
SedonaKepler:
SedonaKepler exposes a create_map API with the following signature:
```python
-create_map(df: SedonaDataFrame=None, name: str='unnamed', config: dict=None)
-> map
+def create_map(
+ df: SedonaDataFrame = None, name: str = "unnamed", config: dict = None
+) -> map: ...
```
The parameter 'name' is used to associate the passed SedonaDataFrame in the
map object and any config applied to the map is linked to this name. It is
recommended you pass a unique identifier to the dataframe here.
@@ -63,7 +65,7 @@ A map config can be passed optionally to apply pre-apply
customizations to the m
SedonaKepler exposes an add_df API with the following signature:
```python
-add_df(map, df: SedonaDataFrame, name: str='unnamed')
+def add_df(map, df: SedonaDataFrame, name: str = "unnamed"): ...
```
This API can be used to add a SedonaDataFrame to an already created map
object. The map object passed is directly mutated and nothing is returned.
diff --git a/docs/api/sql/Visualization-SedonaPyDeck.md
b/docs/api/sql/Visualization-SedonaPyDeck.md
index 62a4f061a2..fd7eca37da 100644
--- a/docs/api/sql/Visualization-SedonaPyDeck.md
+++ b/docs/api/sql/Visualization-SedonaPyDeck.md
@@ -41,9 +41,17 @@ Following are details on all the APIs exposed via
SedonaPyDeck:
### **Geometry Map**
```python
-def create_geometry_map(df, fill_color="[85, 183, 177, 255]", line_color="[85,
183, 177, 255]",
- elevation_col=0, initial_view_state=None,
- map_style=None, map_provider=None, api_keys=None,
stroked=True):
+def create_geometry_map(
+ df,
+ fill_color="[85, 183, 177, 255]",
+ line_color="[85, 183, 177, 255]",
+ elevation_col=0,
+ initial_view_state=None,
+ map_style=None,
+ map_provider=None,
+ api_keys=None,
+ stroked=True,
+): ...
```
The parameter `fill_color` can be given a list of RGB/RGBA values, or a string
that contains RGB/RGBA values based on a column, and is used to color polygons
or point geometries in the map
@@ -60,8 +68,17 @@ More details on the parameters and their default values can
be found on the PyDe
### **Choropleth Map**
```python
-def create_choropleth_map(df, fill_color=None, plot_col=None,
initial_view_state=None, map_style=None,
- map_provider=None,
api_keys=None, elevation_col=0, stroked=True)
+def create_choropleth_map(
+ df,
+ fill_color=None,
+ plot_col=None,
+ initial_view_state=None,
+ map_style=None,
+ map_provider=None,
+ api_keys=None,
+ elevation_col=0,
+ stroked=True,
+): ...
```
The parameter `fill_color` can be given a list of RGB/RGBA values, or a string
that contains RGB/RGBA values based on a column.
@@ -71,9 +88,11 @@ The parameter `stroked` determines whether to draw an
outline around polygons an
For example, all these are valid values of fill_color:
```python
-fill_color=[255, 12, 250]
-fill_color=[0, 12, 250, 255]
-fill_color='[0, 12, 240, AirportCount * 10]' ## AirportCount is a column in
the passed df
+fill_color = [255, 12, 250]
+fill_color = [0, 12, 250, 255]
+fill_color = (
+ "[0, 12, 240, AirportCount * 10]" ## AirportCount is a column in the
passed df
+)
```
Instead of giving a `fill_color` parameter, a 'plot_col' can be passed which
specifies the column to decide the choropleth.
@@ -87,8 +106,18 @@ More details on the parameters and their default values can
be found on the PyDe
### **Scatterplot**
```python
-def create_scatterplot_map(df, fill_color="[255, 140, 0]", radius_col=1,
radius_min_pixels = 1, radius_max_pixels = 10, radius_scale=1,
initial_view_state=None,
- map_style=None, map_provider=None, api_keys=None)
+def create_scatterplot_map(
+ df,
+ fill_color="[255, 140, 0]",
+ radius_col=1,
+ radius_min_pixels=1,
+ radius_max_pixels=10,
+ radius_scale=1,
+ initial_view_state=None,
+ map_style=None,
+ map_provider=None,
+ api_keys=None,
+): ...
```
The parameter `fill_color` can be given a list of RGB/RGBA values, or a string
that contains RGB/RGBA values based on a column.
@@ -107,8 +136,16 @@ More details on the parameters and their default values
can be found on the PyDe
### **Heatmap**
```python
-def create_heatmap(df, color_range=None, weight=1, aggregation="SUM",
initial_view_state=None, map_style=None,
- map_provider=None, api_keys=None)
+def create_heatmap(
+ df,
+ color_range=None,
+ weight=1,
+ aggregation="SUM",
+ initial_view_state=None,
+ map_style=None,
+ map_provider=None,
+ api_keys=None,
+): ...
```
The parameter `color_range` can be optionally given a list of RGB values,
SedonaPyDeck by default uses `6-class YlOrRd` as color_range.
diff --git a/docs/api/stats/sql.md b/docs/api/stats/sql.md
index 005e647a9f..197d13b1a6 100644
--- a/docs/api/stats/sql.md
+++ b/docs/api/stats/sql.md
@@ -198,11 +198,7 @@ To use the [Apache Sedona weight
functions](#adddistancebandcolumn) you need to
from sedona.spark.stats.autocorrelation.moran import Moran
from sedona.spark.stats.weighting import add_binary_distance_band_column
- result = add_binary_distance_band_column(
- df,
- 1.0,
- saved_attributes=["id", "value"]
- )
+ result = add_binary_distance_band_column(df, 1.0, saved_attributes=["id",
"value"])
moran_i_result = Moran.get_global(result)
@@ -241,7 +237,8 @@ The full signatures of the functions
two_tailed: bool = True,
id_column: str = "id",
value_column: str = "value",
- ) -> MoranResult
+ ) -> MoranResult: ...
+
@dataclass
class MoranResult:
diff --git a/docs/setup/databricks.md b/docs/setup/databricks.md
index 6a84cd3a19..8ba2037a76 100644
--- a/docs/setup/databricks.md
+++ b/docs/setup/databricks.md
@@ -142,11 +142,14 @@ You can also use the SQL API as follows:
Here’s how to create a Sedona DataFrame with a geometry column
```python
-df = sedona.createDataFrame([
- ('a', 'POLYGON((1.0 1.0,1.0 3.0,2.0 3.0,2.0 1.0,1.0 1.0))'),
- ('b', 'LINESTRING(4.0 1.0,4.0 2.0,6.0 4.0)'),
- ('c', 'POINT(9.0 2.0)'),
-], ["id", "geometry"])
+df = sedona.createDataFrame(
+ [
+ ("a", "POLYGON((1.0 1.0,1.0 3.0,2.0 3.0,2.0 1.0,1.0 1.0))"),
+ ("b", "LINESTRING(4.0 1.0,4.0 2.0,6.0 4.0)"),
+ ("c", "POINT(9.0 2.0)"),
+ ],
+ ["id", "geometry"],
+)
df = df.withColumn("geometry", expr("ST_GeomFromWKT(geometry)"))
```
diff --git a/docs/setup/fabric.md b/docs/setup/fabric.md
index 43b2baac54..a76ca388c6 100644
--- a/docs/setup/fabric.md
+++ b/docs/setup/fabric.md
@@ -80,7 +80,7 @@ In the notebook page, select the `ApacheSedona` environment
you created before.
In the notebook, you can install the jars by running the following code.
Please replace the `jars` with the download links of the 2 jars from the
previous step.
-```python
+```text
%%configure -f
{
"jars":
["https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.5.1-28.2/geotools-wrapper-1.5.1-28.2.jar",
"https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.4_2.12/1.5.1/sedona-spark-shaded-3.4_2.12-1.5.1.jar"]
@@ -125,7 +125,7 @@
abfss://9e9d4196-870a-4901-8fa5-e24841492...@onelake.dfs.fabric.microsoft.com/e1
If you use this option, the config files in your notebook should be
-```python
+```text
%%configure -f
{
"conf": {
diff --git a/docs/setup/glue.md b/docs/setup/glue.md
index 1a16f309b0..1937836764 100644
--- a/docs/setup/glue.md
+++ b/docs/setup/glue.md
@@ -50,7 +50,7 @@ package. How you do this varies slightly between the notebook
and the script job
Add the following cell magics before starting your sparkContext or
glueContext. The first points to the jars,
and the second installs the Sedona Python package directly from pip.
-```python
+```text
# Sedona Config
%extra_jars
https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.3_2.12/{{
sedona.current_version }}/sedona-spark-shaded-3.3_2.12-{{
sedona.current_version }}.jar,
https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{
sedona.current_geotools }}/geotools-wrapper-{{ sedona.current_geotools }}.jar
%additional_python_modules apache-sedona=={{ sedona.current_version }}
@@ -58,7 +58,7 @@ and the second installs the Sedona Python package directly
from pip.
If you are using the example notebook from glue, the first cell should now
look like this:
-```python
+```text
%idle_timeout 2880
%glue_version 4.0
%worker_type G.1X
diff --git a/docs/tutorial/concepts/distance-spark.md
b/docs/tutorial/concepts/distance-spark.md
index 52c52e144c..5bc61df922 100644
--- a/docs/tutorial/concepts/distance-spark.md
+++ b/docs/tutorial/concepts/distance-spark.md
@@ -34,10 +34,13 @@ Suppose you have four points and would like to compute the
distance between `poi
Let’s create a DataFrame with these points.
```python
-df = sedona.createDataFrame([
- (Point(2, 3), Point(6, 4)),
- (Point(6, 2), Point(9, 2)),
-], ["start", "end"])
+df = sedona.createDataFrame(
+ [
+ (Point(2, 3), Point(6, 4)),
+ (Point(6, 2), Point(9, 2)),
+ ],
+ ["start", "end"],
+)
```
The `start` and `end` columns both have the `geometry` type.
@@ -45,10 +48,7 @@ The `start` and `end` columns both have the `geometry` type.
Now use the `ST_Distance` function to compute the distance between the points.
```python
-df.withColumn(
- "distance",
- ST_Distance(col("start"), col("end"))
-).show()
+df.withColumn("distance", ST_Distance(col("start"), col("end"))).show()
```
Here are the results:
@@ -85,8 +85,7 @@ Let’s compute the distance between these points now:
```python
df.withColumn(
- "st_distance_sphere",
- ST_DistanceSphere(col("place1"), col("place2"))
+ "st_distance_sphere", ST_DistanceSphere(col("place1"), col("place2"))
).show()
```
@@ -111,8 +110,7 @@ Let’s use the same DataFrame from the previous section, but
compute the distan
```python
res = df.withColumn(
- "st_distance_spheroid",
- ST_DistanceSpheroid(col("place1"), col("place2"))
+ "st_distance_spheroid", ST_DistanceSpheroid(col("place1"), col("place2"))
)
res.select("place1_name", "place2_name", "st_distance_spheroid").show()
```
@@ -141,10 +139,7 @@ The distance between two polygons is the minimum Euclidean
distance between any
Let’s compute the distance:
```python
-res = df.withColumn(
- "distance",
- ST_Distance(col("geom1"), col("geom2"))
-)
+res = df.withColumn("distance", ST_Distance(col("geom1"), col("geom2")))
```
Now, take a look at the results:
@@ -170,20 +165,19 @@ Let’s create the DataFrame:
```python
empire_state_ground = Point(-73.9857, 40.7484, 0)
empire_state_top = Point(-73.9857, 40.7484, 380)
-df = sedona.createDataFrame([
- (empire_state_ground, empire_state_top),
-], ["point_a", "point_b"])
+df = sedona.createDataFrame(
+ [
+ (empire_state_ground, empire_state_top),
+ ],
+ ["point_a", "point_b"],
+)
```
Now compute the distance and the 3D distance between the points:
```python
-res = df.withColumn(
- "distance",
- ST_Distance(col("point_a"), col("point_b"))
-).withColumn(
- "3d_distance",
- ST_3DDistance(col("point_a"), col("point_b"))
+res = df.withColumn("distance", ST_Distance(col("point_a"),
col("point_b"))).withColumn(
+ "3d_distance", ST_3DDistance(col("point_a"), col("point_b"))
)
```
@@ -211,18 +205,20 @@ Here’s how to create the Sedona DataFrame:
a = LineString([(1, 1), (1, 3), (2, 4)])
b = LineString([(1.1, 1), (1.1, 3), (3, 4)])
c = LineString([(7, 1), (7, 3), (6, 4)])
-df = sedona.createDataFrame([
- (a, "a", b, "b"),
- (a, "a", c, "c"),
-], ["geometry1", "geometry1_id", "geometry2", "geometry2_id"])
+df = sedona.createDataFrame(
+ [
+ (a, "a", b, "b"),
+ (a, "a", c, "c"),
+ ],
+ ["geometry1", "geometry1_id", "geometry2", "geometry2_id"],
+)
```
Compute the Frechet distance:
```python
res = df.withColumn(
- "frechet_distance",
- ST_FrechetDistance(col("geometry1"), col("geometry2"))
+ "frechet_distance", ST_FrechetDistance(col("geometry1"), col("geometry2"))
)
```
@@ -252,10 +248,7 @@ Suppose you have the following geometric objects:
Here’s how to compute the max distance between some of these geometries. Run
the computations:
```python
-res = df.withColumn(
- "max_distance",
- ST_MaxDistance(col("geom1"), col("geom2"))
-)
+res = df.withColumn("max_distance", ST_MaxDistance(col("geom1"), col("geom2")))
```
Now view the results:
diff --git a/docs/tutorial/flink/pyflink-sql.md
b/docs/tutorial/flink/pyflink-sql.md
index 20a58f59ba..41a8170622 100644
--- a/docs/tutorial/flink/pyflink-sql.md
+++ b/docs/tutorial/flink/pyflink-sql.md
@@ -29,9 +29,7 @@ stream_env =
StreamExecutionEnvironment.get_execution_environment()
flink_settings = EnvironmentSettings.in_streaming_mode()
table_env = SedonaContext.create(stream_env, flink_settings)
-table_env.\
- sql_query("SELECT ST_Point(1.0, 2.0)").\
- execute()
+table_env.sql_query("SELECT ST_Point(1.0, 2.0)").execute()
```
PyFlink does not expose the possibility of transforming Scala's own
user-defined types (UDT) to Python UDT.
@@ -41,10 +39,7 @@ like `ST_AsText` or `ST_ASBinary` to convert the result to a
string or binary.
```python
from shapely.wkb import loads
-table_env.\
- sql_query("SELECT ST_ASBinary(ST_Point(1.0, 2.0))").\
- execute().\
- collect()
+table_env.sql_query("SELECT ST_ASBinary(ST_Point(1.0,
2.0))").execute().collect()
[loads(bytes(el[0])) for el in result]
```
@@ -59,11 +54,13 @@ Similar with User Defined Scalar functions
from pyflink.table.udf import ScalarFunction, udf
from shapely.wkb import loads
+
class Buffer(ScalarFunction):
def eval(self, s):
geom = loads(s)
return geom.buffer(1).wkb
+
table_env.create_temporary_function(
"ST_BufferPython", udf(Buffer(), result_type="Binary")
)
diff --git a/docs/tutorial/geopandas-shapely.md
b/docs/tutorial/geopandas-shapely.md
index 66c4c068f3..54e9293c12 100644
--- a/docs/tutorial/geopandas-shapely.md
+++ b/docs/tutorial/geopandas-shapely.md
@@ -33,21 +33,16 @@ Sedona Python has implemented serializers and deserializers
which allows to conv
Loading the data from shapefile using geopandas read_file method and create
Spark DataFrame based on GeoDataFrame:
```python
-
import geopandas as gpd
from sedona.spark import *
-config = SedonaContext.builder().\
- getOrCreate()
+config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)
gdf = gpd.read_file("gis_osm_pois_free_1.shp")
-sedona.createDataFrame(
- gdf
-).show()
-
+sedona.createDataFrame(gdf).show()
```
This query will show the following outputs:
@@ -71,7 +66,9 @@ To leverage Arrow optimization and speed up the conversion,
you can use the `cre
that takes a SparkSession and GeoDataFrame as parameters and returns a Sedona
DataFrame.
```python
-def create_spatial_dataframe(spark: SparkSession, gdf: gpd.GeoDataFrame) ->
DataFrame
+def create_spatial_dataframe(
+ spark: SparkSession, gdf: gpd.GeoDataFrame
+) -> DataFrame: ...
```
- spark: SparkSession
@@ -91,26 +88,20 @@ create_spatial_dataframe(spark, gdf)
Reading data with Spark and converting to GeoPandas
```python
-
import geopandas as gpd
from sedona.spark import *
-config = SedonaContext.builder().
- getOrCreate()
+config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)
-counties = sedona.\
- read.\
- option("delimiter", "|").\
- option("header", "true").\
- csv("counties.csv")
+counties = (
+ sedona.read.option("delimiter", "|").option("header",
"true").csv("counties.csv")
+)
counties.createOrReplaceTempView("county")
-counties_geom = sedona.sql(
- "SELECT *, st_geomFromWKT(geom) as geometry from county"
-)
+counties_geom = sedona.sql("SELECT *, st_geomFromWKT(geom) as geometry from
county")
df = counties_geom.toPandas()
gdf = gpd.GeoDataFrame(df, geometry="geometry")
@@ -119,11 +110,10 @@ gdf.plot(
figsize=(10, 8),
column="value",
legend=True,
- cmap='YlOrBr',
- scheme='quantiles',
- edgecolor='lightgray'
+ cmap="YlOrBr",
+ scheme="quantiles",
+ edgecolor="lightgray",
)
-
```
<br>
@@ -141,8 +131,7 @@ significantly faster for large results (requires geopandas
>= 1.0).
import geopandas as gpd
from sedona.spark import dataframe_to_arrow
-config = SedonaContext.builder().
- getOrCreate()
+config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)
@@ -173,7 +162,6 @@ To create Spark DataFrame based on mentioned Geometry
types, please use <b> Geom
Schema for target table with integer id and geometry type can be defined as
follows:
```python
-
from pyspark.sql.types import IntegerType, StructField, StructType
from sedona.spark import *
@@ -181,10 +169,9 @@ from sedona.spark import *
schema = StructType(
[
StructField("id", IntegerType(), False),
- StructField("geom", GeometryType(), False)
+ StructField("geom", GeometryType(), False),
]
)
-
```
Also, Spark DataFrame with geometry type can be converted to list of shapely
objects with <b> collect </b> method.
@@ -194,20 +181,12 @@ Also, Spark DataFrame with geometry type can be converted
to list of shapely obj
```python
from shapely.geometry import Point
-data = [
- [1, Point(21.0, 52.0)],
- [1, Point(23.0, 42.0)],
- [1, Point(26.0, 32.0)]
-]
+data = [[1, Point(21.0, 52.0)], [1, Point(23.0, 42.0)], [1, Point(26.0, 32.0)]]
-gdf = sedona.createDataFrame(
- data,
- schema
-)
+gdf = sedona.createDataFrame(data, schema)
gdf.show()
-
```
```
@@ -233,18 +212,11 @@ root
### MultiPoint example
```python3
-
from shapely.geometry import MultiPoint
-data = [
- [1, MultiPoint([[19.511463, 51.765158], [19.446408, 51.779752]])]
-]
-
-gdf = sedona.createDataFrame(
- data,
- schema
-).show(1, False)
+data = [[1, MultiPoint([[19.511463, 51.765158], [19.446408, 51.779752]])]]
+gdf = sedona.createDataFrame(data, schema).show(1, False)
```
```
@@ -261,22 +233,15 @@ gdf = sedona.createDataFrame(
### LineString example
```python3
-
from shapely.geometry import LineString
line = [(40, 40), (30, 30), (40, 20), (30, 10)]
-data = [
- [1, LineString(line)]
-]
+data = [[1, LineString(line)]]
-gdf = sedona.createDataFrame(
- data,
- schema
-)
+gdf = sedona.createDataFrame(data, schema)
gdf.show(1, False)
-
```
```
@@ -292,23 +257,16 @@ gdf.show(1, False)
### MultiLineString example
```python3
-
from shapely.geometry import MultiLineString
line1 = [(10, 10), (20, 20), (10, 40)]
line2 = [(40, 40), (30, 30), (40, 20), (30, 10)]
-data = [
- [1, MultiLineString([line1, line2])]
-]
+data = [[1, MultiLineString([line1, line2])]]
-gdf = sedona.createDataFrame(
- data,
- schema
-)
+gdf = sedona.createDataFrame(data, schema)
gdf.show(1, False)
-
```
```
@@ -324,30 +282,23 @@ gdf.show(1, False)
### Polygon example
```python3
-
from shapely.geometry import Polygon
polygon = Polygon(
[
- [19.51121, 51.76426],
- [19.51056, 51.76583],
- [19.51216, 51.76599],
- [19.51280, 51.76448],
- [19.51121, 51.76426]
+ [19.51121, 51.76426],
+ [19.51056, 51.76583],
+ [19.51216, 51.76599],
+ [19.51280, 51.76448],
+ [19.51121, 51.76426],
]
)
-data = [
- [1, polygon]
-]
+data = [[1, polygon]]
-gdf = sedona.createDataFrame(
- data,
- schema
-)
+gdf = sedona.createDataFrame(data, schema)
gdf.show(1, False)
-
```
```
@@ -363,7 +314,6 @@ gdf.show(1, False)
### MultiPolygon example
```python3
-
from shapely.geometry import MultiPolygon
exterior_p1 = [(0, 0), (0, 2), (2, 2), (2, 0), (0, 0)]
@@ -371,22 +321,13 @@ interior_p1 = [(1, 1), (1, 1.5), (1.5, 1.5), (1.5, 1),
(1, 1)]
exterior_p2 = [(0, 0), (1, 0), (1, 1), (0, 1), (0, 0)]
-polygons = [
- Polygon(exterior_p1, [interior_p1]),
- Polygon(exterior_p2)
-]
+polygons = [Polygon(exterior_p1, [interior_p1]), Polygon(exterior_p2)]
-data = [
- [1, MultiPolygon(polygons)]
-]
+data = [[1, MultiPolygon(polygons)]]
-gdf = sedona.createDataFrame(
- data,
- schema
-)
+gdf = sedona.createDataFrame(data, schema)
gdf.show(1, False)
-
```
```
@@ -402,7 +343,6 @@ gdf.show(1, False)
### GeometryCollection example
```python3
-
from shapely.geometry import GeometryCollection, Point, LineString, Polygon
exterior_p1 = [(0, 0), (0, 2), (2, 2), (2, 0), (0, 0)]
@@ -413,17 +353,12 @@ geoms = [
Polygon(exterior_p1, [interior_p1]),
Polygon(exterior_p2),
Point(1, 1),
- LineString([(0, 0), (1, 1), (2, 2)])
+ LineString([(0, 0), (1, 1), (2, 2)]),
]
-data = [
- [1, GeometryCollection(geoms)]
-]
+data = [[1, GeometryCollection(geoms)]]
-gdf = sedona.createDataFrame(
- data,
- schema
-)
+gdf = sedona.createDataFrame(data, schema)
gdf.show(1, False)
```
diff --git a/docs/tutorial/rdd.md b/docs/tutorial/rdd.md
index 23b94d519f..fcca35c272 100644
--- a/docs/tutorial/rdd.md
+++ b/docs/tutorial/rdd.md
@@ -184,9 +184,13 @@ Assume you now have a SpatialRDD (typed or generic). You
can use the following c
from sedona.spark import Adapter
range_query_window = Envelope(-90.01, -80.01, 30.01, 40.01)
- consider_boundary_intersection = False ## Only return gemeotries fully
covered by the window
+ consider_boundary_intersection = (
+ False ## Only return gemeotries fully covered by the window
+ )
using_index = False
- query_result = RangeQueryRaw.SpatialRangeQuery(spatial_rdd,
range_query_window, consider_boundary_intersection, using_index)
+ query_result = RangeQueryRaw.SpatialRangeQuery(
+ spatial_rdd, range_query_window, consider_boundary_intersection,
using_index
+ )
gdf = StructuredAdapter.toDf(query_result, spark, ["col1", ..., "coln"])
```
@@ -679,22 +683,26 @@ The index should be built on either one of two
SpatialRDDs. In general, you shou
from sedona.spark import CircleRDD
from sedona.spark import GridType
from sedona.spark import JoinQueryRaw
- from sedona.spark import StructuredAdapter
+ from sedona.spark import StructuredAdapter
object_rdd.analyze()
- circle_rdd = CircleRDD(object_rdd, 0.1) ## Create a CircleRDD using the
given distance
+ circle_rdd = CircleRDD(object_rdd, 0.1) ## Create a CircleRDD using the
given distance
circle_rdd.analyze()
circle_rdd.spatialPartitioning(GridType.KDBTREE)
spatial_rdd.spatialPartitioning(circle_rdd.getPartitioner())
- consider_boundary_intersection = False ## Only return gemeotries fully
covered by each query window in queryWindowRDD
+ consider_boundary_intersection = False ## Only return gemeotries fully
covered by each query window in queryWindowRDD
using_index = False
- result = JoinQueryRaw.DistanceJoinQueryFlat(spatial_rdd, circle_rdd,
using_index, consider_boundary_intersection)
+ result = JoinQueryRaw.DistanceJoinQueryFlat(
+ spatial_rdd, circle_rdd, using_index, consider_boundary_intersection
+ )
- gdf = StructuredAdapter.toDf(result, ["left_col1", ..., "lefcoln"],
["rightcol1", ..., "rightcol2"], spark)
+ gdf = StructuredAdapter.toDf(
+ result, ["left_col1", ..., "lefcoln"], ["rightcol1", ...,
"rightcol2"], spark
+ )
```
## Write a Distance Join Query
diff --git a/docs/tutorial/sql.md b/docs/tutorial/sql.md
index 73f4181eab..2088f469a5 100644
--- a/docs/tutorial/sql.md
+++ b/docs/tutorial/sql.md
@@ -241,10 +241,16 @@ Set the `multiLine` option to `True` to read multiline
GeoJSON files.
=== "Python"
```python
- df = sedona.read.format("geojson").option("multiLine",
"true").load("PATH/TO/MYFILE.json")
- .selectExpr("explode(features) as features") # Explode the envelope to
get one feature per row.
- .select("features.*") # Unpack the features struct.
- .withColumn("prop0",
f.expr("properties['prop0']")).drop("properties").drop("type")
+ df = (
+ sedona.read.format("geojson")
+ .option("multiLine", "true")
+ .load("PATH/TO/MYFILE.json")
+ .selectExpr("explode(features) as features") # Explode the envelope
+ .select("features.*") # Unpack the features struct
+ .withColumn("prop0", f.expr("properties['prop0']"))
+ .drop("properties")
+ .drop("type")
+ )
df.show()
df.printSchema()
@@ -844,7 +850,7 @@ SedonaPyDeck exposes a `create_choropleth_map` API which
can be used to visualiz
Example:
```python
-SedonaPyDeck.create_choropleth_map(df=groupedresult, plot_col='AirportCount')
+SedonaPyDeck.create_choropleth_map(df=groupedresult, plot_col="AirportCount")
```
!!!Note
@@ -862,7 +868,7 @@ SedonaPyDeck exposes a create_geometry_map API which can be
used to visualize a
Example:
```python
-SedonaPyDeck.create_geometry_map(df_building, elevation_col='height')
+SedonaPyDeck.create_geometry_map(df_building, elevation_col="height")
```

@@ -1214,7 +1220,9 @@ the normal UDFs. It might be even 2x faster than the
normal UDFs.
Decorator signature looks as follows:
```python
-def sedona_vectorized_udf(udf_type: SedonaUDFType =
SedonaUDFType.SHAPELY_SCALAR, return_type: DataType)
+def sedona_vectorized_udf(
+ udf_type: SedonaUDFType = SedonaUDFType.SHAPELY_SCALAR, return_type:
DataType
+): ...
```
where udf_type is the type of the UDF function, currently supported are:
@@ -1232,6 +1240,7 @@ a given geometry.
import shapely.geometry.base as b
from sedona.spark import sedona_vectorized_udf
+
@sedona_vectorized_udf(return_type=GeometryType())
def vectorized_buffer(geom: b.BaseGeometry) -> b.BaseGeometry:
return geom.buffer(0.1)