Jaeggi99 commented on issue #1723:
URL: https://github.com/apache/sedona/issues/1723#issuecomment-2545230039
Hello I have a similar problem when trying to load "larger" rasterfiles with
65 MB size, although my error message is different. Therefore I do not open a
new issue.
**Setup:**
I use docker desktop and have pulled the image of sedona with
```
docker pull apache/sedona:latest
```
Then I run the container with:
```
docker run -e DRIVER_MEM=6g -e EXECUTOR_MEM=8g -p 8888:8888 -p 8080:8080 -p
8081:8081 -p 4040:4040 apache/sedona:latest
```
In Jupyter lab, I execute the code in the examples notebook
`ApacheSedonaRaster.ipynb` and everything works fine.
When I copy my orthophoto `swissimage-dop10_2021_2637-1223_01_2056.tif`
(attached below) into the examples datafolder and try to load and show it
within the same examples notebook or in any other notebook, I get the following
error message.
**Notebook Code:**
```
from sedona.spark import *
from IPython.display import display, HTML
```
```
config = (
SedonaContext.builder()
.getOrCreate()
)
sedona = SedonaContext.create(config)
```
```
sc = sedona.sparkContext
```
```
swissimage_df =
sedona.read.format("binaryFile").load("data/raster/swissimage-dop10_2021_2637-1223_01_2056.tif")
swissimage_df.show(2))
```
**Expected behavior:**
A table like showing information about the raster:
```
+--------------------+-------------------+------+--------------------+
| path| modificationTime|length| content|
+--------------------+-------------------+------+--------------------+
|file:/opt/workspa...|2024-12-15 00:21:05|209199|[4D 4D 00 2A 00 0...|
+--------------------+-------------------+------+--------------------+
```
**Actual behaviour:**
Unfortunately I am aswell not able to capture more of the error log using
stdout and stderr.
I tried the following function to capture the whole error message:
```
import sys
import os
logfile = "cell_output.log"
with open(logfile, "w") as f:
# Redirect stdout and stderr
original_stdout = sys.stdout
original_stderr = sys.stderr
sys.stdout = f
sys.stderr = f
try:
print("This will be logged into the file.")
swissimage_df =
sedona.read.format("binaryFile").load("data/raster/swissimage-dop10_2021_2637-1223_01_2056.tif")
swissimage_df.show(2)
except Exception as e:
import traceback
traceback.print_exc()
finally:
sys.stdout = original_stdout
sys.stderr = original_stderr
```
Resulting error message:
```
Traceback (most recent call last):
File "/tmp/ipykernel_45/2611185478.py", line 16, in <module>
swissimage_df.show(2)
File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/dataframe.py",
line 899, in show
print(self._jdf.showString(n, 20, vertical))
File "/usr/local/lib/python3.10/dist-packages/py4j/java_gateway.py", line
1322, in __call__
return_value = get_return_value(
File
"/usr/local/lib/python3.10/dist-packages/pyspark/errors/exceptions/captured.py",
line 169, in deco
return f(*a, **kw)
File "/usr/local/lib/python3.10/dist-packages/py4j/protocol.py", line 326,
in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o45.showString.
: java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Formatter.parse(Formatter.java:2807)
at java.base/java.util.Formatter.format(Formatter.java:2763)
at java.base/java.util.Formatter.format(Formatter.java:2717)
at java.base/java.lang.String.format(String.java:4150)
at scala.collection.immutable.StringLike.format(StringLike.scala:354)
at scala.collection.immutable.StringLike.format$(StringLike.scala:353)
at scala.collection.immutable.StringOps.format(StringOps.scala:33)
at org.apache.spark.sql.Dataset.$anonfun$getRows$5(Dataset.scala:293)
at
org.apache.spark.sql.Dataset.$anonfun$getRows$5$adapted(Dataset.scala:293)
at
org.apache.spark.sql.Dataset$$Lambda$3379/0x0000000802066638.apply(Unknown
Source)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at
scala.collection.TraversableLike$$Lambda$177/0x0000000801203f00.apply(Unknown
Source)
at
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofByte.foreach(ArrayOps.scala:210)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.mutable.ArrayOps$ofByte.map(ArrayOps.scala:210)
at org.apache.spark.sql.Dataset.$anonfun$getRows$4(Dataset.scala:293)
at
org.apache.spark.sql.Dataset$$Lambda$3378/0x0000000802066278.apply(Unknown
Source)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at
scala.collection.TraversableLike$$Lambda$177/0x0000000801203f00.apply(Unknown
Source)
at
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.sql.Dataset.$anonfun$getRows$3(Dataset.scala:290)
at
org.apache.spark.sql.Dataset$$Lambda$3377/0x0000000802065eb8.apply(Unknown
Source)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at
scala.collection.TraversableLike$$Lambda$177/0x0000000801203f00.apply(Unknown
Source)
```
But with the provided docker image and orthophoto tif file you should be
able to reproduce the error.
As I use the latest docker image I do not add the versions of the several
packages. The mentioned orthophoto is to large (65 MB), so I cannot attach it,
but with the following download link, one can directly access it from the site
of the Federal Office for Topography of Switzerland:
[https://data.geo.admin.ch/ch.swisstopo.swissimage-dop10/swissimage-dop10_2021_2637-1223/swissimage-dop10_2021_2637-1223_0.1_2056.tif](url)
If there are any further questions, I am happy to answer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]