Re: [I] error on rawDf.show() [sedona]

via GitHub Mon, 16 Dec 2024 02:43:52 -0800


Jaeggi99 commented on issue #1723:
URL: https://github.com/apache/sedona/issues/1723#issuecomment-2545230039


   Hello I have a similar problem when trying to load "larger" rasterfiles with 
65 MB size, although my error message is different. Therefore I do not open a 
new issue.
   
   **Setup:**
   I use docker desktop and have pulled the image of sedona with
   ```
   docker pull apache/sedona:latest
   ```
   
   Then I run the container with:
   ```
   docker run -e DRIVER_MEM=6g -e EXECUTOR_MEM=8g -p 8888:8888 -p 8080:8080 -p 
8081:8081 -p 4040:4040 apache/sedona:latest
   ```
   
   In Jupyter lab, I execute the code in the examples notebook 
`ApacheSedonaRaster.ipynb` and everything works fine. 
   
   When I copy my orthophoto `swissimage-dop10_2021_2637-1223_01_2056.tif` 
(attached below) into the examples datafolder and try to load and show it 
within the same examples notebook or in any other notebook, I get the following 
error message.
   
   **Notebook Code:**
   ```
   from sedona.spark import *
   from IPython.display import display, HTML
   ```
   
   ```
   config = (
       SedonaContext.builder()
       .getOrCreate()
   )
   sedona = SedonaContext.create(config)
   ```
   ```
   sc = sedona.sparkContext
   ```
   ```
   swissimage_df = 
sedona.read.format("binaryFile").load("data/raster/swissimage-dop10_2021_2637-1223_01_2056.tif")
   swissimage_df.show(2))
   ```
   
   **Expected behavior:**
   A table like showing information about the raster:
   ```
   +--------------------+-------------------+------+--------------------+
   |                path|   modificationTime|length|             content|
   +--------------------+-------------------+------+--------------------+
   |file:/opt/workspa...|2024-12-15 00:21:05|209199|[4D 4D 00 2A 00 0...|
   +--------------------+-------------------+------+--------------------+
   ```
   
   
   **Actual behaviour:**
   Unfortunately I am aswell not able to capture more of the error log using 
stdout and stderr.
   I tried the following function to capture the whole error message:
   ```
   import sys
   import os
   
   logfile = "cell_output.log"
   with open(logfile, "w") as f:
       # Redirect stdout and stderr
       original_stdout = sys.stdout
       original_stderr = sys.stderr
       sys.stdout = f
       sys.stderr = f
       try:
           print("This will be logged into the file.")
           swissimage_df = 
sedona.read.format("binaryFile").load("data/raster/swissimage-dop10_2021_2637-1223_01_2056.tif")
           swissimage_df.show(2)
       except Exception as e:
           import traceback
           traceback.print_exc()
       finally:
           sys.stdout = original_stdout
           sys.stderr = original_stderr
   ```
   Resulting error message:
   ```
   Traceback (most recent call last):
     File "/tmp/ipykernel_45/2611185478.py", line 16, in <module>
       swissimage_df.show(2)
     File "/usr/local/lib/python3.10/dist-packages/pyspark/sql/dataframe.py", 
line 899, in show
       print(self._jdf.showString(n, 20, vertical))
     File "/usr/local/lib/python3.10/dist-packages/py4j/java_gateway.py", line 
1322, in __call__
       return_value = get_return_value(
     File 
"/usr/local/lib/python3.10/dist-packages/pyspark/errors/exceptions/captured.py",
 line 169, in deco
       return f(*a, **kw)
     File "/usr/local/lib/python3.10/dist-packages/py4j/protocol.py", line 326, 
in get_return_value
       raise Py4JJavaError(
   py4j.protocol.Py4JJavaError: An error occurred while calling o45.showString.
   : java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Formatter.parse(Formatter.java:2807)
        at java.base/java.util.Formatter.format(Formatter.java:2763)
        at java.base/java.util.Formatter.format(Formatter.java:2717)
        at java.base/java.lang.String.format(String.java:4150)
        at scala.collection.immutable.StringLike.format(StringLike.scala:354)
        at scala.collection.immutable.StringLike.format$(StringLike.scala:353)
        at scala.collection.immutable.StringOps.format(StringOps.scala:33)
        at org.apache.spark.sql.Dataset.$anonfun$getRows$5(Dataset.scala:293)
        at 
org.apache.spark.sql.Dataset.$anonfun$getRows$5$adapted(Dataset.scala:293)
        at 
org.apache.spark.sql.Dataset$$Lambda$3379/0x0000000802066638.apply(Unknown 
Source)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at 
scala.collection.TraversableLike$$Lambda$177/0x0000000801203f00.apply(Unknown 
Source)
        at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofByte.foreach(ArrayOps.scala:210)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.mutable.ArrayOps$ofByte.map(ArrayOps.scala:210)
        at org.apache.spark.sql.Dataset.$anonfun$getRows$4(Dataset.scala:293)
        at 
org.apache.spark.sql.Dataset$$Lambda$3378/0x0000000802066278.apply(Unknown 
Source)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at 
scala.collection.TraversableLike$$Lambda$177/0x0000000801203f00.apply(Unknown 
Source)
        at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at org.apache.spark.sql.Dataset.$anonfun$getRows$3(Dataset.scala:290)
        at 
org.apache.spark.sql.Dataset$$Lambda$3377/0x0000000802065eb8.apply(Unknown 
Source)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at 
scala.collection.TraversableLike$$Lambda$177/0x0000000801203f00.apply(Unknown 
Source)
   ```
   
   But with the provided docker image and orthophoto tif file you should be 
able to reproduce the error.
   
   As I use the latest docker image I do not add the versions of the several 
packages. The mentioned orthophoto is to large (65 MB), so I cannot attach it, 
but with the following download link, one can directly access it from the site 
of the Federal Office for Topography of Switzerland: 
   
[https://data.geo.admin.ch/ch.swisstopo.swissimage-dop10/swissimage-dop10_2021_2637-1223/swissimage-dop10_2021_2637-1223_0.1_2056.tif](url)
   
   If there are any further questions, I am happy to answer.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] error on rawDf.show() [sedona]

Reply via email to