svaddoriya opened a new issue, #6236:
URL: https://github.com/apache/hudi/issues/6236
Hello guys. I am facing an issue on querying Data in Hudi version 0.10.1
using AWS glue. It works fine with 100 partitions in Dev but it got memory
issues running in PROD with 5000 partitions.
the code for reading :-
read_options = {
'hoodie.datasource.query.type':
'read_optimized'
}
intervaldrive_silver_df = (
self.spark.read.format("hudi")
.options(**read_options)
.load(self.silver_basePath)
)
**Spark UI is as follow**

**I tested with the following code**
base_path = 's3://datalake/interval/drive/data/'
partition_paths =
['s3://datalake/interval/drive/data/b54e6bef-4301-4106-8b4b-8e56a15c7d72/']
ingress_pkg_arrived = (spark.read
.format('org.apache.hudi')
.option("basePath", base_path)
.option("hoodie.datasource.read.paths",",".join(partition_paths)) # coma
separated list
.load(partition_paths))
**But got the following error**
An error occurred while calling o144.load. Hoodie table not found in path
Unable to find a hudi table for the user provided paths.
**Issue is # java.lang.OutOfMemoryError: Java heap space**
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 8"...
# # java.lang.OutOfMemoryError: Java heap space #
-XX:OnOutOfMemoryError="kill -9 %p" # Executing /bin/sh -c "kill -9 8".
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]