[GitHub] [hudi] svaddoriya opened a new issue, #6236: [SUPPORT]

GitBox Thu, 28 Jul 2022 13:13:28 -0700


svaddoriya opened a new issue, #6236:
URL: https://github.com/apache/hudi/issues/6236


   Hello guys. I am facing an issue on querying Data in Hudi version 0.10.1  
using AWS glue. It works fine with 100 partitions in Dev but it got memory 
issues running in PROD with 5000 partitions.
   the code for reading :- 
                   read_options =  {
                                   'hoodie.datasource.query.type': 
'read_optimized'
                                   }
                   intervaldrive_silver_df = (
                           self.spark.read.format("hudi")
                                       .options(**read_options)
                                       .load(self.silver_basePath)
                                       )
   
   **Spark UI is as follow**
   
![image](https://user-images.githubusercontent.com/89040317/181628250-b641be94-6747-4d2f-aa87-7189a761bfe4.png)
   
   
   **I tested with the following code** 
   base_path = 's3://datalake/interval/drive/data/'
   partition_paths = 
['s3://datalake/interval/drive/data/b54e6bef-4301-4106-8b4b-8e56a15c7d72/']
   ingress_pkg_arrived = (spark.read
                          .format('org.apache.hudi')
                          .option("basePath", base_path)
                          
.option("hoodie.datasource.read.paths",",".join(partition_paths)) # coma 
separated list
                          .load(partition_paths))
                                          
   **But got the following error**
     An error occurred while calling o144.load. Hoodie table not found in path 
Unable to find a hudi table for the user provided paths.
   
   **Issue is # java.lang.OutOfMemoryError: Java heap space**
   # -XX:OnOutOfMemoryError="kill -9 %p"
   #  Executing /bin/sh -c "kill -9 8"...
   # # java.lang.OutOfMemoryError: Java heap space # 
-XX:OnOutOfMemoryError="kill -9 %p" # Executing /bin/sh -c "kill -9 8".
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] svaddoriya opened a new issue, #6236: [SUPPORT]

Reply via email to