kartik18 opened a new issue #5211: URL: https://github.com/apache/hudi/issues/5211
**Describe the problem you faced** I've S3 directory structure - Folder |_ cluster=abc_$folder_ //It is a file not directory |_ cluster=abc |_dt = 2022-01-01 |_ A1.parquet |_ A2.parquet |_ cluster=efg_$folder_ |_ cluster=efg |_dt = 2022-01-01 |_ B1.parquet |_ B2.parquet I'm trying to read only the subfolders that contains the parquet file. So I'm reading file as followed - spark.read.format("org.apache.hudi").load("s3://bucket/folder/*[^_$folder_]/dt=2022-01-01/*.parquet) But it gives an error - pypy4j.protocol.Py4JJavaError: An error occurred while calling o127.load. java.lang.NullPointerException | : java.lang.NullPointerException org.apache.hudi.HoodieSparkUtils$anonfun$globPath$1$anonfun$1.apply(HoodieSparkUtils.scala:82) However, if I provide to full the path - spark.read.format("org.apache.hudi").load("s3://bucket/folder/cluster=abc/dt=2022-01-01/*.parquet) Then it is able to read the data **To Reproduce** Steps to reproduce the behavior: 1. Create the above folder structure 2. Execute glob pattern to select only parquet files. 3. Load the data as Dataframe **Expected behavior** It will not be able to apply the glob patterns and proceed for only those subfolders that contain the parquet file. **Environment Description** * Hudi version : 0.10 * Spark version : 2.4 * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : No **Additional context** Add any other context about the problem here. **Stacktrace** pypy4j.protocol.Py4JJavaError: An error occurred while calling o127.load. java.lang.NullPointerException | : java.lang.NullPointerException org.apache.hudi.HoodieSparkUtils$anonfun$globPath$1$anonfun$1.apply(HoodieSparkUtils.scala:82) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org