jmnatzaganian edited a comment on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-974507962


   I'm also having the same type of issue in EMR 6.4 after building and 
deploying Hudi 0.9.0. Note that as mentioned 
[above](https://github.com/apache/hudi/issues/2498#issuecomment-969228521), the 
default binaries work just fine (EMR 6.4 with Hudi 0.8.0).
   
   It seems that there's likely something off with the build or referencing. I 
used `mvn clean package -DskipTests -Dspark3 -Dscala-2.12 -T 30`.
   
   What's really interesting is that I can create an MoR table w/o issue, but 
trying to do a `load` renders the loaded DF unusable. It looks like the DF is 
loaded, but then becomes unusable.
   
   This 
[tip](https://github.com/apache/hudi/issues/2498#issuecomment-942282671) also 
worked for me (i.e. using `spark.sql` and referencing the table from the Glue 
data catalog). Unfortunately, querying the data this way seems to be *much* 
slower (compared to 0.8.0).
   
   I documented my build and installation process in 
[this](https://apache-hudi.slack.com/archives/C4D716NPQ/p1637354714476100) 
slack thread.
   
   Edit:
   I tested this with a CoW table and I did not have the issue, i.e. the 
following works just fine. It did; however, take 2.7x longer to do the read 
than it did in 0.8.0.
   ````
   df = spark.read.format("org.apache.hudi").load(path)
   df.show()
   ````


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to