alexeykudinkin commented on PR #5470:
URL: https://github.com/apache/hudi/pull/5470#issuecomment-1194362557

   TL;DR is the difference b/w `Row` and `InternalRow`:
   
    - When you do `df.rdd` you invoke deserializer which will deserialize 
internal binary representation (`UnsafeRow`) into a `Row` holding Java native 
types (it also holds the schema)
   
    - `df.queryExecution.toRdd` is an internal API that returns you an RDD of 
`InternalRow`s avoiding such conversion (that’s the primary reason for 
introduction of many utilities in `HoodieUnsafeUtils` to be able to access 
private Spark APIs)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to