skywalker0618 opened a new issue, #18397:
URL: https://github.com/apache/hudi/issues/18397

   ### Task Description
   
   Within Uber, we use Hudi with parquet not orc. But we found despite the orc 
related functions are not called during runtime, hudi streaming source on 1.2 
still has a dependency on orc package because:
   
   1. HoodieSplitReaderFunction class implements serializable and has 
HoodieWriteConfig 
([code](https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/HoodieSplitReaderFunction.java#L59C17-L59C34))
   2. This class eagerly creates this writerConfig in its constructor 
([code](https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/HoodieSplitReaderFunction.java#L82))
   3. During flink job deployment, this class (HoodieSplitReaderFunction) is 
serialized and sent from JM to TM, which caused all non-transient members 
(including the writerConfig) being serialized.
   4. The serializer uses reflection to look for if this class has 
readObject/writeObject functions.
   5. The reflection needs all class definitions of function signatures of 
HoodieWriteConfig, which includes org.apache.orc.CompressionKind.
   
   Hence, even though the job is not using orc, if not including orc 
dependency, the job encountered this class-not-found failure:
   
   Caused by: java.lang.NoClassDefFoundError: org/apache/orc/CompressionKind
        at java.base/java.lang.Class.getDeclaredMethods0(Native Method)
   
   Proposed solution:
   1. Change the HoodieWriterConfig member to be transient: "private transient 
HoodieWriteConfig writeConfig;"
   2. Remove early construction from the constructor of class 
HoodieSplitReaderFunction.
   3. Add lazy initialization of the writerConfig like this: 
   private HoodieWriteConfig getOrCreateWriteConfig() {
       if (writeConfig == null) {
           writeConfig = FlinkWriteClients.getHoodieClientConfig(configuration);
       }
       return writeConfig;
   }
   4. This solution is similar to what it does with hadoopConfig 
([code](https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/HoodieSplitReaderFunction.java#L143))
   
   
   
   
   ### Task Type
   
   Code improvement/refactoring
   
   ### Related Issues
   
   **Parent feature issue:** (if applicable )
   **Related issues:**
   NOTE: Use `Relationships` button to add parent/blocking issues after issue 
is created.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to