[ 
https://issues.apache.org/jira/browse/CAMEL-23092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18069457#comment-18069457
 ] 

Guillaume Nodet commented on CAMEL-23092:
-----------------------------------------

h3. Investigation Summary

_Claude Code on behalf of Guillaume Nodet_

The issue references [Hardwood|https://github.com/hardwood-hq/hardwood], a new 
lightweight Parquet parser by Gunnar Morling 
({{dev.hardwood:hardwood-core:1.0.0.Alpha1}} on Maven Central).

h4. Why Hardwood is interesting

* *Zero Hadoop dependency* — the only library that is truly Hadoop-free at the 
classpath level
* Multi-threaded decoding, column projection, predicate push-down
* Would solve the JDK 25 {{getSubject}} incompatibility (CAMEL-22934) since it 
doesn't use Hadoop at all
* Has both a native RowReader API and an Avro compatibility module 
({{hardwood-avro}})

h4. Blockers for adoption now

* *JDK 21+ required* — Camel currently targets JDK 17 ({{jdk.version=17}} in 
root pom.xml)
* *Read-only* — Hardwood does not support writing Parquet files yet (on the 
roadmap)
* *Alpha quality* — 1.0.0.Alpha1, API may still change

h4. Alternatives considered

||Library||JDK||Read/Write||Truly Hadoop-free?||Production-ready||
|[Carpet|https://github.com/jerolba/parquet-carpet] 
{{com.jerolba:carpet-record:0.6.1}}|17+|Both|No (exclusions)|Yes|
|[parquet-floor|https://github.com/strategicblue/parquet-floor] 
{{blue.strategic.parquet:parquet-floor:1.64}}|11+|Both|No (exclusions)|Yes|
|[Hardwood|https://github.com/hardwood-hq/hardwood] 
{{dev.hardwood:hardwood-core:1.0.0.Alpha1}}|21+|Read only|*Yes*|No (Alpha)|
|parquet-java (official)|8+|Both|No|Yes|

Carpet and parquet-floor still depend on {{parquet-hadoop}} transitively (with 
exclusions), so they likely still hit the same JDK 25 {{getSubject}} issue. 
Neither is a true improvement over the current {{camel-parquet-avro}} 
implementation.

h4. Recommendation

Wait until:
* Camel moves back to JDK 21+, AND
* Hardwood reaches a stable release (ideally with write support)

Then create a new {{camel-parquet}} data format module using Hardwood, 
alongside the existing {{camel-parquet-avro}} module. The new module would 
provide a lightweight alternative without any Hadoop/Avro dependency.

> camel-parquest - Use lightweight parser library
> -----------------------------------------------
>
>                 Key: CAMEL-23092
>                 URL: https://issues.apache.org/jira/browse/CAMEL-23092
>             Project: Camel
>          Issue Type: New Feature
>            Reporter: Claus Ibsen
>            Assignee: Guillaume Nodet
>            Priority: Major
>             Fix For: 4.x
>
>
> [https://www.morling.dev/blog/hardwood-new-parser-for-apache-parquet/]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to