[
https://issues.apache.org/jira/browse/CAMEL-23092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18069457#comment-18069457
]
Guillaume Nodet commented on CAMEL-23092:
-----------------------------------------
h3. Investigation Summary
_Claude Code on behalf of Guillaume Nodet_
The issue references [Hardwood|https://github.com/hardwood-hq/hardwood], a new
lightweight Parquet parser by Gunnar Morling
({{dev.hardwood:hardwood-core:1.0.0.Alpha1}} on Maven Central).
h4. Why Hardwood is interesting
* *Zero Hadoop dependency* — the only library that is truly Hadoop-free at the
classpath level
* Multi-threaded decoding, column projection, predicate push-down
* Would solve the JDK 25 {{getSubject}} incompatibility (CAMEL-22934) since it
doesn't use Hadoop at all
* Has both a native RowReader API and an Avro compatibility module
({{hardwood-avro}})
h4. Blockers for adoption now
* *JDK 21+ required* — Camel currently targets JDK 17 ({{jdk.version=17}} in
root pom.xml)
* *Read-only* — Hardwood does not support writing Parquet files yet (on the
roadmap)
* *Alpha quality* — 1.0.0.Alpha1, API may still change
h4. Alternatives considered
||Library||JDK||Read/Write||Truly Hadoop-free?||Production-ready||
|[Carpet|https://github.com/jerolba/parquet-carpet]
{{com.jerolba:carpet-record:0.6.1}}|17+|Both|No (exclusions)|Yes|
|[parquet-floor|https://github.com/strategicblue/parquet-floor]
{{blue.strategic.parquet:parquet-floor:1.64}}|11+|Both|No (exclusions)|Yes|
|[Hardwood|https://github.com/hardwood-hq/hardwood]
{{dev.hardwood:hardwood-core:1.0.0.Alpha1}}|21+|Read only|*Yes*|No (Alpha)|
|parquet-java (official)|8+|Both|No|Yes|
Carpet and parquet-floor still depend on {{parquet-hadoop}} transitively (with
exclusions), so they likely still hit the same JDK 25 {{getSubject}} issue.
Neither is a true improvement over the current {{camel-parquet-avro}}
implementation.
h4. Recommendation
Wait until:
* Camel moves back to JDK 21+, AND
* Hardwood reaches a stable release (ideally with write support)
Then create a new {{camel-parquet}} data format module using Hardwood,
alongside the existing {{camel-parquet-avro}} module. The new module would
provide a lightweight alternative without any Hadoop/Avro dependency.
> camel-parquest - Use lightweight parser library
> -----------------------------------------------
>
> Key: CAMEL-23092
> URL: https://issues.apache.org/jira/browse/CAMEL-23092
> Project: Camel
> Issue Type: New Feature
> Reporter: Claus Ibsen
> Assignee: Guillaume Nodet
> Priority: Major
> Fix For: 4.x
>
>
> [https://www.morling.dev/blog/hardwood-new-parser-for-apache-parquet/]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)