rluvaton commented on issue #1028:
URL:
https://github.com/apache/datafusion-comet/issues/1028#issuecomment-2912704996
Looking at the discussion of Iceberg and delta lake support it seems like
there should be a different solution than the extension authors (iceberg/delta
lake) should implement comet Support trait.
Why not create a maven package that have Transitive dependency to
iceberg/deltalake/other popular third party jars and will implement support
comet and also check if conversion is supported, and convert to protobuf.
If we had some kind of internal package that will just expose basic things,
like `isSupported`, `convert` that will be used in the scala serde file, we
could implement
```
./:
iceberg-support/:
pom.xml - which will have Transitive dependency to iceberg maven package
... - the code that will mark the reader as supported and will add
convert code for QueryPlanSerde
delta-lake-support/:
pom.xml - which will have Transitive dependency to delta maven package
... - the code that will mark the reader as supported and will add
convert code for QueryPlanSerde
avro-support/:
pom.xml - which will have Transitive dependency to spark avro maven
package
... - the code that will mark the avro file format as supported and will
add conversion code for QueryPlanSerde
```
# Examples
## Delta Lake
I we want to replace delta lake java reader with the delta-lake rs reader
and avoid serializing to java
Delta lake has implementation in rust, in order to match delta lake file
format and all the config so we can replace it with native scan we need to have
the delta lake jar. but to avoid having delta lake dependency on comet as well
as iceberg, we can create our own delta-late support package that will have
transitive dependency to delta lake spark extension and in this package it will
add the support trait to the delta lake class + whether it is supported in the
current schema and conversion to protobuf.
## Avro Reader
I want to read Avro file. unlike parquet, Avro package no longer come built
in with spark, this means that we will not be able to match `AvroFileFormat`
here:
https://github.com/apache/datafusion-comet/blob/6663245d73c79b4547605e1f68840bc0c2a4d22d/spark/src/main/scala/org/apache/spark/sql/comet/CometScanExec.scala#L527-L529
so if we had like delta lake - a separate package that will have transitive
dependency to avro package - we could match on that and add support for reading
Avro.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]