[
https://issues.apache.org/jira/browse/SPARK-55444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Milicevic updated SPARK-55444:
------------------------------------
Description:
*Summary:*
Add storage format support to the framework
*Description:*
Extend the framework to cover storage format integration points (Parquet, ORC,
Avro, CSV, JSON, XML, columnar caching).
*What this includes:*
* New interface(s) for storage format operations (schema conversion,
read/write support)
* Integration in ~24 files (Scala + Java) across Parquet, ORC, Avro, CSV,
JSON, XML, and columnar caching including vectorized Java files
({{{}OffHeapColumnVector{}}}, {{{}OnHeapColumnVector{}}},
{{{}ParquetVectorUpdaterFactory{}}}, {{{}VectorizedColumnReader{}}})
*Design doc:*
Linked in the parent work item.
was:
*Summary:*
Add storage format support to the framework
*Description:*
Extend the framework to cover storage format integration points (Parquet, ORC,
Avro, CSV, JSON, XML, columnar caching).
*What this includes:*
* New interface(s) for storage format operations (schema conversion,
read/write support)
* Integration in ~15+ files across Parquet ({{{}ParquetSchemaConverter{}}},
{{{}ParquetRowConverter{}}}, {{{}ParquetWriteSupport{}}}), ORC
({{{}OrcSerializer{}}}, {{{}OrcDeserializer{}}}, {{{}OrcUtils{}}}), Avro
({{{}AvroSerializer{}}}, {{{}AvroDeserializer{}}}, {{{}SchemaConverters{}}}),
CSV ({{{}UnivocityParser{}}}, {{{}UnivocityGenerator{}}}), JSON
({{{}JacksonParser{}}}, {{{}JacksonGenerator{}}}), XML ({{{}StaxXmlParser{}}},
{{{}StaxXmlGenerator{}}}), and columnar caching ({{{}ColumnBuilder{}}},
{{{}ColumnAccessor{}}}, {{{}ColumnType{}}}, {{{}GenerateColumnAccessor{}}})
*Design doc:*
Linked in the parent work item.
> Types Framework - Phase 3 - Storage Formats
> -------------------------------------------
>
> Key: SPARK-55444
> URL: https://issues.apache.org/jira/browse/SPARK-55444
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: David Milicevic
> Priority: Major
>
> *Summary:*
> Add storage format support to the framework
> *Description:*
> Extend the framework to cover storage format integration points (Parquet,
> ORC, Avro, CSV, JSON, XML, columnar caching).
> *What this includes:*
> * New interface(s) for storage format operations (schema conversion,
> read/write support)
> * Integration in ~24 files (Scala + Java) across Parquet, ORC, Avro, CSV,
> JSON, XML, and columnar caching including vectorized Java files
> ({{{}OffHeapColumnVector{}}}, {{{}OnHeapColumnVector{}}},
> {{{}ParquetVectorUpdaterFactory{}}}, {{{}VectorizedColumnReader{}}})
> *Design doc:*
> Linked in the parent work item.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]