[jira] [Updated] (SPARK-55444) Types Framework - Phase 3 - Storage Formats

David Milicevic (Jira) Fri, 27 Feb 2026 02:50:08 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-55444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Milicevic updated SPARK-55444:
------------------------------------
    Description: 
*Summary:*

Add storage format support to the framework

*Description:*

Extend the framework to cover storage format integration points (Parquet, ORC, 
Avro, CSV, JSON, XML, columnar caching).

*What this includes:*
 * New interface(s) for storage format operations (schema conversion, 
read/write support)
 * Integration in ~24 files (Scala + Java) across Parquet, ORC, Avro, CSV, 
JSON, XML, and columnar caching including vectorized Java files 
({{{}OffHeapColumnVector{}}}, {{{}OnHeapColumnVector{}}}, 
{{{}ParquetVectorUpdaterFactory{}}}, {{{}VectorizedColumnReader{}}})

*Design doc:*

Linked in the parent work item.

  was:
*Summary:*

Add storage format support to the framework

*Description:*

Extend the framework to cover storage format integration points (Parquet, ORC, 
Avro, CSV, JSON, XML, columnar caching).

*What this includes:*
 * New interface(s) for storage format operations (schema conversion, 
read/write support)
 * Integration in ~15+ files across Parquet ({{{}ParquetSchemaConverter{}}}, 
{{{}ParquetRowConverter{}}}, {{{}ParquetWriteSupport{}}}), ORC 
({{{}OrcSerializer{}}}, {{{}OrcDeserializer{}}}, {{{}OrcUtils{}}}), Avro 
({{{}AvroSerializer{}}}, {{{}AvroDeserializer{}}}, {{{}SchemaConverters{}}}), 
CSV ({{{}UnivocityParser{}}}, {{{}UnivocityGenerator{}}}), JSON 
({{{}JacksonParser{}}}, {{{}JacksonGenerator{}}}), XML ({{{}StaxXmlParser{}}}, 
{{{}StaxXmlGenerator{}}}), and columnar caching ({{{}ColumnBuilder{}}}, 
{{{}ColumnAccessor{}}}, {{{}ColumnType{}}}, {{{}GenerateColumnAccessor{}}})

*Design doc:*

Linked in the parent work item.


> Types Framework - Phase 3 - Storage Formats
> -------------------------------------------
>
>                 Key: SPARK-55444
>                 URL: https://issues.apache.org/jira/browse/SPARK-55444
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: David Milicevic
>            Priority: Major
>
> *Summary:*
> Add storage format support to the framework
> *Description:*
> Extend the framework to cover storage format integration points (Parquet, 
> ORC, Avro, CSV, JSON, XML, columnar caching).
> *What this includes:*
>  * New interface(s) for storage format operations (schema conversion, 
> read/write support)
>  * Integration in ~24 files (Scala + Java) across Parquet, ORC, Avro, CSV, 
> JSON, XML, and columnar caching including vectorized Java files 
> ({{{}OffHeapColumnVector{}}}, {{{}OnHeapColumnVector{}}}, 
> {{{}ParquetVectorUpdaterFactory{}}}, {{{}VectorizedColumnReader{}}})
> *Design doc:*
> Linked in the parent work item.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-55444) Types Framework - Phase 3 - Storage Formats

Reply via email to