[PR] [SPARK-51443] Fix singleVariantColumn in DSv2 and readStream. [spark]

via GitHub Sat, 08 Mar 2025 11:38:33 -0800


chenhao-db opened a new pull request, #50217:
URL: https://github.com/apache/spark/pull/50217


   ### What changes were proposed in this pull request?
   
   The current JSON `singleVariantColumn` mode doesn't work in DSv2 and 
`spark.readStream`. This PR fixes the two cases:
   - DSv1 calls `JsonFileFormat.inferSchema`, which calls 
`JsonFileFormat.inferSchema`; DSv2 calls `JsonFileFormat.inferSchema`. The 
previous `singleVariantColumn` code was in `JsonFileFormat.inferSchema`, and is 
now moved into `JsonFileFormat.inferSchema`, so that both cases can be covered.
   - `spark.readStream` requires that there must be a user-specified schema. 
`singleVariantColumn` plays the same row as a user-specified schema, but the 
check would fail.
   
   It also includes a small refactor that moves the option name definition 
`singleVariantColumn` from `JSONOptions` to `DataSourceOptions`. It will be a 
common option name shared by multiple data sources (e.g., CSV) when we add the 
implementation in the future.
   
   ### Why are the changes needed?
   
   It is a bug fix that improves the usability of variant.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test. A test previously in `VariantSuite` is moved to `JsonSuite`, so 
that we can test the read behavior in both `JsonV1Suite` and `JsonV2Suite`. The 
test is also extended to include `spark.readStream`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51443] Fix singleVariantColumn in DSv2 and readStream. [spark]

Reply via email to