chenhao-db opened a new pull request, #50217: URL: https://github.com/apache/spark/pull/50217
### What changes were proposed in this pull request? The current JSON `singleVariantColumn` mode doesn't work in DSv2 and `spark.readStream`. This PR fixes the two cases: - DSv1 calls `JsonFileFormat.inferSchema`, which calls `JsonFileFormat.inferSchema`; DSv2 calls `JsonFileFormat.inferSchema`. The previous `singleVariantColumn` code was in `JsonFileFormat.inferSchema`, and is now moved into `JsonFileFormat.inferSchema`, so that both cases can be covered. - `spark.readStream` requires that there must be a user-specified schema. `singleVariantColumn` plays the same row as a user-specified schema, but the check would fail. It also includes a small refactor that moves the option name definition `singleVariantColumn` from `JSONOptions` to `DataSourceOptions`. It will be a common option name shared by multiple data sources (e.g., CSV) when we add the implementation in the future. ### Why are the changes needed? It is a bug fix that improves the usability of variant. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. A test previously in `VariantSuite` is moved to `JsonSuite`, so that we can test the read behavior in both `JsonV1Suite` and `JsonV2Suite`. The test is also extended to include `spark.readStream`. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org