szehon-ho commented on code in PR #50571: URL: https://github.com/apache/spark/pull/50571#discussion_r2040394255
########## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala: ########## @@ -257,12 +257,14 @@ class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] QualifiedTableName(table.identifier.catalog.get, table.database, table.identifier.table) val catalog = sparkSession.sessionState.catalog val dsOptions = DataSourceUtils.generateDatasourceOptions(extraOptions, table) + val readDataSourceCacheIgnoreOptions: Boolean = + SQLConf.get.getConf(SQLConf.READ_DATA_SOURCE_CACHE_IGNORE_OPTIONS) catalog.getCachedTable(qualifiedTableName) match { case null => val dataSource = DataSource( sparkSession, - // In older version(prior to 2.1) of Spark, the table schema can be empty and should be + // In older version(prior to 2.1) of Spark,the table schema can be empty and should be Review Comment: The space can be put back between the words? To me, its harder to read without it? ########## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ########## @@ -5260,6 +5260,15 @@ object SQLConf { .booleanConf .createWithDefault(false) + val READ_DATA_SOURCE_CACHE_IGNORE_OPTIONS = + buildConf("spark.sql.legacy.readDataSourceCacheIgnoreOptions") + .internal() + .doc("When set to true, reading from file source table caches the first query plan and " + + "ignores subsequent changes in query options, e.g. `delimiters` option for csv tables.") Review Comment: optional: not sure if the example adds much value. We ignore any subsequent change to query option right? I guess example can be more useful where we document the options itself. ########## sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala: ########## @@ -1405,6 +1405,31 @@ abstract class DDLSuite extends QueryTest with DDLSuiteBase { } } + test("SPARK-51747: When legacy conf enabled data source read ignores option change") { Review Comment: to me , it would be clearer to combine the test: ``` Seq("true", "false").foreach { ignoreCache => withSQLConf(SQLConf.READ_DATA_SOURCE_CACHE_IGNORE_OPTIONS.key -> ignoreCache) { if (ignoreCache) checkAnswer(...) else checkAnswer(...) } ``` Then the test can just show the two versions of the output. I see its in many other suite and think its a nice way to reduce the code and side-by-side see the config behavior change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org