xushiyan commented on code in PR #5943:
URL: https://github.com/apache/hudi/pull/5943#discussion_r930070043
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java:
##########
@@ -130,6 +130,16 @@ public class HoodieStorageConfig extends HoodieConfig {
.defaultValue("TIMESTAMP_MICROS")
.withDocumentation("Sets spark.sql.parquet.outputTimestampType. Parquet
timestamp type to use when Spark writes data to Parquet files.");
+ // SPARK-38094 Spark 3.3 checks if this field is enabled. Hudi has to
provide this or there would be NPE thrown
+ // Would ONLY be effective with Spark 3.3+
+ // default value is true which is in accordance with Spark 3.3
+ public static final ConfigProperty<String> PARQUET_FIELD_ID_WRITE_ENABLED =
ConfigProperty
+ .key("hoodie.parquet.fieldId.write.enabled")
Review Comment:
```suggestion
.key("hoodie.parquet.field_id.write.enabled")
```
##########
pom.xml:
##########
@@ -1307,6 +1308,7 @@
<version>${maven-surefire-plugin.version}</version>
<configuration combine.self="append">
<skip>${skipUTs}</skip>
+ <trimStackTrace>false</trimStackTrace>
Review Comment:
this is also set in
<pluginManagement>
<plugins>
<plugin>
is it not effective?
##########
hudi-examples/hudi-examples-spark/pom.xml:
##########
@@ -190,6 +190,12 @@
<artifactId>spark-sql_${scala.binary.version}</artifactId>
</dependency>
+ <!-- Hadoop -->
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-auth</artifactId>
+ </dependency>
+
Review Comment:
good find. so can we now re-enable spark 3.2 quickstart test in GH action?
check out bot.yml
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/bootstrap/HoodieSparkBootstrapSchemaProvider.java:
##########
@@ -71,11 +72,20 @@ protected Schema
getBootstrapSourceSchema(HoodieEngineContext context, List<Pair
}
private static Schema getBootstrapSourceSchemaParquet(HoodieWriteConfig
writeConfig, HoodieEngineContext context, Path filePath) {
- MessageType parquetSchema = new
ParquetUtils().readSchema(context.getHadoopConf().get(), filePath);
+ Configuration hadoopConf = context.getHadoopConf().get();
+ MessageType parquetSchema = new ParquetUtils().readSchema(hadoopConf,
filePath);
+
+ hadoopConf.set(
+ SQLConf.PARQUET_BINARY_AS_STRING().key(),
+ SQLConf.PARQUET_BINARY_AS_STRING().defaultValueString());
+ hadoopConf.set(
+ SQLConf.PARQUET_INT96_AS_TIMESTAMP().key(),
+ SQLConf.PARQUET_INT96_AS_TIMESTAMP().defaultValueString());
+ hadoopConf.set(
+ SQLConf.CASE_SENSITIVE().key(),
+ SQLConf.CASE_SENSITIVE().defaultValueString());
Review Comment:
dont you want to set those only when they're not set?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]