nsivabalan commented on a change in pull request #5170:
URL: https://github.com/apache/hudi/pull/5170#discussion_r837861114
##########
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsHoodieIncrSource.java
##########
@@ -174,7 +180,19 @@ public S3EventsHoodieIncrSource(
}
Option<Dataset<Row>> dataset = Option.empty();
if (!cloudFiles.isEmpty()) {
- dataset =
Option.of(sparkSession.read().format(fileFormat).load(cloudFiles.toArray(new
String[0])));
+ DataFrameReader dataFrameReader = sparkSession.read().format(fileFormat);
+ if
(!StringUtils.isNullOrEmpty(props.getString(HoodieIncrSource.Config.SPARK_DATASOURCE_OPTIONS,
null))) {
Review comment:
can we move this to a private method and keep this cleaner/leaner.
##########
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java
##########
@@ -88,6 +88,13 @@
* {@value #SOURCE_FILE_FORMAT} is passed to the reader while loading
dataset. Default value is parquet.
*/
static final String SOURCE_FILE_FORMAT =
"hoodie.deltastreamer.source.hoodieincr.file.format";
+ /**
+ *{@value #SPARK_DATASOURCE_OPTIONS} is json string, passed to the reader
while loading dataset.
Review comment:
this may not be applicable to HoodieIncrSource. this is something
specific to S3 incr source. So, can we move this to the other class ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]