xushiyan commented on a change in pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#discussion_r717267707
##########
File path:
hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/UtilitiesTestBase.java
##########
@@ -314,6 +320,28 @@ public static void saveParquetToDFS(List<GenericRecord>
records, Path targetFile
}
}
+ public static void saveORCToDFS(List<GenericRecord> records, Path
targetFile) throws IOException {
+ TypeDescription orcSchema =
AvroOrcUtils.createOrcSchema(HoodieTestDataGenerator.AVRO_SCHEMA);
Review comment:
ditto
##########
File path:
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamerBase.java
##########
@@ -247,4 +254,27 @@ protected static void prepareParquetDFSFiles(int
numRecords, String baseParquetP
dataGenerator.generateInserts("000", numRecords)), new Path(path));
}
}
+
+ protected static void prepareORCDFSFiles(int numRecords) throws IOException {
+ prepareORCDFSFiles(numRecords, ORC_SOURCE_ROOT);
+ }
+
+ protected static void prepareORCDFSFiles(int numRecords, String baseORCPath)
throws IOException {
+ prepareORCDFSFiles(numRecords, baseORCPath, FIRST_ORC_FILE_NAME, false,
null, null);
+ }
+
+ protected static void prepareORCDFSFiles(int numRecords, String baseORCPath,
String fileName, boolean useCustomSchema,
+ String schemaStr, Schema
schema) throws IOException {
+ String path = baseORCPath + "/" + fileName;
+ HoodieTestDataGenerator dataGenerator = new HoodieTestDataGenerator();
+ if (useCustomSchema) {
+ Helpers.saveORCToDFS(Helpers.toGenericRecords(
+ dataGenerator.generateInsertsAsPerSchema("000", numRecords,
schemaStr),
+ schema), new Path(path),
AvroOrcUtils.createOrcSchema(HoodieTestDataGenerator.AVRO_TRIP_SCHEMA));
Review comment:
better if add a `HoodieTestDataGenerator.ORC_TRIP_SCHEMA` in the class
and convert this inside?
##########
File path:
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##########
@@ -1622,6 +1651,16 @@ public void
testParquetDFSSourceWithSchemaFilesAndTransformer() throws Exception
testParquetDFSSource(true,
Collections.singletonList(TripsWithDistanceTransformer.class.getName()));
}
+ @Test
+ public void testORCDFSSourceWithoutSchemaProviderAndNoTransformer() throws
Exception {
+ testORCDFSSource(false, null);
+ }
+
+ @Test
+ public void testORCDFSSourceWithSchemaFilesAndTransformer() throws Exception
{
+ testORCDFSSource(true,
Collections.singletonList(TripsWithDistanceTransformer.class.getName()));
+ }
Review comment:
can we use `@ParameterizedTest` here? with `@MethodSource` returning
`Stream<Arguments>` to make it cleaner
##########
File path:
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##########
@@ -1936,4 +1972,12 @@ public Schema getTargetSchema() {
}
}
+ private static Stream<Arguments> testArguments() {
Review comment:
`testArguments` is not specific enough. use the same method name then
`@MethodSource` does not need to have its argument. it'll look for the method
with the same name. so this argument provider method will only serve its own
test case method
```suggestion
private static Stream<Arguments> testORCDFSSource() {
```
##########
File path:
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##########
@@ -1622,6 +1652,12 @@ public void
testParquetDFSSourceWithSchemaFilesAndTransformer() throws Exception
testParquetDFSSource(true,
Collections.singletonList(TripsWithDistanceTransformer.class.getName()));
}
+ @ParameterizedTest
+ @MethodSource("testArguments")
Review comment:
these 2 should be annotated on `testORCDFSSource()` method itself then
this method not needed
##########
File path:
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java
##########
@@ -129,10 +129,12 @@
public static final Schema AVRO_SCHEMA = new
Schema.Parser().parse(TRIP_EXAMPLE_SCHEMA);
+ public static final Schema ORC_SCHEMA = new
Schema.Parser().parse(TRIP_EXAMPLE_SCHEMA);
public static final Schema AVRO_SCHEMA_WITH_METADATA_FIELDS =
HoodieAvroUtils.addMetadataFields(AVRO_SCHEMA);
public static final Schema AVRO_SHORT_TRIP_SCHEMA = new
Schema.Parser().parse(SHORT_TRIP_SCHEMA);
public static final Schema AVRO_TRIP_SCHEMA = new
Schema.Parser().parse(TRIP_SCHEMA);
+ public static final Schema ORC_TRIP_SCHEMA = new
Schema.Parser().parse(TRIP_SCHEMA);
Review comment:
actually what i meant is to do `AvroOrcUtils.createOrcSchema()` here for
this constant so callers can use it directly
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]