yihua commented on a change in pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#discussion_r811449006



##########
File path: 
hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestCleansCommand.java
##########
@@ -83,7 +83,7 @@ public void init() throws Exception {
     String fileId1 = UUID.randomUUID().toString();
     String fileId2 = UUID.randomUUID().toString();
     FileSystem fs = FSUtils.getFs(basePath(), hadoopConf());
-    HoodieTestDataGenerator.writePartitionMetadata(fs, 
HoodieTestDataGenerator.DEFAULT_PARTITION_PATHS, tablePath);
+    HoodieTestDataGenerator.writePartitionMetadataDeprecated(fs, 
HoodieTestDataGenerator.DEFAULT_PARTITION_PATHS, tablePath);

Review comment:
       Are these going to be cleaned up in a separate PR?

##########
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java
##########
@@ -865,4 +883,31 @@ public int getNumExistingKeys(String schemaStr) {
   public void close() {
     existingKeysBySchema.clear();
   }
+
+  private static long genRandomTimeMillis(Random r) {
+    // NOTE: {@code Date} expects a number representing a number of millis 
elapsed since epoch
+    //       (01/01/1970 00:00:00) until approx year 8099
+    //       This modulo represents approx 6000 years in millis, which should 
be sufficient to
+    //       cover all possible use-cases
+    return Math.abs(r.nextLong()) % 189216000000000L;
+  }
+
+  private static UUID genPseudoRandomUUID(Random r) {
+    byte[] bytes = new byte[16];
+    r.nextBytes(bytes);
+
+    bytes[6] &= 0x0f;
+    bytes[6] |= 0x40;
+    bytes[8] &= 0x3f;
+    bytes[8] |= 0x80;

Review comment:
       what's the reason for having this logic?  trying to bound the value?

##########
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java
##########
@@ -269,25 +300,25 @@ public static GenericRecord generateGenericRecord(String 
rowKey, String partitio
     rec.put("partition_path", partitionPath);
     rec.put("rider", riderName);
     rec.put("driver", driverName);
-    rec.put("begin_lat", RAND.nextDouble());
-    rec.put("begin_lon", RAND.nextDouble());
-    rec.put("end_lat", RAND.nextDouble());
-    rec.put("end_lon", RAND.nextDouble());
+    rec.put("begin_lat", r.nextDouble());
+    rec.put("begin_lon", r.nextDouble());
+    rec.put("end_lat", r.nextDouble());
+    rec.put("end_lon", r.nextDouble());
     if (isFlattened) {
-      rec.put("fare", RAND.nextDouble() * 100);
+      rec.put("fare", r.nextDouble() * 100);
       rec.put("currency", "USD");
     } else {
-      rec.put("distance_in_meters", RAND.nextInt());
-      rec.put("seconds_since_epoch", RAND.nextLong());
-      rec.put("weight", RAND.nextFloat());
+      rec.put("distance_in_meters", r.nextInt());
+      rec.put("seconds_since_epoch", r.nextLong());
+      rec.put("weight", r.nextFloat());
       byte[] bytes = "Canada".getBytes();
       rec.put("nation", ByteBuffer.wrap(bytes));
-      long currentTimeMillis = System.currentTimeMillis();
-      Date date = new Date(currentTimeMillis);
+      long randomMillis = genRandomTimeMillis(r);

Review comment:
       One nit for thought.  Previously, each time generateGenericRecord() is 
called, the timestamp monotonically increases.  If some tests depend on this 
assumption, the new time millis random gen should also honor that in a way, 
e.g., start with a random millis, and increment it by another positive random 
millis for each subsequent call, to guarantee the order. 

##########
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java
##########
@@ -842,7 +860,7 @@ public boolean deleteExistingKeyIfPresent(HoodieKey key) {
     List<GenericRecord> list = new ArrayList<>();
     IntStream.range(0, numRecords).forEach(i -> {
       list.add(generateGenericRecord(UUID.randomUUID().toString(), "0", 
UUID.randomUUID().toString(), UUID.randomUUID()

Review comment:
       same here regarding `UUID.randomUUID()` if full reproducibility is the 
goal.

##########
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java
##########
@@ -140,35 +142,64 @@
   public static final TypeDescription ORC_TRIP_SCHEMA = 
AvroOrcUtils.createOrcSchema(new Schema.Parser().parse(TRIP_SCHEMA));
   public static final Schema FLATTENED_AVRO_SCHEMA = new 
Schema.Parser().parse(TRIP_FLATTENED_SCHEMA);
 
-  private static final Random RAND = new Random(46474747);

Review comment:
       Looks like this always generates the same seq of "random" numbers before?

##########
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java
##########
@@ -489,8 +507,8 @@ public static void createSavepointFile(String basePath, 
String instantTime, Conf
 
   public Stream<HoodieRecord> generateInsertsStream(String commitTime, Integer 
n, boolean isFlattened, String schemaStr, boolean containsAllPartitions) {
     return generateInsertsStream(commitTime, n, isFlattened, schemaStr, 
containsAllPartitions,
-        () -> partitionPaths[RAND.nextInt(partitionPaths.length)],
-        () -> UUID.randomUUID().toString());
+        () -> partitionPaths[r.nextInt(partitionPaths.length)],
+        () -> genPseudoRandomUUID(r).toString());

Review comment:
       Should this be fixed as well:
   ```
   public List<HoodieRecord> generateInsertsForPartition(String instantTime, 
Integer n, String partition) {
       return generateInsertsStream(instantTime,  n, false, 
TRIP_EXAMPLE_SCHEMA, false, () -> partition, () -> 
UUID.randomUUID().toString()).collect(Collectors.toList());
     }
   ```

##########
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java
##########
@@ -140,35 +142,64 @@
   public static final TypeDescription ORC_TRIP_SCHEMA = 
AvroOrcUtils.createOrcSchema(new Schema.Parser().parse(TRIP_SCHEMA));
   public static final Schema FLATTENED_AVRO_SCHEMA = new 
Schema.Parser().parse(TRIP_FLATTENED_SCHEMA);
 
-  private static final Random RAND = new Random(46474747);
+  private final Random r;

Review comment:
       nit: rename to `rand`?

##########
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java
##########
@@ -552,7 +570,7 @@ private void incrementNumExistingKeysBySchema(String 
schemaStr) {
     List<HoodieRecord> inserts = new ArrayList<>();
     int currSize = getNumExistingKeys(TRIP_EXAMPLE_SCHEMA);
     for (int i = 0; i < limit; i++) {
-      String partitionPath = 
partitionPaths[RAND.nextInt(partitionPaths.length)];
+      String partitionPath = partitionPaths[r.nextInt(partitionPaths.length)];
       HoodieKey key = new HoodieKey(UUID.randomUUID().toString(), 
partitionPath);

Review comment:
       Same here for `UUID.randomUUID()`.

##########
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java
##########
@@ -140,35 +142,64 @@
   public static final TypeDescription ORC_TRIP_SCHEMA = 
AvroOrcUtils.createOrcSchema(new Schema.Parser().parse(TRIP_SCHEMA));
   public static final Schema FLATTENED_AVRO_SCHEMA = new 
Schema.Parser().parse(TRIP_FLATTENED_SCHEMA);
 
-  private static final Random RAND = new Random(46474747);
+  private final Random r;
 
   //Maintains all the existing keys schema wise
   private final Map<String, Map<Integer, KeyPartition>> existingKeysBySchema;
   private final String[] partitionPaths;
   //maintains the count of existing keys schema wise
   private Map<String, Integer> numKeysBySchema;
 
+  public HoodieTestDataGenerator(long seed) {
+    this(seed, DEFAULT_PARTITION_PATHS, new HashMap<>());
+  }
+
+  public HoodieTestDataGenerator(long seed, String[] partitionPaths, 
Map<Integer, KeyPartition> keyPartitionMap) {
+    this.r = new Random(seed);

Review comment:
       Should the seed be printed in logs here, so that if CI runs fail, the 
seed can be used for reproducing locally?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to