Prashant Wason created HUDI-667:
-----------------------------------
Summary: HoodieTestDataGenerator does not delete keys correctly
Key: HUDI-667
URL: https://issues.apache.org/jira/browse/HUDI-667
Project: Apache Hudi (incubating)
Issue Type: Bug
Reporter: Prashant Wason
HoodieTestDataGenerator is used to generate sample data for unit-tests. It
allows generating HoodieRecords for insert/update/delete. It maintains the
record keys in a HashMap.
private final Map<Integer, KeyPartition> existingKeys;
There are two issues in the implementation:
# Delete from existingKeys uses KeyPartition rather than Integer keys
# Inserting records after deletes is not correctly handled
The implementation uses the Integer key so that values can be looked up
randomly. Assume three values were inserted, then the HashMap will hold:
0 -> KeyPartition1
1 -> KeyPartition2
2 -> KeyPartition3
Now if we delete KeyPartition2 (generate a random record for deletion), the
HashMap will be:
0 -> KeyPartition1
2 -> KeyPartition3
Now if we issue a insertBatch() then the insert is
existingKeys.put(existingKeys.size(), KeyPartition3) which will overwrite the
KeyPartition3 already in the map rather than actually inserting a new entry in
the map.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)