This is an automated email from the ASF dual-hosted git repository.
vhs pushed a change to branch variant-intro-shredded-avro-support
in repository https://gitbox.apache.org/repos/asf/hudi.git
omit 46779f9daff6 Remove looksLikeVariantStruct:
omit af00daf8dcca feat(schema): Add read + write support for shredded for
AVRO
omit d1e8c0d46287 feat(schema): Config path implemented for spark record
type
omit d3bb0d3750f1 feat(schema): Add support to write shredded variants
omit cf1d9a1e223b Support reading and writing of Variant Types - Add
adapter pattern for Spark3 and 4 - Cleanup invariant issue in SparkSqlWriter -
Add cross engine test - Add backward compatibility test for Spark3.x - Add
cross engine read for Flink - Make VariantLogicalType compare against singleton
add b7f72d33b560 feat: support predicate push down in Hudi flink source v2
(#18212)
add f7b805961e05 feat(flink): Off-heap lookup join cache backed by RocksDB
(#18231)
add fbeb93d353e4 fix: Remove trailing colon from incomplete error message
in HoodieTableMetadataUtil (#18233)
add 8c5f83237c11 fix: Fix typos across codebase (#18232)
add f5d08ec1bbae fix: Fix SHOW PARTITIONS commands functionality for
slash-separated date partitioning (#18195)
add 18ee6cd66b4f fix: Fix string handling on bloom index (#18240)
add 894b817f79af chore(ci): cleanup for print statements, showing
tables/schemas (#17771)
add fd0656c11304 fix: Use correct lastCompletedTransactionMetadata while
acquiring lock for clustering (#18198)
add 6156b74414d5 feat(spark): Add HoodieSparkSqlUtils APIs for tooling
(#18202)
add df5d7c8fd262 feat(spark-datasource): support spark.hoodie.* read
config overrides (#18205)
add 7f4dfdab3222 test: Add Scala test for record index rebootstrap on
non-Hoodie partitions (#18208)
add 5fb1b34a6c38 fix: Fail metadata bootstrap early in presence of 0 byte
file (#18209)
add fb7b1a5d9111 feat(metadata-table): Add count validation for record
index bootstrap (#18029)
add 967408cc9b76 refactor: move source assign package under split (#18253)
add 181af01a1a25 perf: Adding support for LatestBaseFilesPathFilter to
Spark File Index (#18136)
add 59a5d889bbe9 fix: add all fields in HoodieSourceSplitSerializer
(#18243)
add 363f41acbbda fix: [HUDI-CLUSTERING] Optimize binary copy performance
with lazy loading, bulk reads, and double buffering (#18241)
add d4ff54b571c7 fix(flink): Use timestamp based partitioning in
AutoRowDataKeyGen (#18090)
add e1ae9c6ac2fe feat(flink): collect event time in
HoodieRowDataCreateHandle for min/max event time metrics (#18250)
add 73e710de1332 feat(table-services): Emit archival metrics for
monitoring and debugging (#18133)
add 7e9abc71ed03 feat(table-services): Add config to filter partitions
during full clean (#17550)
add e063493337e5 feat(metrics): emit metric for rollback failures (#18148)
add 65c1b1217ece feat: Notebooks to support multiple hudi versions (#18255)
add 69e24ea4690e perf: eliminate unnecessary timeline loading for Flink
append only write path (#18264)
add 1203b215b03c feat: Use PartitionValueExtractor interface in Spark
reader path (#17850)
add 1b2cee800994 feat(vector): add VECTOR type to HoodieSchema (#18146)
add a31f15d22a21 fix: infer record merge mode for pre-v9 tables in
generateRequiredSchema (#18106)
add 43d8ed83f361 test(common): report jvm memory stats for unit tests
(#18207)
add abd8c22d7a80 fix(table-services): When applying rollback metadata to
metadata table (v6) do not rollback a metadata table deltacommit if it has been
already rolled back by post-commit rollback (#18160)
add 365398affac8 refactor: Hudi Flink source v2 with better context
management (#18269)
add 3139a1935ee6 feat(table-services): Allow users to not parallelize
each partition with engine context during clustering planning (#18191)
add 3dcf4c65b261 feat(client): Add pre-write validator framework (#18239)
add 31b8706dbb2d feat(vector): Add further research for supporting VECTOR
type to RFC-99 (#18184)
add 9ba276029311 feat(table-services): Support clustering file groups with
earlier instants times first (#18174)
add 4499b0b2bd43 feat(spark): ZooKeeper node should hold spark app id (for
helping debug when lock is held for long time) (#18123)
add 6e0d786b52b8 fix(flink): Don't perform table service during mdt
initialization if streaming write is enabled (#18283)
add bf4425b9f3d9 fix: Remove noisy logging when table partition is empty
(#18290)
add 729b30c128d8 fix: Improve config docs of enabling column stats in
metadata table (#18289)
add 2c1cb392df14 feat(vector): add converters from spark to hoodieSchema
for vectors (#18190)
add 8296df0de3a3 fix(flink): enable integration test for Hudi Flink Source
V2 (#18287)
add 22aa1fad6a0b fix: Databricks Spark 3.4 Runtime compatibility for
reading Hudi tables (#18292)
add 93b8e9fc9804 feat(flink): Add Kafka offset tracking to Flink Hudi
commits (#18127)
add d13310c6c119 perf(table-services): Incremental clean planning (for
COW) should ignore partitions from instants with only new file groups (#18016)
add b5daa30aed8c feat(flink): Add helper functions to parse Kafka offset
differences b… (#18125)
add f867059b3c0a fix(spark): SparkSQL write queries should correctly infer
HUDI configs from spark.hoodie.* configs in spark conf (#18297)
add cc3a5293bf5c fix(table-services): When single clustering group config
is disabled, clustering should not create clustering groups with same number of
input/output files (#18172)
add abb5fd2ad65d feat: add support for touch partitions in HiveSyncTool
(#18064)
add da244e12fc07 feat(flink): Support create table DDL without primary key
(#18086)
add b01ae22236d6 fix: sort partitions after filtering for clustering
planning (#18092)
add b31d5f7a4409 refactor: rewrite executors tests to avoid code
duplication (#18005)
add b77c7e5eb4d9 fix(common): Handle zero byte properties file and ensure
atomic writes during modification (#18058)
add ddfcc92b131d [HUDI-7503] Compaction execution should fail if another
active writer is already executing the same plan (#18012)
add 39cb726bebe9 feat(common): Add Policy for cleanup/rollback before each
write (#18197)
add e6723a8b2af5 fix(metadata): Allow metadata table bootstrap when
pending commits are being rolled back (#18033)
add 39f1f395b172 fix(common): Filter stray files when loading partitions
in AbstractTableFileSystemView (#18047)
add f64c93ee899c fix(clustering): When inferring wether an instant is
clustering, do not fail if replacecommit was rolled back already (by a
concurrent writer) (#18288)
add 74649c83045d docs: RFC-102 - Spark Vector Search in Apache Hudi
(#14218)
add f763da2bc197 feat(conflict-resolution): Allow
PreferWriterConflictResolutionStrategy to abort clustering if there is an
ongoing write that is in requested state. (#18280)
add a16d43171da9 feat(hudi-sync): Publish HUDI version to Hive metastore
(#18307)
add a17955528595 chore(ci): Add test jobs and Codecov integration in
GitHub Actions (#18225)
add b7b0b83e0ebf chore(ci): Simplify test combinations on Spark in Github
actions (#18336)
add 14a549f45c2c chore(ci): Add codecov coverage from tests running on
Spark 4.0 (#18335)
add 3a1ea4bf8602 feat(metasync): Support HMS 4.x in JDBC sync mode via
automatic Thrift fallback (#18227)
add bbda2428bfd1 feat(flink): Support write buffer based on flink managed
memory (#18319)
add cad08b1f2cfd feat(lance): Support bloom filter in Lance writer and
reader (#18304)
add 19c4cc9c3166 fix: Use explicit Throwable type in AvroConversionUtils
catch clause (#18342)
add 4e21ff1a3d1a docs: Update the build instructions by mentioning
profiles in README (#18310)
add b0e40f62e8b3 feat(utilities): add DELETE operation support for
HudiStreamer (#18088)
add b634262f060a feat(metadata-table): add config to disable automatic
deletion of MDT partitions (#18181)
add 26b324f267e5 fix(concurrency): detect rollback conflicts with ongoing
commit operations (#18089)
add 967e456cc456 feat(common): add core pre-commit validation framework -
Phase 1 (#18068)
add 3aef2cacdb1c fix: Fix flaky test
TestProtoConversionUtil#allFieldsSet_wellKnownTypesAndTimestampsAsRecords
(#18352)
add f74bf3a3e040 fix(flink): enable batch read it for flink source v2
(#18325)
add 941ae6200078 fix: modify the incorrect Hive configuration in hoodie
hive catalog (#18365)
add 81a8c26ee739 feat: support read commits limit in Hudi Flink Source V2
(#18369)
add 331b018d0cfc feat(hive-sync): add Spark-catalog based metastore client
implementation to avoid Hive-on-Spark classloader issues (#18203)
add 817b3ad7de92 fix(common): fix typos commited -> committed, commiting
-> committing (#18363)
add b60855defe0c feat: support read splits limit in Hudi Flink Source V2
(#18370)
add c2b401ed70ff feat(flink): Support bootstrap from RLI to local RocksDB
for flink bucket assigner (#18254)
add 41337396a9d6 perf: Skip unnecessary clean planning for MOR metadata
table file-version cleaning (#17943)
add 2f073643dfe7 feat: add graceful handling for post-commit failures with
metrics (#18196)
add 9859f9aa29df feat(flink): Support more efficient customized serializer
for HoodieRecordGlobalLocation (#18326)
add 69fa35b1015f feat(metadata): Defer RLI initialization for fresh tables
to optimize file group allocation (#18353)
add 56bc28398a47 feat(flink): add pre-commit validation framework for
Flink - Phase 2 (#18362)
add f15e1d060f96 feat: add Flink source reader function for cdc splits
(#18361)
add 3fc1deb68b0f feat(vector): Support writing VECTOR to parquet and avro
formats using Spark (#18328)
add bb5abb6b0483 fix: Optimizing internal schema lookup in
TableSchemaResolver (#18387)
add 1eb97b31826e [HUDI-7030] Commit-based Clustering Plan Strategy (#18251)
add 02e5efb41c7b fix: Fixed the issue of incorrect opName values in Flink
bulk insert writing (#18313)
add d241b0901b27 fix(flink): Improve splits distribution strategy for mor
table w/ bucket index (#18103)
add e930b834e95f feat: Add Unshredded Variant read & write support (#17833)
add e4bc9851bf58 Explicitly state the spark stage name (#18416)
add 54276a957b82 refactor: modularize long test methods in
TestHoodieClientOnCopyOnWriteStorage (#18377)
add 78109aa88b4a feat(schema): Add support to write shredded variants
add b702b7d75d83 feat(schema): Config path implemented for spark record
type
new afc6b4b5b06e feat(schema): Add read + write support for shredded for
AVRO
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (46779f9daff6)
\
N -- N -- N refs/heads/variant-intro-shredded-avro-support
(afc6b4b5b06e)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omit" are not gone; other references still
refer to them. Any revisions marked "discard" are gone forever.
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
.github/workflows/bot.yml | 589 ++++++-----
README.md | 9 +-
azure-pipelines-20230430.yml | 2 +-
.../hudi/cli/commands/TestLockAuditingCommand.java | 15 +-
.../testutils/HoodieCLIIntegrationTestBase.java | 1 -
.../testutils/HoodieCLIIntegrationTestHarness.java | 6 -
.../org/apache/hudi/client/BaseHoodieClient.java | 4 +
.../hudi/client/BaseHoodieTableServiceClient.java | 129 ++-
.../apache/hudi/client/BaseHoodieWriteClient.java | 107 +-
.../client/timeline/HoodieTimelineArchiver.java | 10 +
.../timeline/versioning/v1/TimelineArchiverV1.java | 26 +-
.../timeline/versioning/v2/TimelineArchiverV2.java | 27 +-
.../client/transaction/ConcurrentOperation.java | 40 +-
.../transaction/ConflictResolutionStrategy.java | 13 +
.../PreferWriterConflictResolutionStrategy.java | 129 ++-
.../SchemaConflictResolutionStrategy.java | 2 +-
...urrentFileWritesConflictResolutionStrategy.java | 40 +-
.../SimpleSchemaConflictResolutionStrategy.java | 4 +-
.../lock/BaseZookeeperBasedLockProvider.java | 4 +-
.../transaction/lock/HoodieInterProcessMutex.java | 65 ++
.../hudi/client/transaction/lock/LockManager.java | 5 +-
.../apache/hudi/client/utils/ArchivalMetrics.java | 75 ++
.../hudi/client/utils/PreWriteValidatorUtils.java | 137 +++
.../apache/hudi/client/utils/TransactionUtils.java | 2 +-
.../hudi/client/validator/PreWriteValidator.java | 75 ++
.../client/validator/StreamingOffsetValidator.java | 213 ++++
.../org/apache/hudi/config/HoodieCleanConfig.java | 38 +
.../apache/hudi/config/HoodieClusteringConfig.java | 46 +-
.../config/HoodiePreCommitValidatorConfig.java | 34 +
.../hudi/config/HoodiePreWriteValidatorConfig.java | 83 ++
.../org/apache/hudi/config/HoodieWriteConfig.java | 83 ++
.../java/org/apache/hudi/keygen/KeyGenUtils.java | 6 +-
.../metadata/HoodieBackedTableMetadataWriter.java | 104 +-
...ieBackedTableMetadataWriterTableVersionSix.java | 38 +-
.../hudi/metadata/HoodieMetadataWriteUtils.java | 8 +-
.../org/apache/hudi/metrics/HoodieMetrics.java | 85 ++
.../java/org/apache/hudi/table/HoodieTable.java | 10 +-
.../hudi/table/action/clean/CleanPlanner.java | 72 +-
.../cluster/ClusteringFileSliceComparator.java | 69 ++
.../cluster/ClusteringFileSliceSortByField.java | 20 +-
.../CommitBasedClusteringPlanStrategy.java | 341 +++++++
.../PartitionAwareClusteringPlanStrategy.java | 53 +-
.../TestConflictResolutionStrategyUtil.java | 19 +
...TestPreferWriterConflictResolutionStrategy.java | 415 ++++++++
.../lock/TestHoodieInterProcessMutex.java | 91 ++
.../client/utils/TestDeletePartitionUtils.java | 2 -
.../client/utils/TestPreWriteValidatorUtils.java | 434 ++++++++
.../validator/TestStreamingOffsetValidator.java | 552 ++++++++++
.../config/TestHoodiePreWriteValidatorConfig.java | 202 ++++
.../apache/hudi/config/TestHoodieWriteConfig.java | 2 +-
.../org/apache/hudi/io/TestHoodieCreateHandle.java | 6 +-
...ieBackedTableMetadataWriterTableVersionSix.java | 260 +++++
.../org/apache/hudi/metrics/TestHoodieMetrics.java | 71 ++
.../org/apache/hudi/table/TestHoodieTable.java | 52 +
.../TestCommitBasedClusteringPlanStrategy.java | 670 ++++++++++++
.../TestPartitionAwareClusteringPlanStrategy.java | 43 +
.../hudi/client/HoodieFlinkTableServiceClient.java | 62 +-
.../apache/hudi/client/HoodieFlinkWriteClient.java | 30 +-
.../io/storage/row/HoodieRowDataCreateHandle.java | 61 +-
.../org/apache/hudi/table/HoodieFlinkTable.java | 2 +-
.../commit/FlinkAutoCommitActionExecutor.java | 2 +-
.../apache/hudi/util/HoodieSchemaConverter.java | 7 +-
.../client/TestHoodieFlinkTableServiceClient.java | 117 +++
.../storage/row/TestHoodieRowDataCreateHandle.java | 439 ++++++++
.../testutils/HoodieFlinkClientTestHarness.java | 1 -
.../hudi/util/TestHoodieSchemaConverter.java | 40 +
.../apache/hudi/client/HoodieJavaWriteClient.java | 14 +-
.../run/strategy/JavaExecutionStrategy.java | 2 +-
.../hudi/client/TestJavaHoodieBackedMetadata.java | 56 +
.../apache/hudi/client/SparkRDDWriteClient.java | 18 +-
.../MultipleSparkJobExecutionStrategy.java | 2 +-
...SparkJobConsistentHashingExecutionStrategy.java | 2 +-
.../client/common/HoodieSparkEngineContext.java | 5 +
.../io/storage/HoodieSparkFileWriterFactory.java | 4 +-
.../hudi/io/storage/HoodieSparkLanceReader.java | 68 +-
.../hudi/io/storage/HoodieSparkLanceWriter.java | 31 +-
.../hudi/io/storage/HoodieSparkParquetReader.java | 26 +-
.../hudi/io/storage/VectorConversionUtils.java | 238 +++++
.../row/HoodieBloomFilterRowWriteSupport.java | 58 ++
.../storage/row/HoodieRowParquetWriteSupport.java | 221 ++--
.../org/apache/hudi/table/HoodieSparkTable.java | 25 +-
.../action/commit/SparkAutoCommitExecutor.java | 2 +-
.../org/apache/hudi/AvroConversionUtils.scala | 2 +-
.../apache/hudi/HoodieSchemaConversionUtils.scala | 1 +
.../scala/org/apache/hudi/HoodieSparkUtils.scala | 53 +-
.../SparkFileFormatInternalRowReaderContext.scala | 89 +-
.../apache/spark/sql/HoodieInternalRowUtils.scala | 24 -
.../sql/avro/HoodieSparkSchemaConverters.scala | 76 +-
.../datasources/SparkSchemaTransformUtils.scala | 61 +-
.../parquet/HoodieParquetFileFormatHelper.scala | 36 +-
.../parquet/HoodieParquetReadSupport.scala | 48 +-
.../org/apache/spark/sql/hudi/SparkAdapter.scala | 28 +-
.../callback/TestHoodieClientInitCallback.java | 1 +
.../hudi/client/TestSparkRDDWriteClient.java | 131 +++
.../TestSparkSizeBasedClusteringPlanStrategy.java | 229 ++++-
.../utils/TestSparkPreWriteValidatorUtils.java | 194 ++++
.../hudi/execution/BaseExecutorTestHarness.java | 324 ++++++
.../TestBoundedInMemoryExecutorInSpark.java | 136 +--
.../hudi/execution/TestBoundedInMemoryQueue.java | 2 +-
.../execution/TestDisruptorExecutionInSpark.java | 137 +--
.../hudi/execution/TestSimpleExecutionInSpark.java | 196 +---
.../apache/hudi/io/TestHoodieTimelineArchiver.java | 141 +++
.../java/org/apache/hudi/table/TestCleaner.java | 369 +++++++
.../TestSparkClusteringPlanPartitionFilter.java | 18 +-
.../table/functional/TestCleanPlanExecutor.java | 82 ++
.../TestSparkSchemaTransformUtils.scala | 66 +-
.../parquet/TestHoodieParquetReadSupport.scala | 34 +
.../org/apache/hudi/BaseHoodieTableFileIndex.java | 108 +-
.../client/validator/BasePreCommitValidator.java | 80 ++
.../hudi/client/validator/ValidationContext.java | 183 ++++
.../hudi/common/config/HoodieMetadataConfig.java | 60 +-
.../hudi/common/config/LockConfiguration.java | 2 +
.../hudi/common/data/HoodieListPairData.java | 8 +
.../apache/hudi/common/data/HoodiePairData.java | 11 +
.../hudi/common/engine/HoodieEngineContext.java | 8 +
.../java/org/apache/hudi/common/fs/FSUtils.java | 10 +-
.../hudi/common/model/HoodieCommitMetadata.java | 15 +
.../common/model/HoodiePreWriteCleanerPolicy.java | 74 ++
.../apache/hudi/common/schema/HoodieSchema.java | 624 +++++++++++-
.../HoodieSchemaComparatorForSchemaEvolution.java | 10 +
.../schema/HoodieSchemaCompatibilityChecker.java | 22 +
.../hudi/common/schema/HoodieSchemaType.java | 5 +
.../hudi/common/table/HoodieTableConfig.java | 29 +-
.../hudi/common/table/HoodieTableMetaClient.java | 12 +
.../hudi/common/table/TableSchemaResolver.java | 9 +-
.../apache/hudi/common/table/log/InstantRange.java | 86 +-
.../table/read/FileGroupReaderSchemaHandler.java | 28 +-
.../table/view/HoodieTableFileSystemView.java | 1 -
.../hudi/common/table/view/NoOpTableMetadata.java | 6 +
.../apache/hudi/common/util/CheckpointUtils.java | 327 ++++++
.../apache/hudi/common/util/ClusteringUtils.java | 29 +-
.../hudi/common/util/HoodieTableConfigUtils.java | 40 +-
.../hudi/common/util/collection/RocksDBDAO.java | 28 +-
.../exception/HoodieWriteConflictException.java | 31 +
.../java/org/apache/hudi/internal/schema/Type.java | 7 +-
.../org/apache/hudi/internal/schema/Types.java | 70 ++
.../schema/convert/InternalSchemaConverter.java | 64 +-
.../apache/hudi/metadata/BaseTableMetadata.java | 4 +-
.../metadata/FileSystemBackedTableMetadata.java | 26 +-
.../hudi/metadata/HoodieBackedTableMetadata.java | 67 ++
.../hudi/metadata/HoodieMetadataPayload.java | 10 +-
.../apache/hudi/metadata/HoodieTableMetadata.java | 23 +-
.../hudi/metadata/HoodieTableMetadataUtil.java | 15 +-
.../hudi/metadata/MetadataPartitionType.java | 4 +-
.../sync/common/model/PartitionValueExtractor.java | 0
.../hudi/util}/LazyConcatenatingIterator.java | 4 +-
.../apache/hudi/TestReportJvmConfiguration.java | 67 ++
.../common/data/TestHoodieListDataPairData.java | 18 +
.../common/model/TestHoodieCommitMetadata.java | 104 ++
.../hudi/common/schema/TestHoodieSchema.java | 950 ++++++++++++++++-
...stHoodieSchemaComparatorForSchemaEvolution.java | 18 +
.../schema/TestHoodieSchemaCompatibility.java | 136 +++
.../hudi/common/schema/TestHoodieSchemaType.java | 96 ++
.../read/TestFileGroupReaderSchemaHandler.java | 78 +-
.../hudi/common/util/TestCheckpointUtils.java | 245 +++++
.../TestClosableSortedDedupingIterator.java | 16 +-
.../common/util/collection/TestRocksDBDAO.java | 39 +
.../convert/TestInternalSchemaConverter.java | 155 +++
.../TestHoodieBackedTableMetadataDataCleanup.java | 2 +-
.../hudi/metadata/TestHoodieTableMetadataUtil.java | 9 +
.../hudi/util}/TestLazyConcatenatingIterator.java | 3 +-
hudi-flink-datasource/hudi-flink/pom.xml | 11 +
.../apache/hudi/configuration/FlinkOptions.java | 144 +++
.../apache/hudi/configuration/OptionsResolver.java | 31 +
.../apache/hudi/sink/FlinkCheckpointClient.java | 323 ++++++
.../org/apache/hudi/sink/StreamWriteFunction.java | 15 +-
.../hudi/sink/StreamWriteOperatorCoordinator.java | 20 +
.../AppendWriteFunctionWithBIMBufferSort.java | 17 +-
...AppendWriteFunctionWithDisruptorBufferSort.java | 8 +-
.../sink/bootstrap/AbstractBootstrapOperator.java | 83 ++
.../hudi/sink/bootstrap/BootstrapOperator.java | 50 +-
.../hudi/sink/bootstrap/RLIBootstrapOperator.java | 232 +++++
.../apache/hudi/sink/buffer/BufferMemoryType.java | 42 +-
.../hudi/sink/buffer/MemorySegmentPoolFactory.java | 65 +-
.../apache/hudi/sink/bulk/AutoRowDataKeyGen.java | 22 +-
.../hudi/sink/bulk/BulkInsertWriteFunction.java | 8 +
.../sink/common/AbstractStreamWriteFunction.java | 6 +
.../hudi/sink/common/AbstractWriteFunction.java | 12 +
.../hudi/sink/common/AbstractWriteOperator.java | 18 +
.../org/apache/hudi/sink/event/Correspondent.java | 38 +
.../hudi/sink/muttley/AthenaIngestionGateway.java | 346 +++++++
.../hudi/sink/muttley/FlinkHudiMuttleyClient.java | 246 +++++
.../muttley/FlinkHudiMuttleyClientException.java | 19 +-
.../sink/muttley/FlinkHudiMuttleyException.java | 38 +-
.../muttley/FlinkHudiMuttleyServerException.java | 19 +-
.../sink/partitioner/BucketAssignFunction.java | 1 +
.../partitioner/index/IndexBackendFactory.java | 30 +-
.../sink/partitioner/index/IndexWriteFunction.java | 11 +-
.../index/RecordGlobalLocationSerializer.java | 116 +++
...eIndexBackend.java => RocksDBIndexBackend.java} | 31 +-
.../org/apache/hudi/sink/utils/CommitGuard.java | 25 +
.../org/apache/hudi/sink/utils/EventBuffers.java | 22 +-
.../java/org/apache/hudi/sink/utils/Pipelines.java | 56 +-
.../sink/validator/FlinkKafkaOffsetValidator.java | 58 ++
.../sink/validator/FlinkValidationContext.java | 117 +++
.../hudi/sink/validator/FlinkValidatorUtils.java | 150 +++
.../java/org/apache/hudi/source/HoodieSource.java | 6 +-
.../apache/hudi/source/IncrementalInputSplits.java | 9 +-
.../enumerator/AbstractHoodieSplitEnumerator.java | 3 +-
.../HoodieContinuousSplitEnumerator.java | 11 +-
.../apache/hudi/source/reader/BatchRecords.java | 27 +-
.../source/reader/HoodieSourceSplitReader.java | 41 +-
.../function/HoodieCdcSplitReaderFunction.java | 1065 ++++++++++++++++++++
.../reader/function/HoodieSplitReaderFunction.java | 83 +-
.../StreamReadBucketIndexPartitioner.java | 27 +-
.../selector/StreamReadBucketIndexKeySelector.java | 7 +-
.../source/split/DefaultHoodieSplitDiscover.java | 31 +-
.../source/split/DefaultHoodieSplitProvider.java | 2 +-
.../split/HoodieCdcSourceSplit.java} | 45 +-
.../source/split/HoodieContinuousSplitBatch.java | 63 +-
.../hudi/source/split/HoodieSourceSplit.java | 12 +-
.../source/split/HoodieSourceSplitSerializer.java | 134 ++-
.../assign/DefaultHoodieSplitAssigner.java | 2 +-
.../{ => split}/assign/HoodieSplitAssigner.java | 2 +-
.../{ => split}/assign/HoodieSplitAssigners.java | 2 +-
.../assign/HoodieSplitBucketAssigner.java | 2 +-
.../assign/HoodieSplitNumberAssigner.java | 2 +-
.../org/apache/hudi/table/HoodieTableFactory.java | 9 +
.../org/apache/hudi/table/HoodieTableSource.java | 45 +-
.../hudi/table/catalog/HoodieHiveCatalog.java | 13 +-
.../org/apache/hudi/table/format/FormatUtils.java | 1 +
.../hudi/table/format/cdc/CdcInputFormat.java | 9 +-
.../hudi/table/format/cdc/CdcInputSplit.java | 3 +-
.../table/format/mor/MergeOnReadInputFormat.java | 2 +-
.../table/format/mor/MergeOnReadInputSplit.java | 6 +-
.../table/format/mor/MergeOnReadTableState.java | 6 +-
.../HeapLookupCache.java} | 40 +-
.../hudi/table/lookup/HoodieLookupFunction.java | 56 +-
.../org/apache/hudi/table/lookup/LookupCache.java | 60 ++
.../hudi/table/lookup/RocksDBLookupCache.java | 181 ++++
.../java/org/apache/hudi/util/FileIndexReader.java | 9 +-
.../org/apache/hudi/util/FlinkWriteClients.java | 4 +
.../java/org/apache/hudi/util/HoodiePipeline.java | 20 +-
.../apache/hudi/util/KafkaOffsetParseUtils.java | 201 ++++
.../java/org/apache/hudi/util/StreamerUtil.java | 277 +++++
.../apache/hudi/sink/ITTestDataStreamWrite.java | 31 +-
.../hudi/sink/TestFlinkCheckpointClient.java | 451 +++++++++
.../hudi/sink/TestFlinkCheckpointClientMock.java | 318 ++++++
.../sink/TestStreamWriteOperatorCoordinator.java | 465 +++++++++
.../sink/buffer/TestMemorySegmentPoolFactory.java | 108 ++
.../apache/hudi/sink/bulk/TestRowDataKeyGens.java | 20 +
.../index/TestRecordGlobalLocationSerializer.java | 191 ++++
.../partitioner/index/TestRocksDBIndexBackend.java | 54 +
.../utils/BucketStreamWriteFunctionWrapper.java | 4 +-
.../sink/utils/StreamWriteFunctionWrapper.java | 5 +-
.../apache/hudi/sink/utils/TestCommitGuard.java | 78 ++
.../apache/hudi/sink/utils/TestEventBuffers.java | 121 +++
.../validator/TestFlinkKafkaCheckpointParsing.java | 171 ++++
.../validator/TestFlinkKafkaOffsetValidator.java | 343 +++++++
.../sink/validator/TestFlinkValidationContext.java | 200 ++++
.../sink/validator/TestFlinkValidatorUtils.java | 290 ++++++
.../org/apache/hudi/source/TestHoodieSource.java | 14 +-
.../apache/hudi/source/TestStreamReadOperator.java | 2 +-
.../TestHoodieContinuousSplitEnumerator.java | 202 +++-
.../TestHoodieEnumeratorStateSerializer.java | 5 +-
.../TestHoodieStaticSplitEnumerator.java | 5 +-
.../hudi/source/reader/TestBatchRecords.java | 31 -
.../source/reader/TestHoodieRecordEmitter.java | 3 +-
.../source/reader/TestHoodieSourceSplitReader.java | 65 +-
.../function/TestHoodieCdcSplitReaderFunction.java | 196 ++++
.../function/TestHoodieSplitReaderFunction.java | 226 ++++-
.../split/TestDefaultHoodieSplitDiscover.java | 77 +-
.../split/TestDefaultHoodieSplitProvider.java | 21 +-
.../source/split/TestHoodieCdcSourceSplit.java | 183 ++++
.../split/TestHoodieContinuousSplitBatch.java | 368 +++++++
.../hudi/source/split/TestHoodieSourceSplit.java | 182 +++-
.../split/TestHoodieSourceSplitComparator.java | 3 +-
.../split/TestHoodieSourceSplitSerializer.java | 903 ++++++++++++++++-
.../assign/TestDefaultHoodieSplitAssigner.java | 5 +-
.../assign/TestHoodieSplitAssigners.java | 2 +-
.../assign/TestHoodieSplitBucketAssigner.java | 5 +-
.../assign/TestHoodieSplitNumberAssigner.java | 5 +-
.../apache/hudi/table/ITTestHoodieDataSource.java | 231 ++++-
.../ITTestVariantCrossEngineCompatibility.java | 10 +-
.../hudi/util/TestKafkaOffsetParseUtils.java | 273 +++++
.../apache/hudi/utils/TestFlinkWriteClients.java | 17 +
.../main/java/org/apache/hudi/adapter/Utils.java | 25 +
.../main/java/org/apache/hudi/adapter/Utils.java | 25 +
.../main/java/org/apache/hudi/adapter/Utils.java | 26 +
.../main/java/org/apache/hudi/adapter/Utils.java | 26 +
.../main/java/org/apache/hudi/adapter/Utils.java | 26 +
.../main/java/org/apache/hudi/adapter/Utils.java | 26 +
.../apache/hudi/avro/HoodieAvroWriteSupport.java | 51 +-
.../hudi/io/lance/HoodieBaseLanceWriter.java | 20 +-
.../parquet/io/ByteArraySeekableInputStream.java | 125 +++
.../parquet/io/HoodieParquetBinaryCopyBase.java | 388 +++----
.../parquet/io/HoodieParquetFileBinaryCopier.java | 316 +++++-
.../avro/AvroSchemaConverterWithTimestampNTZ.java | 12 +
...TestHoodieAvroWriteSupportVariantShredding.java | 508 ----------
.../index/TestBaseHoodieTableFileIndex.java | 2 +-
.../hudi/common/table/TestHoodieTableConfig.java | 4 +-
.../hudi/common/table/TestTableSchemaResolver.java | 125 +++
.../hudi/common/util/TestClusteringUtils.java | 143 +++
.../TestHoodieNativeAvroHFileReaderCaching.java | 39 +-
...oodieAvroFileWriterFactoryVariantShredding.java | 275 -----
.../TestFileSystemBackedTableMetadata.java | 48 +
.../io/TestByteArraySeekableInputStream.java | 195 ++++
...HoodieParquetBinaryCopyBaseSchemaEvolution.java | 132 ++-
.../io/TestHoodieParquetFileBinaryCopier.java | 101 +-
.../TestHoodieParquetFileBinaryCopierPrefetch.java | 142 +++
.../io/TestOutputStreamBackedOutputFile.java | 83 ++
.../parquet/avro/TestAvroSchemaConverter.java | 23 +-
.../hudi/hadoop/HiveHoodieTableFileIndex.java | 1 +
.../hadoop/HoodieLatestBaseFilesPathFilter.java | 29 +-
.../hudi/hadoop/HoodieROTablePathFilter.java | 15 +-
.../hudi/hadoop/TestHoodieROTablePathFilter.java | 16 +-
.../hudi/hadoop/utils/TestHiveAvroSerializer.java | 1 -
hudi-notebooks/Dockerfile.spark | 49 +-
hudi-notebooks/build.sh | 22 +-
hudi-notebooks/conf/spark/spark-defaults.conf | 5 -
hudi-notebooks/docker-compose.yml | 4 +-
hudi-notebooks/notebooks/01-crud-operations.ipynb | 2 +-
hudi-notebooks/notebooks/02-query-types.ipynb | 2 +-
.../notebooks/03-scd-type2_and_type4.ipynb | 2 +-
hudi-notebooks/notebooks/04-schema-evolution.ipynb | 2 +-
.../notebooks/05-mastering-sql-procedures.ipynb | 2 +-
.../notebooks/06_hudi_trino_example.ipynb | 325 ++++++
.../notebooks/07_hudi_presto_example.ipynb | 325 ++++++
hudi-notebooks/notebooks/utils.py | 191 ++--
hudi-notebooks/requirements.txt | 6 +
hudi-notebooks/run_spark_hudi.sh | 2 +
.../scala/org/apache/hudi/DataSourceOptions.scala | 52 +-
.../org/apache/hudi/DatabricksRuntimeHelper.scala | 75 ++
.../main/scala/org/apache/hudi/DefaultSource.scala | 15 +-
.../scala/org/apache/hudi/HoodieBaseRelation.scala | 10 +-
.../scala/org/apache/hudi/HoodieFileIndex.scala | 24 +
.../hudi/HoodieHadoopFsRelationFactory.scala | 3 +-
.../org/apache/hudi/HoodieSparkSqlWriter.scala | 8 +-
.../scala/org/apache/hudi/HoodieWriterUtils.scala | 10 +
.../apache/hudi/SparkHoodieTableFileIndex.scala | 31 +-
.../sql/catalyst/catalog/HoodieCatalogTable.scala | 22 +-
.../HoodieFileGroupReaderBasedFileFormat.scala | 134 ++-
.../sql/hive/SparkCatalogMetaStoreClient.scala | 380 +++++++
.../spark/sql/hudi/HoodieSqlCommonUtils.scala | 4 +
.../spark/sql/hudi/ProvidesHoodieConfig.scala | 34 +-
.../hudi/command/CreateHoodieTableCommand.scala | 1 +
.../command/ShowHoodieTablePartitionsCommand.scala | 29 +-
.../org/apache/hudi/TestDataSourceOptions.scala | 54 +-
.../java/org/apache/hudi/HoodieSparkSQLUtils.java | 118 +++
.../apache/hudi/TestDecimalTypeDataWorkflow.scala | 6 +-
.../hudi/client/TestHoodieClientMultiWriter.java | 381 +++++++
...DataValidationCheckForLogCompactionActions.java | 16 -
.../TestHoodieClientOnCopyOnWriteStorage.java | 538 +++++-----
.../TestRemoteFileSystemViewWithMetadataTable.java | 1 -
.../hudi/functional/TestHoodieBackedMetadata.java | 279 +++++
...SparkBinaryCopyClusteringAndValidationMeta.java | 2 +-
.../io/storage/TestHoodieSparkLanceReader.java | 36 +-
.../io/storage/TestHoodieSparkLanceWriter.java | 22 +-
.../TestCopyOnWriteRollbackActionExecutor.java | 3 +-
...dieSparkMergeOnReadTableInsertUpdateDelete.java | 2 +
.../TestIncrementalQueryWithArchivedInstants.scala | 2 +-
.../hudi/TestAvroSchemaResolutionSupport.scala | 204 ++--
.../hudi/TestHoodieSchemaConversionUtils.scala | 284 ++++++
.../org/apache/hudi/TestHoodieSparkSqlWriter.scala | 6 +-
.../org/apache/hudi/TestInsertDedupPolicy.scala | 2 -
.../functional/PartitionStatsIndexTestBase.scala | 1 -
.../TestAutoGenerationOfRecordKeys.scala | 8 -
.../hudi/functional/TestBasicSchemaEvolution.scala | 8 -
.../apache/hudi/functional/TestCOWDataSource.scala | 5 +-
.../apache/hudi/functional/TestMORDataSource.scala | 12 +-
.../functional/TestPartialUpdateAvroPayload.scala | 8 -
.../hudi/functional/TestRecordLevelIndex.scala | 201 +++-
.../TestSparkDataSourceDAGExecution.scala | 9 -
.../hudi/functional/TestVectorDataSource.scala | 1000 ++++++++++++++++++
.../functional/cdc/TestCDCDataFrameSuite.scala | 6 +-
.../functional/cdc/TestCDCStreamingSuite.scala | 4 +-
.../hudi/utils/TestHoodieSparkSQLUtils.scala | 109 ++
.../TestBaseSpark3AdapterVariantMethods.scala | 77 ++
.../TestBaseSpark4AdapterVariantMethods.scala | 262 +++++
.../org/apache/spark/sql/avro/TestAvroSerDe.scala | 136 ++-
.../spark/sql/avro/TestSchemaConverters.scala | 72 +-
.../sql/hive/TestSparkCatalogMetaStoreClient.scala | 246 +++++
.../sql/hudi/common/MockSlashKeyGenerator.scala | 134 +++
.../common/MockSlashPartitionValueExtractor.scala | 40 +
.../common/TestCustomParitionValueExtractor.scala | 386 +++++++
.../sql/hudi/common/TestROPathFilterOnRead.scala | 352 +++++++
.../apache/spark/sql/hudi/common/TestSqlConf.scala | 35 +-
.../spark/sql/hudi/ddl/TestShowPartitions.scala | 47 +
.../apache/spark/sql/hudi/ddl/TestSpark3DDL.scala | 13 +-
.../spark/sql/hudi/ddl/TestSparkCatalogSync.scala | 162 +++
.../sql/hudi/dml/schema/TestVariantDataType.scala | 318 +-----
.../sql/hudi/feature/TestCDCForSparkSQL.scala | 28 +-
.../spark/sql/adapter/BaseSpark3Adapter.scala | 10 +-
.../apache/spark/sql/avro/AvroDeserializer.scala | 35 +
.../org/apache/spark/sql/avro/AvroSerializer.scala | 38 +
.../apache/spark/sql/avro/AvroDeserializer.scala | 35 +
.../org/apache/spark/sql/avro/AvroSerializer.scala | 38 +
.../HoodieSpark34PartitionedFileUtils.scala | 3 +-
.../apache/spark/sql/avro/AvroDeserializer.scala | 35 +
.../org/apache/spark/sql/avro/AvroSerializer.scala | 38 +
.../spark/sql/adapter/BaseSpark4Adapter.scala | 45 +-
.../TestSpark4VariantShreddingProvider.java | 279 -----
.../apache/spark/sql/avro/AvroDeserializer.scala | 65 +-
.../org/apache/spark/sql/avro/AvroSerializer.scala | 40 +-
.../TestHoodieRowParquetWriteSupportVariant.java | 444 ++++++++
.../java/org/apache/hudi/hive/HiveSyncConfig.java | 4 +
.../org/apache/hudi/hive/HiveSyncConfigHolder.java | 6 +
.../java/org/apache/hudi/hive/HiveSyncTool.java | 11 +-
.../org/apache/hudi/hive/HoodieHiveSyncClient.java | 321 +++++-
.../java/org/apache/hudi/hive/ddl/DDLExecutor.java | 10 +
.../org/apache/hudi/hive/ddl/HMSDDLExecutor.java | 46 +-
.../hudi/hive/ddl/JDBCBasedMetadataOperator.java | 300 ++++++
.../org/apache/hudi/hive/ddl/JDBCExecutor.java | 9 +
.../hudi/hive/ddl/QueryBasedDDLExecutor.java | 64 +-
.../org/apache/hudi/hive/TestHiveSyncTool.java | 95 +-
.../hive/ddl/TestJDBCBasedMetadataOperator.java | 190 ++++
.../hudi/sync/common/HoodieMetaSyncOperations.java | 19 +
.../apache/hudi/sync/common/HoodieSyncClient.java | 4 +
.../apache/hudi/sync/common/HoodieSyncConfig.java | 45 +-
.../hudi/sync/common/model/PartitionEvent.java | 6 +-
.../hudi/sync/common/TestHoodieSyncConfig.java | 16 +
.../resources/log4j2-surefire-quiet.properties | 2 +-
.../src/main/resources/log4j2-surefire.properties | 2 +-
.../sources/helpers/ProtoConversionUtil.java | 2 +-
.../utilities/sources/helpers/QueryRunner.java | 2 +-
.../utilities/streamer/BaseErrorTableWriter.java | 6 +-
.../apache/hudi/utilities/streamer/StreamSync.java | 12 +-
.../deltastreamer/TestHoodieDeltaStreamer.java | 18 +-
...TestHoodieDeltaStreamerSchemaEvolutionBase.java | 6 +-
.../utilities/sources/TestJsonKafkaSource.java | 6 +-
.../sources/helpers/TestProtoConversionUtil.java | 2 +-
rfc/rfc-102/cat_emebdding.png | Bin 0 -> 6075633 bytes
rfc/rfc-102/comparison_embedding.png | Bin 0 -> 6920384 bytes
rfc/rfc-102/embedding_table.png | Bin 0 -> 6452822 bytes
rfc/rfc-102/rfc-102.md | 227 +++++
rfc/rfc-99/appendix.md | 246 +++++
rfc/rfc-99/rfc-99.md | 30 +-
style/checkstyle-suppressions.xml | 1 +
428 files changed, 33163 insertions(+), 4561 deletions(-)
create mode 100644
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/HoodieInterProcessMutex.java
create mode 100644
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/ArchivalMetrics.java
create mode 100644
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/PreWriteValidatorUtils.java
create mode 100644
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/validator/PreWriteValidator.java
create mode 100644
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/validator/StreamingOffsetValidator.java
create mode 100644
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodiePreWriteValidatorConfig.java
create mode 100644
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ClusteringFileSliceComparator.java
copy
hudi-common/src/main/java/org/apache/hudi/common/bloom/BloomFilterTypeCode.java
=>
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ClusteringFileSliceSortByField.java
(56%)
create mode 100644
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/CommitBasedClusteringPlanStrategy.java
create mode 100644
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/lock/TestHoodieInterProcessMutex.java
create mode 100644
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/utils/TestPreWriteValidatorUtils.java
create mode 100644
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/validator/TestStreamingOffsetValidator.java
create mode 100644
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/config/TestHoodiePreWriteValidatorConfig.java
create mode 100644
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metadata/TestHoodieBackedTableMetadataWriterTableVersionSix.java
create mode 100644
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/cluster/strategy/TestCommitBasedClusteringPlanStrategy.java
create mode 100644
hudi-client/hudi-flink-client/src/test/java/org/apache/hudi/client/TestHoodieFlinkTableServiceClient.java
create mode 100644
hudi-client/hudi-flink-client/src/test/java/org/apache/hudi/io/storage/row/TestHoodieRowDataCreateHandle.java
create mode 100644
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/VectorConversionUtils.java
create mode 100644
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/row/HoodieBloomFilterRowWriteSupport.java
create mode 100644
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/utils/TestSparkPreWriteValidatorUtils.java
create mode 100644
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/execution/BaseExecutorTestHarness.java
create mode 100644
hudi-common/src/main/java/org/apache/hudi/client/validator/BasePreCommitValidator.java
create mode 100644
hudi-common/src/main/java/org/apache/hudi/client/validator/ValidationContext.java
create mode 100644
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePreWriteCleanerPolicy.java
create mode 100644
hudi-common/src/main/java/org/apache/hudi/common/util/CheckpointUtils.java
rename {hudi-sync/hudi-sync-common =>
hudi-common}/src/main/java/org/apache/hudi/sync/common/model/PartitionValueExtractor.java
(100%)
rename
{hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils =>
hudi-common/src/main/java/org/apache/hudi/util}/LazyConcatenatingIterator.java
(96%)
create mode 100644
hudi-common/src/test/java/org/apache/hudi/TestReportJvmConfiguration.java
create mode 100644
hudi-common/src/test/java/org/apache/hudi/common/util/TestCheckpointUtils.java
rename {hudi-client/hudi-client-common/src/test/java/org/apache/hudi/utils =>
hudi-common/src/test/java/org/apache/hudi/util}/TestLazyConcatenatingIterator.java
(97%)
create mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/FlinkCheckpointClient.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/AbstractBootstrapOperator.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/RLIBootstrapOperator.java
copy
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/writer/DeltaInputWriter.java
=>
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/buffer/BufferMemoryType.java
(54%)
create mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/muttley/AthenaIngestionGateway.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/muttley/FlinkHudiMuttleyClient.java
copy
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/bootstrap/translator/IdentityBootstrapPartitionPathTranslator.java
=>
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/muttley/FlinkHudiMuttleyClientException.java
(65%)
copy hudi-io/src/main/java/org/apache/hudi/exception/HoodieException.java =>
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/muttley/FlinkHudiMuttleyException.java
(57%)
copy
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/bootstrap/translator/IdentityBootstrapPartitionPathTranslator.java
=>
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/muttley/FlinkHudiMuttleyServerException.java
(65%)
create mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/RecordGlobalLocationSerializer.java
copy
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/{FlinkStateIndexBackend.java
=> RocksDBIndexBackend.java} (50%)
create mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/validator/FlinkKafkaOffsetValidator.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/validator/FlinkValidationContext.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/validator/FlinkValidatorUtils.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/HoodieCdcSplitReaderFunction.java
copy
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/{table/format/cdc/CdcInputSplit.java
=> source/split/HoodieCdcSourceSplit.java} (51%)
rename hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/{
=> split}/assign/DefaultHoodieSplitAssigner.java (97%)
rename hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/{
=> split}/assign/HoodieSplitAssigner.java (96%)
rename hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/{
=> split}/assign/HoodieSplitAssigners.java (96%)
rename hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/{
=> split}/assign/HoodieSplitBucketAssigner.java (97%)
rename hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/{
=> split}/assign/HoodieSplitNumberAssigner.java (97%)
copy
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/{format/SchemaEvolvedRecordIterator.java
=> lookup/HeapLookupCache.java} (54%)
create mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/lookup/LookupCache.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/lookup/RocksDBLookupCache.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/KafkaOffsetParseUtils.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestFlinkCheckpointClient.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestFlinkCheckpointClientMock.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/buffer/TestMemorySegmentPoolFactory.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/partitioner/index/TestRecordGlobalLocationSerializer.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/partitioner/index/TestRocksDBIndexBackend.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestCommitGuard.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestEventBuffers.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/validator/TestFlinkKafkaCheckpointParsing.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/validator/TestFlinkKafkaOffsetValidator.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/validator/TestFlinkValidationContext.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/validator/TestFlinkValidatorUtils.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/reader/function/TestHoodieCdcSplitReaderFunction.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/split/TestHoodieCdcSourceSplit.java
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/split/TestHoodieContinuousSplitBatch.java
rename hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/{
=> split}/assign/TestDefaultHoodieSplitAssigner.java (99%)
rename hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/{
=> split}/assign/TestHoodieSplitAssigners.java (99%)
rename hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/{
=> split}/assign/TestHoodieSplitBucketAssigner.java (99%)
rename hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/source/{
=> split}/assign/TestHoodieSplitNumberAssigner.java (98%)
create mode 100644
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/util/TestKafkaOffsetParseUtils.java
create mode 100644
hudi-hadoop-common/src/main/java/org/apache/hudi/parquet/io/ByteArraySeekableInputStream.java
delete mode 100644
hudi-hadoop-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroWriteSupportVariantShredding.java
delete mode 100644
hudi-hadoop-common/src/test/java/org/apache/hudi/io/storage/hadoop/TestHoodieAvroFileWriterFactoryVariantShredding.java
create mode 100644
hudi-hadoop-common/src/test/java/org/apache/hudi/parquet/io/TestByteArraySeekableInputStream.java
create mode 100644
hudi-hadoop-common/src/test/java/org/apache/hudi/parquet/io/TestHoodieParquetFileBinaryCopierPrefetch.java
create mode 100644
hudi-hadoop-common/src/test/java/org/apache/hudi/parquet/io/TestOutputStreamBackedOutputFile.java
copy
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/versioning/v2/TimelinePathProviderV2.java
=>
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieLatestBaseFilesPathFilter.java
(52%)
create mode 100644 hudi-notebooks/notebooks/06_hudi_trino_example.ipynb
create mode 100644 hudi-notebooks/notebooks/07_hudi_presto_example.ipynb
create mode 100644 hudi-notebooks/requirements.txt
create mode 100644
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DatabricksRuntimeHelper.scala
create mode 100644
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hive/SparkCatalogMetaStoreClient.scala
create mode 100644
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/HoodieSparkSQLUtils.java
create mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestVectorDataSource.scala
create mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/utils/TestHoodieSparkSQLUtils.scala
create mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/adapter/TestBaseSpark3AdapterVariantMethods.scala
create mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/adapter/TestBaseSpark4AdapterVariantMethods.scala
create mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hive/TestSparkCatalogMetaStoreClient.scala
create mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/MockSlashKeyGenerator.scala
create mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/MockSlashPartitionValueExtractor.scala
create mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/TestCustomParitionValueExtractor.scala
create mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/common/TestROPathFilterOnRead.scala
create mode 100644
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/ddl/TestSparkCatalogSync.scala
delete mode 100644
hudi-spark-datasource/hudi-spark4-common/src/test/java/org/apache/hudi/variant/TestSpark4VariantShreddingProvider.java
create mode 100644
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/JDBCBasedMetadataOperator.java
create mode 100644
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/ddl/TestJDBCBasedMetadataOperator.java
create mode 100644 rfc/rfc-102/cat_emebdding.png
create mode 100644 rfc/rfc-102/comparison_embedding.png
create mode 100644 rfc/rfc-102/embedding_table.png
create mode 100644 rfc/rfc-102/rfc-102.md
create mode 100644 rfc/rfc-99/appendix.md