LadyForest opened a new pull request #18394:
URL: https://github.com/apache/flink/pull/18394


   ## What is the purpose of the change
   
   This PR aims to implement `ALTER TABLE table_identifier [PARTITION 
partition_spec] COMPACT`, which will invoke a batch job to perform file 
compaction for file store, the more details can be found at 
[FLIP-188](https://cwiki.apache.org/confluence/display/FLINK/FLIP-188%3A+Introduce+Built-in+Dynamic+Table+Storage).
   
   
   ## Brief changelog
   
   * Add interface method `ManagedTableFactory#onCompact`, which returns a 
`Map<String, String>` of metadata for file entries to be compacted, as dynamic 
options.
   
   * Change `AlterTableCompactOperation` to extends `CatalogQueryOperation`, 
which carries the dynamic options aforementioned. During the operation 
conversion, let `AlterTableCompactOperation` be the child of a 
`CatalogModifyOperation`, and thus the modify operation can go through 
`PlannerBase#translateToRel`
   
    *  Add `FlinkRelBuilder#compactScan` to translate 
`AlterTableCompactOperation` to rel. The dynamic options is designed to be 
translated as hints, which requires `FlinkHintStrategies` to be added to 
`HintStrategyTable` during optimizing phase.
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
     - 
`SqlToOperationConverterTest#testAlterTableCompactOnManagedNonPartitionedTable` 
and 
`SqlToOperationConverterTest#testAlterTableCompactOnManagedPartitionedTable` 
verfy converting the SQL clause to desired operation.
     - `TestManagedTableFactory#onCompactTable` as a test impl method, which 
injects `compact.file-base-path` and `compact.file-entries` options to the 
current managed table.
     - `CompactManagedTableTest` to verify plan.
     - `CompactManagedTableITCase`, which uses the `datagen` and `filesystem` 
connector to prepare some local files to compact, the compaction strategy is 
rolling all files under each partition and recreating file which named with 
pattern `compact-${uuid}-file-0`. The test checks on 
non-partitioned/single-partitioned/multi-partitioned table with/without 
partition spec, and check two successive compaction's idempotence.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (**yes** / no): add 
test dependencies
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (**yes** / no)
     - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to