ji chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/22914 )
Change subject: IMPALA-14081: Support create/drop paimon table for impala ...................................................................... Patch Set 13: (41 comments) http://gerrit.cloudera.org:8080/#/c/22914/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22914/8//COMMIT_MSG@9 PS8, Line 9: > nit: we limit commit text body to 72 chars. Done http://gerrit.cloudera.org:8080/#/c/22914/8//COMMIT_MSG@37 PS8, Line 37: ON 'hdfs > Is it required to have the EXTERNAL keyword here? Can't Impala create a bra Done http://gerrit.cloudera.org:8080/#/c/22914/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22914/11//COMMIT_MSG@12 PS11, Line 12: a type > It'd be good to mention all the datatypes supported. Done http://gerrit.cloudera.org:8080/#/c/22914/11//COMMIT_MSG@46 PS11, Line 46: > Would also recommend writing a TODO section for all things that are WIP or Done http://gerrit.cloudera.org:8080/#/c/22914/6/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/22914/6/bin/impala-config.sh@289 PS6, Line 289: export IMPALA_RELOAD4j_VERSION=1.2 > How often we need to change version? I see current released version is actu the version can be changed.will use the latest release in the next revision. http://gerrit.cloudera.org:8080/#/c/22914/8/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/22914/8/bin/impala-config.sh@1244 PS8, Line 1244: echo "IMPALA_ICEBERG_VERSION = $IMPALA_ICEBERG_VERSION > nit: This line could come right after IMPALA_ICEBERG_VERSION, where we list Done http://gerrit.cloudera.org:8080/#/c/22914/8/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/22914/8/common/thrift/CatalogObjects.thrift@708 PS8, Line 708: > nit: missing space Done http://gerrit.cloudera.org:8080/#/c/22914/8/common/thrift/CatalogObjects.thrift@775 PS8, Line 775: // Set if this is a system table > nit: Missing empty line before the comment Done http://gerrit.cloudera.org:8080/#/c/22914/8/docs/topics/impala_paimon.xml File docs/topics/impala_paimon.xml: http://gerrit.cloudera.org:8080/#/c/22914/8/docs/topics/impala_paimon.xml@40 PS8, Line 40: Impala now adds experimental support for Apache Paim > nit: Impala now adds experimental support for Apache Paimon? Done http://gerrit.cloudera.org:8080/#/c/22914/8/docs/topics/impala_paimon.xml@77 PS8, Line 77: <li> : Scalable metadata: supports storing Petabyte large-scale datasets and storing a large : number of partitions. : </li> : <li> : Supports ACID Transactions & Time Travel & Schema Evolution. : </li> : </ul> > Not sure we want to mention these while Impala doesn't support these operat will remove this item http://gerrit.cloudera.org:8080/#/c/22914/11/docs/topics/impala_paimon.xml File docs/topics/impala_paimon.xml: http://gerrit.cloudera.org:8080/#/c/22914/11/docs/topics/impala_paimon.xml@237 PS11, Line 237: 1.1 > nit: 1.1? Done http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/cup/sql-parser.cup@2302 PS8, Line 2302: | KW_STORED KW_BY storage_engine_val:storage_engine > For Iceberg we also allow STORED BY ICEBERG, I think we should also allow i Done http://gerrit.cloudera.org:8080/#/c/22914/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/22914/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@20 PS6, Line 20: import java.util.ArrayDeque; : import java.util.ArrayList; : import java.util.Arrays; : import java.util.Collections; : import java.util.Comparator; : import java.util.Deque; : import java.util.HashMap; : import java.util.HashSet; : import java.util.IdentityHashMap; : import java.util.Iterator; : import java.util.LinkedHashMap; > Please minimize import reordering and keep the old grouping for now. Thanks, will keep the original ordering http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java File fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java: http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java@83 PS11, Line 83: to file-based t > It is quite an obsolete statement. I think "SHOW FILES is applicable only t Done http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java@91 PS11, Line 91: if (!showFileStmtSupport.supportPartitionFilter()) { > Can we say "SHOW FILES with partition filter is not applicable to table typ Done http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/paimon/PaimonAnalyzer.java: http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonAnalyzer.java@257 PS11, Line 257: !colSets.contains(col) > 'colSets' contains lower cased column names, but PaimonUtil.extractColumnNa Done http://gerrit.cloudera.org:8080/#/c/22914/9/fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java File fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java: http://gerrit.cloudera.org:8080/#/c/22914/9/fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java@83 PS9, Line 83: "org.apache.paimon.hive.PaimonSerDe"); > nit: Add comment like others. will fix http://gerrit.cloudera.org:8080/#/c/22914/9/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/22914/9/fe/src/main/java/org/apache/impala/catalog/Table.java@20 PS9, Line 20: import java.util.ArrayList; : import java.util.Collections; : import java.util.HashMap; : import java.util.List; > For existing java files, please maintain the old import order and keep code thanks, will fix http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java File fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java: http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@55 PS11, Line 55: > nit: too much indent Done http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java File fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java: http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java@48 PS11, Line 48: > Is jobConf needed? yes, it is needed to cache the hadoop conf,and used in related ddl operations. http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/ImpalaTypeUtils.java File fe/src/main/java/org/apache/impala/catalog/paimon/ImpalaTypeUtils.java: http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/ImpalaTypeUtils.java@156 PS11, Line 156: public Type visit(TimestampType timestampType) { : return Type.TIMESTAMP; : } : > Can you add a comment or explain why these and the following > functions are commented? will remote the comments, these types are not supported. http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/ImpalaTypeUtils.java@258 PS11, Line 258: se INT: > If this is a TODO, please add it to the TODO list. these types are not supported, will use default: http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonCatalogOpExecutor.java@122 PS11, Line 122: on tab > Do you mean 'e'? Also can you be a bit more descriptive with these exceptio Done http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonCatalogOpExecutor.java@177 PS11, Line 177: n("Error while in > Please be more descriptive. Done http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java: http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@180 PS8, Line 180: } : numClusteringCols_ = impalaPartitionedFields.size(); : // sync table properties from underlying paimon table : final Map<String, String> paimonProps = : Maps.newHashMap(getPaimonApiTable().options()); : for (String key : PAIMON_EXCLUDED_PROPERTIES) { paimonProps.remove(key); } : > Will the Paimon scanners support these virtual columns? yes, additional supported virtual columns will also be added. http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java: http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@95 PS11, Line 95: ByteBuffer.wrap( > Here we could wrap 'e' into a TableLoadingException. Done http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@108 PS11, Line 108: Verify the table metadata. > Would you be adding it on the same patch or in a separate patch? Would reco Done http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@212 PS11, Line 212: * Schema and partitioning schemes are loaded directly from Paimon. for column stats, : * will try to loaded from HMS f > No need for this deepCopy() as we don't check changes later. Done http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@226 PS11, Line 226: // Copy the table to check later if anything has changed > Maybe add a marker when schema sync to HMS completes as well. Done http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java: http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@174 PS8, Line 174: > nit: Missing empty line before comment. There are several other instances o Done http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@381 PS8, Line 381: Present()) { > You probably want to invoke setMaxColLen() here. Done http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@396 PS8, Line 396: Present()) { > Same as L381. Done http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java: http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@522 PS11, Line 522: try { > Can you add a comment about these numbers? Done http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@567 PS11, Line 567: }); > Maybe change it to: throw new DatabaseNotFoundException("Failed to get snap Done http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4407 PS11, Line 4407: if (underlyingCatalog != TPaimonCatalog.HADOOP_CATALOG && > Maybe add a meaningful exception if catalog type is unsupported. Done http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4430 PS11, Line 4430: rce of > Please add a comment about this. Done http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4454 PS11, Line 4454: } > So PaimonCatalogOpExecutor.createTable() at L4400 did not create the table No, it only create table in file system, not in hive metastore. http://gerrit.cloudera.org:8080/#/c/22914/6/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/22914/6/testdata/datasets/functional/functional_schema_template.sql@4800 PS6, Line 4800: hadoop fs -put -f ${IM > Is the current dataset (testdata/data/paimon_test/paimon_catalog/warehouse/ Yes, the data size can be lower. will fix. http://gerrit.cloudera.org:8080/#/c/22914/9/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/22914/9/testdata/datasets/functional/functional_schema_template.sql@4798 PS9, Line 4798: ---- DEPENDENT_LOAD : `hadoop fs -mkdir -p /test-warehouse/paimon_test/paimon_catalog/warehouse/functional.db && \ : hadoop fs -put -f ${IMPALA_HOME}/testdata/data/paimon_test/paimon_catalog/warehouse/functional.db/paimon_partitioned /test-warehouse/paimon_test/paimon_catalog/warehouse/functional.db > It looks like Hive can write a Paimon table using a connector I see, thanks, will use DEPENDENT_LOAD_HIVE in the subsequent patch submission for query support. http://gerrit.cloudera.org:8080/#/c/22914/8/testdata/workloads/functional-query/queries/QueryTest/show-create-table-paimon.test File testdata/workloads/functional-query/queries/QueryTest/show-create-table-paimon.test: http://gerrit.cloudera.org:8080/#/c/22914/8/testdata/workloads/functional-query/queries/QueryTest/show-create-table-paimon.test@77 PS8, Line 77: PRIMARY KEY (user_id) > Is it possible to define the PRIMARY KEY like this? If so, we should add a sure, we can support this, will add test case. http://gerrit.cloudera.org:8080/#/c/22914/11/tests/custom_cluster/test_paimon.py File tests/custom_cluster/test_paimon.py: http://gerrit.cloudera.org:8080/#/c/22914/11/tests/custom_cluster/test_paimon.py@25 PS11, Line 25: > I don't think we need a custom cluster for these tests, i.e. it could inher Done -- To view, visit http://gerrit.cloudera.org:8080/22914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef Gerrit-Change-Number: 22914 Gerrit-PatchSet: 13 Gerrit-Owner: ji chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Pranav Lodha <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: ji chen <[email protected]> Gerrit-Comment-Date: Sat, 23 Aug 2025 00:46:08 +0000 Gerrit-HasComments: Yes
