ji chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22914 )

Change subject: IMPALA-14081: Support create/drop paimon table for impala
......................................................................


Patch Set 13:

(41 comments)

http://gerrit.cloudera.org:8080/#/c/22914/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22914/8//COMMIT_MSG@9
PS8, Line 9:
> nit: we limit commit text body to 72 chars.
Done


http://gerrit.cloudera.org:8080/#/c/22914/8//COMMIT_MSG@37
PS8, Line 37: ON 'hdfs
> Is it required to have the EXTERNAL keyword here? Can't Impala create a bra
Done


http://gerrit.cloudera.org:8080/#/c/22914/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22914/11//COMMIT_MSG@12
PS11, Line 12: a type
> It'd be good to mention all the datatypes supported.
Done


http://gerrit.cloudera.org:8080/#/c/22914/11//COMMIT_MSG@46
PS11, Line 46:
> Would also recommend writing a TODO section for all things that are WIP or
Done


http://gerrit.cloudera.org:8080/#/c/22914/6/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/22914/6/bin/impala-config.sh@289
PS6, Line 289: export IMPALA_RELOAD4j_VERSION=1.2
> How often we need to change version? I see current released version is actu
the version can be changed.will use the latest release in the next revision.


http://gerrit.cloudera.org:8080/#/c/22914/8/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/22914/8/bin/impala-config.sh@1244
PS8, Line 1244: echo "IMPALA_ICEBERG_VERSION  = $IMPALA_ICEBERG_VERSION
> nit: This line could come right after IMPALA_ICEBERG_VERSION, where we list
Done


http://gerrit.cloudera.org:8080/#/c/22914/8/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/22914/8/common/thrift/CatalogObjects.thrift@708
PS8, Line 708:
> nit: missing space
Done


http://gerrit.cloudera.org:8080/#/c/22914/8/common/thrift/CatalogObjects.thrift@775
PS8, Line 775:   // Set if this is a system table
> nit: Missing empty line before the comment
Done


http://gerrit.cloudera.org:8080/#/c/22914/8/docs/topics/impala_paimon.xml
File docs/topics/impala_paimon.xml:

http://gerrit.cloudera.org:8080/#/c/22914/8/docs/topics/impala_paimon.xml@40
PS8, Line 40: Impala now adds experimental support for Apache Paim
> nit: Impala now adds experimental support for Apache Paimon?
Done


http://gerrit.cloudera.org:8080/#/c/22914/8/docs/topics/impala_paimon.xml@77
PS8, Line 77:           <li>
            :             Scalable metadata: supports storing Petabyte 
large-scale datasets and storing a large
            :             number of partitions.
            :           </li>
            :           <li>
            :             Supports ACID Transactions &amp; Time Travel &amp; 
Schema Evolution.
            :           </li>
            :         </ul>
> Not sure we want to mention these while Impala doesn't support these operat
will remove this item


http://gerrit.cloudera.org:8080/#/c/22914/11/docs/topics/impala_paimon.xml
File docs/topics/impala_paimon.xml:

http://gerrit.cloudera.org:8080/#/c/22914/11/docs/topics/impala_paimon.xml@237
PS11, Line 237: 1.1
> nit: 1.1?
Done


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/cup/sql-parser.cup@2302
PS8, Line 2302:   | KW_STORED KW_BY storage_engine_val:storage_engine
> For Iceberg we also allow STORED BY ICEBERG, I think we should also allow i
Done


http://gerrit.cloudera.org:8080/#/c/22914/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/22914/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@20
PS6, Line 20: import java.util.ArrayDeque;
            : import java.util.ArrayList;
            : import java.util.Arrays;
            : import java.util.Collections;
            : import java.util.Comparator;
            : import java.util.Deque;
            : import java.util.HashMap;
            : import java.util.HashSet;
            : import java.util.IdentityHashMap;
            : import java.util.Iterator;
            : import java.util.LinkedHashMap;
> Please minimize import reordering and keep the old grouping for now.
Thanks, will keep the original ordering


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java
File fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java:

http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java@83
PS11, Line 83: to file-based t
> It is quite an obsolete statement. I think "SHOW FILES is applicable only t
Done


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java@91
PS11, Line 91: if (!showFileStmtSupport.supportPartitionFilter()) {
> Can we say "SHOW FILES with partition filter is not applicable to table typ
Done


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/paimon/PaimonAnalyzer.java:

http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonAnalyzer.java@257
PS11, Line 257: !colSets.contains(col)
> 'colSets' contains lower cased column names, but PaimonUtil.extractColumnNa
Done


http://gerrit.cloudera.org:8080/#/c/22914/9/fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java
File fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java:

http://gerrit.cloudera.org:8080/#/c/22914/9/fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java@83
PS9, Line 83: "org.apache.paimon.hive.PaimonSerDe");
> nit: Add comment like others.
will fix


http://gerrit.cloudera.org:8080/#/c/22914/9/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/22914/9/fe/src/main/java/org/apache/impala/catalog/Table.java@20
PS9, Line 20: import java.util.ArrayList;
            : import java.util.Collections;
            : import java.util.HashMap;
            : import java.util.List;
> For existing java files, please maintain the old import order and keep code
thanks, will fix


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
File fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java:

http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java@55
PS11, Line 55:
> nit: too much indent
Done


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java
File fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java:

http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java@48
PS11, Line 48:
> Is jobConf needed?
yes, it is needed to cache the hadoop conf,and used in related ddl operations.


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/ImpalaTypeUtils.java
File fe/src/main/java/org/apache/impala/catalog/paimon/ImpalaTypeUtils.java:

http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/ImpalaTypeUtils.java@156
PS11, Line 156: public Type visit(TimestampType timestampType) {
              :       return Type.TIMESTAMP;
              :     }
              :
> Can you add a comment or explain why these and the following
 > functions are commented?

will remote the comments, these types are not supported.


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/ImpalaTypeUtils.java@258
PS11, Line 258: se INT:
> If this is a TODO, please add it to the TODO list.

these types are not supported, will use default:


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonCatalogOpExecutor.java
File 
fe/src/main/java/org/apache/impala/catalog/paimon/PaimonCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonCatalogOpExecutor.java@122
PS11, Line 122: on tab
> Do you mean 'e'? Also can you be a bit more descriptive with these exceptio
Done


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonCatalogOpExecutor.java@177
PS11, Line 177: n("Error while in
> Please be more descriptive.
Done


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java:

http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@180
PS8, Line 180:       }
             :       numClusteringCols_ = impalaPartitionedFields.size();
             :       // sync table properties from underlying paimon table
             :       final Map<String, String> paimonProps =
             :           Maps.newHashMap(getPaimonApiTable().options());
             :       for (String key : PAIMON_EXCLUDED_PROPERTIES) { 
paimonProps.remove(key); }
             :
> Will the Paimon scanners support these virtual columns?
yes, additional supported virtual columns will also be added.


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java:

http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@95
PS11, Line 95: ByteBuffer.wrap(
> Here we could wrap 'e' into a TableLoadingException.
Done


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@108
PS11, Line 108:  Verify the table metadata.
> Would you be adding it on the same patch or in a separate patch? Would reco
Done


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@212
PS11, Line 212:    * Schema and partitioning schemes are loaded directly from 
Paimon. for column stats,
              :    * will try to loaded from HMS f
> No need for this deepCopy() as we don't check changes later.
Done


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@226
PS11, Line 226:       // Copy the table to check later if anything has changed
> Maybe add a marker when schema sync to HMS completes as well.
Done


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java:

http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@174
PS8, Line 174:
> nit: Missing empty line before comment. There are several other instances o
Done


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@381
PS8, Line 381: Present()) {
> You probably want to invoke setMaxColLen() here.
Done


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@396
PS8, Line 396: Present()) {
> Same as L381.
Done


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java:

http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@522
PS11, Line 522:     try {
> Can you add a comment about these numbers?
Done


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@567
PS11, Line 567:       });
> Maybe change it to: throw new DatabaseNotFoundException("Failed to get snap
Done


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4407
PS11, Line 4407:             if (underlyingCatalog != 
TPaimonCatalog.HADOOP_CATALOG &&
> Maybe add a meaningful exception if catalog type is unsupported.
Done


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4430
PS11, Line 4430: rce of
> Please add a comment about this.
Done


http://gerrit.cloudera.org:8080/#/c/22914/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4454
PS11, Line 4454:     }
> So PaimonCatalogOpExecutor.createTable() at L4400 did not create the table
No, it only create table in file system, not in hive metastore.


http://gerrit.cloudera.org:8080/#/c/22914/6/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/22914/6/testdata/datasets/functional/functional_schema_template.sql@4800
PS6, Line 4800: hadoop fs -put -f ${IM
> Is the current dataset (testdata/data/paimon_test/paimon_catalog/warehouse/
Yes, the data size can be lower. will fix.


http://gerrit.cloudera.org:8080/#/c/22914/9/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/22914/9/testdata/datasets/functional/functional_schema_template.sql@4798
PS9, Line 4798: ---- DEPENDENT_LOAD
              : `hadoop fs -mkdir -p 
/test-warehouse/paimon_test/paimon_catalog/warehouse/functional.db && \
              : hadoop fs -put -f 
${IMPALA_HOME}/testdata/data/paimon_test/paimon_catalog/warehouse/functional.db/paimon_partitioned
 /test-warehouse/paimon_test/paimon_catalog/warehouse/functional.db
> It looks like Hive can write a Paimon table using a connector
I see, thanks, will use DEPENDENT_LOAD_HIVE in the subsequent patch submission 
for query support.


http://gerrit.cloudera.org:8080/#/c/22914/8/testdata/workloads/functional-query/queries/QueryTest/show-create-table-paimon.test
File 
testdata/workloads/functional-query/queries/QueryTest/show-create-table-paimon.test:

http://gerrit.cloudera.org:8080/#/c/22914/8/testdata/workloads/functional-query/queries/QueryTest/show-create-table-paimon.test@77
PS8, Line 77: PRIMARY KEY (user_id)
> Is it possible to define the PRIMARY KEY like this? If so, we should add a
sure, we can support this, will add test case.


http://gerrit.cloudera.org:8080/#/c/22914/11/tests/custom_cluster/test_paimon.py
File tests/custom_cluster/test_paimon.py:

http://gerrit.cloudera.org:8080/#/c/22914/11/tests/custom_cluster/test_paimon.py@25
PS11, Line 25:
> I don't think we need a custom cluster for these tests, i.e. it could inher
Done



--
To view, visit http://gerrit.cloudera.org:8080/22914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef
Gerrit-Change-Number: 22914
Gerrit-PatchSet: 13
Gerrit-Owner: ji chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Pranav Lodha <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: ji chen <[email protected]>
Gerrit-Comment-Date: Sat, 23 Aug 2025 00:46:08 +0000
Gerrit-HasComments: Yes

Reply via email to