[ 
https://issues.apache.org/jira/browse/IMPALA-8937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012734#comment-18012734
 ] 

ASF subversion and git services commented on IMPALA-8937:
---------------------------------------------------------

Commit 1cead451147fe4afd0e4c2c3a5d6e78da84c2025 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1cead4511 ]

IMPALA-13947: Test local catalog mode by default

Local catalog mode has been the default and works well in downstream
Impala for over 5 years. This patch turn on local catalog mode by
default (--catalog_topic_mode=minimal and --use_local_catalog=true) as
preferred mode going forward.

Implemented LocalCatalog.setIsReady() to facilitate using local catalog
mode for FE tests. Some FE tests fail due to behavior differences in
local catalog mode like IMPALA-7539. This is probably OK since Impala
now largely hand over FileSystem permission check to Apache Ranger.

The following custom cluster tests are pinned to evaluate under legacy
catalog mode because their behavior changed in local catalog mode:

TestCalcitePlanner.test_calcite_frontend
TestCoordinators.test_executor_only_lib_cache
TestMetadataReplicas
TestTupleCacheCluster
TestWorkloadManagementSQLDetailsCalcite.test_tpcds_8_decimal

At TestHBaseHmsColumnOrder.test_hbase_hms_column_order, set
--use_hms_column_order_for_hbase_tables=true flag for both impalad and
catalogd to get consistent column order in either local or legacy
catalog mode.

Changed TestCatalogRpcErrors.test_register_subscriber_rpc_error
assertions to be more fine grained by matching individual query id.

Move most of test methods from TestRangerLegacyCatalog to
TestRangerLocalCatalog, except for some that do need to run in legacy
catalog mode. Also renamed TestRangerLocalCatalog to
TestRangerDefaultCatalog. Table ownership issue in local catalog mode
remains unresolved (see IMPALA-8937).

Testing:
Pass exhaustive tests.

Change-Id: Ie303e294972d12b98f8354bf6bbc6d0cb920060f
Reviewed-on: http://gerrit.cloudera.org:8080/23080
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Fine grained table metadata loading on Catalog server
> -----------------------------------------------------
>
>                 Key: IMPALA-8937
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8937
>             Project: IMPALA
>          Issue Type: Epic
>          Components: Catalog, Frontend
>    Affects Versions: Impala 2.12.0, Impala 3.3.0
>            Reporter: Bharath Vissapragada
>            Priority: Major
>              Labels: 2023Q1
>
> *Background*:
> Currently the table _on the Catalog server_ is either in a loaded or unloaded 
> state (IncompleteTable). When Catalog server starts for the first time, we 
> first fetch a list of table names for each databases and every table in this 
> list starts as an unloaded table. The table lists are propagated to the 
> coordinators so that they know whether a table with a given name exists or 
> not and they can start analyzing the queries. No metadata is loaded in the 
> incomplete tables (like schema/ownership, comments etc.)
> The table metadata is loaded lazily (and the table moves into a loaded state) 
> when it is referenced in any query. When a load request comes in, all the 
> table metadata is loaded including file block information. 
> *Problem:* 
> Coordinators need some additional information when analyzing unloaded tables. 
> For example: IMPALA-8228. The ownership information is a part of the HMS 
> table schema which is not loaded until the table is marked fully loaded. 
> While this is not a problem for regular queries (like select * from <tbl>), 
> it is an issue with queries like "show tables" which do not trigger a table 
> load. In this particular case, due to the lack of ownership information, the 
> output of the table listing could be different depending on whether the table 
> is loaded. Another example is IMPALA-8606 where the GET_TABLES request does 
> not return the table comments because they are not available for unloaded 
> tables.
> *Ask:*
> We need to consider finer grained loading on the Catalog server in general. 
> Instead of having a binary state (loaded vs unloaded), the table could be in 
> a partially loaded state. We could also start with aggressively fetching 
> certain pieces of information that we think could aid with analysis and 
> lazily load the remaining pieces of metadata. Finer grained loading also 
> integrates well with the LocalCatalog implementation on the coordinators 
> where the the entire table need not be loaded on the Catalog server to serve 
> partial meta information (e.g: show partitions <large-table>).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to