[ 
https://issues.apache.org/jira/browse/IMPALA-9135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018012#comment-18018012
 ] 

ASF subversion and git services commented on IMPALA-9135:
---------------------------------------------------------

Commit 6f3deabb9d0c0ca98956316bbcc31e14a3363804 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6f3deabb9 ]

IMPALA-14330: set a valid createEventId in global INVALIDATE METADATA

In global INVALIDATE METADATA (catalog reset), catalogd creates
IncompleteTable for all the known table names. However, the
createEventId is uninitialized so remain as -1. Tables could be dropped
unintentionally by stale DropTable or AlterTableRename events.

Ideally when catalogd creates an IncompleteTable during reset(), it
should fetch the latest event on that table and use its event id as the
createEventId. However, fetching such event ids for all tables is
impractical to finish in a reasonable time. It also adds a significant
load on HMS.

As a compromise, this patch uses the current event id when the reset()
operation starts, and sets it to all IncompleteTable objects created in
this reset operation. This is enough to handle self CreateTable /
DropTable / AlterTableRename events since such self-events generated
before that id will be skipped. Such self-events generated after that id
are triggered by concurrent DDLs which will wait until the corresponding
table list is updated in reset(). The DDL will also update createEventId
to skip stale DropTable / AlterTableRename events.

Concurrent CreateTable DDLs could set a stale createEventId if their HMS
operation finish before reset() and their catalog operations finish
after reset() creates the table. To address this, we add a check in
setCreateEventId() to skip stale event ids.

The current event id of reset() is also used in DeleteEventLog to track
tables removed by this operation.

Refactored IncompleteTable.createUninitializedTable() to force passing a
createEventId as a parameter.

To ease debugging, adds logs when a table is added/removed in HMS events
processing. Also adds logs when the catalog version of a table changes
and adds logs when start processing a rename event.

This patch also refactors CatalogOpExecutor.alterTableOrViewRename() by
extracting some codes into methods. A race issue is identified and fixed
that DeleteEventLog should be updated before renameTable() updates the
catalog cache so the removed old table won't be added back by
concurrently processing of a stale CREATE_TABLE event.

_run_ddls_with_invalidation in test_concurrent_ddls.py could still fail
with timeout when running with sync_ddl=true. The reason is when the DDL
hits IMPALA-9135 and hangs, it needs catalogd to send new catalog
updates to reach the max waiting attempts (see waitForSyncDdlVersion()).
However, if all other concurrent threads already finish, there won't be
any new catalog updates so the DDL will wait forever and finally result
in the test timed out. To workaround this, this patch adds another
concurrent thread that keeps creating new tables until the test finish.

Tests:
 - Ran the following tests in test_concurrent_ddls.py 10 rounds. Each
   round takes 11 mins.
   - test_ddls_with_invalidate_metadata
   - test_ddls_with_invalidate_metadata_sync_ddl
   - test_mixed_catalog_ddls_with_invalidate_metadata
   - test_mixed_catalog_ddls_with_invalidate_metadata_sync_ddl
   - test_local_catalog_ddls_with_invalidate_metadata
   - test_local_catalog_ddls_with_invalidate_metadata_sync_ddl
   - test_local_catalog_ddls_with_invalidate_metadata_unlock_gap

Change-Id: I6506821dedf7701cdfa58d14cae5760ee178c4ec
Reviewed-on: http://gerrit.cloudera.org:8080/23346
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> DDLs with sync_ddl may fail with concurrent INVALIDATE METADATA
> ---------------------------------------------------------------
>
>                 Key: IMPALA-9135
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9135
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Priority: Major
>              Labels: concurrency
>
> This can be revealed by tests/custom_cluster/test_concurrent_ddls.py added in 
> [https://gerrit.cloudera.org/c/14307]
> If running with INVALIDATE METADATA concurrently, the DDLs may run out of 
> attemps in CatalogServiceCatalog.waitForSyncDdlVersion() to wait for the 
> target update being sent, no matter how large we increase the maxNumAttempts.
> The error logs:
> {code:java}
> E1107 17:34:25.092439  7353 CatalogServiceCatalog.java:2626] Couldn't 
> retrieve the covering topic version for catalog objects. Updated objects: 
> [TABLE:test_ddls_with_invalidate_metadata_sync_ddl_f41e97e6.test_9_part 
> version: 349], deleted objects: []
> I1107 17:34:25.093451  7353 jni-util.cc:288] 
> org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog 
> topic version for the SYNC_DDL operation after 5 attempts.The operation has 
> been successfully executed but its effects may have not been broadcast to all 
> the coordinators.
>         at 
> org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2630)
>         at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:414)
>         at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:167)
> I1107 17:34:25.142006  6389 catalog-server.cc:337] A catalog update with 2 
> entries is assembled. Catalog version: 356 Last sent catalog version: 355
> I1107 17:34:25.142168  6381 catalog-server.cc:641] Collected update: 
> 1:TABLE:test_ddls_with_invalidate_metadata_sync_ddl_f41e97e6.test_15_part, 
> version=357, original size=101, compressed size=98
> I1107 17:34:25.142215  6381 catalog-server.cc:641] Collected update: 
> 1:CATALOG_SERVICE_ID, version=357, original size=49, compressed size=52
> I1107 17:34:25.142287  7356 CatalogServiceCatalog.java:2642] Operation using 
> SYNC_DDL is waiting for catalog topic version: 357. Time to identify topic 
> version (msec): 19
> I1107 17:34:25.192239  6389 catalog-server.cc:337] A catalog update with 2 
> entries is assembled. Catalog version: 357 Last sent catalog version: 356
> I1107 17:34:25.192428  6381 catalog-server.cc:641] Collected update: 
> 1:TABLE:test_ddls_with_invalidate_metadata_sync_ddl_f41e97e6.test_16_part, 
> version=358, original size=101, compressed size=98
> I1107 17:34:25.192462  6381 catalog-server.cc:641] Collected update: 
> 1:TABLE:test_ddls_with_invalidate_metadata_sync_ddl_f41e97e6.test_11_part, 
> version=359, original size=101, compressed size=98
> I1107 17:34:25.192484  6381 catalog-server.cc:641] Collected update: 
> 1:TABLE:test_ddls_with_invalidate_metadata_sync_ddl_f41e97e6.test_12_part, 
> version=360, original size=101, compressed size=98
> I1107 17:34:25.192535  6381 catalog-server.cc:641] Collected update: 
> 1:CATALOG_SERVICE_ID, version=360, original size=49, compressed size=52
> I1107 17:34:25.192613  7355 CatalogServiceCatalog.java:2642] Operation using 
> SYNC_DDL is waiting for catalog topic version: 360. Time to identify topic 
> version (msec): 13
> I1107 17:34:25.192695  7351 CatalogServiceCatalog.java:2642] Operation using 
> SYNC_DDL is waiting for catalog topic version: 360. Time to identify topic 
> version (msec): 45
> I1107 17:34:25.192734  7350 CatalogServiceCatalog.java:2642] Operation using 
> SYNC_DDL is waiting for catalog topic version: 360. Time to identify topic 
> version (msec): 29
> I1107 17:34:25.222911  7353 status.cc:126] CatalogException: Couldn't 
> retrieve the catalog topic version for the SYNC_DDL operation after 5 
> attempts.The operation has been successfully executed but its effects may 
> have not been broadcast to all the coordinators.
>     @          0x1c5ae50  impala::Status::Status()
>     @          0x24f7ad2  impala::JniUtil::GetJniExceptionMsg()
>     @          0x1c41987  impala::JniCall::Call<>()
>     @          0x1c3fec9  impala::JniUtil::CallJniMethod<>()
>     @          0x1c3e0e6  impala::Catalog::ExecDdl()
>     @          0x1c1ed17  CatalogServiceThriftIf::ExecDdl()
>     @          0x1cb3047  impala::CatalogServiceProcessor::process_ExecDdl()
>     @          0x1cb2d95  impala::CatalogServiceProcessor::dispatchCall()
>     @          0x1c08d65  apache::thrift::TDispatchProcessor::process()
>     @          0x20e8a0d  
> apache::thrift::server::TAcceptQueueServer::Task::run()
>     @          0x20de040  impala::ThriftThread::RunRunnable()
>     @          0x20df766  boost::_mfi::mf2<>::operator()()
>     @          0x20df5fc  boost::_bi::list3<>::operator()<>()
>     @          0x20df348  boost::_bi::bind_t<>::operator()()
>     @          0x20df25b  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
>     @          0x1ffb6e9  boost::function0<>::operator()()
>     @          0x2573dea  impala::Thread::SuperviseThread()
>     @          0x257c16e  boost::_bi::list5<>::operator()<>()
>     @          0x257c092  boost::_bi::bind_t<>::operator()()
>     @          0x257c055  boost::detail::thread_data<>::run()
>     @          0x3d61599  thread_proxy
>     @     0x7f1ce6ca46b9  start_thread
>     @     0x7f1ce343f41c  clone
> E1107 17:34:25.222932  7353 catalog-server.cc:112] CatalogException: Couldn't 
> retrieve the catalog topic version for the SYNC_DDL operation after 5 
> attempts.The operation has been successfully executed but its effects may 
> have not been broadcast to all the coordinators.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to