[
https://issues.apache.org/jira/browse/IMPALA-12187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18011979#comment-18011979
]
ASF subversion and git services commented on IMPALA-12187:
----------------------------------------------------------
Commit 447c016ae18bd89902ff8ac2cd3a5298360c0d50 in impala's branch
refs/heads/master from Sai Hemanth Gantasala
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=447c016ae ]
IMPALA-12187: Fix flaky test_event_based_replication()
TestEventProcessing.test_event_based_replication is turning flaky when
there is a lag replication of a database that has too many events to
replicate. The case III in the test is turning flaky because the event
processor has to processes so many ALTER_PARTITIONS events that valid
writeId list can be inaccurate when the replication is not complete.
So a 20 sec timeout is introduced in case III after replication so
that event processor will process events after replication process is
completely done.
Testing:
- Looped the test 100 times to avoid flakiness
Change-Id: I89fcd951f6a65ab7fe97c4f23554d93d9ba12f4e
Reviewed-on: http://gerrit.cloudera.org:8080/22131
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Riza Suminto <[email protected]>
> TestEventProcessing.test_event_based_replication flaky for truncate table
> -------------------------------------------------------------------------
>
> Key: IMPALA-12187
> URL: https://issues.apache.org/jira/browse/IMPALA-12187
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 4.3.0
> Reporter: Joe McDonnell
> Assignee: Sai Hemanth Gantasala
> Priority: Critical
> Labels: broken-build, flaky
>
> There have been a couple Jenkins jobs that have seen a failure on
> TestEventProcessing.test_event_based_replication() where the test is
> expecting the truncated table to have zero rows, but instead the table has
> 100 rows:
> {noformat}
> metadata/test_event_processing.py:180: in test_event_based_replication
> self.__run_event_based_replication_tests()
> metadata/test_event_processing.py:329: in __run_event_based_replication_tests
> assert rows_in_part_tbl_target == 0
> E assert 100 == 0{noformat}
> More logs:
> {noformat}
> truncate table repl_source_tsmyd.part_tbl;
> -- 2023-06-02 06:44:19,049 INFO MainThread: Started query
> 50469ac62856f797:53e74fb400000000
> -- 2023-06-02 06:44:41,638 INFO MainThread: Waiting until events
> processor syncs to event id:32187
> -- 2023-06-02 06:44:42,596 DEBUG MainThread: Metric last-synced-event-id
> has reached the desired value: 32187
> -- 2023-06-02 06:44:42,632 DEBUG MainThread: Found 3 impalad/1
> statestored/1 catalogd process(es)
> -- 2023-06-02 06:44:42,648 INFO MainThread: Getting metric:
> catalog.curr-version from hostname:25000
> -- 2023-06-02 06:44:42,651 INFO MainThread: Sleeping 1s before next retry.
> -- 2023-06-02 06:44:43,653 INFO MainThread: Getting metric:
> catalog.curr-version from hostname:25000
> -- 2023-06-02 06:44:43,669 INFO MainThread: Sleeping 1s before next retry.
> -- 2023-06-02 06:44:44,670 INFO MainThread: Getting metric:
> catalog.curr-version from hostname:25000
> -- 2023-06-02 06:44:44,674 INFO MainThread: Sleeping 1s before next retry.
> -- 2023-06-02 06:44:45,676 INFO MainThread: Getting metric:
> catalog.curr-version from hostname:25000
> -- 2023-06-02 06:44:45,679 INFO MainThread: Sleeping 1s before next retry.
> -- 2023-06-02 06:44:46,680 INFO MainThread: Getting metric:
> catalog.curr-version from hostname:25000
> -- 2023-06-02 06:44:46,683 INFO MainThread: Sleeping 1s before next retry.
> -- 2023-06-02 06:44:47,685 INFO MainThread: Getting metric:
> catalog.curr-version from hostname:25000
> -- 2023-06-02 06:44:47,688 INFO MainThread: Metric 'catalog.curr-version'
> has reached desired value: 9771
> -- 2023-06-02 06:44:47,688 INFO MainThread: Getting metric:
> catalog.curr-version from hostname:25001
> -- 2023-06-02 06:44:47,691 INFO MainThread: Metric 'catalog.curr-version'
> has reached desired value: 9771
> -- 2023-06-02 06:44:47,691 INFO MainThread: Getting metric:
> catalog.curr-version from hostname:25002
> -- 2023-06-02 06:44:47,694 INFO MainThread: Metric 'catalog.curr-version'
> has reached desired value: 9771
> -- executing against localhost:21000
> select count(*) from repl_target_hhkuw.unpart_tbl;
> -- 2023-06-02 06:44:47,697 INFO MainThread: Started query
> 6c40644e00cdf143:3be5e75a00000000
> -- executing against localhost:21000
> select count(*) from repl_target_hhkuw.part_tbl;{noformat}
> This was seen in a debug core job and a debug erasure coding job. Only for
> the partitioned table and not the unpartitioned table.
> This seems like a symptom that doesn't match the existing flakiness for
> TestEventProcessing.test_event_based_replication().
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]