[ https://issues.apache.org/jira/browse/KUDU-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722725#comment-17722725 ]
ASF subversion and git services commented on KUDU-3452: ------------------------------------------------------- Commit a5648f39b407c62b70ca0ce6fefd0fdab9533daf in kudu's branch refs/heads/master from xinghuayu007 [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=a5648f39b ] KUDU-3452 Allow creating tablets under replicated tables Currently, creating a three-replicas table when there are not less than 3 healthy tablet servers will fail and the system catalog will retry continuously and always fail until the unavailable tablet servers becomes healthy again. An under-replicated table is still available for reading and writing, so it's enough to place just a majority of replicas for each tablet at healthy tablet servers to make a newly created table ready to use. This patch adds a new flag: --allow_creating_under_replicated_tables to support this feature. The original logic is kept the same. When this flag is set true, it's possible to create a tablet placing just a majority of replicas at healthy tablet servers. Even if the new tablet is created under-replicated, it's still available for read and write operations. Change-Id: I742ba1ff770f5c8b1be5800334c29bec96e195c6 Reviewed-on: http://gerrit.cloudera.org:8080/19571 Tested-by: Kudu Jenkins Reviewed-by: Yifan Zhang <chinazhangyi...@163.com> Reviewed-by: Alexey Serbin <ale...@apache.org> Reviewed-by: Yuqi Du <shenxingwuy...@gmail.com> Reviewed-by: Yingchun Lai <laiyingc...@apache.org> > Support creating three-replicas table or partition when only 2 tservers > healthy > ------------------------------------------------------------------------------- > > Key: KUDU-3452 > URL: https://issues.apache.org/jira/browse/KUDU-3452 > Project: Kudu > Issue Type: Improvement > Reporter: Xixu Wang > Priority: Major > > h1. Background > In my case, every day a new Kudu table (called: history_data_table) will be > created to store history data and a new partition for another table (called: > business_data_table) to be ready to store today's data. These tables and > partitions all require 3 replicas. This business logic was implemented by > some Python scripts. My Kudu cluster contains 3 masters and 3 tservers. Flag: > --catalog_manager_check_ts_count_for_create_table is false. > Sometimes, one tserver maybe become unavailable. Table creating task will > retry continuously and always fail until the tserver become healthy again. > See the error: > {color:#ff8b00}E0222 11:10:32.767140 3321 catalog_manager.cc:672] Error > processing pending assignments: Invalid argument: error selecting replicas > for tablet 41dffa9783f14f36a5b6c35e89075c1a, state:0: Not enough tablet > servers are online for table 'test_table'. Need at least 3 replicas, but only > 2 tablet servers are available{color} > {color:#172b4d}As there are no enough replicas, a tablet will never be > created. The state of this tablet is not running. Therefore, read or write > this tablet will fail even if there are 2 tservers can be used to create 2 > replicas.{color} > > An already created tablet can still be on service even if one of its 3 > replicas become unavailable. Why can not create a three-replicas table when > only 2 tservers healthy? > > Besides, a validate table creating task will be affected by another > invalidate tasks. In the upper example, a table creating task with RF=1 will > still not succeed even if there exists more than one alive tablet servers. > Because the background task manager will break the whole process when finds a > tablet creating task failed and begin a new process to try to execute all > tasks. > > > h1. Design > A new flag: --support_create_tablet_without_enough_healthy_tservers is added. > The original logic keeps the same. When this flag is set true, a > three-replicas tablet can be created successfully and its status is losing > one replica. This tablet can be be read and write normally. > > There are 3 things need to do: > # A tool to cancel the table creating task. > # A tool to show the running table creating task. > # A method to create table without enough healthy tservers. > # make invalidate table creating task not affected by other invalidate tasks. -- This message was sent by Atlassian Jira (v8.20.10#820010)