[ 
https://issues.apache.org/jira/browse/KUDU-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729367#comment-17729367
 ] 

ASF subversion and git services commented on KUDU-3452:
-------------------------------------------------------

Commit 2a29d299cc1e74dc298e985c7855d6b8cc575c99 in kudu's branch 
refs/heads/master from xinghuayu007
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=2a29d299c ]

KUDU-3452 A tool to report on table creation progress

If there isn't enough healthy tablet servers to create the
required number of replicas, the catalog manager retries creating
corresponding tablet replicas for a long time.

Hence there is a need to report on the table creation process,
so users could be aware of its current status. If necessary,
they could make an informed decision to cancel the process by
dropping the table or the partition being created if it's stuck.

This patch adds a new command: 'kudu table list_in_flight'
to show tables in the process of being created. For each in-flight
table, it reports on its id, name, state, and the number of tablets
being created.

Below is an example of the new tool's output:
NoEnoughTServersTable id:3c56ed6c159f4b35885912c34259de46
num_tablets:1 num_tablets_in_flight:1 state:RUNNING

AddNewPartitionTable id:6c2209cc62af4398a60ebdc8f26050f3
num_tablets:2 num_tablets_in_flight:1 state:ALTERING

Change-Id: I348b69f48e6ce36ed869097f9f798c5946136de5
Reviewed-on: http://gerrit.cloudera.org:8080/19584
Tested-by: Alexey Serbin <ale...@apache.org>
Reviewed-by: Alexey Serbin <ale...@apache.org>


> Support creating three-replicas table or partition when only 2 tservers 
> healthy
> -------------------------------------------------------------------------------
>
>                 Key: KUDU-3452
>                 URL: https://issues.apache.org/jira/browse/KUDU-3452
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: Xixu Wang
>            Priority: Major
>
> h1. Background
> In my case, every day a new Kudu table (called: history_data_table) will be 
> created to store history data and a new partition for another table (called: 
> business_data_table) to be ready to store today's data. These tables and 
> partitions all require 3 replicas. This business logic was implemented by 
> some Python scripts. My Kudu cluster contains 3 masters and 3 tservers. Flag: 
> --catalog_manager_check_ts_count_for_create_table is false.
> Sometimes, one tserver maybe become unavailable. Table creating task will 
> retry continuously and always fail until the tserver become healthy again. 
> See the error:
> {color:#ff8b00}E0222 11:10:32.767140 3321 catalog_manager.cc:672] Error 
> processing pending assignments: Invalid argument: error selecting replicas 
> for tablet 41dffa9783f14f36a5b6c35e89075c1a, state:0: Not enough tablet 
> servers are online for table 'test_table'. Need at least 3 replicas, but only 
> 2 tablet servers are available{color}
> {color:#172b4d}As there are no enough replicas, a tablet will never be 
> created. The state of this tablet is not running. Therefore, read or write 
> this tablet will fail even if there are 2 tservers can be used to create 2 
> replicas.{color}
>  
> An already created tablet can still be on service even if one of its 3 
> replicas become unavailable. Why can not create a three-replicas table when 
> only 2 tservers healthy?
>  
> Besides, a validate table creating task will be affected by another 
> invalidate tasks. In the upper example, a table creating task with RF=1 will 
> still not succeed even if there exists more than one alive tablet servers. 
> Because the background task manager will break the whole process when finds a 
> tablet creating task failed and begin a new process to try to execute all 
> tasks.
>  
>  
> h1. Design
> A new flag: --support_create_tablet_without_enough_healthy_tservers is added. 
> The original logic keeps the same. When this flag is set true, a 
> three-replicas tablet can be created successfully and its status is losing 
> one replica. This tablet can be be read and write normally.
>  
> There are 3 things need to do:
>  # A tool to cancel the table creating task.
>  # A tool to show the running table creating task.
>  # A method to create table without enough healthy tservers.
>  # make invalidate table creating task not affected by other invalidate tasks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to