[jira] [Commented] (KUDU-3452) Support creating three-replicas table or partition when only 2 tservers healthy

Alexey Serbin (Jira) Wed, 29 Mar 2023 12:35:57 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706581#comment-17706581
 ]


Alexey Serbin commented on KUDU-3452:
-------------------------------------

Thank you for a quick response, [~wangxixu]!

{quote}
But before bringing the Kudu cluster back to high availability (at least 3 
tablet servers), there are maybe a period of time that creating table service 
will be unavailable. For an online service based on Kudu, that is intolerable.
{quote}

So, having 4 nodes in your cluster doesn't seem like a good idea in your use 
case?  Having 4 nodes allows Kudu to automatically re-replicate tablets with 
RF=3 elsewhere if just one node isn't available, so the cluster can tolerate 
one more node failure after a short period of data re-replication, keeping the 
data still available for reading and writing.

I guess there are some hardware/resource constraints that drive the decision to 
have just 3 tablet servers in your use case, and that make sense, of course.  
BTW, does it make sense to allow creating a table even with 0 tablet servers 
available in a cluster?  Or the use-case assumes the client should be able to 
write into the newly created table right after it gets OK response for the 
CreateTable API method?

{quote}
In my opinion, create a table with RF=3 when only 2 of 3 tablet servers are 
alive may be a high risk, but the risk is the same for an already created table 
with RF=3, it is better than can not creating tables. We should keep the Kudu 
cluster providing service as long as possible even if some of nodes failed 
before we have enough time to repair the problem. The service of reading and 
writing is available when only 2 of 3 tablet servers alive, the service of 
creating table could be available as the same.
{quote}

The probability of two nodes going down during a time interval is much less 
than the probability of just one node going down in the same time interval, so 
the level of the risk is different.   Perhaps, by "the risk is the same" you 
meant the case when one tablet replica became unavailable after a table has 
been created in case of cluster with just 3 nodes.

I agree that from logical perspective it makes sense to allow for creating a 
table when there are enough tablet servers just to create a majority of 
replicas, so writing and reading data would be possible.  Not allowing to 
create a table when not enough tablet servers are around to create all the 
required replicas is just a policy, of course.  That's to guarantee that at 
least at the table's creation time Kudu can provide the availability of the 
data according to table's replication factor, and the cluster isn't 
misconfigured.  Basically, it's a restriction to force people to have at least 
M tablet servers in their cluster if they want to have tables with RF=M.  
Otherwise, it could give a false impression that the data is available up to 
the Raft guarantees of  the table's replication factor, but in fact that could 
not be ever achieved for a misconfigured Kudu cluster.  I'd say that's just a 
policy to avoid unexpected data loss.

I agree it makes sense to ease that restriction, assuming that cluster 
operators know what they are doing.  But let's please at least check for the 
total number of tablet servers registered with the catalog manager (i.e. Kudu 
master).  Will it work in your case?

Thanks a lot!

> Support creating three-replicas table or partition when only 2 tservers 
> healthy
> -------------------------------------------------------------------------------
>
>                 Key: KUDU-3452
>                 URL: https://issues.apache.org/jira/browse/KUDU-3452
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: Xixu Wang
>            Priority: Major
>
> h1. Background
> In my case, every day a new Kudu table (called: history_data_table) will be 
> created to store history data and a new partition for another table (called: 
> business_data_table) to be ready to store today's data. These tables and 
> partitions all require 3 replicas. This business logic was implemented by 
> some Python scripts. My Kudu cluster contains 3 masters and 3 tservers. Flag: 
> --catalog_manager_check_ts_count_for_create_table is false.
> Sometimes, one tserver maybe become unavailable. Table creating task will 
> retry continuously and always fail until the tserver become healthy again. 
> See the error:
> {color:#ff8b00}E0222 11:10:32.767140 3321 catalog_manager.cc:672] Error 
> processing pending assignments: Invalid argument: error selecting replicas 
> for tablet 41dffa9783f14f36a5b6c35e89075c1a, state:0: Not enough tablet 
> servers are online for table 'test_table'. Need at least 3 replicas, but only 
> 2 tablet servers are available{color}
> {color:#172b4d}As there are no enough replicas, a tablet will never be 
> created. The state of this tablet is not running. Therefore, read or write 
> this tablet will fail even if there are 2 tservers can be used to create 2 
> replicas.{color}
>  
> An already created tablet can still be on service even if one of its 3 
> replicas become unavailable. Why can not create a three-replicas table when 
> only 2 tservers healthy?
>  
> h1. Design
> A new flag: --support_create_tablet_without_enough_healthy_tservers is added. 
> The original logic keeps the same. When this flag is set true, a 
> three-replicas tablet can be created successfully and its status is losing 
> one replica. This tablet can be be read and write normally.
>  
> There are 3 things need to do:
>  # A tool to cancel the table creating task.
>  # A tool to show the running table creating task.
>  # A method to create table without enough healthy tservers



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KUDU-3452) Support creating three-replicas table or partition when only 2 tservers healthy

Reply via email to