[ https://issues.apache.org/jira/browse/KUDU-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706581#comment-17706581 ]
Alexey Serbin commented on KUDU-3452: ------------------------------------- Thank you for a quick response, [~wangxixu]! {quote} But before bringing the Kudu cluster back to high availability (at least 3 tablet servers), there are maybe a period of time that creating table service will be unavailable. For an online service based on Kudu, that is intolerable. {quote} So, having 4 nodes in your cluster doesn't seem like a good idea in your use case? Having 4 nodes allows Kudu to automatically re-replicate tablets with RF=3 elsewhere if just one node isn't available, so the cluster can tolerate one more node failure after a short period of data re-replication, keeping the data still available for reading and writing. I guess there are some hardware/resource constraints that drive the decision to have just 3 tablet servers in your use case, and that make sense, of course. BTW, does it make sense to allow creating a table even with 0 tablet servers available in a cluster? Or the use-case assumes the client should be able to write into the newly created table right after it gets OK response for the CreateTable API method? {quote} In my opinion, create a table with RF=3 when only 2 of 3 tablet servers are alive may be a high risk, but the risk is the same for an already created table with RF=3, it is better than can not creating tables. We should keep the Kudu cluster providing service as long as possible even if some of nodes failed before we have enough time to repair the problem. The service of reading and writing is available when only 2 of 3 tablet servers alive, the service of creating table could be available as the same. {quote} The probability of two nodes going down during a time interval is much less than the probability of just one node going down in the same time interval, so the level of the risk is different. Perhaps, by "the risk is the same" you meant the case when one tablet replica became unavailable after a table has been created in case of cluster with just 3 nodes. I agree that from logical perspective it makes sense to allow for creating a table when there are enough tablet servers just to create a majority of replicas, so writing and reading data would be possible. Not allowing to create a table when not enough tablet servers are around to create all the required replicas is just a policy, of course. That's to guarantee that at least at the table's creation time Kudu can provide the availability of the data according to table's replication factor, and the cluster isn't misconfigured. Basically, it's a restriction to force people to have at least M tablet servers in their cluster if they want to have tables with RF=M. Otherwise, it could give a false impression that the data is available up to the Raft guarantees of the table's replication factor, but in fact that could not be ever achieved for a misconfigured Kudu cluster. I'd say that's just a policy to avoid unexpected data loss. I agree it makes sense to ease that restriction, assuming that cluster operators know what they are doing. But let's please at least check for the total number of tablet servers registered with the catalog manager (i.e. Kudu master). Will it work in your case? Thanks a lot! > Support creating three-replicas table or partition when only 2 tservers > healthy > ------------------------------------------------------------------------------- > > Key: KUDU-3452 > URL: https://issues.apache.org/jira/browse/KUDU-3452 > Project: Kudu > Issue Type: Improvement > Reporter: Xixu Wang > Priority: Major > > h1. Background > In my case, every day a new Kudu table (called: history_data_table) will be > created to store history data and a new partition for another table (called: > business_data_table) to be ready to store today's data. These tables and > partitions all require 3 replicas. This business logic was implemented by > some Python scripts. My Kudu cluster contains 3 masters and 3 tservers. Flag: > --catalog_manager_check_ts_count_for_create_table is false. > Sometimes, one tserver maybe become unavailable. Table creating task will > retry continuously and always fail until the tserver become healthy again. > See the error: > {color:#ff8b00}E0222 11:10:32.767140 3321 catalog_manager.cc:672] Error > processing pending assignments: Invalid argument: error selecting replicas > for tablet 41dffa9783f14f36a5b6c35e89075c1a, state:0: Not enough tablet > servers are online for table 'test_table'. Need at least 3 replicas, but only > 2 tablet servers are available{color} > {color:#172b4d}As there are no enough replicas, a tablet will never be > created. The state of this tablet is not running. Therefore, read or write > this tablet will fail even if there are 2 tservers can be used to create 2 > replicas.{color} > > An already created tablet can still be on service even if one of its 3 > replicas become unavailable. Why can not create a three-replicas table when > only 2 tservers healthy? > > h1. Design > A new flag: --support_create_tablet_without_enough_healthy_tservers is added. > The original logic keeps the same. When this flag is set true, a > three-replicas tablet can be created successfully and its status is losing > one replica. This tablet can be be read and write normally. > > There are 3 things need to do: > # A tool to cancel the table creating task. > # A tool to show the running table creating task. > # A method to create table without enough healthy tservers -- This message was sent by Atlassian Jira (v8.20.10#820010)