[ https://issues.apache.org/jira/browse/KUDU-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dengke updated KUDU-3413: ------------------------- Attachment: data_and_metadata.png > Kudu multi-tenancy > ------------------ > > Key: KUDU-3413 > URL: https://issues.apache.org/jira/browse/KUDU-3413 > Project: Kudu > Issue Type: New Feature > Reporter: dengke > Assignee: dengke > Priority: Major > Attachments: data_and_metadata.png, kudu table topology.png > > > h1. 1、Definition > Tenant: A cluster user can be called a tenant. Tenants may be divided by > project or actual application. Each tenant is equivalent to a resource pool, > and all users under a tenant share all resources of the resource pool. > Multiple tenants share a cluster resource. > User: The user of cluster resources. > Multi tenant: The database level controls that tenants cannot access each > other, and resources are private and independent(Note: Kudu does not have the > concept of database, which is simply understood as multiple tables). > h1. 2.Current situation > The latest version of kudu has realized ‘data at rest encryption', mainly > cluster level authentication and encryption, data storage encryption of a > single server level, which can meet the needs of basic encryption scenarios, > but there is still a little gap from the tenant level encryption we are > pursuing. > h1. 3.Outline design > In general, there are the following differences between tenant level > encryption and cluster level encryption: > *Tenant level encryption requires data storage isolation, which means data > between tenants needs to be separated (a new layer of namespace namespace may > be added to the storage topology, and data of the same tenant is stored in > the same namespace path, with minimal mutual impact); > *The generation and use of tenants'keys. In a multi tenant scenario, we need > to replace the cluster key with the tenant key > h1. 4.Design > h2. 4.1 Namespace > The namespace in the storage field of the industry is mainly used to > maintain the file attributes, directory tree structure and other metadata > information of the file system, and is compatible with POSIX directory trees > and file operations. It is a core concept in file storage. > Taking the common HDFS as an example, its namespace is mainly implemented > based on "the disk allows logical partitioning, while attaching partition > files to different directories, and finally modifying the directory owner's > permissions" to achieve resource isolation. > Corresponding to the Kudu system, the current storage topology is > relatively mature, and the kudu client's read/write requests need to be > processed by tserver before the corresponding data can be obtained. The > request does not involve direct manipulation of raw data, that is, the client > does not perceive the data distribution in the storage engine at all, there > is a natural degree of data isolation. However, the data in the storage > engine are intertwined. In some extreme cases, there is still the possibility > of interaction. The best solution is to completely distinguish the > read/write, compact and other processing processes of different tenants. > However, it requires a lot of changes and may lead to system instability. We > can make minimal changes by tenant to achieve physical isolation of data > > First, we need to analyze the current storage topology: a table in kudu > will be divided into multiple tablet partitions. Each tablet includes > metadata meta information and several RowSets. The RowSet contains a > 'MemRowSet'(corresponding to the data in memory) and multiple > 'DiskRowSets'(corresponding to the data on the disk). The 'DiskRowSet' > contains 'BloomFile’、'Ad_hoc Index’、'BaseData'、'DeltaMem' and several > 'RedoFiles' and 'UndoFile' (generally, there is only one 'UndoFile'). For > more specific distribution information, please refer to the following figure. > !kudu table topology.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)