[ 
https://issues.apache.org/jira/browse/KUDU-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dengke updated KUDU-3413:
-------------------------
    Attachment: zonekey_update.png

> Kudu multi-tenancy
> ------------------
>
>                 Key: KUDU-3413
>                 URL: https://issues.apache.org/jira/browse/KUDU-3413
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: dengke
>            Assignee: dengke
>            Priority: Major
>         Attachments: data_and_metadata.png, kudu table topology.png, 
> metadata_record.png, new_fs_manager.png, tablet_rowsets.png, 
> zonekey_update.png
>
>
> h1. 1、Definition
> * Tenant: A cluster user can be called a tenant. Tenants may be divided by 
> project or actual application. Each tenant is equivalent to a resource pool, 
> and all users under a tenant share all resources of the resource pool. 
> Multiple tenants share a cluster resource.
> * User: The user of cluster resources.
> * Multi tenant: The database level controls that tenants cannot access each 
> other, and resources are private and independent(Note: Kudu does not have the 
> concept of database, which is simply understood as multiple tables).
> h1. 2.Current situation
>         The latest version of kudu has realized ‘data at rest encryption', 
> mainly cluster level authentication and encryption, data storage encryption 
> of a single server level, which can meet the needs of basic encryption 
> scenarios, but there is still a little gap from the tenant level encryption 
> we are pursuing.
> h1. 3.Outline design
>         In general, there are the following differences between tenant level 
> encryption and cluster level encryption:
> * Tenant level encryption requires data storage isolation, which means data 
> between tenants needs to be separated (a new layer of namespace namespace may 
> be added to the storage topology, and data of the same tenant is stored in 
> the same namespace path, with minimal mutual impact);
> * The generation and use of tenants'keys. In a multi tenant scenario, we need 
> to replace the cluster key with the tenant key
> h1. 4.Design
> h2. 4.1 Namespace
>         The namespace in the storage field of the industry is mainly used to 
> maintain the file attributes, directory tree structure and other metadata 
> information of the file system, and is compatible with POSIX directory trees 
> and file operations. It is a core concept in file storage.
>         Taking the common HDFS as an example, its namespace is mainly 
> implemented based on "the disk allows logical partitioning, while attaching 
> partition files to different directories, and finally modifying the directory 
> owner's permissions" to achieve resource isolation.
>         Corresponding to the Kudu system, the current storage topology is 
> relatively mature, and the kudu client's read/write requests need to be 
> processed by tserver before the corresponding data can be obtained. The 
> request does not involve direct manipulation of raw data, that is, the client 
> does not perceive the data distribution in the storage engine at all, there 
> is a natural degree of data isolation. However, the data in the storage 
> engine are intertwined. In some extreme cases, there is still the possibility 
> of interaction. The best solution is to completely distinguish the 
> read/write, compact and other processing processes of different tenants. 
> However, it requires a lot of changes and may lead to system instability. We 
> can make minimal changes by tenant to achieve physical isolation of data.
>     First, we need to analyze the current storage topology: a table in kudu 
> will be divided into multiple tablet partitions. Each tablet includes 
> metadata meta information and several RowSets. The RowSet contains a 
> 'MemRowSet'(corresponding to the data in memory) and multiple 
> 'DiskRowSets'(corresponding to the data on the disk). The 'DiskRowSet' 
> contains 'BloomFile’、'Ad_hoc Index’、'BaseData'、'DeltaMem' and several 
> 'RedoFiles' and 'UndoFile' (generally, there is only one 'UndoFile'). For 
> more specific distribution information, please refer to the following figure.
>  !kudu table topology.png! 
>         The simplest way to achieve physical isolation is to set different 
> storage paths for the data of different tenants. Currently, we only need to 
> consider the physical isolation of 'DiskRowSet'.
>         Kudu system writes disks through containers. Each container can write 
> a large continuous disk space for writing data to a CFile (the actual storage 
> form of ‘DiskRowSet'). When one CFile is written, the container will be 
> returned to the ‘BlockManager', and then the container can be used to write 
> data to the next CFile. When no container is available in the BlockManager, a 
> new container will be created for the new CFile. Each container consists of a 
> *. metadata and a * Data. Each DiskRowSet has several blocks, and all the 
> blocks corresponding to a DiskRowset are distributed to multiple containers. 
> A container may also contain data from multiple DiskRowSets.
>         It can be simply understood that one DiskRowSet corresponds to one 
> CFile file (it refers to the single column case. If it is multi column, it 
> corresponds to multiple CFile files). The difference is that DiskRowSet is 
> our logical organization, while CFile is our physical storage. For the six 
> parts of a DiskRowSet (BloomFile, BaseData, UndoFile, RedoFile, DeltaMem, 
> AdhocIndex as shown in the figure above), neither one CFile corresponds to a 
> DiskRowSet nor one CFile contains all six parts of a DiskRowSet. These six 
> parts will be independent in multiple CFiles, and each part will be a 
> separate CFile. As shown in the figure below, we can only find the following 
> files (*. data and *. metadata) in the actual production environment, and no 
> CFile file exists.
>  !data_and_metadata.png! 
>         This is because a large number of CFiles will be merged and written 
> to a *.data file by the container, and the *.data is actually a collection of 
> CFiles. The CFile corresponding to each part of the DiskRowSet and its 
> mapping relationship are recorded in the tablet-meta/<tablet_id>. In the 
> file, each mapping relationship is based on the tablet_id which saved 
> separately. 
>     In current storage topology, the *.metadata file corresponds to the 
> metadata of the block (the final representation of CFile in fs) of the lowest 
> level fs layer. It is not in the same dimension as the above concepts such as 
> CFile and BlockManager. Instead, it records the relevant information of the 
> block. As shown in the figure below, it is a record in *. metadata.
>  !metadata_record.png! 
>         According to the above description, we can draw the approximate 
> corresponding relationship as shown in the figure below:
>  !tablet_rowsets.png! 
>         According to the above logic, we can know that the *.data file is the 
> actual storage location of tenant data. To achieve data isolation, the 
> isolation of *.data is needed. In order to achieve this goal, we can choose 
> to create different BlockManagers for each tenant, maintain their own *.data 
> files.
>         In the default scenario (no tenant name is specified), the data will 
> have a default block_manager. If multi tenant encryption is enabled, 
> fs_manager will create a new tenant_block_manager based on the tenant name, 
> the data of the specified tenant name will be stored in the 
> tenant_block_manager corresponding to the tenant name to achieve the purpose 
> of data physical isolation. The modified schematic diagram is as follows:
>  !new_fs_manager.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to