[
https://issues.apache.org/jira/browse/IGNITE-16102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aleksandr Polovtcev reassigned IGNITE-16102:
--------------------------------------------
Assignee: Aleksandr Polovtcev
> Store all RocksDB partitions in a single column family.
> -------------------------------------------------------
>
> Key: IGNITE-16102
> URL: https://issues.apache.org/jira/browse/IGNITE-16102
> Project: Ignite
> Issue Type: Improvement
> Affects Versions: 3.0.0-alpha3
> Reporter: Ivan Bessonov
> Assignee: Aleksandr Polovtcev
> Priority: Major
> Labels: iep-74, ignite-3
>
> Current storage implementation puts each partition in its own column family.
> This effectively means that every partition lives in it's own database,
> sharing only WAL and some in-memory resources. Given that each column family
> has multiple files for LSM trees, the amount of opened file descriptors is
> bigger than it needs to be.
> Now, the idea is to have a single column family for partitions within a
> table. And we should think of possibility of storing several tables in the
> same RocksDB instance, for similar reasons. You can think about is as of
> cache groups in Ignite 2.x.
> There's also an "optimization" to be implemented that is missing in code -
> using key hashes as prefixes.
> h3. What should be implemented:
> First of all, code will be heavily refactored. This will lead to
> simplifications in many places.
> Otherwise, I see the following list of goals to achieve:
> * current implementation allows to derive the list of partitions from the
> list of column families. This won't be possible, I suggest storing this list
> explicitly in "meta" CF, in any format that'll be convenient during the
> implementation
> * there should be a way of having compact "tableId" representation.
> IgniteUUID or even UUID is too much I think, but it might work as a basis.
> This problem should be discussed
> * binary representation for keys should now include following information:
> ** tableId - fixed-length set of bytes to be used as a prefix
> ** partitionId - 2 bytes that will follow the tableId. This layout will
> allow making range queries for specific partitions of specific tables
> ** key hash - 4 bytes. This one is required to optimize comparison time for
> keys. Generally speaking, it's safe to assume that hashes will be mostly
> different for different keys, meaning that hashes will be enough to determine
> keys inequality
> ** actual key payload goes after all these prefixes
--
This message was sent by Atlassian Jira
(v8.20.1#820001)