Buffering on write up to at most one page seems fine? Once you are past a single page it’s fine to write either to the end of the partition or to a separate file, there’s nothing much to be gained, but esp. for small partitions there’s likely significant value in prepending it?
It might be preferable to retain the separate index for those that overflow this buffer, and simply encode in the partition index whether the row index is inline or in the separate file. > On 21 Nov 2022, at 13:29, Branimir Lambov <blam...@apache.org> wrote: > > > There is no intention to introduce any new versions of the format > specifically for DSE. If there are any further changes to the format, they > will be OSS-first. In other words this support only extends to preexisting > versions of the format. > > Inline row index in the data file is not something we have implemented, and > it's currently not in any plans. I personally am not sure how it can be done > to provide a benefit: if we place it at the end of a partition, it does not > help much compared to a separate file; if we place it in front, we have to > buffer the partition content, which will affect write performance. In either > case it may be harder to cache. Do you have something different in mind? > > Regards, > Branimir > >> On Mon, Nov 21, 2022 at 3:01 PM Benedict <bened...@apache.org> wrote: >> Personally very pleased to see this proposal, and I’m not opposed to easing >> your migration by maintaining some light support for internal file versions >> - though would prefer the support have some version limit where it can be >> excised (maybe for one minor version bump?) >> >> One implementation question: are there any plans to support inline row index >> in the big sstable format files? Is this something DSE supports, and on the >> roadmap just not for initial work, or currently not envisioned? >> >> I would anticipate significant advantage to this for many workloads, and no >> downside (except for streaming - which could be resolved fairly easily by >> skipping over these sections when streaming to an old node, but since we >> don’t generally stream between versions I don’t see any major issue anyway). >> >> >>>> On 21 Nov 2022, at 12:43, Branimir Lambov <blam...@apache.org> wrote: >>>> >>> >>> Hi everyone, >>> >>> We would like to put CEP-25 for discussion. >>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-25%3A+Trie-indexed+SSTable+format >>> >>> The proposal describes DSE's Big Trie-indexed SSTable format, which >>> replaces the primary index with on-disk tries to improve lookup performance >>> and index size, better handle wide partitions, and remove the need to >>> manage key caching and index summaries. >>> >>> We would like to discuss this proposal with you. >>> >>> One of the questions that we want to ask is whether anyone objects to >>> maintaining full compatibility with existing files created by DataStax >>> Enterprise. >>> >>> Regards, >>> Branimir > > >