Thanks everybody for the feedback. With the protocol flag mechanism and the documentation. I agree that the users should not be surprised by the schema change. Nevertheless, there are still several issues remaining and I am under the impression that our test coverage is not at the level it should be for that feature.
By consequence, I would like to mark the feature as experimental until the following points are solved: # Have a similar set of unit tests to the one we have for COMPACT storage for after DROP COMPACT and another set where we mix writes performed before and after DROP COMPACT. # Solve the issues linked to the lack of primary key liveness (CASSANDRA-16675) # Have a way to prevent users to scew up their tables by using ALTER DROP statements # Find some solution for CASSANDRA-1606 My plan is to create an umbrella ticket to have DROP COMPACT STORAGE out of experimental as soon as possible and allow users to be able to use it without taking any risk. Le lun. 7 juin 2021 à 20:36, Jeremiah D Jordan <jeremiah.jor...@gmail.com> a écrit : > We had many discussions around this back when this was added. There is a > transition ability in place. Users can set a native protocol flag to have > the server return results as if DROP COMPACT STORAGE was already run. In > this way you can update your applications to support the new way results > are returned before the change has been made server side. In the face of > multiple applications you can update them one at a time, switch each over > with the protocol flag. Once all of your applications are updated and > running with the protocol flag set, so they now deal with how data is > returned when DROP COMPACT STORAGE has been run, you can then finally run > DROP COMPACT STORAGE on the server itself to update the schema. > > This is not the case where someone needs to run DROP COMPACT STORAGE and > then deal with the fallout. > > -Jeremiah > > > On Jun 7, 2021, at 3:06 AM, Oleksandr Petrov <oleksandr.pet...@gmail.com> > wrote: > > > > Thank you for bringing this subject up. > > > >> not ready for production use unless users fully understand what they are > > doing. > > > > Thing is, there's no easy way around dropping compact storage. At the > > moment of writing of 10857, we have collectively decided that we'll > > document that the new columns are going to be shown, and have added a > > client protocol option that would hide/show columns depending on the mode > > we're running it in for anyone who upgrades. This makes it harder to make > > a transition for anyone who controls only the server side, since you have > > to account for how clients would behave whenever they see a new column. > We > > did try to patch around the shown columns, but because of ColumnFilter > this > > also turned out to be non-trivial, or at least not worth the effort for > the > > moment. > > > > One of the things mentioned in this list (primary key liveness) is also > > existing as a difference between UPDATE and INSERT, but I'm not sure if > > it's properly documented. Similar to some other nuances, such as nulls in > > clustering keys on partitions that only have a static row. We did > recently > > discuss some of these not-commonly-known cases with Benjamin and some > other > > folks. So it might be worth documenting those, too. > > > > Problem with compact storage is that very few people want to touch it, > and > > it requires a non-trivial amount of "institutional" knowledge and > > remembering things about Thrift. I think it's OK to mark the feature as > > experimental, but remembering how we haven't made significant > improvements > > to things we have previously marked as experimental, this one may not > > materialise into something final, too. > > > > What would a complete, non-foot-gun solution for dropping compact storage > > entail? If we're talking about avoiding showing columns to users, there > are > > ways to achieve this without rewriting sstables, for example, by > > introducing "hidden" columns in table metadata. However, if we want to > > preserve deletion semantics, I'm not sure if we're doing it right at all: > > we'll just trade one source of difference for partition liveness for > insert > > queries for the other, so I'd say that, by executing ALTER TABLE > statement, > > you're accepting that after it propagates, there will be at least some > > difference in behaviour and semantics. We did discuss this in C-16069, > and > > my thesis back then was that replacing special-casing for compact tables > > with special casing for tables that "used to be compact" isn't bringing > us > > closer to the final solution. > > > > To summarise, I don't mind if we mark this feature experimental, but if > we > > want to ever make it complete, we have to discuss what we do with each of > > the special cases. And it may very well be that we just need to add > > explicit hidden columns to metadata, and allow nulls for clusterings, > maybe > > several more small changes. Unless we define what it would take to get > this > > feature out of experimental state, and actually make an effort to resolve > > these issues, I'd just put a huge warning and call it a power-user > feature. > > > > > > On Fri, Jun 4, 2021 at 5:01 PM Joshua McKenzie <jmcken...@apache.org> > wrote: > > > >>> > >>> not ready for production use unless users fully understand what they > are > >>> doing. > >> > >> This statement stood out to me - in my opinion we should think carefully > >> about the surface area of the user interfaces on new features before we > add > >> more cognitive burden to our users. We already have plenty of > "foot-guns" > >> in the project and should only add more if absolutely necessary. > >> > >> Further, marking this as experimental would be another feature we've > >> released and then retroactively marked as experimental; that's a habit > we > >> should not get into. > >> > >> On balance, my .02 is the benefits to our end users and operators of > >> getting 4.0 to GA outweigh the costs of flagging this as experimental > now > >> so I'm a +1 to the flagging idea, but I think there's some valuable > lessons > >> for us to learn in retrospect from not just this feature but others > like it > >> in the past. > >> > >> Curious to hear Alex' thoughts about this situation in particular as > author > >> of C-10857. I recall that being a pretty painful slog so apologies in > >> advance for picking at this scab. :) > >> > >> > >> > >> On Fri, Jun 4, 2021 at 9:44 AM Brandon Williams <dri...@gmail.com> > wrote: > >> > >>> +1 > >>> > >>> On Fri, Jun 4, 2021, 3:53 AM Benjamin Lerer <ble...@apache.org> wrote: > >>> > >>>> Hi everybody, > >>>> > >>>> There are a significant amount of issues with DROP COMPACT STORAGE > that > >>> can > >>>> be pretty surprising for users. > >>>> To name a few: > >>>> * Some hidden columns will show up changing the resultset returned for > >>>> wildcard queries > >>>> * As COMPACT tables did not have primary key liveness there empty rows > >>>> inserted AFTER the ALTER will be returned whereas the one inserted > >> before > >>>> the ALTER will not. > >>>> * Also due to the lack of primary key liveness the amount of SSTables > >>> being > >>>> read will increase resulting in slower queries > >>>> * After DROP COMPACT it becomes possible to ALTER the table in a way > >> that > >>>> makes all the row disappears > >>>> * There is a loss of functionality around null clustering when > dropping > >>>> compact storage (CASSANDRA-16069) > >>>> > >>>> In my opinion DROP COMPACT STORAGE is not ready for production use > >> unless > >>>> users fully understand what they are doing. > >>>> By consequence, I am wondering if we should not mark it as > experimental > >>> as > >>>> we did for the Materialized Views (CASSANDRA-13959). > >>>> > >>>> What is your opinion? > >>>> > >>> > >> > > > > > > -- > > alex p > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >