I usually do shut down clients before rolling back, exactly because of possible inconsistencies. It was by accident that I noticed this: I had rolled back the group to a previous state while shut off, then turned them on to continue work. Then to respond on this thread I tested the rollback again and was surprised that it worked after noticing that I hadn't shut them down this time. But yeah, knowing this is how group snaps are handled, I'll definitely have to ensure the machines are off. Btw, some safety switch would make sense from my point of view. What if I have shut off only part of the group? Those would be consistent after rollback while the rest would have issues.

Zitat von Ilya Dryomov <[email protected]>:

On Wed, Jun 3, 2026 at 12:42 PM Eugen Block via dev <[email protected]> wrote:

Now that's unexpected, I thought I had misremembered my previous
actions and didn't mention it yet, now I created 3 new VMs, created a
new group, took a group snapshot, all good. But a group snap rollback
is executed although the rbd images have watchers:

rbd --id openstack group snap create images/test-servers@snap1

rbd --id openstack group image ls images/test-servers
images/3df08789-2be9-4e99-9746-9d2edc8c612a_disk
images/7a5d19eb-1034-489f-885a-0074fef59e89_disk
images/f79e323f-87a1-4cb7-ad9c-1108ce73efe3_disk


rbd status images/3df08789-2be9-4e99-9746-9d2edc8c612a_disk
Watchers:
         watcher=X.X.X.18:0/4236769191 client.<client> cookie=<cookie>

rbd --id openstack group snap rollback images/test-servers@snap1
Rolling back to group snapshot: 100% complete...done.

This shouldn't be possible, I would expect a message like this:

Rolling back to snapshot: 0% complete...failed.
rbd: rollback failed: (30) Read-only file system

Is this a known bug?

No, this behavior is expected.  The reason for why "rbd snap rollback"
command can deny the operation with EROFS when the image is mapped in
some cases but "rbd group snap rollback" command doesn't do that lies
in implementation details (specifically the interaction with exclusive
locks on member images).

When it comes to rollback operation, the mere presence of a watch isn't
really an indicator of anything.  That said, I'd recommend shutting down
clients before issuing any rollbacks in both standalone image and group
scenarios.  That way all caches would get invalidated and there is no
chance of confusing someone or something with now-stale data.

Thanks,

                Ilya


Zitat von Eugen Block <[email protected]>:

> But one more question on this: why is it allowed to remove an image
> from the group if there are existing snapshots? Shouldn't this be
> prevented to keep the group consistency?
>
> And just for my understanding: how are the group snapshots
> technically created? Is that one snapshot for all images or is it an
> individual snapshot per image?
>
> Zitat von Eugen Block <[email protected]>:
>
>> I understand, you're right about the group consistency of course. I
>> just thought if you can remove an image from the group, it would
>> also remove the image's snapshot(s) from the list of snapshots as
>> well. My scenario is: initially I thought it would make sense to
>> have those servers in a group because if I wanted to rollback, it
>> would make sense to do it for all. But then I thought a bit more
>> about it and decided that one of the images actually doesn't make
>> sense to be in that group. Re-adding it will cause more problems in
>> case of rollback... I need to think about this...
>>
>> Thanks a lot for taking the time, I really appreciate it!
>>
>> Zitat von Ilya Dryomov <[email protected]>:
>>
>>> On Wed, Jun 3, 2026 at 10:27 AM Eugen Block <[email protected]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> that is correct, log_to_stderr is false in our cluster. And with
>>>> --log-to-stderr true the result is a you expected:
>>>>
>>>> rbd: rollback group to snapshot failed: 2026-06-03T08:06:52.930+0000
>>>> 7f53896ae0c0 -1 librbd::api::Group: snap_rollback: group snapshot
>>>> membership does not match group membership
>>>>
>>>> But what's the conclusion here? So it's not allowed to rollback if
>>>> memberships don't match. How would I correct the membership?
>>>
>>> Re-add the image back to the group if the image is still around.
>>>
>>>> Because I
>>>> wouldn't want to delete all snapshots from before I removed one image
>>>> from the group. Is there any workaround?
>>>
>>> I'm not sure I see what needs to be worked around here.  The group is
>>> supposed to be a logical collection of images where some level of
>>> consistency between images is required, not a random "bag".  This
>>> suggests that while images can come and go (i.e. be added and removed
>>> from the group), the group can't always be meaningfully rolled back.
>>> For example, if a group snapshot captured images A, B and C but image
>>> C had since been removed from the group and potentially reformatted,
>>> repurposed for something else or removed altogether, the group's state
>>> exactly as of that snapshot just can't be restored.
>>>
>>> Thanks,
>>>
>>>               Ilya
>>>
>>>>
>>>> Thanks,
>>>> Eugen
>>>>
>>>> Zitat von Ilya Dryomov <[email protected]>:
>>>>
>>>>> On Fri, May 29, 2026 at 6:41 PM Eugen Block <[email protected]> wrote:
>>>>>>
>>>>>> The commands were:
>>>>>>
>>>>>> controller02:~# rbd --id user group create images/test-servers
>>>>>>
>>>>>> controller02:~# for i in 0f69278e-00c2-46b0-b6e7-0b06e9c8b6fd
>>>>>> 72f5816c-c1db-44de-b0a2-19d661faa963
>>>>>> 47d6144e-0d5a-4dc7-82dd-5be3edf9f6cc; do rbd --id user group image add
>>>>>> images/test-servers images/${i}_disk; done
>>>>>>
>>>>>> controller02:~# rbd --id user group image ls images/test-servers
>>>>>> images/0f69278e-00c2-46b0-b6e7-0b06e9c8b6fd_disk
>>>>>> images/47d6144e-0d5a-4dc7-82dd-5be3edf9f6cc_disk
>>>>>> images/72f5816c-c1db-44de-b0a2-19d661faa963_disk
>>>>>>
>>>>>> controller02:~# rbd --id user group snap create
>>>>>> images/test-servers@snap1
>>>>>>
>>>>>> controller02:~# rbd --id user group snap ls images/test-servers
>>>>>> NAME   STATUS
>>>>>> snap1      ok
>>>>>>
>>>>>>
>>>>>> # rollback works for all images
>>>>>> controller02:~# rbd --id user group snap rollback
>>>>>> images/test-servers@snap1
>>>>>> Rolling back to group snapshot: 100% complete...done.
>>>>>>
>>>>>> # removing one image from the group
>>>>>> controller02:~# rbd --id user group image rm images/test-servers
>>>>>> images/0f69278e-00c2-46b0-b6e7-0b06e9c8b6fd_disk
>>>>>>
>>>>>> # rollback fails
>>>>>> controller02:~# rbd --id user group snap rollback
>>>>>> images/test-servers@snap1
>>>>>> Rolling back to group snapshot: 0% complete...failed.
>>>>>> rbd: rollback group to snapshot failed: (22) Invalid argument
>>>>>>
>>>>>> I'll add the debug output later, will need to sanitze it first. But I
>>>>>> don't see anything obvious in there.
>>>>>
>>>>> Hi Eugen,
>>>>>
>>>>> Based on the above, it's https://tracker.ceph.com/issues/66300 and is
>>>>> therefore the intended behavior.  The only fly in the ointment is that
>>>>> you aren't seeing the associated "group snapshot membership does not
>>>>> match group membership" error message.
>>>>>
>>>>> You not seeing it is consistent with the attached debug output where
>>>>> only very early messenger traffic is present and nothing beyond that.
>>>>> It suggests some non-conventional settings in the cluster-wide config
>>>>> such as log_to_stderr being set to false or similar.
>>>>>
>>>>> Can you try appending --log-to-stderr true to "rbd group snap rollback"
>>>>> command?
>>>>>
>>>>> Thanks,
>>>>>
>>>>>                Ilya
>>>>>
>>>>>>
>>>>>> Zitat von Ilya Dryomov <[email protected]>:
>>>>>>
>>>>>>> On Fri, May 29, 2026 at 4:05 PM Eugen Block <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> thanks for your quick reply. No I didn't see any additional output
>>>>>>>> than the one I shared (invalid argument). I could add debug log level
>>>>>>>> if necessary.
>>>>>>>
>>>>>>> That error message should have been displayed no matter the log level, >>>>>>> so something other than https://tracker.ceph.com/issues/66300 might be
>>>>>>> involved.
>>>>>>>
>>>>>>> What exactly do you mean by "I removed an image from the group
>>>>>>> snapshot"?  Which commands were run there and in what order?
>>>>>>>
>>>>>>>> But one more detail, I also tried the rollback directly within the
>>>>>>>> cephadm shell (so version 19.2.3) with the same result:
>>>>>>>>
>>>>>>>> ceph03:~ # cephadm shell
>>>>>>>> ...
>>>>>>>> [ceph: root@ceph03 /]# rbd group snap rollback
>>>>>>>> images/test-servers@20260430_start
>>>>>>>> Rolling back to group snapshot: 0% complete...failed.
>>>>>>>> rbd: rollback group to snapshot failed: (22) Invalid argument
>>>>>>>>
>>>>>>>> [ceph: root@ceph03 /]# ceph -v
>>>>>>>> ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62)
>>>>>>>> squid (stable)
>>>>>>>
>>>>>>> Can you try appending --debug-ms 1 --debug-rbd 20 to the command
>>>>>>> (let's stick to this cephadm shell) and attach the output?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>                 Ilya
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Eugen
>>>>>>>>
>>>>>>>> Zitat von Ilya Dryomov <[email protected]>:
>>>>>>>>
>>>>>>>> > On Fri, May 29, 2026 at 2:33 PM Eugen Block via ceph-users
>>>>>>>> > <[email protected]> wrote:
>>>>>>>> >>
>>>>>>>> >> Hi,
>>>>>>>> >>
>>>>>>>> >> I wanted to rollback a group snapshot on Ubuntu 24.04 (rbd client
>>>>>>>> >> version 19.2.1), the Ceph cluster version is 19.2.3. The
>>>>>>>> client fails
>>>>>>>> >> with "invalid argument":
>>>>>>>> >>
>>>>>>>> >> controller02:~# rbd --id <user> group snap rollback --pool images
>>>>>>>> >> --group test-servers --snap 20260430_start
>>>>>>>> >> Rolling back to group snapshot: 0% complete...failed.
>>>>>>>> >> rbd: rollback group to snapshot failed: (22) Invalid argument
>>>>>>>> >>
>>>>>>>> >> controller02:~# ceph -v
>>>>>>>> >> ceph version 19.2.1 (9efac4a81335940925dd17dbf407bfd6d3860d28)
>>>>>>>> >> squid (stable)
>>>>>>>> >>
>>>>>>>> >> But running the same command (just as admin not as <user>)
>>>>>>>> on a Ceph
>>>>>>>> >> node works:
>>>>>>>> >>
>>>>>>>> >> ceph03:~ # rbd group snap rollback --pool images --group
>>>>>>>> test-servers
>>>>>>>> >> --snap 20260430_start
>>>>>>>> >> Rolling back to group snapshot: 100% complete...done.
>>>>>>>> >>
>>>>>>>> >> ceph03:~ # ceph -v
>>>>>>>> >> ceph version 16.2.13-66-g54799ee0666
>>>>>>>> >> (54799ee06669271880ee5fc715f99202002aa371) pacific (stable)
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> What seems to be the issue here is that I removed an image from the
>>>>>>>> >> group snapshot. I wonder if it could be this bug [0] which
>>>>>>>> is supposed
>>>>>>>> >> to be fixed in 19.2.0 according to the "Released In" field of the
>>>>>>>> >> Squid backport tracker [1].
>>>>>>>> >>
>>>>>>>> >> This seems a little inconsistent to me, could someone
>>>>>>>> please clarify?
>>>>>>>> >
>>>>>>>> > Hi Eugen,
>>>>>>>> >
>>>>>>>> > Did you see "group snapshot membership does not match group
>>>>>>>> membership"
>>>>>>>> > error message when the rollback command failed with 19.2.1 client?
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> >
>>>>>>>> >                 Ilya
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>




_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to