Was a volume with existing data got converted to sharding volume? On Wed, Jan 27, 2021 at 5:06 AM Erik Jacobson <[email protected]> wrote:
> Shortly after the sharded volume is made, there are some fuse mount > messages. I'm not 100% sure if this was just before or during the > big qemu-img command to make the 5T image > (qemu-img create -f raw -o preallocation=falloc > /adminvm/images/adminvm.img 5T) > > > (from /var/log/glusterfs/adminvm.log) > [2021-01-26 19:18:21.287697] I [fuse-bridge.c:5166:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel > 7.31 > [2021-01-26 19:18:21.287719] I [fuse-bridge.c:5777:fuse_graph_sync] > 0-fuse: switched to graph 0 > [2021-01-26 19:18:23.945566] W [MSGID: 114031] > [client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-2: remote > operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.7 > (00000000-0000-0000-0000-000000000000) [No data available] > [2021-01-26 19:18:54.089721] W [MSGID: 114031] > [client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-0: remote > operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.85 > (00000000-0000-0000-0000-000000000000) [No data available] > [2021-01-26 19:18:54.089784] W [MSGID: 114031] > [client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-1: remote > operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.85 > (00000000-0000-0000-0000-000000000000) [No data available] > [2021-01-26 19:18:55.048613] W [MSGID: 114031] > [client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-1: remote > operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.88 > (00000000-0000-0000-0000-000000000000) [No data available] > [2021-01-26 19:18:55.355131] W [MSGID: 114031] > [client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-0: remote > operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.89 > (00000000-0000-0000-0000-000000000000) [No data available] > [2021-01-26 19:18:55.981094] W [MSGID: 114031] > [client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-adminvm-client-0: remote > operation failed. Path: /.shard/0cb55720-2288-46c2-bd7e-5d9bd23b40bd.91 > (00000000-0000-0000-0000-000000000000) [No data available] > ...... > > > Towards the end (or just after, it's hard to tell) of the qemu-img > create command, these msgs showed up in the adminvm.log. I just supplied > the first few. There were many: > > > [2021-01-26 19:28:40.652898] W [MSGID: 101159] > [inode.c:1212:__inode_unlink] 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/48bb5288-e27e-46c9-9f7c-944a804df361.1: > dentry not found in 48bb5288-e27e-46c9-9f7c-944a804df361 > [2021-01-26 19:28:40.652975] W [MSGID: 101159] > [inode.c:1212:__inode_unlink] 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/931508ed-9368-4982-a53e-7187a9f0c1f9.3: > dentry not found in 931508ed-9368-4982-a53e-7187a9f0c1f9 > [2021-01-26 19:28:40.653047] W [MSGID: 101159] > [inode.c:1212:__inode_unlink] 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/e808ecab-2e70-4ef3-954e-ce1b78ed8b52.4: > dentry not found in e808ecab-2e70-4ef3-954e-ce1b78ed8b52 > [2021-01-26 19:28:40.653102] W [MSGID: 101159] > [inode.c:1212:__inode_unlink] 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/2c62c383-d869-4655-9c03-f08a86a874ba.6: > dentry not found in 2c62c383-d869-4655-9c03-f08a86a874ba > [2021-01-26 19:28:40.653169] W [MSGID: 101159] > [inode.c:1212:__inode_unlink] 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/556ffbc9-bcbe-445a-93f5-13784c5a6df1.2: > dentry not found in 556ffbc9-bcbe-445a-93f5-13784c5a6df1 > [2021-01-26 19:28:40.653218] W [MSGID: 101159] > [inode.c:1212:__inode_unlink] 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/5d414e7c-335d-40da-bb96-6c427181338b.5: > dentry not found in 5d414e7c-335d-40da-bb96-6c427181338b > [2021-01-26 19:28:40.653314] W [MSGID: 101159] > [inode.c:1212:__inode_unlink] 0-inode: > be318638-e8a0-4c6d-977d-7a937aa84806/43364dc9-2d8e-4fca-89d2-e11dee6fcfd4.8: > dentry not found in 43364dc9-2d8e-4fca-89d2-e11dee6fcfd4 > ..... > > > So now I installed Linux in to a VM using the above as the VM image. > There were no additional fuse messages while the admin VM was being > installed with our installer (via qemu on the same physical node the > above messages appeared and same node where I ran qemu-img create). > > Rebooted the virtual machine and it booted fine. No new messages in > fuse log. So now it's officially booted. This was 'reboot' so qemu > didn't restart. > > halted the vm with 'halt', then in virt-manager did a forced shut down. > > started vm from scratch. > > Still no new messages and it booted fine. > > Powered off a physical node and brought it back, still fine. > Reset all physical nodes and brought them back, still fine. > > I am unable to trigger this problem. However, once it starts to go bad, > it stays bad and stays bad across all the physical nodes. The kpartx > mount root from within the image then umount it trick is only a > temporary fix that doesn't persist beyond one boot once we're in the > bad state. > > So something gets in to a bad state and stays that way but we don't know > how to cause it to happen at will. I will continue to try to reproduce > this as it's causing some huge problems in the field. > > > > > On Tue, Jan 26, 2021 at 07:40:19AM -0600, Erik Jacobson wrote: > > Thank you so much for responding! More below. > > > > > > > Anything in the logs of the fuse mount? can you stat the file from > the mount? > > > also, the report of an image is only 64M makes me think about Sharding > as the > > > default value of Shard size is 64M. > > > Do you have any clues on when this issue start to happen? was there any > > > operation done to the Gluster cluster? > > > > > > - I had just created the gluster volumes within an hour of the problem > > to test the vary problem I reported. So it was a "fresh start". > > > > - It booted one or two times, then stopped booting. Once it couldn't > > boot, all 3 nodes were the same in that grub2 couldn't boot in the VM > > image. > > > > As for the fuse log, I did see a couple of these before it happened the > > first time. I'm not sure if it's a clue or not. > > > > [2021-01-25 22:48:19.310467] I [fuse-bridge.c:5777:fuse_graph_sync] > 0-fuse: switched to graph 0 > > [2021-01-25 22:50:09.693958] E [fuse-bridge.c:227:check_and_dump_fuse_W] > (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17a)[0x7f914e346faa] > (--> /usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x874a)[0x7f914a3d374a] > (--> /usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x91cb)[0x7f914a3d41cb] > (--> /lib64/libpthread.so.0(+0x84f9)[0x7f914cf184f9] (--> > /lib64/libc.so.6(clone+0x3f)[0x7f914c76afbf] ))))) 0-glusterfs-fuse: > writing to fuse device failed: No such file or directory > > [2021-01-25 22:50:09.694462] E [fuse-bridge.c:227:check_and_dump_fuse_W] > (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17a)[0x7f914e346faa] > (--> /usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x874a)[0x7f914a3d374a] > (--> /usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x91cb)[0x7f914a3d41cb] > (--> /lib64/libpthread.so.0(+0x84f9)[0x7f914cf184f9] (--> > /lib64/libc.so.6(clone+0x3f)[0x7f914c76afbf] ))))) 0-glusterfs-fuse: > writing to fuse device failed: No such file or directory > > > > > > > > I have reserved the test system again. My plans today are: > > - Start over with the gluster volume on the machine with sles15sp2 > > updates > > > > - Learn if there are modifications to the image (besides > > mounting/umounting filesystems with the image using kpartx to map > > them to force it to work). What if I add/remove a byte from the end > > of the image file for example. > > > > - Revert the setup to sles15sp2 with no updates. My theory is the > > updates are not making a difference and it's just random chance. > > (re-making the gluster volume in the process) > > > > - The 64MB shard size made me think too!! > > > > - If the team feels it is worth it, I could try a newer gluster. We're > > using the versions we've validated at scale when we have large > > clusters in the factory but if the team thinks I should try something > > else I'm happy to re-build it!!! We are @ 7.2 plus > afr-event-gen-changes > > patch. > > > > I will keep a better eye on the fuse log to tie an error to the problem > > starting. > > > > > > THANKS AGAIN for responding and let me know if you have any more > > clues! > > > > Erik > > > > > > > > > > On Tue, Jan 26, 2021 at 2:40 AM Erik Jacobson <[email protected]> > wrote: > > > > > > Hello all. Thanks again for gluster. We're having a strange problem > > > getting virtual machines started that are hosted on a gluster > volume. > > > > > > One of the ways we use gluster now is to make a HA-ish cluster head > > > node. A virtual machine runs in the shared storage and is backed > up by 3 > > > physical servers that contribute to the gluster storage share. > > > > > > We're using sharding in this volume. The VM image file is around > 5T and > > > we use qemu-img with falloc to get all the blocks allocated in > advance. > > > > > > We are not using gfapi largely because it would mean we have to > build > > > our own libvirt and qemu and we'd prefer not to do that. So we're > using > > > a glusterfs fuse mount to host the image. The virtual machine is > using > > > virtio disks but we had similar trouble using scsi emulation. > > > > > > The issue: - all seems well, the VM head node installs, boots, etc. > > > > > > However, at some point, it stops being able to boot! grub2 acts > like it > > > cannot find /boot. At the grub2 prompt, it can see the partitions, > but > > > reports no filesystem found where there are indeed filesystems. > > > > > > If we switch qemu to use "direct kernel load" (bypass grub2), this > often > > > works around the problem but in one case Linux gave us a clue. > Linux > > > reported /dev/vda as only being 64 megabytes, which would explain > a lot. > > > This means the virtual machine Linux though the disk supplied by > the > > > disk image was tiny! 64M instead of 5T > > > > > > We are using sles15sp2 and hit the problem more often with updates > > > applied than without. I'm in the process of trying to isolate if > there > > > is a sles15sp2 update causing this, or if we're within "random > chance". > > > > > > On one of the physical nodes, if it is in the failure mode, if I > use > > > 'kpartx' to create the partitions from the image file, then mount > the > > > giant root filesystem (ie mount /dev/mapper/loop0p31 /mnt) and then > > > umount /mnt, then that physical node starts the VM fine, grub2 > loads, > > > the virtual machine is fully happy! Until I try to shut it down > and > > > start it up again, at which point it sticks at grub2 again! What > about > > > mounting the image file makes it so qemu sees the whole disk? > > > > > > The problem doesn't always happen but once it starts, the same VM > image has > > > trouble starting on any of the 3 physical nodes sharing the > storage. > > > But using the trick to force-mount the root within the image with > > > kpartx, then the machine can come up. My only guess is this > changes the > > > file just a tiny bit in the middle of the image. > > > > > > Once the problem starts, it keeps happening except temporarily > working > > > when I do the loop mount trick on the physical admin. > > > > > > > > > Here is some info about what I have in place: > > > > > > > > > nano-1:/adminvm/images # gluster volume info > > > > > > Volume Name: adminvm > > > Type: Replicate > > > Volume ID: 67de902c-8c00-4dc9-8b69-60b93b5f6104 > > > Status: Started > > > Snapshot Count: 0 > > > Number of Bricks: 1 x 3 = 3 > > > Transport-type: tcp > > > Bricks: > > > Brick1: 172.23.255.151:/data/brick_adminvm > > > Brick2: 172.23.255.152:/data/brick_adminvm > > > Brick3: 172.23.255.153:/data/brick_adminvm > > > Options Reconfigured: > > > performance.client-io-threads: on > > > nfs.disable: on > > > storage.fips-mode-rchecksum: on > > > transport.address-family: inet > > > performance.quick-read: off > > > performance.read-ahead: off > > > performance.io-cache: off > > > performance.low-prio-threads: 32 > > > network.remote-dio: enable > > > cluster.eager-lock: enable > > > cluster.quorum-type: auto > > > cluster.server-quorum-type: server > > > cluster.data-self-heal-algorithm: full > > > cluster.locking-scheme: granular > > > cluster.shd-max-threads: 8 > > > cluster.shd-wait-qlength: 10000 > > > features.shard: on > > > user.cifs: off > > > cluster.choose-local: off > > > client.event-threads: 4 > > > server.event-threads: 4 > > > cluster.granular-entry-heal: enable > > > storage.owner-uid: 439 > > > storage.owner-gid: 443 > > > > > > > > > > > > > > > libglusterfs0-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64 > > > glusterfs-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64 > > > python3-gluster-7.2-4723.1520.210122T1700.a.sles15sp2hpe.noarch > > > > > > > > > > > > nano-1:/adminvm/images # uname -a > > > Linux nano-1 5.3.18-24.46-default #1 SMP Tue Jan 5 16:11:50 UTC > 2021 > > > (4ff469b) x86_64 x86_64 x86_64 GNU/Linux > > > nano-1:/adminvm/images # rpm -qa | grep qemu-4 > > > qemu-4.2.0-9.4.x86_64 > > > > > > > > > > > > Would love any advice!!!! > > > > > > > > > Erik > > > ________ > > > > > > > > > > > > Community Meeting Calendar: > > > > > > Schedule - > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > > Bridge: https://meet.google.com/cpu-eiue-hvk > > > Gluster-users mailing list > > > [email protected] > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > -- > > > Respectfully > > > Mahdi > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > [email protected] > https://lists.gluster.org/mailman/listinfo/gluster-users > -- -- https://kadalu.io Container Storage made easy!
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
