The question: Is this something I need to investigate further, or am I being paranoid? Seems bad to me. ============================================================
I have a fairly new cluster built using ceph-deploy 1.5.34-0, ceph 10.2.2-0, and centos 7.2.1511. I recently noticed on every one of my osd nodes alarming dmesg log entries for each osd on each node on some kind of periodic basis: attempt to access beyond end of device sda1: rw=0, want=11721043088, limit=11721043087 For instance one node had entries at times: Sep 27 05:40:34 Sep 27 07:10:32 Sep 27 08:10:30 Sep 27 09:40:28 Sep 27 12:40:24 Sep 27 15:40:19 In every case, the "want" is 1 sector greater than the "limit"... My first thought was 'could this be an off-by-one bug somewhere in Ceph?' But, after thinking about the way stuff works and the data below, the seems unlikely. Digging around I found and followed this redhat article: https://access.redhat.com/solutions/21135 ---------------------------------------------- Error Message Device Size: 11721043087 * 512 = 6001174060544 Current Device Size: cat /proc/partitions | grep sda1 8 1 5860521543 sda1 5860521543 * 1024 = 6001174060032 Filesystem Size: sudo xfs_info /dev/sda1 | grep data | grep blocks data = bsize=4096 blocks=1465130385, imaxpct=5 1465130385 * 4096 = 6001174056960 ---------------------------------------------- (EMDS != CDS) == true Redhat says device naming may have change. All but 2 disks in the node are identical. Those 2 disks are md raided and not exhibiting the issue. So, I don't think this is the issue. (FSS > CDS) == false My filesystem is not larger than the device size or the error message device size. Thanks, Brady
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com