Re: [DRBD-user] sync doesnt work
Hi Felix, yes all volumes didn't sync even when the /etc/init.d/drbd status said - Primary/Secondary UpToDate/UpToDate C I just finished to manually copy all devices onto the xen002. And about the " meta-disk /dev/vgmain/lv_drbd-meta[0];" you said you are not familiar with it, have a look here - http://www.drbd.org/users-guide-8.3/ch-internals.html#s-meta-data-size my disk size total is 2 TB that means with drbd formula: Mmb<(Cmb/32768) + 1 Mmb<(1024*1024*2/32768) + 1 Mmb<65 and here - http://www.drbd.org/users-guide/re-drbdconf.html#idp10760864 meta-disk internal, meta-disk device, meta-disk device [index] When an index is specified, each index number refers to a fixed slot of meta-data of 128 MB, which allows a maximum data size of 4 GB. This way, multiple DBRD devices can share the same meta-data device. For example, if /dev/sde6[0] and /dev/sde6[1] are used, /dev/sde6 must be at least 256 MB big. Because of the hard size limit, use of meta-disk indexes is discouraged. My meta lv device is 10GB big (lvcreate -L 10GB -n lv_drbd-meta vgmain), i think big enough for 2TB of max disk usage. Thanks Walter. From: Walter Robert Ditzler [mailto:ditwal...@gmail.com] Sent: Mittwoch, 27. Juni 2012 10:31 To: drbd-user@lists.linbit.com Subject: sync doesnt work Importance: High hi all, i have a problem in sync'ing 2 hosts. actually they dont even when i see in the status, that all goes fine! for maintenance reasons i had to move my xen from host xen001 to host xen002. after stopping xen001 and starting xen002 i realized, that i had a 2 month old disk replication. after stopping drbd and doing the "dd bs=4M if=/dev/vgmain/lv_server01 | ssh -p root@10.255.255.2 'dd bs=4M of=/dev/vgmain/lv_server01'" i had again the latest copy onto the xen002 L any glue in that? thanks a lot, walter. (ping works between hosts) *** root@srv-ldeb-xen001:~# ping 10.255.255.2 PING 10.255.255.2 (10.255.255.2) 56(84) bytes of data. 64 bytes from 10.255.255.2: icmp_req=1 ttl=64 time=0.166 ms root@srv-ldeb-xen002:~# ping 10.255.255.1 PING 10.255.255.1 (10.255.255.1) 56(84) bytes of data. 64 bytes from 10.255.255.1: icmp_req=1 ttl=64 time=0.169 ms *** (drbd status) *** root@srv-ldeb-xen001:~# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.11 (api:88/proto:86-96) srcversion: 0D2B62DEDB020A425130935 m:res csro ds p mounted fstype 0:server01Connected Primary/Secondary UpToDate/UpToDate C 1:server02Connected Primary/Secondary UpToDate/UpToDate C 2:server03Connected Primary/Secondary UpToDate/UpToDate C 3:server04Connected Primary/Secondary UpToDate/UpToDate C 4:server05_1 Connected Primary/Secondary UpToDate/UpToDate C 5:server05_2 Connected Primary/Secondary UpToDate/UpToDate C 6:server06Connected Primary/Secondary UpToDate/UpToDate C root@srv-ldeb-xen001:~# *** (lvm and drbd install script) *** lvcreate -L 10GB -n lv_drbd-meta vgmain lvcreate -L 100GB -n lv_server01 vgmain lvcreate -L 50GB -n lv_server02 vgmain lvcreate -L 100GB -n lv_server03 vgmain lvcreate -L 100GB -n lv_server04 vgmain lvcreate -L 50GB -n lv_server05_1 vgmain lvcreate -L 1.15TB -n lv_server05_2 vgmain lvcreate -L 50GB -n lv_server06 vgmain drbdadm -f create-md server01 drbdadm -f create-md server02 drbdadm -f create-md server03 drbdadm -f create-md server04 drbdadm -f create-md server05_1 drbdadm -f create-md server05_2 drbdadm -f create-md server06 /etc/init.d/drbd start drbdadm up server01 drbdadm up server02 drbdadm up server03 drbdadm up server04 drbdadm up server05_1 drbdadm up server06 (only on xen001 host) drbdsetup /dev/drbd0 primary -o drbdsetup /dev/drbd1 primary -o drbdsetup /dev/drbd2 primary -o drbdsetup /dev/drbd3 primary -o drbdsetup /dev/drbd4 primary -o drbdsetup /dev/drbd5 primary -o drbdsetup /dev/drbd6 primary -o *** ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] sync doesnt work
Hi, On 06/28/2012 01:48 PM, Walter Robert Ditzler wrote: > yes all volumes didn't sync even when the /etc/init.d/drbd status said > - Primary/Secondary UpToDate/UpToDate C > I just finished to manually copy all devices onto the xen002. yes, but are they live replicating now that you have completed this task? You can check by snapshotting the backing device on the secondary, if you can survive the performance hit for a few minutes. Just mount the snapshot and examine the data. > When an /index/is specified, each index number refers to a fixed slot of > meta-data of 128 MB, which allows a maximum data size of 4 GB. Interesting, I didn't know that. Is that a typo in the documentation? Because 128MB of metadata for 4GB of data cannot be right. That should probably be 4TB there. If the 4G limit *was* correct, it would explain some things, seeing as your volumes are each way above 4GB, but again - that doesn't make a lick of sense. Cheers, Felix ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] sync doesnt work
Felix, Fort he manuall replicating of my devices i stoped drbd of course. My old host, xen001 is going to be formated and reinstalled. does that mean i should rather setup "internal" fort he meta-data? anyway, strange ist hat the data hasn't synced at all. after coping from xen001 onto xen002 with dd over ssh i finally have the previous state of my domu's. i am only a bit afraid when i bring up again my new xen001, what will be in the future, what did i wrong in the config files or better, what possibilities do i have to track or test the sync's of drbd devices? thanks a lot, walter. -Original Message- From: Felix Frank [mailto:f...@mpexnet.de] Sent: Donnerstag, 28. Juni 2012 13:59 To: Walter Robert Ditzler Cc: drbd-user Subject: Re: sync doesnt work Hi, On 06/28/2012 01:48 PM, Walter Robert Ditzler wrote: > yes all volumes didn't sync even when the /etc/init.d/drbd status said > - Primary/Secondary UpToDate/UpToDate C I just finished to manually > copy all devices onto the xen002. yes, but are they live replicating now that you have completed this task? You can check by snapshotting the backing device on the secondary, if you can survive the performance hit for a few minutes. Just mount the snapshot and examine the data. > When an /index/is specified, each index number refers to a fixed slot > of meta-data of 128 MB, which allows a maximum data size of 4 GB. Interesting, I didn't know that. Is that a typo in the documentation? Because 128MB of metadata for 4GB of data cannot be right. That should probably be 4TB there. If the 4G limit *was* correct, it would explain some things, seeing as your volumes are each way above 4GB, but again - that doesn't make a lick of sense. Cheers, Felix ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] sync doesnt work
On Thu, Jun 28, 2012 at 01:58:57PM +0200, Felix Frank wrote: > Hi, > > On 06/28/2012 01:48 PM, Walter Robert Ditzler wrote: > > yes all volumes didn't sync even when the /etc/init.d/drbd status said > > - Primary/Secondary UpToDate/UpToDate C > > I just finished to manually copy all devices onto the xen002. > > yes, but are they live replicating now that you have completed this task? > > You can check by snapshotting the backing device on the secondary, if > you can survive the performance hit for a few minutes. Just mount the > snapshot and examine the data. > > > When an /index/is specified, each index number refers to a fixed slot of > > meta-data of 128 MB, which allows a maximum data size of 4 GB. 4 TiB minus a few sectors, actually. The only explanation would be that you had been "Diskless" on one of the systems for an extended period of time, or that you had been disconnected for what ever reason, or something fiddled with DRBD meta data. Or that you are bypassing DRBD. I've seen this serveral times: people configuring their VMs to run on the LVs, then telling DRBD to replicate these LVs. [VM][DRBD]--- replicates nothing to --- [DRBD peer] | | sits on `-- writes to [LV] Because no-one is writing to DRBD, DRBD cannot replicate anything. So don't do that. DRBD logs and complete configuration (including the VM configuration) may help to understand what was going on in your setup. Lars ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] sync doesnt work
On 06/28/2012 02:24 PM, Walter Robert Ditzler wrote: > My old > host, xen001 is going to be formated and reinstalled. does that mean i > should rather setup "internal" fort he meta-data? I never tried external, but it really must work one way or the other. Far as I know, there can be performance benefits in external MD, but not if your md disk is in the same LVM VG. You may want to forego the officially discourad meta data indices though. After all, with LVM there is nothing stopping you from creating an md disk per volume. > i am only a bit afraid when i bring up again my new xen001, what will be in > the future, what did i wrong in the config files or better, what > possibilities do i have to track or test the sync's of drbd devices? Right, this should really not happen under any circumstances. As described earlier, you can do simple tests by mounting snapshots of your filesystem on the secondary, or so I believe. Careful though, this *will* affect write performance on your primary. Apart from that, I'm not really sure what happened here. It reminds me of a case a while back when someone had a diskless secondary for a long while, and then ended up with horribly outdated filesystems of course. Sorry for being of little help, Felix ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] sync doesnt work
lars, when i do a: root@srv-ldeb-xen001:~# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.11 (api:88/proto:86-96) srcversion: 0D2B62DEDB020A425130935 m:res csro ds p mounted fstype 0:server01Connected " C 1:server02Connected Primary/Secondary UpToDate/UpToDate C 2:server03Connected Primary/Secondary UpToDate/UpToDate C 3:server04Connected Primary/Secondary UpToDate/UpToDate C 4:server05_1 Connected Primary/Secondary UpToDate/UpToDate C 5:server05_2 Connected Primary/Secondary UpToDate/UpToDate C 6:server06Connected Primary/Secondary UpToDate/UpToDate C root@srv-ldeb-xen001:~# *** can this mean, even when i got "Primary/Secondary UpToDate/UpToDate C" that on one host, in my case xen002, i had a "Diskless" state? i check those servers weekly once or twice, never ever had diffrent than "Primary/Secondary UpToDate/UpToDate C". by the way: on my 2TB disk what is better, internal meta-data or external? and in my config i didnt drbd the meta data, each host had its own meta data onto a lvm. thanks a lot, walter -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Lars Ellenberg Sent: Donnerstag, 28. Juni 2012 14:35 To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] sync doesnt work On Thu, Jun 28, 2012 at 01:58:57PM +0200, Felix Frank wrote: > Hi, > > On 06/28/2012 01:48 PM, Walter Robert Ditzler wrote: > > yes all volumes didn't sync even when the /etc/init.d/drbd status > > said > > - Primary/Secondary UpToDate/UpToDate C I just finished to manually > > copy all devices onto the xen002. > > yes, but are they live replicating now that you have completed this task? > > You can check by snapshotting the backing device on the secondary, if > you can survive the performance hit for a few minutes. Just mount the > snapshot and examine the data. > > > When an /index/is specified, each index number refers to a fixed > > slot of meta-data of 128 MB, which allows a maximum data size of 4 GB. 4 TiB minus a few sectors, actually. The only explanation would be that you had been "Diskless" on one of the systems for an extended period of time, or that you had been disconnected for what ever reason, or something fiddled with DRBD meta data. Or that you are bypassing DRBD. I've seen this serveral times: people configuring their VMs to run on the LVs, then telling DRBD to replicate these LVs. [VM][DRBD]--- replicates nothing to --- [DRBD peer] | | sits on `-- writes to [LV] Because no-one is writing to DRBD, DRBD cannot replicate anything. So don't do that. DRBD logs and complete configuration (including the VM configuration) may help to understand what was going on in your setup. Lars ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] sync doesnt work
On Thu, Jun 28, 2012 at 03:04:30PM +0200, Walter Robert Ditzler wrote: > lars, > > when i do a: > > root@srv-ldeb-xen001:~# /etc/init.d/drbd status > drbd driver loaded OK; device status: > version: 8.3.11 (api:88/proto:86-96) > srcversion: 0D2B62DEDB020A425130935 > m:res csro ds p mounted > fstype > 0:server01Connected " C > 1:server02Connected Primary/Secondary UpToDate/UpToDate C > 2:server03Connected Primary/Secondary UpToDate/UpToDate C > 3:server04Connected Primary/Secondary UpToDate/UpToDate C > 4:server05_1 Connected Primary/Secondary UpToDate/UpToDate C > 5:server05_2 Connected Primary/Secondary UpToDate/UpToDate C > 6:server06Connected Primary/Secondary UpToDate/UpToDate C > root@srv-ldeb-xen001:~# > *** > > can this mean, even when i got "Primary/Secondary UpToDate/UpToDate C" > that on one host, in my case xen002, i had a "Diskless" state? It does not tell *anything* about what may or may not have occurred earlier. For that you need logs. > i check those servers weekly once or twice, never ever had diffrent than > "Primary/Secondary UpToDate/UpToDate C". Then, maybe, you are in fact bypassing DRBD? Do you have any counter increase in /proc/drbd at all? "dw:", "ns:" and so on? > by the way: on my 2TB disk what is better, internal meta-data or external? "yes". > and in my config i didnt drbd the meta data, each host had its own meta data > onto a lvm. > > thanks a lot, > > walter > > -Original Message- > From: drbd-user-boun...@lists.linbit.com > [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Lars Ellenberg > Sent: Donnerstag, 28. Juni 2012 14:35 > To: drbd-user@lists.linbit.com > Subject: Re: [DRBD-user] sync doesnt work > > On Thu, Jun 28, 2012 at 01:58:57PM +0200, Felix Frank wrote: > > Hi, > > > > On 06/28/2012 01:48 PM, Walter Robert Ditzler wrote: > > > yes all volumes didn't sync even when the /etc/init.d/drbd status > > > said > > > - Primary/Secondary UpToDate/UpToDate C I just finished to manually > > > copy all devices onto the xen002. > > > > yes, but are they live replicating now that you have completed this task? > > > > You can check by snapshotting the backing device on the secondary, if > > you can survive the performance hit for a few minutes. Just mount the > > snapshot and examine the data. > > > > > When an /index/is specified, each index number refers to a fixed > > > slot of meta-data of 128 MB, which allows a maximum data size of 4 GB. > > 4 TiB minus a few sectors, actually. > > > The only explanation would be that you had been "Diskless" on one of the > systems for an extended period of time, or that you had been disconnected > for what ever reason, or something fiddled with DRBD meta data. > > > > > Or that you are bypassing DRBD. > > I've seen this serveral times: > > people configuring their VMs to run on the LVs, then telling DRBD to > replicate these LVs. > > > [VM][DRBD]--- replicates nothing to --- [DRBD peer] >| | sits on >`-- writes to [LV] > > Because no-one is writing to DRBD, DRBD cannot replicate anything. > So don't do that. > > > > DRBD logs and complete configuration (including the VM configuration) may > help to understand what was going on in your setup. > > > Lars > ___ > drbd-user mailing list > drbd-user@lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > > ___ > drbd-user mailing list > drbd-user@lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] blocking I/O with drbd
Hi, > Once the host is live again, i will report if that did the trick :-) As promised, here comes the follow-up. Unfortunately 8.3.12 does not do the trick. The described behaviour with the load rising after using dd is still present. But i had the chance to test the I/O-Performance while the whole environment (9 Servers having the drbd-device mounted via nfs) was under very little use. Doing the dd's at 5am in the morning showed almost no problems with I/O-Performance. I was even able to write 400MB to the drbd-device without any problems regarding io-wait. Doing the same dd at 9am made the load go up to around 15. I can conclude, that since the behaviour is the same wth 8.3.8-1 and 8.3.12, this is most likely not a drbd-bug. Having no problems under low usage in contrast to having problems under heavier usage shows, that the problem is the underlying I/O-Subsystem not being able handle the amount of I/O-Requests generated by the whole environment. Im not sure where to go from here. If we find a solution, i'll let you know... :-) - volker ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] blocking I/O with drbd
On Thu, Jan 05, 2012 at 12:06:14PM +0100, Volker wrote: > Hi, > > > Once the host is live again, i will report if that did the trick :-) > > As promised, here comes the follow-up. > > Unfortunately 8.3.12 does not do the trick. The described behaviour with > the load rising after using dd is still present. > > But i had the chance to test the I/O-Performance while the whole > environment (9 Servers having the drbd-device mounted via nfs) was under > very little use. Doing the dd's at 5am in the morning showed almost no > problems with I/O-Performance. I was even able to write 400MB to the > drbd-device without any problems regarding io-wait. > > Doing the same dd at 9am made the load go up to around 15. > > I can conclude, that since the behaviour is the same wth 8.3.8-1 and > 8.3.12, this is most likely not a drbd-bug. Having no problems under low > usage in contrast to having problems under heavier usage shows, that the > problem is the underlying I/O-Subsystem not being able handle the amount > of I/O-Requests generated by the whole environment. > > Im not sure where to go from here. If we find a solution, i'll let you > know... :-) On the server, use io-scheduler: deadline you may need to increase the number of nfsd threads. There are a few other sysfs and sysctl knobs to tune, both server and client side, to help even out write bursts and reduce latency. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] A problem about oracle on drbd
Hi: It may be not appropriate to post this problem here, but I'm just looking for some clues. In our testing environment, we have 2 box attaching to different storage (IBM DS 3000 series), we use LVM to manage the LUNs, build DRBDs (protocol A) on the top of Logic Volume, and create an oracle instance on Primary. Primary: DS 3000-> LVM -> DRBD (Protocal A) -> Oracle Secondary: A different DS 3000-> LVM -> DRBD (Protocal A) In one of our test case, while an application is writing to oracle on primary node, we reboot it and try to recover the oracle database on (previous) secondary node. However, oracle was unable to start, it complains: ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [162], [678757], [683523], [], [], [], [], [], [], [] Did anyone encountered this problem ? Regards! ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] A problem about oracle on drbd
some more information: oracle on (pervious) primary is able to start after recovey. On Fri, Jun 29, 2012 at 11:16 AM, Lyre <417...@gmail.com> wrote: > Hi: > > It may be not appropriate to post this problem here, but I'm just > looking for some clues. > > In our testing environment, we have 2 box attaching to different > storage (IBM DS 3000 series), we use LVM to manage the LUNs, build > DRBDs (protocol A) on the top of Logic Volume, and create an oracle > instance on Primary. > Primary: DS 3000-> LVM -> DRBD (Protocal A) -> Oracle > Secondary: A different DS 3000-> LVM -> DRBD (Protocal A) > > In one of our test case, while an application is writing to oracle > on primary node, we reboot it and try to recover the oracle database > on (previous) secondary node. > However, oracle was unable to start, it complains: ORA-00600: > internal error code, arguments: [kcratr_nab_less_than_odr], [1], > [162], [678757], [683523], [], [], [], [], [], [], [] > > Did anyone encountered this problem ? > > > Regards! ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user