>>> I'm using DNS zone transfers in and out of OpenDNSSEC with OpenDNSSEC >>> version 1.4.6. It looks like one of the zones have become wedged, and >>> OpenDNSSEC refuses to transfer a new copy, despite a new SOA being >>> announced via DNS notify. ods-signerd logs: >>> >>> <timestamp+host> ods-signerd: [query] ignore notify from a.b.c.d: zone >>> xxx.yyy.no transfer in progress >> >> This may be a bit misleading log message: The query code checks >> whether there is already a notify acquired. If so, there is a check to >> see if the incoming notify has a serial newer than OpenDNSSEC knows >> of. If not, it will log this message. >> >> "Look, I got a notify already and need to transfer anyway" was perhaps >> a better log message. Or perhaps "updated notify serial to >> <new_serial>". > > Hmm... That doesn't match with the observed behaviour. What I saw > was that I did an update of the zone on the hidden master, but the new > zone with the updated SOA version number (for the SOA versioning > regime between the hidden master and OpenDNSSEC) was not being > transferred to the OpenDNSSEC host. This state persisted for at least > a day, until the user who requested the additions complained that they > were still not visible in the public DNS, and an investigation > confirmed this -- the distribution master which is at the exit portion > of OpenDNSSEC didn't have the newly added records. > > Hm, I may have read the code in query_process_notify() wrong, and my > initial explanation of the bug was possibly wrong. But at least I'm > pretty certain of my observed behaviour: changes from the hidden > master did not make it through OpenDNSSEC and out to the publication > master, and stopping OpenDNSSEC, removing the xfrd.state file and > restarting OpenDNSSEC fixed the logjam.
OpenDNSSEC has now decided that the zone has "expired". The Serial line from the .xfrd-state file looks like this: ;;Serial: xfr 2014091709 1414052224 notify 2014102302 1414059442 disk 2014091709 1414052224 Decoded: serial_xfr = 2014091709 serial_xfr_acquired = Thu Oct 23 10:17:04 CEST 2014 serial_notify = 2014102302 serial_notify_acquired = Thu Oct 23 12:17:22 CEST 2014 serial_disk = 2014091709 serial_disk_acquired = Thu Oct 23 10:17:04 CEST 2014 So the notify serial# received is much newer than the one on disk or from the last zone transfer. However, it looks like serial_notify_acquired is not updated; in my log I have a log message saying: Oct 28 11:39:49 ods-host ods-signerd: [query] ignore notify from x.x.x.x: zone x.y.z.in-addr.arpa transfer in progress Further, it seems that the signer is making repeated attempts at doing IXFR zone transfer of the zone. OpenDNSSEC is logging Oct 28 20:07:37 ods-host ods-enforcerd: Zone x.y.z.in-addr.arpa found. Oct 28 20:07:37 ods-host ods-enforcerd: Policy for x.y.z.in-addr.arpa set to de fault. Oct 28 20:07:37 ods-host ods-enforcerd: Config will be output to /var/opendnssec/si gnconf/x.y.z.in-addr.arpa.xml. Oct 28 20:07:37 ods-host ods-enforcerd: No change to: /var/opendnssec/signconf/46.3 9.128.in-addr.arpa.xml Oct 28 20:13:01 ods-host ods-signerd: [STATS] x.y.z.in-addr.arpa 2014102801 RR[ count=0 time=0(sec)] NSEC3[count=0 time=0(sec)] RRSIG[new=2 reused=251 time=0(se c) avg=0(sig/sec)] TOTAL[time=0(sec)] Oct 28 20:58:10 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request ud p/ixfr=3180924024 to a.b.c.d Oct 28 20:58:10 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa received t oo short udp reply from a.b.c.d, retry tcp Oct 28 20:58:10 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request tc p/ixfr=3180924024 to a.b.c.d Oct 28 20:58:40 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request ud p/ixfr=3180924024 to a.b.c.d Oct 28 20:58:40 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa received t oo short udp reply from a.b.c.d, retry tcp Oct 28 20:58:40 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request tc p/ixfr=3180924024 to a.b.c.d Oct 28 20:59:10 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request ud p/ixfr=3180924024 to a.b.c.d Oct 28 20:59:10 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa received t oo short udp reply from a.b.c.d, retry tcp Oct 28 20:59:10 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request tc p/ixfr=3180924024 to a.b.c.d However, I do *not* see a log message fitting with this line in the code for this particular zone: ods_log_info("[%s] zone %s transfer done [notify acquired %u, serial on " "disk %u, notify serial %u]", xfrd_str, zone->name, xfrd->serial_notify_acquired, xfrd->serial_disk, xfrd->serial_notify); On the BIND side (which is the hidden master), I see these lines corresponding to this: Oct 28 20:58:10 hidden-master named[4795]: client a.b.e.f#61709/key tsig-key-name (x.y.z.in-addr.arpa): transfer of 'x.y.z.in-addr.arpa/IN': IXFR ended Oct 28 20:58:40 hidden-master named[4795]: client a.b.e.f#61707/key tsig-key-name (x.y.z.in-addr.arpa): transfer of 'x.y.z.in-addr.arpa/IN': IXFR ended Oct 28 20:59:10 hidden-master named[4795]: client a.b.e.f#61706/key tsig-key-name (x.y.z.in-addr.arpa): transfer of 'x.y.z.in-addr.arpa/IN': IXFR ended So... Can it be that there's an interoperability problem between OpenDNSSEC and BIND when it comes to doing IXFR? When zone transfers work, I see Oct 28 11:39:50 ods-host ods-signerd: [xfrd] zone ok.x.y.in-addr.arpa request udp/ixfr=2014102701 to a.b.c.d Oct 28 11:39:50 ods-host ods-signerd: [xfrd] zone ok.x.y.in-addr.arpa received too short udp reply from a.b.c.d, retry tcp Oct 28 11:39:50 ods-host ods-signerd: [xfrd] zone ok.x.y.in-addr.arpa request tcp/ixfr=2014102701 to a.b.c.d Oct 28 11:39:50 ods-host ods-signerd: [xfrd] zone ok.x.y.in-addr.arpa transfer done [notify acquired 1414492790, serial on disk 2014102804, notify serial 2014102804] Hmm.... Upping the logging level for the signer should reveal some more details. OK, here we go: Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa timeout Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa make request [udp round 0 master a.b.c.d:0] Oct 30 13:53:08 ods-host ods-signerd: [domain] tsig sign query with key: tsig-key-name. Oct 30 13:53:08 ods-host ods-signerd: [domain] tsig sign query with algorithm: hmac-sha256. Oct 30 13:53:08 ods-host ods-signerd: [xfrd] tsig append rr to request id=53126 Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request udp/ixfr=3180924024 to a.b.c.d Oct 30 13:53:08 ods-host ods-signerd: [xfrd] send 215 bytes over udp to a.b.c.d Oct 30 13:53:08 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event udp read Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sets timer timeout now Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa read data from udp Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa received too short udp reply from a.b.c.d, retry tcp Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa xfr packet parsed (res 3) Oct 30 13:53:08 ods-host ods-signerd: [xfrd] truncation from a.b.c.d Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa open tcp connection to a.b.c.d Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request tcp/ixfr=3180924024 to a.b.c.d Oct 30 13:53:08 ods-host ods-signerd: [domain] tsig sign query with key: tsig-key-name. Oct 30 13:53:08 ods-host ods-signerd: [domain] tsig sign query with algorithm: hmac-sha256. Oct 30 13:53:08 ods-host ods-signerd: [xfrd] tsig append rr to request id=40799 Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sending tcp query id=40799 Oct 30 13:53:08 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp write Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa done writing, get ready for reading Oct 30 13:53:08 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp read Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa xfr packet parsed (res 1) Oct 30 13:53:08 ods-host ods-signerd: [file] open file file=x.y.z.in-addr.arpa.xfrd mode=appending Oct 30 13:53:08 ods-host ods-signerd: [file] openfile x.y.z.in-addr.arpa.xfrd count 1 Oct 30 13:53:08 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp read Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sets timer timeout now Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa release tcp connection to a.b.c.d Oct 30 13:53:38 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:53:38 ods-host ods-signerd: [netio] dispatch timeout event without checking for other events Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa timeout Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa make request [udp round 1 master a.b.c.d:0] Oct 30 13:53:38 ods-host ods-signerd: [domain] tsig sign query with key: tsig-key-name. Oct 30 13:53:38 ods-host ods-signerd: [domain] tsig sign query with algorithm: hmac-sha256. Oct 30 13:53:38 ods-host ods-signerd: [xfrd] tsig append rr to request id=28997 Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request udp/ixfr=3180924024 to a.b.c.d Oct 30 13:53:38 ods-host ods-signerd: [xfrd] send 215 bytes over udp to a.b.c.d Oct 30 13:53:38 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event udp read Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sets timer timeout now Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa read data from udp Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa received too short udp reply from a.b.c.d, retry tcp Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa xfr packet parsed (res 3) Oct 30 13:53:38 ods-host ods-signerd: [xfrd] truncation from a.b.c.d Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa open tcp connection to a.b.c.d Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request tcp/ixfr=3180924024 to a.b.c.d Oct 30 13:53:38 ods-host ods-signerd: [domain] tsig sign query with key: tsig-key-name. Oct 30 13:53:38 ods-host ods-signerd: [domain] tsig sign query with algorithm: hmac-sha256. Oct 30 13:53:38 ods-host ods-signerd: [xfrd] tsig append rr to request id=50256 Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sending tcp query id=50256 Oct 30 13:53:38 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp write Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa done writing, get ready for reading Oct 30 13:53:38 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp read Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa xfr packet parsed (res 1) Oct 30 13:53:38 ods-host ods-signerd: [file] open file file=x.y.z.in-addr.arpa.xfrd mode=appending Oct 30 13:53:38 ods-host ods-signerd: [file] openfile x.y.z.in-addr.arpa.xfrd count 1 Oct 30 13:53:38 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp read Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sets timer timeout now Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa release tcp connection to a.b.c.d Oct 30 13:54:08 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:54:08 ods-host ods-signerd: [netio] dispatch timeout event without checking for other events Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa timeout Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa make request [udp round 2 master a.b.c.d:0] Oct 30 13:54:08 ods-host ods-signerd: [domain] tsig sign query with key: tsig-key-name. Oct 30 13:54:08 ods-host ods-signerd: [domain] tsig sign query with algorithm: hmac-sha256. Oct 30 13:54:08 ods-host ods-signerd: [xfrd] tsig append rr to request id=40835 Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request udp/ixfr=3180924024 to a.b.c.d Oct 30 13:54:08 ods-host ods-signerd: [xfrd] send 215 bytes over udp to a.b.c.d Oct 30 13:54:08 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event udp read Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sets timer timeout now Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa read data from udp Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa received too short udp reply from a.b.c.d, retry tcp Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa xfr packet parsed (res 3) Oct 30 13:54:08 ods-host ods-signerd: [xfrd] truncation from a.b.c.d Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa open tcp connection to a.b.c.d Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request tcp/ixfr=3180924024 to a.b.c.d Oct 30 13:54:08 ods-host ods-signerd: [domain] tsig sign query with key: tsig-key-name. Oct 30 13:54:08 ods-host ods-signerd: [domain] tsig sign query with algorithm: hmac-sha256. Oct 30 13:54:08 ods-host ods-signerd: [xfrd] tsig append rr to request id=51054 Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sending tcp query id=51054 Oct 30 13:54:08 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp write Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa done writing, get ready for reading Oct 30 13:54:08 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp read Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa xfr packet parsed (res 1) Oct 30 13:54:08 ods-host ods-signerd: [file] open file file=x.y.z.in-addr.arpa.xfrd mode=appending Oct 30 13:54:08 ods-host ods-signerd: [file] openfile x.y.z.in-addr.arpa.xfrd count 1 Oct 30 13:54:08 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:54:39 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp read Oct 30 13:54:39 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sets timer timeout now Oct 30 13:54:39 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa release tcp connection to a.b.c.d Oct 30 13:54:39 ods-host ods-signerd: [xfrhandler] netio dispatch Oct 30 13:54:39 ods-host ods-signerd: [netio] dispatch timeout event without checking for other events Oct 30 13:54:39 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa timeout Oct 30 13:54:39 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sets timer timeout retry 3600 Oct 30 13:54:39 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa make request wait retry Oct 30 13:54:39 ods-host ods-signerd: [xfrhandler] netio dispatch If I manually try to do the IXFR with dig from the OpenDNSSEC host, it gives me the entire zone, and ends with SOA and TSIG, and the dig output ends with % dig @a.b.c.d -y hmac-sha256:<keyname>:<keymaterial> x.y.z.in-addr.arpa. ixfr=2014091709 ... ;; Query time: 5 msec ;; SERVER: a.b.c.d#53(a.b.c.d) ;; WHEN: Thu Oct 30 14:08:13 2014 ;; XFR size: 130 records (messages 1, bytes 3762) % So ... my current best bet is that the problem is a lack of interoperation with BIND when it in essence sends the whole zone in response to an IXFR query. Hmm, looking at tcpdump -v output of a failed transfer attempt by OpenDNSSEC and a successful one with dig reveals some differences; I'll send you the details separately, but the start looks ... "mis-encoded": ods-host.uninett.no.56056 > hidden-master.uninett.no.domain: Flags [P.], cksum 0x9bcc (correct), seq 3:218, ack 1, win 4197, options [nop,nop,TS val 1 ecr 2], length 2150 [b2&3=0x1] [1a] [0q] [1n] [564au] ^Bbb^Caaa^Gin-addr^Darpa^@^@M... vs. ods-host.uninett.no.55751 > hidden-master.uninett.no.domain: Flags [P.], cksum 0x858c (correct), seq 1:171, ack 1, win 4197, options [nop,nop,TS val 1 ecr 1], length 17062286 [1n] [2au] IXFR? cc.bb.aaa.in-addr.arpa. ns: cc.bb.aaa.in-addr.arpa. SOA . . 2014091709 0 0 0 0 ar: . .... and a failed OpenDNSSEC transaction transpires like this: 14:50:45.574240 IP hidden-master.uninett.no.domain > ods-host.uninett.no.56056: Flags [P.], seq 1:186, ack 218, win 4197, options [nop,nop,TS val 2 ecr 1], length 18550176*- 1/0/1 SOA (183) 14:50:45.773002 IP ods-host.uninett.no.56056 > hidden-master.uninett.no.domain: Flags [.], ack 186, win 4197, options [nop,nop,TS val 2 ecr 2], length 0 14:51:15.648025 IP hidden-master.uninett.no.domain > ods-host.uninett.no.56056: Flags [F.], seq 186, ack 218, win 4197, options [nop,nop,TS val 62 ecr 2], length 0 i.e. the BIND instance closes the tcp session after 30 seconds of inactivity. So... Failure to do correct IXFR over TCP, perhaps? Regards, - HÃ¥vard _______________________________________________ Opendnssec-user mailing list Opendnssec-user@lists.opendnssec.org https://lists.opendnssec.org/mailman/listinfo/opendnssec-user