Closing the loop on this: A bit over a week in, any of the boxes on 2.0.7 consistently reproduce the issue (need `configure` issued twice to actually pick up the export policy changes) whereas the test boxes on 2.0.10 are still consistently behaving properly (single `configure` properly updates exports).
On Wed, Mar 8, 2023 at 2:20 PM Hugo Slabbert < hugo.slabb...@menlosecurity.com> wrote: > *nods* > > We generally run distro packages unless we have specific requirements for > newer features or fixes etc. > > Current experiments show that the test routers on 2.0.7 started to > reproduce this issue after about 3 days since BIRD's process start, whereas > the test boxes we have on 2.0.10 have not yet reproduced the issue after > about 3 days and 5 days, respectively. > > It's always tough to prove a negative, but we'll keep it running through > the weekend to validate and look at bumping to 2.0.10 across the fleet if > we're still clear on the 2.0.10 boxes at that point. > > On Wed, Mar 8, 2023 at 2:17 PM Ross Tajvar <r...@tajvar.io> wrote: > >> By the way - compiling the most recent version of bird is very easy. So >> even though there's not a package for 2.0.12 for bullseye, I recommend just >> compiling with the same options as the bird package and running that. >> >> On Fri, Mar 3, 2023, 2:07 PM Hugo Slabbert via Bird-users < >> bird-users@network.cz> wrote: >> >>> Ah; thanks. Okay, I was misreading that as just referring to regular >>> table filtering, not in conjunction with import/export. I had looked at >>> `show >>> symbols table` and not seen any indication of it, but missed that these >>> are present in the `show route export table <p.c>` format regardless. >>> >>> Thanks. That confirms that we do in fact see a difference there between >>> the export table and the ad hoc route export view when this occurs, after a >>> single call to `birdc configure` (scrubbed slightly here): >>> >>> ``` >>> bird> show route export gw_085ea85_euwest2 >>> bird> >>> bird> show route export table gw_085ea85_euwest2.ipv4 >>> Table export: >>> 57.140.1.0/24 >>> <https://isolate-menlo.menlosecurity.com/0/eJyrViotylGyUsooKSmw0tc3NdczNDHQM9Qz0DcyUdJRKspXsjLUUSrJTAGqMTSxNFeqBQBU6gyn> >>> unicast [<name of static source> 16:41:41.058] * (100) >>> via <next hop> on bond0 onlink >>> ``` >>> >>> We'll keep an eye here and validate if we do see this returning on >>> 2.0.10 as well, or if 2.0.10 remains clear. >>> >>> On Fri, Mar 3, 2023 at 10:18 AM Alexander Zubkov <gr...@qrator.net> >>> wrote: >>> >>>> Hi, >>>> >>>> It is documented in recent versions and on the bird's site too. Pay >>>> attention to this: >>>> >>>> [(import|export) table p.c] >>>> >>>> On Fri, Mar 3, 2023, 18:32 Hugo Slabbert via Bird-users < >>>> bird-users@network.cz> wrote: >>>> >>>>> Right, so, >>>>> >>>>> I've gone ahead and enabled export tables on the channels for the >>>>> relevant peers, per Alexander's suggestion for possibly getting additional >>>>> visibility. I don't seem to spot any different views for route status, >>>>> though. I don't see any particular docs on how to *view* export >>>>> tables; does enabling export tables make a different view available to >>>>> look >>>>> at the export table contents specifically? Or does it just shift the >>>>> behaviour so `show route export <protocol>` displays the export table >>>>> contents rather than a point-in-time evaluation of export filters for the >>>>> specified neighbor? >>>>> >>>>> Snippet showing export tables config enabled for the peers: >>>>> >>>>> ``` >>>>> template bgp GATEWAY_v6 { >>>>> hold time 6; >>>>> startup hold time 20; >>>>> connect delay time 3; >>>>> connect retry time 6; >>>>> error wait time 3, 12; >>>>> med metric; >>>>> allow local as 1; >>>>> >>>>> local fdff::4:2 as MENLO_ASN; >>>>> >>>>> ipv6 { >>>>> export table on; >>>>> import filter GATEWAY_IMPORT_v6; >>>>> export filter GATEWAY_EXPORT_v6; >>>>> }; >>>>> ipv4 { >>>>> export table on; >>>>> extended next hop on; >>>>> add paths rx; >>>>> import filter GATEWAY_IMPORT_v4; >>>>> export filter GATEWAY_EXPORT_v4; >>>>> }; >>>>> } >>>>> # ... >>>>> protocol bgp gw_085ea85_euwest2 from GATEWAY_v6 { >>>>> neighbor fdff::8005:f4d1 as 65000; >>>>> } >>>>> ``` >>>>> >>>>> I don't see any different behaviour on the affected hosts, though. >>>>> E.g. a host that just had `configure` called once after setting the >>>>> draining flag is showing these symptoms, showing nothing for `show >>>>> route export <protocol>`: >>>>> >>>>> ``` >>>>> bird> show route export gw_085ea85_euwest2 >>>>> bird> >>>>> ``` >>>>> >>>>> ...but still showing exports under the protocol details: >>>>> >>>>> ``` >>>>> bird> show protocols all gw_085ea85_euwest2 >>>>> Name Proto Table State Since Info >>>>> gw_085ea85_euwest2 BGP --- up 2023-03-03 16:33:43 >>>>> Established >>>>> BGP state: Established >>>>> # ... >>>>> Channel ipv6 >>>>> State: UP >>>>> Table: master6 >>>>> Preference: 100 >>>>> Input filter: GATEWAY_IMPORT_v6 >>>>> Output filter: GATEWAY_EXPORT_v6 >>>>> Routes: 2 imported, 2 exported, 1 preferred >>>>> Route change stats: received rejected filtered ignored >>>>> accepted >>>>> Import updates: 3 0 1 0 >>>>> 2 >>>>> Import withdraws: 0 0 --- 0 >>>>> 0 >>>>> Export updates: 109 5 96 --- >>>>> 8 >>>>> Export withdraws: 2 --- --- --- >>>>> 2 >>>>> BGP Next hop: fdff::4:2 >>>>> Channel ipv4 >>>>> State: UP >>>>> Table: master4 >>>>> Preference: 100 >>>>> Input filter: GATEWAY_IMPORT_v4 >>>>> Output filter: GATEWAY_EXPORT_v4 >>>>> Routes: 12 imported, 1 exported, 0 preferred >>>>> Route change stats: received rejected filtered ignored >>>>> accepted >>>>> Import updates: 12 0 0 0 >>>>> 12 >>>>> Import withdraws: 0 0 --- 0 >>>>> 0 >>>>> Export updates: 39 4 31 --- >>>>> 4 >>>>> Export withdraws: 0 --- --- --- >>>>> 1 >>>>> BGP Next hop: fdff::4:2 >>>>> ``` >>>>> >>>>> Note this is still on 2.0.7. We've bumped some hosts to 2.0.10, but >>>>> as indicated in the previous message, just a simple restart clears this >>>>> issue from occurring. We've enabled the export table config on both a >>>>> 2.0.7 and a 2.0.10 host, to be able to possibly spot if this reoccurs on >>>>> the 2.0.10 host as well after a period. An example host on 2.0.7 showing >>>>> this behaviour has been up for ~2 weeks. The box upgraded to 2.0.10 has >>>>> had >>>>> BIRD running for just ~16 hours at this point and is not yet showing any >>>>> issues. >>>>> >>>>> On Thu, Mar 2, 2023 at 4:07 PM Hugo Slabbert < >>>>> hugo.slabb...@menlosecurity.com> wrote: >>>>> >>>>>> A slight update on this: >>>>>> >>>>>> 3f477ccb >>>>>> <https://isolate-menlo.menlosecurity.com/0/eJwNzcEOgjAMANB_6RmpC5tN9zdrx7AJiBnlovHf5fZu7wtnXyHD0_19ZMTFfC0yvkxH_eDFA8V6xRvqvm3mOLVIpCr3aa7M2h6UopSiJRA3bcRBtJaUYIC-Qw4DuNUrCJEJfn_7GSJQ> >>>>>> does appear to be in 2.0.7, which we're running, so if that's the issue >>>>>> that may not be the problem. >>>>>> >>>>>> This looked to be successful initially when upgrading to 2.0.10. But, >>>>>> I then checked a box that was still running 2.0.7 and where we could >>>>>> repro >>>>>> it. I simply restarted bird there, and then could no longer repro it. >>>>>> >>>>>> So, just restarting bird at 2.0.7 was sufficient to clear the >>>>>> problem, at least temporarily, and the bump to 2.0.10 then wasn't a clear >>>>>> test, given that's obviously a fresh instance of bird running. >>>>>> >>>>>> We'll try to validate if the problem eventually returns on the 2.0.7 >>>>>> box(es) after a restart, and if it does *not* return on the 2.0.10 >>>>>> instance, but we don't have a clear timeline at the moment on this if >>>>>> it's >>>>>> something that pops up "in a while" of bird running. >>>>>> >>>>>> On Thu, Mar 2, 2023 at 2:54 PM Hugo Slabbert < >>>>>> hugo.slabb...@menlosecurity.com> wrote: >>>>>> >>>>>>> Was this perhaps 3f477ccb >>>>>>> <https://isolate-menlo.menlosecurity.com/0/eJwNzcEOgjAMANB_6RmpC5tN9zdrx7AJiBnlovHf5fZu7wtnXyHD0_19ZMTFfC0yvkxH_eDFA8V6xRvqvm3mOLVIpCr3aa7M2h6UopSiJRA3bcRBtJaUYIC-Qw4DuNUrCJEJfn_7GSJQ> >>>>>>> ? >>>>>>> >>>>>>> Filters: Function body comparison result now used. >>>>>>>> Function bodies were compared in post-parse time, yet the result >>>>>>>> was not >>>>>>>> used and the functions were incorrectly considered the same as >>>>>>>> before. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Now the result is used to reload affected protocols. >>>>>>> >>>>>>> >>>>>>> On Thu, Mar 2, 2023 at 2:51 PM Hugo Slabbert < >>>>>>> hugo.slabb...@menlosecurity.com> wrote: >>>>>>> >>>>>>>> ah, right, apologies. >>>>>>>> >>>>>>>> bird 2.0.7-4.1 on Debian 11.6, kernel 5.10.136-1 >>>>>>>> >>>>>>>> Looks like 2.0.7 was released Oct 16 2019 ( >>>>>>>> https://bird.network.cz/?download >>>>>>>> <https://isolate-menlo.menlosecurity.com/0/eJyrViotylGyUsooKSkottLXT8osStHLSy0pzy_K1kuu0rdPyS_Py8lPTFHSUSrKV7Iy1FEqyUwBajA0sTRXqgUAsmAUQw>), >>>>>>>> so a fair chance we might be hitting this? It looks like something from >>>>>>>> 2.0.10 is available from the bullseye backports, with the most recent >>>>>>>> being >>>>>>>> 2.0.12 in bookworm or sid. I'll look at pulling one of those in to >>>>>>>> validate. >>>>>>>> >>>>>>>> ...where changes in functions sometimes got ignored. >>>>>>>> >>>>>>>> >>>>>>>> This might be reaching, but would that explain the difference >>>>>>>> between what's shown in route export status output versus what's >>>>>>>> actually >>>>>>>> being exported? >>>>>>>> >>>>>>>> On Thu, Mar 2, 2023 at 2:39 PM Maria Matejka via Bird-users < >>>>>>>> bird-users@network.cz> wrote: >>>>>>>> >>>>>>>>> Hello! >>>>>>>>> >>>>>>>>> >>>>>>>>> > We've tried adding a sleep between when the include snippet that >>>>>>>>> changes >>>>>>>>> > the DRAIN_NODE value is written and when we hit `birdc >>>>>>>>> configure`, but >>>>>>>>> > that doesn't appear to make any difference. If we execute `birdc >>>>>>>>> > configure` *twice*, though, everything's fine: The actual >>>>>>>>> exports are >>>>>>>>> > stopped. That's true without any sleep or break between running >>>>>>>>> > configure as well; literally just `birdc configure` back to back >>>>>>>>> in the >>>>>>>>> > script that manages this. >>>>>>>>> > >>>>>>>>> > We do not see any indication of issues in the `birdc configure` >>>>>>>>> runs or >>>>>>>>> > in BIRD's logs. >>>>>>>>> >>>>>>>>> You are not disclosing the version of BIRD you are using. I >>>>>>>>> vaguely >>>>>>>>> remember that we fixed this kind of bug several years ago where >>>>>>>>> changes >>>>>>>>> in functions sometimes got ignored. >>>>>>>>> >>>>>>>>> Thus if you are not using a recent BIRD version, you are probably >>>>>>>>> hitting that old bug. >>>>>>>>> >>>>>>>>> Maria >>>>>>>>> >>>>>>>>