Hi, Just.
On 2016-09-06 22:50, Justin Cattle wrote:
I found some time to package using a patch to the latest 1.6.0
release, created from a diff of origin/krt-export-filtr-fix
against
v1.6.0-34-g768d013 [ seems to be the top three commits ].
Yes, the top three commits, exactly!
I hope that's valid. That patch applied without issue, and I
wrapped
it into a debian patch.
I've installed on a few hosts, and I'll report back tomorrow if I
get
a chance.
Great!
Thanks again for the speedy code :)
Here's my debian package patch for reference:
cat bird-1.6.0/debian/patches/001-krt-export-filtr-fix.patch
filter/tree: prefer xmalloc/xfree to malloc/free
rt-table: fix kernel protocol export filter memory bug
Index: bird-1.6.0/filter/tree.c
===================================================================
--- bird-1.6.0.orig/filter/tree.c 2013-11-23 12:29:53.000000000
+0000
+++ bird-1.6.0/filter/tree.c 2016-09-06 21:30:15.435090279 +0100
@@ -82,7 +82,7 @@
if (len <= 1024)
buf = alloca(len * sizeof(struct f_tree *));
else
- buf = malloc(len * sizeof(struct f_tree *));
+ buf = xmalloc(len * sizeof(struct f_tree *));
/* Convert a degenerated tree into an sorted array */
i = 0;
@@ -94,7 +94,7 @@
root = build_tree_rec(buf, 0, len);
if (len > 1024)
- free(buf);
+ xfree(buf);
return root;
}
Index: bird-1.6.0/nest/rt-table.c
===================================================================
--- bird-1.6.0.orig/nest/rt-table.c 2016-04-29 10:13:23.000000000
+0100
+++ bird-1.6.0/nest/rt-table.c 2016-09-06 21:30:15.435090279 +0100
@@ -60,6 +60,21 @@
static inline void rt_schedule_prune(rtable *tab);
+static int rte_update_nest_cnt; /* Nesting counter to allow
recursive
updates */
+
+static inline void
+rte_update_lock(void)
+{
+ rte_update_nest_cnt++;
+}
+
+static inline void
+rte_update_unlock(void)
+{
+ if (!--rte_update_nest_cnt)
+ lp_flush(rte_update_pool);
+}
+
static inline struct ea_list *
make_tmp_attrs(struct rte *rt, struct linpool *pool)
{
@@ -609,10 +624,18 @@
if (!rte_is_valid(best0))
return NULL;
+ /* This non-static function could be called from outside
rt-table.c
file and
+ * we need to ensure that a temporary allocated linpool memory
@rte_update_pool
+ * will be freed */
+ rte_update_lock();
+
best = export_filter(ah, best0, rt_free, tmpa, silent);
if (!best || !rte_is_reachable(best))
+ {
+ rte_update_unlock();
return best;
+ }
for (rt0 = best0->next; rt0; rt0 = rt0->next)
{
@@ -646,6 +669,8 @@
if (best != best0)
*rt_free = best;
+ rte_update_unlock();
+
return best;
}
@@ -1097,21 +1122,6 @@
rte_free_quick(old);
}
-static int rte_update_nest_cnt; /* Nesting counter to allow
recursive
updates */
-
-static inline void
-rte_update_lock(void)
-{
- rte_update_nest_cnt++;
-}
-
-static inline void
-rte_update_unlock(void)
-{
- if (!--rte_update_nest_cnt)
- lp_flush(rte_update_pool);
-}
-
static inline void
rte_hide_dummy_routes(net *net, rte **dummy)
{
Looks fine :)
Cheers,
Just
On 6 September 2016 at 18:03, Justin Cattle <j...@ocado.com> wrote:
Hi Pavel,
Thanks for quick response! I will try that as soon as I can,
hopefully in the next couple of days.
I'll report back as soon as I know.
Cheers,
Just
On 6 September 2016 at 16:46, Pavel Tvrdík <pavel.tvr...@nic.cz>
wrote:
Hi Justin,
On 2016-09-05 16:21, Justin Cattle wrote:
Hi,
A colleague of mine reported a memory usage issue with the bird
daemon
last year, which resulted in a request for a core dump, but we never
followed it up.
I'd like to re-open this discussion and see if anything can be done
to
fix it.
I'll provide some information regarding a production environment,
where the problem is most obvious. But any further details and
diagnostics will have to come from our lab environment.
Please note, in production we mostly run 1.5, but in the lab we are
on
1.6, however we see the same symptoms in both environments on both
versions.
The symptoms are twofold, but potentially related - greater than
expected memory usage reported by the bird daemon itself for the
number of routes, but also massively more memory actually used by
the
daemon process.
When the process is started, we see "normal" memory usage, which
then
seems to grow indefinitely in distinct steps, separated by a period
of
a few hours.
In production, this consumes most of the 32G of memory until the
kernel oom-killer to intervenes.
Production:
BIRD 1.5.0 ready.
bird> show memory
BIRD memory usage
Routing tables: 1405 MB
Route attributes: 84 kB
ROA tables: 192 B
Protocols: 45 kB
Total: 1405 MB
bird> show route count
2273 of 2273 routes for 1142 networks
# ps u -p 3441
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
COMMAND
bird 3441 0.1 55.4 18275124 18241540 ? Ssl Aug10 73:39
/usr/sbin/bird -f -u bird -g bird
..so that's ~1.4G reported by bird, and ~18G actually consumed by
the
process.
Lab:
BIRD 1.6.0 ready.
bird> show mem
BIRD memory usage
Routing tables: 693 MB
Route attributes: 28 kB
ROA tables: 192 B
Protocols: 41 kB
Total: 693 MB
bird> show route count
175 of 175 routes for 91 networks
# ps u -p 29085
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
COMMAND
bird 29085 0.0 14.9 4994852 4915032 ? Ssl Aug05 19:41
/usr/sbin/bird -f -u bird -g bird
Thanks for this report. I successfully simulated this weird
behavior
too. The setting of kernel protocol with some export filter will
cause
memory leak bug. I prepared fixing commits in branch
`krt-export-filtr-fix'
https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
[1] [1]
Can you please download it and confirm, that the bug is fixed?
Best,
Pavel
..so that's ~ 0.7G reported by bird, and ~5G actually consumed by
the
process.
I also attached the bird config from the lab.
Any help is much appreciated!
Thanks.
Cheers,
Just
Notice: This email is confidential and may contain copyright
material
of members of the Ocado Group. Opinions and views expressed in this
message may not necessarily reflect the opinions and views of the
members of the Ocado Group.
If you are not the intended recipient, please notify us immediately
and delete all copies of this message. Please note that it is your
responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and
Fabled is a trading name of Marie Claire Beauty Limited, both
members
of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc
(registered
in England and Wales with number 7098618) and its subsidiary
undertakings (as that expression is defined in the Companies Act
2006)
from time to time. The registered office of Ocado Group plc is
Titan
Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts.
AL10
9NE.
Notice: This email is confidential and may contain copyright
material
of members of the Ocado Group. Opinions and views expressed in this
message may not necessarily reflect the opinions and views of the
members of the Ocado Group.
If you are not the intended recipient, please notify us immediately
and delete all copies of this message. Please note that it is your
responsibility to scan this message for viruses.
Fetch and Sizzle are trading names of Speciality Stores Limited and
Fabled is a trading name of Marie Claire Beauty Limited, both
members
of the Ocado Group.
References to the “Ocado Group” are to Ocado Group plc
(registered
in England and Wales with number 7098618) and its subsidiary
undertakings (as that expression is defined in the Companies Act
2006)
from time to time. The registered office of Ocado Group plc is
Titan
Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts.
AL10
9NE.
Links:
------
[1]
https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
[1]