route for large read buffer

Jason Baron Fri, 04 Nov 2016 12:07:44 -0700


On 11/04/2016 02:43 PM, Alexander Duyck wrote:

On Fri, Nov 4, 2016 at 7:45 AM, Jason Baron <jba...@akamai.com> wrote:

From: Jason Baron <jba...@akamai.com>

When read() is called on /proc/net/route requesting a size that is one
entry size (128 bytes) less than m->size or greater, the resulting output
has missing and/or duplicate entries. Since m->size is typically PAGE_SIZE,
for a PAGE_SIZE of 4,096 this means that reads requesting more than 3,968
bytes will see bogus output.

For example:

for i in {100..200}; do
        ip route add 192.168.1.$i dev eth0
done
dd if=/proc/net/route of=/tmp/good bs=1024
dd if=/proc/net/route of=/tmp/bad bs=4096

# diff -q /tmp/good /tmp/bad
Files /tmp/good and /tmp/bad differ

I think this has gone unnoticed, since the output of 'netstat -r' and
'route' is generated by reading in 1,024 byte increments and thus not
corrupted. Further, the number of entries in the route table needs to be
sufficiently large in order to trigger the problematic case.

The issue arises because fib_route_get_idx() does not properly handle
the case where pos equals iter->pos. This case only arises when we have
a large read buffer size because we end up re-requesting the last entry
that overflowed m->buf. In the case of a smaller read buffer size,
we don't exceed the size of m->buf, and thus fib_route_get_idx() is called
with pos greater than iter->pos.

Fix by properly handling the iter->pos == pos case.

Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in 
/proc/net/route")
Cc: Andy Whitcroft <a...@canonical.com>
Cc: Alexander Duyck <alexander.h.du...@intel.com>
Signed-off-by: Jason Baron <jba...@akamai.com>
---
 net/ipv4/fib_trie.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 31cef3602585..1017533fc75c 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -2411,12 +2411,17 @@ static struct key_vector *fib_route_get_idx(struct 
fib_route_iter *iter,
                                            loff_t pos)
 {
        struct key_vector *l, **tp = &iter->tnode;
+       loff_t saved_pos = 0;
        t_key key;

        /* use cache location of next-to-find key */
        if (iter->pos > 0 && pos >= iter->pos) {
                pos -= iter->pos;
                key = iter->key;
+               if (pos == 0) {
+                       saved_pos = iter->pos;
+                       key--;
+               }
        } else {
                iter->pos = 0;
                key = 0;
@@ -2436,10 +2441,13 @@ static struct key_vector *fib_route_get_idx(struct 
fib_route_iter *iter,
                        break;
        }

-       if (l)
+       if (l) {
                iter->key = key;        /* remember it */
-       else
+               if (saved_pos)
+                       iter->pos = saved_pos;
+       } else {
                iter->pos = 0;          /* forget it */
+       }

        return l;
 }


This doesn't seem correct to me.  I will have to look through this.
My understanding is that the value of iter->pos is supposed to be the
next position for us to grab, not the last one that was retrieved.  If
we are trying to re-request the last value then we should be falling
back into the else case for this since pos should be one less than
iter->pos.  The problem is the table could change out from under us
which is one of the reasons why we don't want to try and rewind the
key like you are doing here.

- Alex


Hi Alex,

In this case, seq_read() has called m->op->next(), which sets iter->posequal to pos and iter->key to key + 1. However, when we then go tooutput the item associated with key, the 'm->op->next()' call overflows.Thus, we have a situation where iter->pos equals pos, iter->key = key +1, but we have not displayed the item at position 'key' (thus the bug isthat we miss the item at key).

The change I proposed was simply to restart the search from 'key' inthis case. If that item has disappeared, we will output the next one, orif its been replaced we will display its replacement. I think that is

ok?

The bug could also be fixed by changing:

if (iter->pos > 0 && pos >= iter->pos) {

to say:

if (iter->pos > 0 && pos > iter->pos) {

But that restarts the search on every overflow, which could mean everypage size, and that seems suboptimal to me. Like-wise, if we make pos 1less than iter->pos that restarts the search. The idea with this patchis to not force us to redo the entire search on each overflow.


Thanks,

-Jason

Re: [PATCH net] fib_trie: correct /proc/net/route for large read buffer

Reply via email to