On 02.10.2019 20:15, Flavio Leitner wrote:
On Wed, 2 Oct 2019 17:50:41 +0000
Shahaf Shuler <shah...@mellanox.com> wrote:
Wednesday, October 2, 2019 3:59 PM, Flavio Leitner:
Obrembski MichalX <michalx.obremb...@intel.com>; Stokes Ian
<ian.sto...@intel.com>
Subject: Re: [dpdk-dev] [PATCH] vhost: add support to large linear
mbufs
Hi Shahaf,
Thanks for looking into this, see my inline comments.
On Wed, 2 Oct 2019 09:00:11 +0000
Shahaf Shuler <shah...@mellanox.com> wrote:
Wednesday, October 2, 2019 11:05 AM, David Marchand:
Subject: Re: [dpdk-dev] [PATCH] vhost: add support to large
linear mbufs
Hello Shahaf,
On Wed, Oct 2, 2019 at 6:46 AM Shahaf Shuler
<shah...@mellanox.com> wrote:
[...]
I am missing some piece here.
Which pool would the PMD take those external buffers from?
The mbuf is always taken from the single mempool associated w/ the
rxq. The buffer for the mbuf may be allocated (in case virtio
payload is bigger than current mbuf size) from DPDK hugepages or
any other system memory and be attached to the mbuf.
You can see example implementation of it in mlx5 PMD (checkout
rte_pktmbuf_attach_extbuf call)
Thanks, I wasn't aware of external buffers.
I see that attaching external buffers of the correct size would be
more efficient in terms of saving memory/avoiding sparsing.
However, we still need to be prepared to the worse case scenario
(all packets 64K), so that doesn't help with the total memory
required.
Am not sure why.
The allocation can be per demand. That is - only when you encounter a
large buffer.
Having buffer allocated in advance will benefit only from removing
the cost of the rte_*malloc. However on such big buffers, and further
more w/ device offloads like TSO, am not sure that is an issue.
Now I see what you're saying. I was thinking we had to reserve the
memory before, like mempool does, then get the buffers as needed.
OK, I can give a try with rte_*malloc and see how it goes.
This way we actually could have a nice API. For example, by
introducing some new flag RTE_VHOST_USER_NO_CHAINED_MBUFS (there
might be better name) which could be passed to driver_register().
On receive, depending on this flag, function will create chained
mbufs or allocate new contiguous memory chunk and attach it as
an external buffer if the data could not be stored in a single
mbuf from the registered memory pool.
Supporting external memory in mbufs will require some additional
work from the OVS side (e.g. better work with ol_flags), but
we'll have to do it anyway for upgrade to DPDK 19.11.
Best regards, Ilya Maximets.
The current patch pushes the decision to the application which
knows better the workload. If more memory is available, it can
optionally use large buffers, otherwise just don't pass that. Or
even decide whether to share the same 64K mempool between multiple
vhost ports or use one mempool per port.
Perhaps I missed something, but managing memory with mempool still
require us to have buffers of 64K regardless if the data consumes
less space. Otherwise the application or the PMD will have to
manage memory itself.
If we let the PMD manages the memory, what happens if a port/queue
is closed and one or more buffers are still in use (switching)? I
don't see how to solve this cleanly.
Closing of the dev should return EBUSY till all buffers are free.
What is the use case of closing a port while still having packet
pending on other port of the switch? And why we cannot wait for them
to complete transmission?
The vswitch gets the request from outside and the assumption is that
the command will succeed. AFAIK, there is no retry mechanism.
Thanks Shahaf!
fbl