On Tue, May 16, 2023 at 10:53 AM Eelco Chaudron <echau...@redhat.com> wrote:
> On 10 May 2023, at 13:44, David Marchand wrote:

[snip]

> >> @@ -846,6 +848,14 @@ vhost_user_socket_mem_free(struct vhost_user_socket 
> >> *vsocket)
> >>                 vsocket->path = NULL;
> >>         }
> >>
> >> +       if (vsocket && vsocket->alloc_notify_ops) {
> >> +#pragma GCC diagnostic push
> >> +#pragma GCC diagnostic ignored "-Wcast-qual"
> >> +               free((struct rte_vhost_device_ops *)vsocket->notify_ops);
> >> +#pragma GCC diagnostic pop
> >> +               vsocket->notify_ops = NULL;
> >> +       }
> >
> > Rather than select the behavior based on a boolean (and here force the
> > compiler to close its eyes), I would instead add a non const pointer
> > to ops (let's say alloc_notify_ops) in vhost_user_socket.
> > The code can then unconditionnally call free(vsocket->alloc_notify_ops);
>
> Good idea, I will make the change in v3.

Feel free to use a better name for this field :-).

>
> >> +
> >>         if (vsocket) {
> >>                 free(vsocket);
> >>                 vsocket = NULL;

[snip]

> >> +       /*
> >> +        * Although the ops structure is a const structure, we do need to
> >> +        * override the guest_notify operation. This is because with the
> >> +        * previous APIs it was "reserved" and if any garbage value was 
> >> passed,
> >> +        * it could crash the application.
> >> +        */
> >> +       if (ops && !ops->guest_notify) {
> >
> > Hum, as described in the comment above, I don't think we should look
> > at ops->guest_notify value at all.
> > Checking ops != NULL should be enough.
>
> Not sure I get you here. If the guest_notify passed by the user is NULL, it 
> means the previously ‘reserved[1]’ field is NULL, so we do not need to use a 
> new structure.
>
> I guess your comment would be true if we would introduce a new field in the 
> data structure, not replacing a reserved one.

Hum, I don't understand my comment either o_O'.
Too many days off... or maybe my evil twin took over the keyboard.


>
> >> +               struct rte_vhost_device_ops *new_ops;
> >> +
> >> +               new_ops = malloc(sizeof(*new_ops));
> >
> > Strictly speaking, we lose the numa affinity of "ops" by calling malloc.
> > I am unclear of the impact though.
>
> Don’t think there is a portable API that we can use to determine the NUMA for 
> the ops memory and then allocate this on the same numa?
>
> Any thoughts or ideas on how to solve this? I hope most people will memset() 
> the ops structure and the reserved[1] part is zero, but it might be a problem 
> in the future if more extensions get added.

Determinining current numa is doable, via 'ops'
get_mempolicy(MPOL_F_NODE | MPOL_F_ADDR), like what is done for vq in
numa_realloc().
The problem is how to allocate on this numa with the libc allocator
for which I have no idea...
We could go with the dpdk allocator (again, like numa_realloc()).


In practice, the passed ops will be probably from a const variable in
the program .data section (for which I think fields are set to 0
unless explicitly initialised), or a memset() will be called for a
dynamic allocation from good citizens.
So we can probably live with the current proposal.
Plus, this is only for one release, since in 23.11 with the ABI bump,
we will drop this compat code.

Maxime, Chenbo, what do you think?


[snip]

> >
> > But putting indentation aside, is this change equivalent?
> > -               if ((vhost_need_event(vhost_used_event(vq), new, old) &&
> > -                                       (vq->callfd >= 0)) ||
> > -                               unlikely(!signalled_used_valid)) {
> > +               if ((vhost_need_event(vhost_used_event(vq), new, old) ||
> > +                               unlikely(!signalled_used_valid)) &&
> > +                               vq->callfd >= 0) {
>
> They are not equal, but in the past eventfd_write() should also not have been 
> called with callfd < 0, guess this was an existing bug ;)

I think this should be a separate fix.

>
> >> +                       vhost_vring_inject_irq(dev, vq);


-- 
David Marchand

Reply via email to