Hi Olivier, Apology for a delayed response. > -----Original Message----- > From: Olivier Matz [mailto:olivier.m...@6wind.com] > Sent: Tuesday, November 22, 2016 2:55 PM > To: Hemant Agrawal <hemant.agra...@nxp.com> > Cc: dev@dpdk.org; jerin.ja...@caviumnetworks.com; david.h...@intel.com > Subject: Re: [PATCH v3 2/2] mempool: pktmbuf pool default fallback for > mempool ops error > > Hi Hemant, > > Back on this topic, please see some comments below. > > On 11/07/2016 01:30 PM, Hemant Agrawal wrote: > > Hi Olivier, > > > >> -----Original Message----- > >> From: Olivier Matz [mailto:olivier.m...@6wind.com] > >> Sent: Friday, October 14, 2016 5:41 PM > >>> On 9/22/2016 6:42 PM, Hemant Agrawal wrote: > >>>> Hi Olivier > >>>> > >>>> On 9/19/2016 7:27 PM, Olivier Matz wrote: > >>>>> Hi Hemant, > >>>>> > >>>>> On 09/16/2016 06:46 PM, Hemant Agrawal wrote: > >>>>>> In the rte_pktmbuf_pool_create, if the default external mempool > >>>>>> is not available, the implementation can default to "ring_mp_mc", > >>>>>> which is an software implementation. > >>>>>> > >>>>>> Signed-off-by: Hemant Agrawal <hemant.agra...@nxp.com> > >>>>>> --- > >>>>>> Changes in V3: > >>>>>> * adding warning message to say that falling back to default sw > >>>>>> pool > >>>>>> --- > >>>>>> lib/librte_mbuf/rte_mbuf.c | 8 ++++++++ > >>>>>> 1 file changed, 8 insertions(+) > >>>>>> > >>>>>> diff --git a/lib/librte_mbuf/rte_mbuf.c > >>>>>> b/lib/librte_mbuf/rte_mbuf.c index 4846b89..8ab0eb1 100644 > >>>>>> --- a/lib/librte_mbuf/rte_mbuf.c > >>>>>> +++ b/lib/librte_mbuf/rte_mbuf.c > >>>>>> @@ -176,6 +176,14 @@ rte_pktmbuf_pool_create(const char *name, > >>>>>> unsigned n, > >>>>>> > >>>>>> rte_errno = rte_mempool_set_ops_byname(mp, > >>>>>> RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL); > >>>>>> + > >>>>>> + /* on error, try falling back to the software based default > >>>>>> pool */ > >>>>>> + if (rte_errno == -EOPNOTSUPP) { > >>>>>> + RTE_LOG(WARNING, MBUF, "Default HW Mempool not > supported. " > >>>>>> + "falling back to sw mempool \"ring_mp_mc\""); > >>>>>> + rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc", > >>>>>> NULL); > >>>>>> + } > >>>>>> + > >>>>>> if (rte_errno != 0) { > >>>>>> RTE_LOG(ERR, MBUF, "error setting mempool handler\n"); > >>>>>> return NULL; > >>>>>> > >>>>> > >>>>> Without adding a new method ".supported()", the first call to > >>>>> rte_mempool_populate() could return the same error ENOTSUP. In > >>>>> this case, it is still possible to fallback. > >>>>> > >>>> It will be bit late. > >>>> > >>>> On failure, than we have to set the default ops and do a goto > >>>> before rte_pktmbuf_pool_init(mp, &mbp_priv); > >> > >> I still think we can do the job without adding the .supported() method. > >> The following code is just an (untested) example: > >> > >> struct rte_mempool * > >> rte_pktmbuf_pool_create(const char *name, unsigned n, > >> unsigned cache_size, uint16_t priv_size, uint16_t data_room_size, > >> int socket_id) > >> { > >> struct rte_mempool *mp; > >> struct rte_pktmbuf_pool_private mbp_priv; > >> unsigned elt_size; > >> int ret; > >> const char *ops[] = { > >> RTE_MBUF_DEFAULT_MEMPOOL_OPS, "ring_mp_mc", NULL, > >> }; > >> const char **op; > >> > >> if (RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) != priv_size) { > >> RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n", > >> priv_size); > >> rte_errno = EINVAL; > >> return NULL; > >> } > >> elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size + > >> (unsigned)data_room_size; > >> mbp_priv.mbuf_data_room_size = data_room_size; > >> mbp_priv.mbuf_priv_size = priv_size; > >> > >> for (op = &ops[0]; *op != NULL; op++) { > >> mp = rte_mempool_create_empty(name, n, elt_size, cache_size, > >> sizeof(struct rte_pktmbuf_pool_private), socket_id, 0); > >> if (mp == NULL) > >> return NULL; > >> > >> ret = rte_mempool_set_ops_byname(mp, *op, NULL); > >> if (ret != 0) { > >> RTE_LOG(ERR, MBUF, "error setting mempool handler\n"); > >> rte_mempool_free(mp); > >> if (ret == -ENOTSUP) > >> continue; > >> rte_errno = -ret; > >> return NULL; > >> } > >> rte_pktmbuf_pool_init(mp, &mbp_priv); > >> > >> ret = rte_mempool_populate_default(mp); > >> if (ret < 0) { > >> rte_mempool_free(mp); > >> if (ret == -ENOTSUP) > >> continue; > >> rte_errno = -ret; > >> return NULL; > >> } > >> } > >> > >> rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL); > >> > >> return mp; > >> } > >> > >> > > [Hemant] This look fine to me. Please submit a patch for the same. > > > >>>>> I've just submitted an RFC, which I think is quite linked: > >>>>> http://dpdk.org/ml/archives/dev/2016-September/046974.html > >>>>> Assuming a new parameter "mempool_ops" is added to > >>>>> rte_pktmbuf_pool_create(), would it make sense to fallback to > >>>>> "ring_mp_mc"? What about just returning ENOTSUP? The application > >>>>> could do the job and decide which sw fallback to use. > >>>> > >>>> We ran into this issue when trying to run the standard DPDK > >>>> examples > >>>> (l3fwd) in VM. Do you think, is it practical to add fallback > >>>> handling in each of the DPDK examples? > >> > >> OK. What is still unclear for me, is how the software is aware of the > >> different hardware-assisted handlers. Moreover, we could imagine more > >> software handlers, which could be used depending on the use case. > >> > >> I think this choice has to be made by the user or the application: > >> > >> - the application may want to use a specific (sw or hw) handler: in > >> this case, it want to be notified if it fails, instead of having > >> a quiet fallback to ring_mp_mc > >> - if several handlers are available, the application may want to > >> try them in a specific order > >> - maybe some handlers will have some limitations with some > >> configurations or driver? The application could decide to use > >> a different handler according the configuration. > >> > >> So that's why I think this is an application decision. > >> > > [Hemant] We should simplify it: if the application has supplied the > > handler, it > is application's responsibility to take care of failure. Only if the > application want > to use the default handler, the implementation can fallback. The fallback > handler can again be configurable. > > Honestly, I'm not very convinced that having a quiet fallback is the proper > solution: the application won't be notified that the fallback occured. > > I understand your patch solves your use-case, but my fear is that we just > integrate this minimal patch and never do the harder work of solving all the > use- > cases. I think we need to give the tools to the applications to control what > occurs because we also need to solve these issues: > - you have a hardware handler, but you want to use the software > handler > - you have several software handlers, and you (as an administrator) or > the application knows which one is the best to use in your case. > > An alternative (which I don't like either) to solve your issue without > modifying > the mbuf code is to have your own handler fallbacking to ring_mp_mc. > [Hemant] are you suggesting that we also configure a fallback handler in addition to default?
I agree that we need to provide full flexibility to the applications, who are willing to implement the fallback mechanism. > > >>>> Typically when someone is writing a application on host, he need > >>>> not worry non-availability of the hw offloaded mempool. He may also > >>>> want to run the same binary in virtual machine. In VM, it is not > >>>> guaranteed that hw offloaded mempools will be available. > >> > >> Running the same binary is of course a need. But if your VM does not > >> provide the same virtualized hardware than the host, I think the > >> command line option makes sense. > >> > >> AFAIU, on the host, you can use a hw mempool handler or a sw one, and > >> on the guest, only a sw one. So you need an option to select the > >> behavior you want on the host, without recompiling. > >> > >> I understand that modifying all the applications is not a good option > >> either. I'm thinking a a EAL parameter that would allow to configure > >> a library (mbuf in this case). Something like kernel boot options. > >> Example: testpmd -l 2,4 --opt mbuf.handler="ring_mp_mc" -- [app args] > >> > >> I don't know if this is feasible, this has to be discussed first, but > >> what do you think on the principle? The value could also be an > >> ordered list if we want a fallback, and this option could be > >> overriden by the application before creating the mbuf pool. There was > >> some discussions about a kind of dpdk global configuration some time ago. > >> > > [Hemant] I agree that command line option provide a better control in this > case. > > On the flipside, We need to be careful that we do not end up having too many > command line options. > > Yes, we should avoid having too many command line arguments. > I think we should go in the direction of having a sort of key/value database > in > EAL. It could be set by: > - a generic command line argument > - a config file (maybe later) > - the application through an API > > Then, it would be used by: > - dpdk libraries > - the application (maybe) > > This has been discussed some times, the latest was probably: > http://dpdk.org/ml/archives/dev/2016-June/040079.html > > I think it would be a good tool for dpdk configurability, and it would help to > remove some compile-time options and maybe some eal options. > Unfortunately, I don't have the time to work on this at the moment. > > Regards, > Olivier