Hi Hemant, Back on this topic, please see some comments below.
On 11/07/2016 01:30 PM, Hemant Agrawal wrote: > Hi Olivier, > >> -----Original Message----- >> From: Olivier Matz [mailto:olivier.matz at 6wind.com] >> Sent: Friday, October 14, 2016 5:41 PM >>> On 9/22/2016 6:42 PM, Hemant Agrawal wrote: >>>> Hi Olivier >>>> >>>> On 9/19/2016 7:27 PM, Olivier Matz wrote: >>>>> Hi Hemant, >>>>> >>>>> On 09/16/2016 06:46 PM, Hemant Agrawal wrote: >>>>>> In the rte_pktmbuf_pool_create, if the default external mempool is >>>>>> not available, the implementation can default to "ring_mp_mc", >>>>>> which is an software implementation. >>>>>> >>>>>> Signed-off-by: Hemant Agrawal <hemant.agrawal at nxp.com> >>>>>> --- >>>>>> Changes in V3: >>>>>> * adding warning message to say that falling back to default sw >>>>>> pool >>>>>> --- >>>>>> lib/librte_mbuf/rte_mbuf.c | 8 ++++++++ >>>>>> 1 file changed, 8 insertions(+) >>>>>> >>>>>> diff --git a/lib/librte_mbuf/rte_mbuf.c >>>>>> b/lib/librte_mbuf/rte_mbuf.c index 4846b89..8ab0eb1 100644 >>>>>> --- a/lib/librte_mbuf/rte_mbuf.c >>>>>> +++ b/lib/librte_mbuf/rte_mbuf.c >>>>>> @@ -176,6 +176,14 @@ rte_pktmbuf_pool_create(const char *name, >>>>>> unsigned n, >>>>>> >>>>>> rte_errno = rte_mempool_set_ops_byname(mp, >>>>>> RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL); >>>>>> + >>>>>> + /* on error, try falling back to the software based default >>>>>> pool */ >>>>>> + if (rte_errno == -EOPNOTSUPP) { >>>>>> + RTE_LOG(WARNING, MBUF, "Default HW Mempool not supported. " >>>>>> + "falling back to sw mempool \"ring_mp_mc\""); >>>>>> + rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc", >>>>>> NULL); >>>>>> + } >>>>>> + >>>>>> if (rte_errno != 0) { >>>>>> RTE_LOG(ERR, MBUF, "error setting mempool handler\n"); >>>>>> return NULL; >>>>>> >>>>> >>>>> Without adding a new method ".supported()", the first call to >>>>> rte_mempool_populate() could return the same error ENOTSUP. In this >>>>> case, it is still possible to fallback. >>>>> >>>> It will be bit late. >>>> >>>> On failure, than we have to set the default ops and do a goto before >>>> rte_pktmbuf_pool_init(mp, &mbp_priv); >> >> I still think we can do the job without adding the .supported() method. >> The following code is just an (untested) example: >> >> struct rte_mempool * >> rte_pktmbuf_pool_create(const char *name, unsigned n, >> unsigned cache_size, uint16_t priv_size, uint16_t data_room_size, >> int socket_id) >> { >> struct rte_mempool *mp; >> struct rte_pktmbuf_pool_private mbp_priv; >> unsigned elt_size; >> int ret; >> const char *ops[] = { >> RTE_MBUF_DEFAULT_MEMPOOL_OPS, "ring_mp_mc", NULL, >> }; >> const char **op; >> >> if (RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) != priv_size) { >> RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n", >> priv_size); >> rte_errno = EINVAL; >> return NULL; >> } >> elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size + >> (unsigned)data_room_size; >> mbp_priv.mbuf_data_room_size = data_room_size; >> mbp_priv.mbuf_priv_size = priv_size; >> >> for (op = &ops[0]; *op != NULL; op++) { >> mp = rte_mempool_create_empty(name, n, elt_size, cache_size, >> sizeof(struct rte_pktmbuf_pool_private), socket_id, 0); >> if (mp == NULL) >> return NULL; >> >> ret = rte_mempool_set_ops_byname(mp, *op, NULL); >> if (ret != 0) { >> RTE_LOG(ERR, MBUF, "error setting mempool handler\n"); >> rte_mempool_free(mp); >> if (ret == -ENOTSUP) >> continue; >> rte_errno = -ret; >> return NULL; >> } >> rte_pktmbuf_pool_init(mp, &mbp_priv); >> >> ret = rte_mempool_populate_default(mp); >> if (ret < 0) { >> rte_mempool_free(mp); >> if (ret == -ENOTSUP) >> continue; >> rte_errno = -ret; >> return NULL; >> } >> } >> >> rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL); >> >> return mp; >> } >> >> > [Hemant] This look fine to me. Please submit a patch for the same. > >>>>> I've just submitted an RFC, which I think is quite linked: >>>>> http://dpdk.org/ml/archives/dev/2016-September/046974.html >>>>> Assuming a new parameter "mempool_ops" is added to >>>>> rte_pktmbuf_pool_create(), would it make sense to fallback to >>>>> "ring_mp_mc"? What about just returning ENOTSUP? The application >>>>> could do the job and decide which sw fallback to use. >>>> >>>> We ran into this issue when trying to run the standard DPDK examples >>>> (l3fwd) in VM. Do you think, is it practical to add fallback handling >>>> in each of the DPDK examples? >> >> OK. What is still unclear for me, is how the software is aware of the >> different >> hardware-assisted handlers. Moreover, we could imagine more software >> handlers, which could be used depending on the use case. >> >> I think this choice has to be made by the user or the application: >> >> - the application may want to use a specific (sw or hw) handler: in >> this case, it want to be notified if it fails, instead of having >> a quiet fallback to ring_mp_mc >> - if several handlers are available, the application may want to >> try them in a specific order >> - maybe some handlers will have some limitations with some >> configurations or driver? The application could decide to use >> a different handler according the configuration. >> >> So that's why I think this is an application decision. >> > [Hemant] We should simplify it: if the application has supplied the > handler, it is application's responsibility to take care of failure. Only if > the application want to use the default handler, the implementation can > fallback. The fallback handler can again be configurable. Honestly, I'm not very convinced that having a quiet fallback is the proper solution: the application won't be notified that the fallback occured. I understand your patch solves your use-case, but my fear is that we just integrate this minimal patch and never do the harder work of solving all the use-cases. I think we need to give the tools to the applications to control what occurs because we also need to solve these issues: - you have a hardware handler, but you want to use the software handler - you have several software handlers, and you (as an administrator) or the application knows which one is the best to use in your case. An alternative (which I don't like either) to solve your issue without modifying the mbuf code is to have your own handler fallbacking to ring_mp_mc. >>>> Typically when someone is writing a application on host, he need not >>>> worry non-availability of the hw offloaded mempool. He may also want >>>> to run the same binary in virtual machine. In VM, it is not >>>> guaranteed that hw offloaded mempools will be available. >> >> Running the same binary is of course a need. But if your VM does not provide >> the same virtualized hardware than the host, I think the command line option >> makes sense. >> >> AFAIU, on the host, you can use a hw mempool handler or a sw one, and on the >> guest, only a sw one. So you need an option to select the behavior you want >> on >> the host, without recompiling. >> >> I understand that modifying all the applications is not a good option >> either. I'm >> thinking a a EAL parameter that would allow to configure a library (mbuf in >> this >> case). Something like kernel boot options. >> Example: testpmd -l 2,4 --opt mbuf.handler="ring_mp_mc" -- [app args] >> >> I don't know if this is feasible, this has to be discussed first, but what >> do you >> think on the principle? The value could also be an ordered list if we want a >> fallback, and this option could be overriden by the application before >> creating >> the mbuf pool. There was some discussions about a kind of dpdk global >> configuration some time ago. >> > [Hemant] I agree that command line option provide a better control in this > case. > On the flipside, We need to be careful that we do not end up having too many > command line options. Yes, we should avoid having too many command line arguments. I think we should go in the direction of having a sort of key/value database in EAL. It could be set by: - a generic command line argument - a config file (maybe later) - the application through an API Then, it would be used by: - dpdk libraries - the application (maybe) This has been discussed some times, the latest was probably: http://dpdk.org/ml/archives/dev/2016-June/040079.html I think it would be a good tool for dpdk configurability, and it would help to remove some compile-time options and maybe some eal options. Unfortunately, I don't have the time to work on this at the moment. Regards, Olivier