On Fri, 21 Feb 2020 09:48:58 -0500 Aaron Conole <acon...@redhat.com> wrote:
> David Marchand <david.march...@redhat.com> writes: > > > On Fri, Feb 21, 2020 at 9:19 AM Song, Keesang <keesang.s...@amd.com> wrote: > > > >> > >> [AMD Official Use Only - Internal Distribution Only] > > > > Please, get this header removed. > > This is a public mailing list. > > > > > >> Thanks Thomas for bringing this up. > >> I consider this is not a new feature, but rather a fix to address > >> the issue with statically assigned maximum lcore limit on > >> high-density CPU platform such as AMD Epyc. > >> As I see a lot of DPDK adopters are still using LTS 18.11 & 19.11, > >> and they have 1~2 yrs of lifetime left, we like to backport this to > >> LTS 18.11 & 19.11 at least. > > > > It is not a fix. > > > > The use of static arrays is a design choice that goes back to the > > early days in dpdk. > > The addition of --lcores came in after this, but it was introduced for > > a different use case than placing lcores on any physical core. > > And there was no claim that a core > RTE_MAX_LCORE would be usable. > > > > > > When testing on a new hardware, it is normal to observe some limitations. > > Running DPDK on those platforms should be possible: "should be" > > because I do not have access to this hardware and saw neither tests > > reports nor performance numbers. > > Before this patch, the limitation is that on Epyc, cores > > > RTE_MAX_LCORE are not usable. > > > > > > Now, this change is quite constrained. > > If we backport it, I don't expect issues in the main dpdk components > > (based on code review and ovs tests with a RTE_MAX_LCORE set to 16 on > > a 24 cores system). > > There might be issues in some examples or not widely used library > > which uses a physical core id instead of a lcore id. > > > > > > This is the same recurring question "do we allow new features in a > > stable branch?". > > Usually, the answer is 'no'. But we do allow some "new" things to be > backported (pci ids, etc) that might be required to enable older > functionality. Additionally, I'm sure if some feature were required to > mitigate a CVE, we'd rather favor backporting it. > > I guess we could pose a litmus test: > > 1. Is the problem this feature solves so widespread that it needs to > be addressed ASAP? > 2. Is there a known workaround to the problem this is solving? > 3. How intrusive is the feature? > 4. Is it shown to be stable in the mainline (number of fixes, testing, > etc)? > 5. Is it constrained enough that we know we can support it with even > higher priority than other things? > > Probably other questions that will need to be asked. > > And even in that list of question, I'm not sure I'd be able to advocate > backporting this in the upstream branches - it hasn't had much testing. > It's unstable. It's "difficult" to use. It is not widespread that > people have so many cores. The workaround is much simpler than > supporting this (recompile). > > > > > -- > > David Marchand > RTE_MAX_LCORES is exposed in API/ABI to application. Many applications use that to size internal data structures. Having rte_lcore_id() potentially return a larger value would cause out of bounds access (and crash) in that application.