On 04/02/2020 10:24, Thomas Monjalon wrote: > RED FLAG > > I don't see a lot of reactions, so I summarize the issue. > We need action TODAY! > > API makes think that rte_cryptodev_info_get() cannot return > a value >= 3 (RTE_CRYPTO_AEAD_LIST_END in 19.11). > Current 20.02 returns 3 (RTE_CRYPTO_AEAD_CHACHA20_POLY1305). > The ABI compatibility contract is broken currently. > > There are 3 possible outcomes: > > a) Change the API comments and backport to 19.11.1 > The details are discussed between Ferruh and me. > Either put responsibility on API user (with explicit comment), > or expose ABI extension allowance with a new API max value. > In both cases, this is breaking the implicit contract of 19.11.0. > This option can be chosen only if release and ABI maintainers > vote for it. > > b) Revert Chacha-Poly from 20.02-rc2. > > c) Add versioned function rte_cryptodev_info_get_v1911() > which calls rte_cryptodev_info_get() and filters out > RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability. > So Chacha-Poly capability would be seen and usable only > if compiling with DPDK 20.02. >
Maybe a separate version of rte_cryptodev_get_aead_algo_enum() also needed to handle chacha string differently. > I hope it is clear what are the actions for everybody: > - ABI and release maintainers must say yes or no to the proposal (a) My 2c for a) is No. > - In the meantime, crypto team must send a patch for the proposal (c) > - If (a) and (c) are not possible at the end of today, I will take (b) > > Note: do not say it is too short for (c), as it was possible to work > on such solution since the issue was exposed on last Wednesday. > Could it be reverted today if necessary and re-added later in the release cycle? It seems like something modular that should not invalidate earlier testing. > > 03/02/2020 22:07, Thomas Monjalon: >> 03/02/2020 19:55, Ray Kinsella: >>> On 03/02/2020 17:34, Thomas Monjalon wrote: >>>> 03/02/2020 18:09, Thomas Monjalon: >>>>> 03/02/2020 10:30, Ferruh Yigit: >>>>>> On 2/2/2020 2:41 PM, Ananyev, Konstantin wrote: >>>>>>> 02/02/2020 14:05, Thomas Monjalon: >>>>>>>> 31/01/2020 15:16, Trahe, Fiona: >>>>>>>>> On 1/30/2020 8:18 PM, Thomas Monjalon wrote: >>>>>>>>>> If library give higher value than expected by the application, >>>>>>>>>> if the application uses this value as array index, >>>>>>>>>> there can be an access out of bounds. >>>>>>>>> >>>>>>>>> [Fiona] All asymmetric APIs are experimental so above shouldn't be a >>>>>>>>> problem. >>>>>>>>> But for the same issue with sym crypto below, I believe Ferruh's >>>>>>>>> explanation makes >>>>>>>>> sense and I don't see how there can be an API breakage. >>>>>>>>> So if an application hasn't compiled against the new lib it will be >>>>>>>>> still using the old value >>>>>>>>> which will be within bounds. If it's picking up the higher new value >>>>>>>>> from the lib it must >>>>>>>>> have been compiled against the lib so shouldn't have problems. >>>>>>>> >>>>>>>> You say there is no ABI issue because the application will be >>>>>>>> re-compiled >>>>>>>> for the updated library. Indeed, compilation fixes compatibility >>>>>>>> issues. >>>>>>>> But this is not relevant for ABI compatibility. >>>>>>>> ABI compatibility means we can upgrade the library without recompiling >>>>>>>> the application and it must work. >>>>>>>> You think it is a false positive because you assume the application >>>>>>>> "picks" the new value. I think you miss the case where the new value >>>>>>>> is returned by a function in the upgraded library. >>>>>>>> >>>>>>>>> There are also no structs on the API which contain arrays using this >>>>>>>>> for sizing, so I don't see an opportunity for an appl to have a >>>>>>>>> mismatch in memory addresses. >>>>>>>> >>>>>>>> Let me demonstrate where the API may "use" the new value >>>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 and how it impacts the application. >>>>>>>> >>>>>>>> Once upon a time a DPDK application counting the number of devices >>>>>>>> supporting each AEAD algo (in order to find the best supported algo). >>>>>>>> It is done in an array indexed by algo id: >>>>>>>> int aead_dev_count[RTE_CRYPTO_AEAD_LIST_END]; >>>>>>>> The application is compiled with DPDK 19.11, >>>>>>>> where RTE_CRYPTO_AEAD_LIST_END = 3. >>>>>>>> So the size of the application array aead_dev_count is 3. >>>>>>>> This binary is run with DPDK 20.02, >>>>>>>> where RTE_CRYPTO_AEAD_CHACHA20_POLY1305 = 3. >>>>>>>> When calling rte_cryptodev_info_get() on a device QAT_GEN3, >>>>>>>> rte_cryptodev_info.capabilities.sym.aead.algo is set to >>>>>>>> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 (= 3). >>>>>>>> The application uses this value: >>>>>>>> ++ aead_dev_count[info.capabilities.sym.aead.algo]; >>>>>>>> The application is crashing because of out of bound access. >>>>>>> >>>>>>> I'd say this is an example of bad written app. >>>>>>> It probably should check that returned by library value doesn't >>>>>>> exceed its internal array size. >>>>>> >>>>>> +1 >>>>>> >>>>>> Application should ignore values >= MAX. >>>>> >>>>> Of course, blaming the API user is a lot easier than looking at the API. >>>>> Here the API has RTE_CRYPTO_AEAD_LIST_END which can be understood >>>>> as the max value for the application. >>>>> Value ranges are part of the ABI compatibility contract. >>>>> It seems you expect the application developer to be aware that >>>>> DPDK could return a higher value, so the application should >>>>> check every enum values after calling an API. CRAZY. >>>>> >>>>> When we decide to announce an ABI compatibility and do some marketing, >>>>> everyone is OK. But when we need to really make our ABI compatible, >>>>> I see little or no effort. DISAPPOINTING. >>>>> >>>>>> Do you suggest we don't extend any enum or define between ABI breakage >>>>>> releases >>>>>> to be sure bad written applications not affected? >>>>> >>>>> I suggest we must consider not breaking any assumption made on the API. >>>>> Here we are breaking the enum range because nothing mentions _LIST_END >>>>> is not really the absolute end of the enum. >>>>> The solution is to make the change below in 20.02 + backport in 19.11.1: >>>> >>>> Thinking twice, merging such change before 20.11 is breaking the >>>> ABI assumption based on the API 19.11.0. >>>> I ask the release maintainers (Luca, Kevin, David and me) and >>>> the ABI maintainers (Neil and Ray) to vote for a or b solution: >>>> a) add comment and LIST_MAX as below in 20.02 + 19.11.1 >>> >>> That would still be an ABI breakage though right. >>> >>>> b) wait 20.11 and revert Chacha-Poly from 20.02 >>> >>> Thanks for analysis above Fiona, Ferruh and all. >>> >>> That is a nasty one alright - there is no "good" answer here. >>> I agree with Ferruh's sentiments overall, we should rethink this API for >>> 20.11. >>> Could do without an enumeration? >>> >>> There a c) though right. >>> We could work around the issue by api versioning rte_cryptodev_info_get() >>> and friends. >>> So they only support/acknowledge the existence of Chacha-Poly for >>> applications build against > 20.02. >> >> I agree there is a c) as I proposed in another email: >> http://mails.dpdk.org/archives/dev/2020-February/156919.html >> " >> In this case, the proper solution is to implement >> rte_cryptodev_info_get_v1911() so it filters out >> RTE_CRYPTO_AEAD_CHACHA20_POLY1305 capability. >> With this solution, an application compiled with DPDK 19.11 will keep >> seeing the same range as before, while a 20.02 application could >> see and use ChachaPoly. >> " >> >>> It would be painful I know. >> >> Not so painful in my opinion. >> Just need to call rte_cryptodev_info_get() from >> rte_cryptodev_info_get_v1911() and filter the value >> in the 19.11 range: [0..AES_GCM]. >> >>> It would also mean that Chacha-Poly would only be available to >>> those building against >= 20.02. >> >> Yes exactly. >> >> The addition of comments and LIST_MAX like below are still valid >> to avoid versioning after 20.11. >> >>>>> - _LIST_END >>>>> + _LIST_END, /* an ABI-compatible version may increase this value */ >>>>> + _LIST_MAX = _LIST_END + 42 /* room for ABI-compatible additions */ >>>>> }; >>>>> >>>>> Then *_LIST_END values could be ignored by libabigail with such a change. >> >> In order to avoid ABI check complaining, the best is to completely >> remove LIST_END in DPDK 20.11. >> >> >>>>> If such a patch is not done by tomorrow, I will have to revert >>>>> Chacha-Poly commits before 20.02-rc2, because >>>>> >>>>> 1/ LIST_END, without any comment, means "size of range" >>>>> 2/ we do not blame users for undocumented ABI changes >>>>> 3/ we respect the ABI compatibility contract > > >