On 3/10/2016 3:33 AM, Ananyev, Konstantin wrote: > >>>>>>>>>> On 3/8/2016 4:54 PM, Panu Matilainen wrote: >>>>>>>>>>> On 03/04/2016 12:05 PM, Jianfeng Tan wrote: >>>>>>>>>>>> This patch adds option, --avail-cores, to use lcores which are >>>>>>>>>>>> available >>>>>>>>>>>> by calling pthread_getaffinity_np() to narrow down detected cores >>>>>>>>>>>> before >>>>>>>>>>>> parsing coremask (-c), corelist (-l), and coremap (--lcores). >>>>>>>>>>>> >>>>>>>>>>>> Test example: >>>>>>>>>>>> $ taskset 0xc0000 ./examples/helloworld/build/helloworld \ >>>>>>>>>>>> --avail-cores -m 1024 >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com> >>>>>>>>>>>> Acked-by: Neil Horman <nhorman at tuxdriver.com> >>>>>>>>>>> Hmm, to me this sounds like something that should be done always so >>>>>>>>>>> there's no need for an option. Or if there's a chance it might do >>>>>>>>>>> the >>>>>>>>>>> wrong thing in some rare circumstance then perhaps there should be a >>>>>>>>>>> disabler option instead? >>>>>>>>>> Thanks for comments. >>>>>>>>>> >>>>>>>>>> Yes, there's a use case that we cannot handle. >>>>>>>>>> >>>>>>>>>> If we make it as default, DPDK applications may fail to start, when >>>>>>>>>> user >>>>>>>>>> specifies a core in isolcpus and its parent process (say bash) has a >>>>>>>>>> cpuset affinity that excludes isolcpus. Originally, DPDK applications >>>>>>>>>> just blindly do pthread_setaffinity_np() and it always succeeds >>>>>>>>>> because >>>>>>>>>> it always has root privilege to change any cpu affinity. >>>>>>>>>> >>>>>>>>>> Now, if we do the checking in rte_eal_cpu_init(), those lcores will >>>>>>>>>> be >>>>>>>>>> flagged as undetected (in my older implementation) and leads to >>>>>>>>>> failure. >>>>>>>>>> To make it correct, we would always add "taskset mask" (or other >>>>>>>>>> ways) >>>>>>>>>> before DPDK application cmd lines. >>>>>>>>>> >>>>>>>>>> How do you think? >>>>>>>>> I still think it sounds like something that should be done by default >>>>>>>>> and maybe be overridable with some flag, rather than the other way >>>>>>>>> around. Another alternative might be detecting the cores always but if >>>>>>>>> running as root, override but with a warning. >>>>>>>> For your second solution, only root can setaffinity to isolcpus? >>>>>>>> Your first solution seems like a promising way for me. >>>>>>>> >>>>>>>>> But I dont know, just wondering. To look at it from another angle: why >>>>>>>>> would somebody use this new --avail-cores option and in what >>>>>>>>> situation, if things "just work" otherwise anyway? >>>>>>>> For DPDK applications, the most common case to initialize DPDK is like >>>>>>>> this: "$dpdk-app [options for DPDK] -- [options for app]", so users >>>>>>>> need >>>>>>>> to specify which cores to run and how much hugepages are used. Suppose >>>>>>>> we need this dpdk-app to run in a container, users already give those >>>>>>>> information when they build up the cgroup for it to run inside, this >>>>>>>> option or this patch is to make DPDK more smart to discover how much >>>>>>>> resource will be used. Make sense? >>>>>>> But then, all we need might be just a script that would extract this >>>>>>> information from the system >>>>>>> and form a proper cmdline parameter for DPDK? >>>>>> Yes, a script will work. Or to construct (argc, argv) to call >>>>>> rte_eal_init() in the application. But as Neil Horman once suggested, a >>>>>> simple pthread_getaffinity_np() will get all things done. So if it worth >>>>>> a patch here? >>>>> Don't know... >>>>> Personally I would prefer not to put extra logic inside EAL. >>>>> For me - there are too many different options already. >>>> Then how about make it default in rte_eal_cpu_init()? And it is already >>>> known it will bring trouble to those use isolcpus users, they need to >>>> add "taskset [mask]" before starting a DPDK app. >>> As I said - provide a script? >> Yes. But what I want to say is this script is hard to be right, if there >> are different kinds of limitations. (Barely happen though :-) ) > My thought was to keep dpdk code untouched - i.e. let it still blindly > set_pthread_affinity() > based on the input parameters, and in addition provide a script for those who > want to run > in '--avail-cores' mode. > So it could do 'taskset -p $$' and then either form -c parameter list for > the app, > or check existing -c/-l/--lcores parameter and complain if not allowed pcpu > detected. > But ok, might be it is easier and more convenient to have this logic inside > EAL, > then in a separate script. > >>> Same might be for amount of hugepage memory available to the user? >> Ditto. Limitations like hugetlbfs quota, cgroup hugetlb, some are used >> by app themself (more like an artificial argument) ... >>>>> From other side looking at the patch itself: >>>>> You are updating lcore_count and lcore_config[],based on physical cpu >>>>> availability, >>>>> but these days it is not always one-to-one mapping between EAL lcore and >>>>> physical cpu. >>>>> Shouldn't that be taken into account? >>>> I have not see the problem so far, because this work is done before >>>> parsing coremask (-c), corelist (-l), and coremap (--lcores). If a core >>>> is disabled here, it's like it is not detected in rte_eal_cpu_init(). Or >>>> could you please give more hints? >>> I didn't test try changes, so probably I am missing something. >>> Let say iuser allowed to use only cpus 0-3. >>> If he would type with: >>> --avail-cores --lcores='(1-7)@2', >>> then only lcores 1-3 would be started. >>> Again if user would specify '2@(1-7)' it would also be undetected >>> that cpus 4-7 are note available to the user. >>> Is that so? >> After reading the code: >> For case --lcores='(1-7)@2', lcores 1-7 would be started, and bind to >> pcore 2. >> For case --lcores='2@(1-7)', this will fail with "core 4 unavailable". >> >> It's because: >> a. although 1:1 mapping is built-up and flagged as detected if pcore is >> found in sysfs. (ROLE_RTE, cpuset, detected is true) >> b. in the beginning of eal_parse_lcores(), "reset lcore config". >> (ROLE_OFF, cpuset is empty, detected is still true) >> c. pcore cpuset will be checked by convert_to_cpuset using the previous >> "detected" value. > Ok, my bad then - I misunderstood the code. > Thanks for explanation. > So if I get it right now - first inside > lib/librte_eal/common/eal_common_lcore.c > Both lcore_count and lcore_config relate to the pcpus. > Then later, at lib/librte_eal/common/eal_common_options.c > they are overwritten related to lcores information. > Except lcore_config[].detected, which seems kept intact. > Is that correct?
Yes, exactly. And really appreciate that you raise up this question for discussion. > >> I have tested it with the patch. Result aligns above analysis. >> For case --lcores='(1-7)@2': sudo taskset 0xf >> ./examples/helloworld/build/helloworld --avail-cores --lcores='(1-7)@2' >> ... >> hello from core 2 >> hello from core 3 >> hello from core 4 >> hello from core 5 >> hello from core 6 >> hello from core 7 >> hello from core 1 >> >> For case --lcores='2@(1-7)': sudo taskset 0xf >> ./examples/helloworld/build/helloworld --avail-cores --lcores='2@(1-7)' >> ... >> EAL: core 4 unavailable >> EAL: invalid parameter for --lcores >> ... >> >> One thing may worth mention: shall "detected" be maintained in struct >> lcore_config? Maybe we need to maintain an data structure for pcores? > Yes, it might be good to split pcpu and lcores information somehow, > as it is a bit confusing right now. > But I suppose this is a subject for another patch/discussion. Yes, just another topic. Thanks, Jianfeng > Konstantin > >