On Wed, May 21, 2025 at 10:28 AM Likhitha Korrapati <likhi...@linux.ibm.com> wrote: > > Hi Ian, > > On 5/21/25 21:15, Ian Rogers wrote: > > On Wed, May 21, 2025 at 6:03 AM Likhitha Korrapati > > <likhi...@linux.ibm.com> wrote: > >> > >> Hi Arnaldo, > >> > >> On 5/14/25 02:43, Arnaldo Carvalho de Melo wrote: > >>> On Fri, May 02, 2025 at 01:14:32PM +0530, Mukesh Kumar Chaurasiya wrote: > >>>> On Fri, Apr 25, 2025 at 02:46:43PM -0300, Arnaldo Carvalho de Melo wrote: > >>>>> Maybe that max() call in perf_cpu_map__intersect() somehow makes the > >>>>> compiler happy. > >>> > >>>>> And in perf_cpu_map__alloc() all calls seems to validate it. > >>> > >>>>> Like: > >>> > >>>>> +++ b/tools/lib/perf/cpumap.c > >>>>> @@ -411,7 +411,7 @@ int perf_cpu_map__merge(struct perf_cpu_map **orig, > >>>>> struct perf_cpu_map *other) > >>>>> } > >>>>> > >>>>> tmp_len = __perf_cpu_map__nr(*orig) + > >>>>> __perf_cpu_map__nr(other); > >>>>> - tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu)); > >>>>> + tmp_cpus = calloc(tmp_len, sizeof(struct perf_cpu)); > >>>>> if (!tmp_cpus) > >>>>> return -ENOMEM; > >>> > >>>>> ⬢ [acme@toolbx perf-tools-next]$ > >>> > >>>>> And better, do the max size that the compiler is trying to help us > >>>>> catch? > >>> > >>>> Isn't it better to use perf_cpu_map__nr. That should fix this problem. > >>> > >>> Maybe, have you tried it? > >> > >> I have tried this method and it works. > >> > >> --- a/tools/lib/perf/cpumap.c > >> +++ b/tools/lib/perf/cpumap.c > >> @@ -410,7 +410,7 @@ int perf_cpu_map__merge(struct perf_cpu_map **orig, > >> struct perf_cpu_map *other) > >> return 0; > >> } > >> > >> - tmp_len = max(__perf_cpu_map__nr(*orig), > >> __perf_cpu_map__nr(other)); > >> + tmp_len = perf_cpu_map__nr(*orig) + perf_cpu_map__nr(other); > >> tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu)); > >> if (!tmp_cpus) > >> return -ENOMEM; > >> > >> I will send a V2 with this change if this looks good. > > > > How is this different from the existing code: > > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/cpumap.c?h=perf-tools-next#n423 > > ``` > > tmp_len = __perf_cpu_map__nr(*orig) + __perf_cpu_map__nr(other); > > tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu)); > > if (!tmp_cpus) > > return -ENOMEM; > > ``` > > > > Thanks, > > Ian > > I gave the wrong diff. Here is the corrected diff. > > --- a/tools/lib/perf/cpumap.c > +++ b/tools/lib/perf/cpumap.c > @@ -410,7 +410,7 @@ int perf_cpu_map__merge(struct perf_cpu_map **orig, > struct perf_cpu_map *other) > return 0; > } > > - tmp_len = __perf_cpu_map__nr(*orig) + __perf_cpu_map__nr(other); > + tmp_len = perf_cpu_map__nr(*orig) + perf_cpu_map__nr(other); > tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu)); > if (!tmp_cpus) > return -ENOMEM; > > > I am using perf_cpu_map__nr instead of __perf_cpu_map__nr.
Ok, why is that a fix? The function declarations are near identical and perf_cpu_map__nr is implemented in terms of __perf_cpu_map__nr: ``` static int __perf_cpu_map__nr(const struct perf_cpu_map *cpus) { return RC_CHK_ACCESS(cpus)->nr; } int perf_cpu_map__nr(const struct perf_cpu_map *cpus) { return cpus ? __perf_cpu_map__nr(cpus) : 1; } ``` My guess is that being static allows all of the code to be analyzed in the compilation unit and thereby create the warning/error, your change is just defeating the analysis. The analysis could easily kick in again for Link Time Optimization. I'd prefer making these `__nr` functions return `unsigned int` or size_t over changes like this. Thanks, Ian > Thanks, > Likhitha. > > > > >> Thanks > >> Likhitha. > >> > >>> > >>>> One question I have, in perf_cpu_map__nr, the function is returning > >>>> 1 in case *cpus is NULL. Is it ok to do that? wouldn't it cause problems? > >>> > >>> Indeed this better be documented, as by just looking at: > >>> > >>> int perf_cpu_map__nr(const struct perf_cpu_map *cpus) > >>> { > >>> return cpus ? __perf_cpu_map__nr(cpus) : 1; > >>> } > >>> > >>> It really doesn't make much sense to say that a NULL map has one entry. > >>> > >>> But the next functions are: > >>> > >>> bool perf_cpu_map__has_any_cpu_or_is_empty(const struct perf_cpu_map *map) > >>> { > >>> return map ? __perf_cpu_map__cpu(map, 0).cpu == -1 : true; > >>> } > >>> > >>> bool perf_cpu_map__is_any_cpu_or_is_empty(const struct perf_cpu_map *map) > >>> { > >>> if (!map) > >>> return true; > >>> > >>> return __perf_cpu_map__nr(map) == 1 && __perf_cpu_map__cpu(map, > >>> 0).cpu == -1; > >>> } > >>> > >>> bool perf_cpu_map__is_empty(const struct perf_cpu_map *map) > >>> { > >>> return map == NULL; > >>> } > >>> > >>> So it seems that a NULL cpu map means "any/all CPU) and a map with just > >>> one entry would have as its content "-1" that would mean "any/all CPU". > >>> > >>> Ian did work on trying to simplify/clarify this, so maybe he can chime > >>> in :-) > >>> > >>> - Arnaldo > >> > > >