Thanks for reviewing this patch Ira,
Ira Weiny <ira.we...@intel.com> writes: > On Thu, Sep 10, 2020 at 02:52:12PM +0530, Vaibhav Jain wrote: >> A warning is reported by the kernel in case perf_stats_show() returns >> an error code. The warning is of the form below: >> >> papr_scm ibm,persistent-memory:ibm,pmemory@44100001: >> Failed to query performance stats, Err:-10 >> dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count >> fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count >> >> On investigation it looks like that the compiler is silently truncating the >> return value of drc_pmem_query_stats() from 'long' to 'int', since the >> variable used to store the return code 'rc' is an 'int'. This >> truncated value is then returned back as a 'ssize_t' back from >> perf_stats_show() to 'dev_attr_show()' which thinks of it as a large >> unsigned number and triggers this warning.. >> >> To fix this we update the type of variable 'rc' from 'int' to >> 'ssize_t' that prevents the compiler from truncating the return value >> of drc_pmem_query_stats() and returning correct signed value back from >> perf_stats_show(). >> >> Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance >> stats from PHYP') >> Signed-off-by: Vaibhav Jain <vaib...@linux.ibm.com> >> --- >> arch/powerpc/platforms/pseries/papr_scm.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c >> b/arch/powerpc/platforms/pseries/papr_scm.c >> index a88a707a608aa..9f00b61676ab9 100644 >> --- a/arch/powerpc/platforms/pseries/papr_scm.c >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c >> @@ -785,7 +785,8 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor >> *nd_desc, >> static ssize_t perf_stats_show(struct device *dev, >> struct device_attribute *attr, char *buf) >> { >> - int index, rc; >> + int index; >> + ssize_t rc; > > I'm not sure this is really fixing everything here. The issue is with the statement in perf_stats_show(): 'return rc ? rc : seq_buf_used(&s);' The function seq_buf_used() returns an 'unsigned int' and with 'rc' typed as 'int', forces a promotion of the expression to 'unsigned int' which causes a loss of signedness of 'rc' and compiler silently assigns this unsigned value to the function return typed as 'signed long'. Making 'rc', a 'signed long' forces a promotion of the expresion to 'signed long' which preserves the signedness of 'rc' and will also be compatible with the function return type. > > drc_pmem_query_stats() can return negative errno's. Why are those not checked > somewhere in perf_stats_show()? > For the specific invocation 'drc_pmem_query_stats(p, stats, 0)' we only expect return value 'rc <=0' with '0' indicating a successful fetch of nvdimm performance stats from hypervisor. Hence there are no explicit checks for negative error codes in the functions as all return values !=0 indicate an error. > It seems like all this fix is handling is a > 0 return value: 'ret[0]' from > line 289 in papr_scm.c... Or something? No, in case the arg 'num_stats' is '0' and 'buff_stats != NULL' the variable 'size' is assigned a non-zero value hence that specific branch you mentioned is never taken. Instead in case of success drc_pmem_query_stats() return '0' and in case of an error a negative error code is returned. > > Worse yet drc_pmem_query_stats() is returning ssize_t which is a signed value. > Therefore, it should not be returning -errno. I'm surprised the static > checkers did not catch that. Didnt quite get the assertion here. The function is marked to return 'ssize_t' because we can return both +ve for success and -ve values to indicate errors. > > I believe I caught similar errors with a patch series before which did not pay > attention to variable types. > > Please audit this code for these types of errors and ensure you are really > doing the correct thing when using the sysfs interface. I'm pretty sure bad > things will eventually happen (if they are not already) if you return some > really big number to the sysfs core from *_show(). I think this problem is different compared to what you had previously pointed to. The values returned from drc_pmem_query_stats() can be stored in an 'int' variable too, however it was the silent promotion of a signed type to unsigned type was what caused this specific issue. -- Cheers ~ Vaibhav