from:"ira . weiny"

Re: [PATCH] infiniband-diags/saquery.c: switchinfo support added

2013-02-24 Thread Ira Weiny

On Sun, 24 Feb 2013 10:22:47 +0200
"Husam Kahalah"  wrote:

> Thank you for your time, I have the following notes:
> I plan to add support for more records other than switchInfoRecord like 
> smInfo, multipath, nodeInfo

That would be great!

> I plan also to add component filter to those records attributes in the same 
> way that pathInfo and MCR were implemented, this is the reason why I started 
> adding attributes to query_params structure 
> I'am not talking about RID(record identifier) attributes only, but also 
> about other attributes that may differ between records.
> If it is available to add to query_params structure in that way let me know 
> , otherwise I will add just RID attributes in the suggested way.
> 

I am not against adding fields to query_params.  However, we have a standard 
way to specify the RID components like LID which should be followed.

If you need new components to be specified for new attributes it is fine to add 
them to query_params.  That said, any components which already have options 
should be overloaded; for example MultiPathRecord.SL should use the --sl option.

Does this make sense?

Thanks,
Ira

> 
> -----Original Message-
> From: Ira Weiny 
> To: Ira Weiny 
> Cc: "Husam Kahalah" , linux-r...@vger.kernel.org, 
> linux-kernel@vger.kernel.org
> Date: Thu, 21 Feb 2013 10:33:44 -0800
> Subject: Re: [PATCH] infiniband-diags/saquery.c: switchinfo support added
> 
> On Thu, 21 Feb 2013 10:09:00 -0800
> Ira Weiny  wrote:
> 
> [snip]
> 
> > > + ib_net16_t swir_lid;
> >
> > We don't need this field or lid option.  Other records simply have an 
> optional parameter to specify the lid.
> >
> > The usage should be:
> >
> > SwitchRecord [lid]
> >
> 
> Sorry I meant "SwitchInfoRecord" or SWIR.
> 
> Ira
> 
> [snip]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] infiniband-diags/saquery.c: switchinfo support added

2013-02-28 Thread Ira Weiny

Your mailer line wrapped the patch.  Could you resend.

Also, could you add this to doc/rst/saquery.8.in.rst

Thanks,
Ira

On Thu, 28 Feb 2013 08:11:38 +0200
"Husam Kahalah"  wrote:

> From d61f2580b829fa2fc56aea15bd98ef3a607d9aba Mon Sep 17 00:00:00 2001
> 
> From: Husam kahalah 
> Date: Mon, 25 Feb 2013 10:53:25 +0200
> Subject: [PATCH]  infiniband-diags/saquery.c: switchinfo support added
> 
> ---
>  src/saquery.c |   55 
> +++
>  1 files changed, 55 insertions(+), 0 deletions(-)
> 
> diff --git a/src/saquery.c b/src/saquery.c
> index d31d77d..6390bcd 100644
> --- a/src/saquery.c
> +++ b/src/saquery.c
> @@ -481,6 +481,41 @@ static void dump_service_record(void *data)
>  cl_ntoh64(p_sr->service_data64[1]));
>  }
> 
> +static void dump_switch_info_record(void *data)
> +{
> +  ib_switch_info_record_t *p_sir = data;
> +
> +  printf("SwitchInfoRecord dump:\n"
> +"\t\tRID\n"
> +"\t\tlid.%u\n"
> +"\t\tSwitchInfo dump:\n"
> +"\t\tlin_cap.0x%X\n"
> +"\t\trand_cap0x%X\n"
> +"\t\tmcast_cap...0x%X\n"
> +"\t\tlin_top.0x%X\n"
> +"\t\tdef_port%u\n"
> +"\t\tdef_mcast_pri_port..%u\n"
> +"\t\tdef_mcast_not_port..%u\n"
> +"\t\tlife_state..%u\n"
> +"\t\tlids_per_port...0x%X\n"
> +"\t\tenforce_cap.0x%X\n"
> +"\t\tflags...%u\n"
> +"\t\tmcast_top...0x%X\n",
> +cl_ntoh16(p_sir->lid),
> +cl_ntoh16(p_sir->switch_info.lin_cap),
> +cl_ntoh16(p_sir->switch_info.rand_cap),
> +cl_ntoh16(p_sir->switch_info.mcast_cap),
> +cl_ntoh16(p_sir->switch_info.lin_top),
> +p_sir->switch_info.def_port,
> +p_sir->switch_info.def_mcast_pri_port,
> +p_sir->switch_info.def_mcast_not_port,
> +p_sir->switch_info.life_state,
> +cl_ntoh16(p_sir->switch_info.lids_per_port),
> +cl_ntoh16(p_sir->switch_info.enforce_cap),
> +p_sir->switch_info.flags,
> +cl_ntoh16(p_sir->switch_info.mcast_top));
> +}
> +
>  static void dump_inform_info_record(void *data)
>  {
>   char gid_str[INET6_ADDRSTRLEN];
> @@ -1150,6 +1185,24 @@ static int query_service_records(const struct 
> query_cmd *q, struct sa_handle * h
>dump_service_record);
>  }
> 
> +static int query_switchinfo_records(const struct query_cmd *q,
> + struct sa_handle * h, struct query_params *p,
> + int argc, char *argv[])
> +{
> +  ib_switch_info_record_t swir;
> + ib_net64_t comp_mask = 0;
> + int lid = 0;
> +
> + if (argc > 0)
> +  parse_lid_and_ports(h, argv[0], &lid, NULL, NULL);
> +
> +  memset(&swir, 0, sizeof(swir));
> +  CHECK_AND_SET_VAL(lid, 16, 0, swir.lid, SWIR, LID);
> +
> + return get_and_dump_any_records(h, IB_SA_ATTR_SWITCHINFORECORD, 0, 
> comp_mask,
> + &swir, sizeof(swir), dump_switch_info_record);
> +}
> +
>  static int query_inform_info_records(const struct query_cmd *q,
>   struct sa_handle * h, struct query_params *p,
>   int argc, char *argv[])
> @@ -1342,6 +1395,8 @@ static const struct query_cmd query_cmds[] = {
>"[[mlid]/[position]/[block]]", query_mft_records},
>{"GUIDInfoRecord", "GIR", IB_SA_ATTR_GUIDINFORECORD,
>"[[lid]/[block]]", query_guidinfo_records},
> +  {"SwitchInfoRecord", "SWIR", IB_SA_ATTR_SWITCHINFORECORD,
> +  "[lid]", query_switchinfo_records},
>{0}
>  };
> 
> -- 
> 1.7.1


-- 
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] infiniband-diags/saquery.c: switchinfo support added

2013-02-21 Thread Ira Weiny

tic int process_opt(void *context, int ch, char 
> *optarg)
>   case 'X':
>p->proxy_join = strtoul(optarg, NULL, 0);
>break;
> + case 22 :
> +  p->swir_lid = (ib_net16_t) strtoul(optarg, NULL, 0);
> + break;

As above no need for the additional option.

>default:
>   return -1;
>   }
> @@ -1584,6 +1641,7 @@ int main(int argc, char **argv)
>{"x", 'x', 0, NULL, "get LinkRecord info"},
>{"c", 'c', 0, NULL, "get the SA's class port info"},
>{"S", 'S', 0, NULL, "get ServiceRecord info"},
> +  {"W", 'W', 0, NULL, "get SwitchInfoRecord"},
>{"I", 'I', 0, NULL, "get InformInfoRecord (subscription) info"},
>{"list", 'D', 0, NULL, "the node desc of the CA's"},
>{"src-to-dst", 1, 1, "", "get a PathRecord for"
> @@ -1633,6 +1691,9 @@ int main(int argc, char **argv)
>{"scope", 21, 1, NULL, "Scope (MCMemberRecord)"},
>{"join_state", 'J', 1, NULL, "Join state (MCMemberRecord)"},
>{"proxy_join", 'X', 1, NULL, "Proxy join (MCMemberRecord)"},
> +  {"swir_lid", 22, 1, NULL,
> +  "switchInfo_lid (SwitchInfoRecord)"
> + " LID of switch port 0"},

As above no need for the additional option.

Thanks,
Ira

>{0}
>};
>  
> -- 
> 1.7.1
>  


-- 
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] infiniband-diags/saquery.c: switchinfo support added

2013-02-21 Thread Ira Weiny

On Thu, 21 Feb 2013 10:09:00 -0800
Ira Weiny  wrote:

[snip]

> > + ib_net16_t swir_lid;
> 
> We don't need this field or lid option.  Other records simply have an 
> optional parameter to specify the lid.
> 
> The usage should be:
> 
> SwitchRecord [lid]
> 

Sorry I meant "SwitchInfoRecord" or SWIR.

Ira

[snip]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] ACPI: NFIT: Use cleanup.h helpers instead of devm_*()

2023-12-21 Thread Ira Weiny

Rafael J. Wysocki wrote:
> On Wed, Oct 18, 2023 at 6:28 AM Dan Williams  wrote:
> >
> > Michal Wilczynski wrote:
> > > The new cleanup.h facilities that arrived in v6.5-rc1 can replace the
> > > the usage of devm semantics in acpi_nfit_init_interleave_set(). That
> > > routine appears to only be using devm to avoid goto statements. The
> > > new __free() annotation at variable declaration time can achieve the same
> > > effect more efficiently.
> > >
> > > There is no end user visible side effects of this patch, I was
> > > motivated to send this cleanup to practice using the new helpers.
> > >
> > > Suggested-by: Dave Jiang 
> > > Suggested-by: Andy Shevchenko 
> > > Reviewed-by: Dave Jiang 
> > > Reviewed-by: Andy Shevchenko 
> > > Signed-off-by: Michal Wilczynski 
> > > ---
> > >
> > > Dan, would you like me to give you credit for the changelog changes
> > > with Co-developed-by tag ?
> >
> > Nope, this looks good to me, thanks for fixing it up.
> >
> > Reviewed-by: Dan Williams 
> 
> Are you going to apply it too, or should I take it?

I'm prepping the nvdimm tree for 6.8.  I will take it.

Ira

Re: [PATCH 1/3] nvdimm/btt: fix btt_blk_cleanup() kernel-doc

2023-12-21 Thread Ira Weiny

Randy Dunlap wrote:
> Correct the function parameters to prevent kernel-doc warnings:
> 
> btt.c:1567: warning: Function parameter or member 'nd_region' not described 
> in 'btt_init'
> btt.c:1567: warning: Excess function parameter 'maxlane' description in 
> 'btt_init'
> 
> Signed-off-by: Randy Dunlap 
> Cc: Vishal Verma 
> Cc: Dan Williams 
> Cc: Dave Jiang 
> Cc: Ira Weiny 

Reviewed-by: Ira Weiny 

> Cc: nvd...@lists.linux.dev
> ---
>  drivers/nvdimm/btt.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff -- a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -1550,7 +1550,7 @@ static void btt_blk_cleanup(struct btt *
>   * @rawsize: raw size in bytes of the backing device
>   * @lbasize: lba size of the backing device
>   * @uuid:A uuid for the backing device - this is stored on media
> - * @maxlane: maximum number of parallel requests the device can handle
> + * @nd_region:   &struct nd_region for the REGION device
>   *
>   * Initialize a Block Translation Table on a backing device to provide
>   * single sector power fail atomicity.

Re: [PATCH 2/3] nvdimm/dimm_devs: fix kernel-doc for function params

2023-12-21 Thread Ira Weiny

Randy Dunlap wrote:
> Adjust kernel-doc notation to prevent warnings when using -Wall.
> 
> dimm_devs.c:59: warning: Function parameter or member 'ndd' not described in 
> 'nvdimm_init_nsarea'
> dimm_devs.c:59: warning: Excess function parameter 'nvdimm' description in 
> 'nvdimm_init_nsarea'
> dimm_devs.c:59: warning: No description found for return value of 
> 'nvdimm_init_nsarea'
> dimm_devs.c:728: warning: No description found for return value of 
> 'nd_pmem_max_contiguous_dpa'
> dimm_devs.c:773: warning: No description found for return value of 
> 'nd_pmem_available_dpa'
> dimm_devs.c:844: warning: Function parameter or member 'ndd' not described in 
> 'nvdimm_allocated_dpa'
> dimm_devs.c:844: warning: Excess function parameter 'nvdimm' description in 
> 'nvdimm_allocated_dpa'
> dimm_devs.c:844: warning: No description found for return value of 
> 'nvdimm_allocated_dpa'
> 
> Signed-off-by: Randy Dunlap 
> Cc: Dan Williams 
> Cc: Vishal Verma 
> Cc: Dave Jiang 
> Cc: Ira Weiny 
> Cc: nvd...@lists.linux.dev
> ---
>  drivers/nvdimm/dimm_devs.c |   14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff -- a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
> --- a/drivers/nvdimm/dimm_devs.c
> +++ b/drivers/nvdimm/dimm_devs.c
> @@ -53,7 +53,10 @@ static int validate_dimm(struct nvdimm_d
>  
>  /**
>   * nvdimm_init_nsarea - determine the geometry of a dimm's namespace area
> - * @nvdimm: dimm to initialize
> + * @ndd: dimm to initialize
> + *
> + * Returns: %0 if the area is already valid, -errno on error,

This is good...

> ...otherwise an
> + * ND_CMD_* status code.

I don't see where any of the ->ndctl() calls return an ND_CMD_* status
code.  What am I missing?

The rest looks good,
Ira

>   */
>  int nvdimm_init_nsarea(struct nvdimm_drvdata *ndd)
>  {
> @@ -722,6 +725,9 @@ static unsigned long dpa_align(struct nd
>   *  contiguous unallocated dpa range.
>   * @nd_region: constrain available space check to this reference region
>   * @nd_mapping: container of dpa-resource-root + labels
> + *
> + * Returns: %0 if there is an alignment error, otherwise the max
> + *   unallocated dpa range
>   */
>  resource_size_t nd_pmem_max_contiguous_dpa(struct nd_region *nd_region,
>  struct nd_mapping *nd_mapping)
> @@ -767,6 +773,8 @@ resource_size_t nd_pmem_max_contiguous_d
>   *
>   * Validate that a PMEM label, if present, aligns with the start of an
>   * interleave set.
> + *
> + * Returns: %0 if there is an alignment error, otherwise the unallocated dpa
>   */
>  resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
> struct nd_mapping *nd_mapping)
> @@ -836,8 +844,10 @@ struct resource *nvdimm_allocate_dpa(str
>  
>  /**
>   * nvdimm_allocated_dpa - sum up the dpa currently allocated to this label_id
> - * @nvdimm: container of dpa-resource-root + labels
> + * @ndd: container of dpa-resource-root + labels
>   * @label_id: dpa resource name of the form pmem-
> + *
> + * Returns: sum of the dpa allocated to the label_id
>   */
>  resource_size_t nvdimm_allocated_dpa(struct nvdimm_drvdata *ndd,
>   struct nd_label_id *label_id)

Re: [PATCH 3/3] nvdimm/namespace: fix kernel-doc for function params

2023-12-21 Thread Ira Weiny

Randy Dunlap wrote:

[snip]

> @@ -1656,8 +1664,10 @@ static int select_pmem_id(struct nd_regi
>  /**
>   * create_namespace_pmem - validate interleave set labelling, retrieve label0
>   * @nd_region: region with mappings to validate
> - * @nspm: target namespace to create
> + * @nd_mapping: container of dpa-resource-root + labels
>   * @nd_label: target pmem namespace label to evaluate
> + *
> + * Returns: the created &struct device on success or -errno on error

NIT: should this be ERR_PTR(-errno) on error?

Generally good to me though.

Reviewed-by: Ira Weiny 

>   */
>  static struct device *create_namespace_pmem(struct nd_region *nd_region,
>   struct nd_mapping *nd_mapping,

Re: [PATCH 2/3] nvdimm/dimm_devs: fix kernel-doc for function params

2023-12-21 Thread Ira Weiny

Ira Weiny wrote:
> Randy Dunlap wrote:

[snip]

> > diff -- a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
> > --- a/drivers/nvdimm/dimm_devs.c
> > +++ b/drivers/nvdimm/dimm_devs.c
> > @@ -53,7 +53,10 @@ static int validate_dimm(struct nvdimm_d
> >  
> >  /**
> >   * nvdimm_init_nsarea - determine the geometry of a dimm's namespace area
> > - * @nvdimm: dimm to initialize
> > + * @ndd: dimm to initialize
> > + *
> > + * Returns: %0 if the area is already valid, -errno on error,
> 
> This is good...
> 
> > ...otherwise an
> > + * ND_CMD_* status code.
> 
> I don't see where any of the ->ndctl() calls return an ND_CMD_* status
> code.  What am I missing?

If you agree that this should be dropped I can clean it up as I'm trying
to prep the nvdimm tree now and was hoping to take this series.

Ira

> 
> The rest looks good,
> Ira

Re: [PATCH 3/3] nvdimm/namespace: fix kernel-doc for function params

2023-12-21 Thread Ira Weiny

Randy Dunlap wrote:
> 
> 
> On 12/21/23 14:32, Ira Weiny wrote:
> > Randy Dunlap wrote:
> > 
> > [snip]
> > 
> >> @@ -1656,8 +1664,10 @@ static int select_pmem_id(struct nd_regi
> >>  /**
> >>   * create_namespace_pmem - validate interleave set labelling, retrieve 
> >> label0
> >>   * @nd_region: region with mappings to validate
> >> - * @nspm: target namespace to create
> >> + * @nd_mapping: container of dpa-resource-root + labels
> >>   * @nd_label: target pmem namespace label to evaluate
> >> + *
> >> + * Returns: the created &struct device on success or -errno on error
> > 
> > NIT: should this be ERR_PTR(-errno) on error?
> 
> Oh, for sure. Thanks for catching that.

I'll clean this up as well if you are good with me changing patch 2/3 as
well.

Ira

[GIT PULL] NVDIMM/NFIT changes for 6.8

2024-01-10 Thread Ira Weiny

Hi Linus, please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
tags/libnvdimm-for-6.8

... to get updates to the nvdimm tree.  They are a mix of bug fixes and updates
to interfaces used by nvdimm.

Updates to interfaces include:
Use the new scope based management
Remove deprecated ida interfaces
Update to sysfs_emit()

Fixup kdoc comments

They have all been in -next more than 6 days with no reported issues.

---

The following changes since commit 610a9b8f49fbcf1100716370d3b5f6f884a2835a:

  Linux 6.7-rc8 (2023-12-31 12:51:25 -0800)

are available in the Git repository at:

  g...@gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
tags/libnvdimm-for-6.8

for you to fetch changes up to a085a5eb6594a3ebe5c275e9c2c2d341f686c23c:

  acpi/nfit: Use sysfs_emit() for all attributes (2024-01-03 12:21:37 -0800)


libnvdimm updates for v6.8

- updates to deprecated and changed interfaces
- use new cleanup.h features
- use new ida interface
- kdoc fixes


Christophe JAILLET (1):
  nvdimm: Remove usage of the deprecated ida_simple_xx() API

Dan Williams (1):
  acpi/nfit: Use sysfs_emit() for all attributes

Dinghao Liu (1):
  nvdimm-btt: simplify code with the scope based resource management

Michal Wilczynski (1):
  ACPI: NFIT: Use cleanup.h helpers instead of devm_*()

Randy Dunlap (3):
  nvdimm/btt: fix btt_blk_cleanup() kernel-doc
  nvdimm/dimm_devs: fix kernel-doc for function params
  nvdimm/namespace: fix kernel-doc for function params

 drivers/acpi/nfit/core.c| 65 +++--
 drivers/nvdimm/btt.c| 15 --
 drivers/nvdimm/btt_devs.c   |  6 ++--
 drivers/nvdimm/bus.c|  4 +--
 drivers/nvdimm/dax_devs.c   |  4 +--
 drivers/nvdimm/dimm_devs.c  | 17 ---
 drivers/nvdimm/namespace_devs.c | 19 
 drivers/nvdimm/pfn_devs.c   |  4 +--
 8 files changed, 71 insertions(+), 63 deletions(-)

Re: [PATCH v2] acpi/ghes: Prevent sleeping with spinlock held

2024-02-14 Thread Ira Weiny

Jonathan Cameron wrote:
> On Tue, 06 Feb 2024 14:15:32 -0800
> Ira Weiny  wrote:
> 
> > Smatch caught that cxl_cper_post_event() is called with a spinlock held
> > or preemption disabled.[1]  The callback takes the device lock to
> > perform address translation and therefore might sleep.  The record data
> > is released back to BIOS in ghes_clear_estatus() which requires it to be
> > copied for use in the workqueue.
> > 
> > Copy the record to a lockless list and schedule a work item to process
> > the record outside of atomic context.
> > 
> > [1] 
> > https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/
> > 
> > Reported-by: Dan Carpenter 
> > Signed-off-by: Ira Weiny 
> 
> +CC tracing folk for the splat below (the second one as first one is fixed!)
> 
> Lock debugging is slow (on an emulated machine) :(
> Testing with my gitlab.com/jic23/qemu cxl-2024-02-05-draft branch
> which is only one I've put out with the FW first injection patches so far
> 
> For reference without this patch I got a nice spat identifying the original 
> bug.
> So far so good.
> 
> With this patch (and tp_printk in command line and trace points enabled)
> [  193.048229] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
> Error Source: 1
> [  193.049636] {1}[Hardware Error]: event severity: recoverable
> [  193.050472] {1}[Hardware Error]:  Error 0, type: recoverable
> [  193.051337] {1}[Hardware Error]:   section type: unknown, 
> fbcd0a77-c260-417f-85a9-088b1621eba6
> [  193.052270] {1}[Hardware Error]:   section length: 0x90
> [  193.053402] {1}[Hardware Error]:   : 0090 0007  
> 0d938086  
> [  193.055036] {1}[Hardware Error]:   0010: 000e  0005 
>   
> [  193.058592] {1}[Hardware Error]:   0020: 0180  1626fa24 
> 17b3b158  $.&.X...
> [  193.062289] {1}[Hardware Error]:   0030:    
>   
> [  193.065959] {1}[Hardware Error]:   0040: 07d0  0fc00307 
> 05210300  ..!.
> [  193.069782] {1}[Hardware Error]:   0050: 7269 6d207361 00326d65 
>   ..iras mem2.
> [  193.072760] {1}[Hardware Error]:   0060:    
>   
> [  193.074062] {1}[Hardware Error]:   0070:    
>   
> [  193.075346] {1}[Hardware Error]:   0080:    
>   
> 
> I rebased after this so now we get the smaller print - but that's not really 
> relevant here.
> 
> [  193.086589] cxl_general_media: memdev=mem1 host=:0e:00.0 serial=5 
> log=Warning : time=1707903675590441508 
> uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=128 flags='0x1' handle=0 
> related_handle=0 maint_op_class=0 : dpa=7c0 dpa_flags='0x10' 
> descriptor='UNCORRECTABLE_EVENT|THRESHOLD_EVENT|POISON_LIST_OVERFLOW' 
> type='0x3' transaction_type='0xc0' channel=3 rank=33 device=5 comp_id=69 72 
> 61 73 20 6d 65 6d 32 00 00 00 00 00 00 00 
> validity_flags='CHANNEL|RANK|DEVICE|COMPONENT'
> [  193.087181]
>   
>   
>   
> [  193.087361] =
> [  193.087399] [ BUG: Invalid wait context ]
> [  193.087504] 6.8.0-rc3 #1200 Not tainted
> [  193.087660] -
> [  193.087858] kworker/3:0/31 is trying to lock:
> [  193.087966] c0ce1898 (&port_lock_key){-.-.}-{3:3}, at: 
> pl011_console_write+0x148/0x1c8
> [  193.089754] other info that might help us debug this:
> [  193.089820] context-{5:5}[  193.089900] 8 locks held by kworker/3:0/31:
> [  193.089990]  #0: c0018738 ((wq_completion)events){+.+.}-{0:0}, at: 
> process_one_work+0x154/0x500
> [  193.090439]  #1: 800083793de0 (cxl_cper_work){+.+.}-{0:0}, at: 
> process_one_work+0x154/0x500
> [  193.090718]  #2: 800082883310 (cxl_cper_rw_sem){}-{4:4}, at: 
> cxl_cper_work_fn+0x2c/0xb0
> [  193.091019]  #3: c0a7b1a8 (&dev->mutex){}-{4:4}, at: 
> pci_dev_lock+0x28/0x48
> [  193.091431]  #4: 800082738f18 (tracepoint_iter_lock){}-{2:2}, at: 
> trace_event_buffer_commit+0xd8/0x2c8
> [  193.091772]  #5: 8000826b3ce8 (console_lock){+.+.}-{0:0}, at: 
> vprintk_emit+0x124/0x398
> [  193.092031]  #6: 8000826b3d40 (console_srcu){}-{0:0}, at:

Re: [PATCH v2] acpi/ghes: Prevent sleeping with spinlock held

2024-02-14 Thread Ira Weiny

Ira Weiny wrote:
> Jonathan Cameron wrote:
> > On Tue, 06 Feb 2024 14:15:32 -0800
> > Ira Weiny  wrote:
> > 
> >

[snip]

> > 
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 9ff8a439d674..7ee45f22f56f 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -2957,7 +2957,7 @@ static void output_printk(struct trace_event_buffer 
> > *fbuffer)
> > iter->ent = fbuffer->entry;
> > event_call->event.funcs->trace(iter, 0, event);
> > trace_seq_putc(&iter->seq, 0);
> > -   printk("%s", iter->seq.buffer);
> > +   printk_deferred("%s", iter->seq.buffer);
> > 
> > raw_spin_unlock_irqrestore(&tracepoint_iter_lock, flags);
> >  }
> > 
> > My assumption is similar views will apply here to Peter Zijlstra's comment 
> > on
> > not using printk_deferred.
> > 
> > https://lore.kernel.org/all/20231010141244.gm...@noisy.programming.kicks-ass.net/
> > 
> > Note I also tried the hacks Peter links to from there. They trip issues in 
> > the initial
> > CPER print - so I assume not appropriate here.
> > 
> > So I'm thinking this is a won't fix - wait for the printk rework to land and
> > assume this will be resolved as well?
> > 
> 
> Or could we avoid the situation entirely by using a static call?
> 
> The work queue still needs to be created because of the atomicness of
> ghes_do_proc() but it avoids cxl_cper_rw_sem.
> 
> I think the hardest thing may be writing the commit message to explain all
> this.  :-(
> 

Never mind, I already went down that path.  I was surprised I did not
mention it in this commit message.  I did in V1.  :-(

"A static call was considered but ARM does not select HAVE_STATIC_CALL
and in that case setting the function pointer uses a RW semaphore."
-- 
https://lore.kernel.org/all/20240202-cxl-cper-smatch-v1-1-7a4103c7f...@intel.com/

Should have carried that through.

So should we revert ...

Fixes: 671a794c33c6 ("acpi/ghes: Process CXL Component Events") 

... and wait for the printk fix or just move forward with this patch?

Sorry for the noise,
Ira

Re: [PATCH v2] acpi/ghes: Prevent sleeping with spinlock held

2024-02-14 Thread Ira Weiny

Jonathan Cameron wrote:
> On Wed, 14 Feb 2024 10:23:10 -0500
> Steven Rostedt  wrote:
> 
> > On Wed, 14 Feb 2024 12:11:53 +
> > Jonathan Cameron  wrote:
> > 
> > > So I'm thinking this is a won't fix - wait for the printk rework to land 
> > > and
> > > assume this will be resolved as well?  
> > 
> > That pretty much sums up what I was about to say ;-)
> > 
> > tp_printk is more of a hack and not to be used sparingly. With the right
> > trace events it can hang the machine.
> > 
> > So, you can use your internal patch locally, but I would recommend waiting
> > for the new printk changes to land. I'm really hoping that will be soon!
> > 
> > -- Steve
> 
> Thanks Steve,
> 
> Ira's fix is needed for other valid locking reasons - this was 'just another'
> lock debugging report that came up whilst testing it.
> 
> For this patch (not a potential additional one that we aren't going to do ;)
> 
> Tested-by: Jonathan Cameron 
> Reviewed-by: Jonathan Cameron 

Thanks Jonathan!  I really appreciate the testing,
Ira

Re: [PATCH v2] acpi/ghes: Prevent sleeping with spinlock held

2024-02-14 Thread Ira Weiny

Jonathan Cameron wrote:
> On Wed, 14 Feb 2024 10:23:10 -0500
> Steven Rostedt  wrote:
> 
> > On Wed, 14 Feb 2024 12:11:53 +
> > Jonathan Cameron  wrote:
> > 
> > > So I'm thinking this is a won't fix - wait for the printk rework to land 
> > > and
> > > assume this will be resolved as well?  
> > 
> > That pretty much sums up what I was about to say ;-)
> > 
> > tp_printk is more of a hack and not to be used sparingly. With the right
> > trace events it can hang the machine.
> > 
> > So, you can use your internal patch locally, but I would recommend waiting
> > for the new printk changes to land.

Steven, Do you think that will land in 6.9?

> >
> > I'm really hoping that will be soon!
> > 
> > -- Steve
> 
> Thanks Steve,
> 
> Ira's fix is needed for other valid locking reasons - this was 'just another'
> lock debugging report that came up whilst testing it.
> 
> For this patch (not a potential additional one that we aren't going to do ;)
> 
> Tested-by: Jonathan Cameron 
> Reviewed-by: Jonathan Cameron 

Jonathan,

Again thanks for the testing!  However, Dan and I just discussed this and
he has an uneasy feeling about going forward with this for 6.8 final.

If we revert the following patch I can squash this fix and wait for the
tp_printk() fix to land in 6.9 and resubmit.

Dan here is the patch which backs out the actual bug:

Fixes: 671a794c33c6 ("acpi/ghes: Process CXL Component Events") 

Thanks everyone,
Ira

Re: [PATCH v2] acpi/ghes: Prevent sleeping with spinlock held

2024-02-14 Thread Ira Weiny

Ira Weiny wrote:
> Jonathan Cameron wrote:
> > On Wed, 14 Feb 2024 10:23:10 -0500
> > Steven Rostedt  wrote:
> > 
> > > On Wed, 14 Feb 2024 12:11:53 +
> > > Jonathan Cameron  wrote:
> > > 
> > > > So I'm thinking this is a won't fix - wait for the printk rework to 
> > > > land and
> > > > assume this will be resolved as well?  
> > > 
> > > That pretty much sums up what I was about to say ;-)
> > > 
> > > tp_printk is more of a hack and not to be used sparingly. With the right
> > > trace events it can hang the machine.
> > > 
> > > So, you can use your internal patch locally, but I would recommend waiting
> > > for the new printk changes to land.
> 
> Steven, Do you think that will land in 6.9?
> 
> > >
> > > I'm really hoping that will be soon!
> > > 
> > > -- Steve
> > 
> > Thanks Steve,
> > 
> > Ira's fix is needed for other valid locking reasons - this was 'just 
> > another'
> > lock debugging report that came up whilst testing it.
> > 
> > For this patch (not a potential additional one that we aren't going to do ;)
> > 
> > Tested-by: Jonathan Cameron 
> > Reviewed-by: Jonathan Cameron 
> 
> Jonathan,
> 
> Again thanks for the testing!  However, Dan and I just discussed this and
> he has an uneasy feeling about going forward with this for 6.8 final.
> 
> If we revert the following patch I can squash this fix and wait for the
> tp_printk() fix to land in 6.9 and resubmit.
> 
> Dan here is the patch which backs out the actual bug:
> 
>   Fixes: 671a794c33c6 ("acpi/ghes: Process CXL Component Events") 

Unfortunately this is not the only patch.

We need to revert this too:

Fixes: dc97f6344f20 ("cxl/pci: Register for and process CPER events") 

And then revert ...
Fixes: 671a794c33c6 ("acpi/ghes: Process CXL Component Events") 

... but there is a conflict.

Dan, below is the correct revert patch.  Let me know if you need more.

Ira

commit 807fbe9cac9b190dab83e3ff377a30d18859c8ab
Author: Ira Weiny 
Date:   Wed Feb 14 15:25:24 2024 -0800

Revert "acpi/ghes: Process CXL Component Events"

This reverts commit 671a794c33c6e048ca5cedd5ad6af44d52d5d7e5.

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fe825a432c5b..ab2a82cb1b0b 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -26,7 +26,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -674,52 +673,6 @@ static void ghes_defer_non_standard_event(struct 
acpi_hest_generic_data *gdata,
schedule_work(&entry->work);
 }
 
-/*
- * Only a single callback can be registered for CXL CPER events.
- */
-static DECLARE_RWSEM(cxl_cper_rw_sem);
-static cxl_cper_callback cper_callback;
-
-static void cxl_cper_post_event(enum cxl_event_type event_type,
-   struct cxl_cper_event_rec *rec)
-{
-   if (rec->hdr.length <= sizeof(rec->hdr) ||
-   rec->hdr.length > sizeof(*rec)) {
-   pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
-  rec->hdr.length);
-   return;
-   }
-
-   if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
-   pr_err(FW_WARN "CXL CPER invalid event\n");
-   return;
-   }
-
-   guard(rwsem_read)(&cxl_cper_rw_sem);
-   if (cper_callback)
-   cper_callback(event_type, rec);
-}
-
-int cxl_cper_register_callback(cxl_cper_callback callback)
-{
-   guard(rwsem_write)(&cxl_cper_rw_sem);
-   if (cper_callback)
-   return -EINVAL;
-   cper_callback = callback;
-   return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_register_callback, CXL);
-
-int cxl_cper_unregister_callback(cxl_cper_callback callback)
-{
-   guard(rwsem_write)(&cxl_cper_rw_sem);
-   if (callback != cper_callback)
-   return -EINVAL;
-   cper_callback = NULL;
-   return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_callback, CXL);
-
 static bool ghes_do_proc(struct ghes *ghes,
 const struct acpi_hest_generic_status *estatus)
 {
@@ -754,22 +707,6 @@ static bool ghes_do_proc(struct ghes *ghes,
}
else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
queued = ghes_handle_arm_hw_error(gdata, sev, sync);
-   } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
-   struct cxl_cper_event_rec *rec =
-   acpi_hest_get_payload(gdata);
-
-   cxl_cper_post_event(CXL_CPER_EVENT_GEN_

Re: [PATCH v2] acpi/ghes: Prevent sleeping with spinlock held

2024-02-15 Thread Ira Weiny

Jonathan Cameron wrote:
> On Wed, 14 Feb 2024 17:33:18 -0500
> Steven Rostedt  wrote:
> 
> > On Wed, 14 Feb 2024 14:19:19 -0800
> > Ira Weiny  wrote:
> > 
> > > > > Jonathan Cameron  wrote:
> > > > > 
> > > > > > So I'm thinking this is a won't fix - wait for the printk rework to 
> > > > > > land and
> > > > > > assume this will be resolved as well?  
> > > > > 
> > > > > That pretty much sums up what I was about to say ;-)
> > > > > 
> > > > > tp_printk is more of a hack and not to be used sparingly. With the 
> > > > > right
> > > > > trace events it can hang the machine.
> > > > > 
> > > > > So, you can use your internal patch locally, but I would recommend 
> > > > > waiting
> > > > > for the new printk changes to land.
> > > 
> > > Steven, Do you think that will land in 6.9?
> > >   
> > > > >
> > > > > I'm really hoping that will be soon!
> > > > >   
> > 
> > I may be like Jon Corbet predicting RT will land in mainline if I do.
> > 
> > -- Steve
> > 
> 
> 
> Agreed. Don't wait on printk fixes landing. (Well unless you are sure
> it's the year of the Linux desktop.) Reverting is fine for 6.8
> if you and Dan feel it's unwise to take this forwards (all the distros
> will backport it anyway and 6.8 isn't an LTS so no great rush)
> so fine to just queue it up again for 6.9 with this fix.
> 
> As Steve said, tp_printk is a hack (a very useful one) and
> hopefully no one runs it in production.

OMG...  I did not realize what tp_printk() was exactly.  I should have
looked closer.

Do we have evidence of its use in production?

I would love to not have to revert/respin,
Ira

[GIT PULL] NVDIMM and DAX for 6.10

2024-05-15 Thread Ira Weiny

Hi Linus, Please pull from 

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
tags/libnvdimm-for-6.10

... to get code updates for the nvdimm tree.  These have been in
linux-next for a couple of weeks.  The changes include removing duplicate
code and updating the nvdimm tree to the current kernel interfaces such as
using const for struct device_type and changing the platform remove
callback signature.

Thank you,
Ira Weiny

---

The following changes since commit ed30a4a51bb196781c8058073ea720133a65596f:

  Linux 6.9-rc5 (2024-04-21 12:35:54 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
tags/libnvdimm-for-6.10

for you to fetch changes up to 41147b006be2174b825a54b0620ecf4cc7ec5c84:

  dax: remove redundant assignment to variable rc (2024-04-25 14:11:11 -0700)


Updates for nvdimm for 6.10

Code cleanups, remove duplicate code, and updates for current
interfaces.


Christoph Hellwig (2):
  nvdimm: remove nd_integrity_init
  nvdimm/btt: always set max_integrity_segments

Colin Ian King (1):
  dax: remove redundant assignment to variable rc

Ricardo B. Marliere (1):
  dax: constify the struct device_type usage

Shivaprasad G Bhat (1):
  powerpc/papr_scm: Move duplicate definitions to common header files

Uwe Kleine-König (1):
  ndtest: Convert to platform remove callback returning void

 MAINTAINERS|  2 +
 arch/powerpc/platforms/pseries/papr_scm.c  | 43 +--
 drivers/dax/bus.c  |  3 +-
 drivers/nvdimm/btt.c   | 12 --
 drivers/nvdimm/core.c  | 30 -
 drivers/nvdimm/nd.h|  1 -
 include/linux/papr_scm.h   | 49 ++
 .../uapi/asm => include/uapi/linux}/papr_pdsm.h|  0
 tools/testing/nvdimm/test/ndtest.c |  7 ++--
 tools/testing/nvdimm/test/ndtest.h | 31 --
 10 files changed, 66 insertions(+), 112 deletions(-)
 create mode 100644 include/linux/papr_scm.h
 rename {arch/powerpc/include/uapi/asm => include/uapi/linux}/papr_pdsm.h (100%)

Re: [PATCH] testing: nvdimm: make struct class structures constant

2023-10-11 Thread Ira Weiny

Greg Kroah-Hartman wrote:
> Now that the driver core allows for struct class to be in read-only
> memory, we should make all 'class' structures declared at build time
> placing them into read-only memory, instead of having to be dynamically
> allocated at runtime.
> 
> Cc: Dan Williams 
> Cc: Vishal Verma 
> Cc: Dave Jiang 
> Cc: Ira Weiny 

Tested-by: Ira Weiny 
Reviewed-by: Ira Weiny 

> Signed-off-by: Greg Kroah-Hartman 
> ---
>  tools/testing/nvdimm/test/ndtest.c | 17 +
>  tools/testing/nvdimm/test/nfit.c   | 14 +++---
>  2 files changed, 16 insertions(+), 15 deletions(-)

[GIT PULL] NVDIMM for 6.7

2023-11-01 Thread Ira Weiny

Hi Linus, please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
tags/libnvdimm-for-6.7

... to get updates to the nvdimm tree.  They are a mix of bug fixes and updates
to interfaces used by nvdimm.

Bug fixes include:
Fix a sleep during spinlock in PREEMPT_RT kernels when getting a BTT
lane
Check for kstrdup memory failure in region probe
Use the correct variables in the nvdimm_badblocks_populate() kdoc block

Updates to interfaces include:
Use devm_kstrdup to manage memory better
Allow class structures to be declared at build time
Start using __counted_by to help with bounds checking
Remove use of the deprecated strncpy

They have all been in -next for more than a week with no reported issues.

---

The following changes since commit 6465e260f48790807eef06b583b38ca9789b6072:

  Linux 6.6-rc3 (2023-09-24 14:31:13 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
tags/libnvdimm-for-6.7

for you to fetch changes up to 9ea459e477dc09370cdd8ee13b61aefe8cd1f20a:

  libnvdimm: remove kernel-doc warnings: (2023-10-18 09:48:05 -0700)


libnvdimm updates for v6.7

- updates to deprecated and changed interfaces
- bug/kdoc fixes


Chen Ni (1):
  libnvdimm/of_pmem: Use devm_kstrdup instead of kstrdup and check its 
return value

Greg Kroah-Hartman (1):
  testing: nvdimm: make struct class structures constant

Justin Stitt (1):
  dax: refactor deprecated strncpy

Kees Cook (1):
  libnvdimm: Annotate struct nd_region with __counted_by

Tomas Glozar (1):
  nd_btt: Make BTT lanes preemptible

Zhu Wang (1):
  libnvdimm: remove kernel-doc warnings:

 drivers/dax/bus.c  |  2 +-
 drivers/nvdimm/badrange.c  |  4 ++--
 drivers/nvdimm/nd.h|  2 +-
 drivers/nvdimm/of_pmem.c   |  8 +++-
 drivers/nvdimm/region_devs.c   | 10 +-
 tools/testing/nvdimm/test/ndtest.c | 17 +
 tools/testing/nvdimm/test/nfit.c   | 14 +++---
 7 files changed, 32 insertions(+), 25 deletions(-)

Re: [PATCH] nvdimm-btt: fix a potential memleak in btt_freelist_init

2023-12-07 Thread Ira Weiny

Dave Jiang wrote:
> 

[snip]

First off thanks for the patch.  This code seems to have a few things to
clean up.

> 
> On 12/6/23 20:43, Dinghao Liu wrote:
> > When an error happens in btt_freelist_init(), its caller
> > discover_arenas() will directly free arena, which makes
> > arena->freelist allocated in btt_freelist_init() a leaked
> > memory. Fix this by freeing arena->freelist in all error
> > handling paths of btt_freelist_init().
> > 
> > Fixes: 5212e11fde4d ("nd_btt: atomic sector updates")
> > Signed-off-by: Dinghao Liu 
> 
> How about use the new scope based resource management and we can avoid the 
> goto mess altogether?
> https://lwn.net/Articles/934679/
> 

The freelist is returned as part of arena.  I've not traced both paths of
btt_freelist_init() completely but devm_kcalloc() looks like a better
solution here because this memory needs to live past the function scope.

That said, this patch does not completely fix freelist from leaking in the
following error path.

discover_arenas()
btt_freelist_init() -> ok (memory allocated)
btt_rtt_init() -> fail
goto out;
(leak because arena is not yet on btt->arena_list)
OR
btt_maplocks_init() -> fail
goto out;
(leak because arena is not yet on btt->arena_list)

This error could be fixed by adding to arena_list earlier but devm_*()
also takes care of this without having to worry about that logic.

On normal operation all of this memory can be free'ed with the
corresponding devm_kfree() and/or devm_add_action_*() calls if arenas come
and go.  I'm not sure off the top of my head.

In addition, looking at this code.  discover_arenas() could make use of
the scoped based management for struct btt_sb *super!

Dinghao would you be willing to submit a series of 2 or 3 patches to fix
the above issues?

Thanks!
Ira

Re: [PATCH] nvdimm-btt: fix a potential memleak in btt_freelist_init

2023-12-08 Thread Ira Weiny

dinghao.liu@ wrote:
> > Dave Jiang wrote:

[snip]

> > That said, this patch does not completely fix freelist from leaking in the
> > following error path.
> > 
> > discover_arenas()
> > btt_freelist_init() -> ok (memory allocated)
> > btt_rtt_init() -> fail
> > goto out;
> > (leak because arena is not yet on btt->arena_list)
> > OR
> > btt_maplocks_init() -> fail
> > goto out;
> > (leak because arena is not yet on btt->arena_list)
> > 
> 
> Thanks for pointing out this issue! I rechecked discover_arenas() and found
> that btt_rtt_init() may also trigger a memleak for the same reason as
> btt_freelist_init(). Also, I checked another call trace:
> 
> btt_init() -> btt_meta_init() -> btt_maplocks_init()
> 
> I think there is a memleak if btt_maplocks_init() succeeds but an error
> happens in btt_init() after btt_meta_init() (e.g., when btt_blk_init()
> returns an error). Therefore, we may need to fix three functions.

Yea I think we need to trace this code better.  This is why devm_ is nice for
memory allocated for the life of the device.

> 
> > This error could be fixed by adding to arena_list earlier but devm_*()
> > also takes care of this without having to worry about that logic.
> > 
> > On normal operation all of this memory can be free'ed with the
> > corresponding devm_kfree() and/or devm_add_action_*() calls if arenas come
> > and go.  I'm not sure off the top of my head.
> > 
> > In addition, looking at this code.  discover_arenas() could make use of
> > the scoped based management for struct btt_sb *super!
> > 
> > Dinghao would you be willing to submit a series of 2 or 3 patches to fix
> > the above issues?
> > 
> 
> Sure. Currently I plan to send 2 patches as follows:
> 1. Using devm_kcalloc() to replace kcalloc() in btt_freelist_init(), 
>btt_rtt_init(), and btt_maplocks_init(), and removing the corresponding
>kfree in free_arenas(). I checked some uses of devm_kcalloc() and it
>seems that we need not to call devm_kfree(). The memory is automatically
>freed on driver detach, right?

On device put yes.  So if these allocations are scoped to the life of the
device there would be no reason to call devm_kfree() on them at all.  I was not
sure if they got reallocated at some point or not.

> 2. Using the scoped based management for struct btt_sb *super (not a bug,
>but it could improve the code).

Good!

> 
> I'm not quite sure whether my understanding or bug fixing plan is correct.
> If there are any issues, please correct me, thanks!

The above sounds right.
Ira

Re: [PATCH v4 2/3] dax/bus: Introduce guard(device) for device_{lock,unlock} flows

2023-12-12 Thread Ira Weiny

Vishal Verma wrote:
> Introduce a guard(device) macro to lock a 'struct device', and unlock it
> automatically when going out of scope using Scope Based Resource
> Management semantics. A lot of the sysfs attribute writes in
> drivers/dax/bus.c benefit from a cleanup using these, so change these
> where applicable.
> 
> Cc: Joao Martins 
> Suggested-by: Dan Williams 
> Signed-off-by: Vishal Verma 
>

Reviewed-by: Ira Weiny

Re: [PATCH] driver core: Add a guard() definition for the device_lock()

2023-12-13 Thread Ira Weiny

Dan Williams wrote:
> At present there are ~200 usages of device_lock() in the kernel. Some of
> those usages lead to "goto unlock;" patterns which have proven to be
> error prone. Define a "device" guard() definition to allow for those to
> be cleaned up and prevent new ones from appearing.
> 
> Link: 
> http://lore.kernel.org/r/657897453dda8_269bd29...@dwillia2-mobl3.amr.corp.intel.com.notmuch
> Link: 
> http://lore.kernel.org/r/6577b0c2a02df_a04c529...@dwillia2-xfh.jf.intel.com.notmuch
> Cc: Vishal Verma 
> Cc: Ira Weiny 

Reviewed-by: Ira Weiny 

> Cc: Peter Zijlstra 
> Cc: Greg Kroah-Hartman 
> Cc: Andrew Morton 
> Signed-off-by: Dan Williams 
> ---
> Hi Greg,
> 
> I wonder if you might include this change in v6.7-rc to ease some patch
> sets alternately going through my tree and Andrew's tree. Those
> discussions are linked above. Alternately I can can just take it through
> my tree with your ack and the other use case can circle back to it in
> the v6.9 cycle.
> 
> I considered also defining a __free() helper similar to __free(mutex),
> but I think "__free(device)" would be a surprising name for something
> that drops a lock. Also, I like the syntax of guard(device) over
> something like guard(device_lock) since a 'struct device *' is the
> argument, not a lock type, but I'm open to your or Peter's thoughts on
> the naming.
> 
>  include/linux/device.h |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/linux/device.h b/include/linux/device.h
> index d7a72a8749ea..6c83294395ac 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -1007,6 +1007,8 @@ static inline void device_unlock(struct device *dev)
>   mutex_unlock(&dev->mutex);
>  }
>  
> +DEFINE_GUARD(device, struct device *, device_lock(_T), device_unlock(_T))
> +
>  static inline void device_lock_assert(struct device *dev)
>  {
>   lockdep_assert_held(&dev->mutex);
>

Re: [PATCH] [v2] nvdimm-btt: fix several memleaks

2023-12-14 Thread Ira Weiny

Dinghao Liu wrote:
> Resources allocated by kcalloc() in btt_freelist_init(),
> btt_rtt_init(), and btt_maplocks_init() are not correctly
> released in their callers when an error happens. For
> example, when an error happens in btt_freelist_init(), its
> caller discover_arenas() will directly free arena, which makes
> arena->freelist a leaked memory. Fix these memleaks by using
> devm_kcalloc() to make the memory auto-freed on driver detach.

Thanks for this work!

> 
> Fixes: 5212e11fde4d ("nd_btt: atomic sector updates")
> Signed-off-by: Dinghao Liu 
> ---
> 
> Changelog:
> 
> v2: -Use devm_kcalloc() to fix the memleaks.
> -Fix the potential leaked memory in btt_rtt_init()
>  and btt_maplocks_init().
> ---
>  drivers/nvdimm/btt.c | 35 ---
>  1 file changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index d5593b0dc700..c55231f42617 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -531,13 +531,13 @@ static int arena_clear_freelist_error(struct arena_info 
> *arena, u32 lane)
>   return ret;
>  }
>  
> -static int btt_freelist_init(struct arena_info *arena)
> +static int btt_freelist_init(struct device *dev, struct arena_info *arena)

Both struct arena_info and struct btt contain references to struct nd_btt
which is the device you are passing down this call stack.

Just use the device in the arena/btt rather than passing a device
structure.  That makes the code easier to read and ensures that the device
associated with this arena or btt is used.

>  {
>   int new, ret;
>   struct log_entry log_new;
>   u32 i, map_entry, log_oldmap, log_newmap;
>  
> - arena->freelist = kcalloc(arena->nfree, sizeof(struct free_entry),
> + arena->freelist = devm_kcalloc(dev, arena->nfree, sizeof(struct 
> free_entry),

... devm_kcalloc(&arena->nd_btt.dev, ...)

>   GFP_KERNEL);
>   if (!arena->freelist)
>   return -ENOMEM;
> @@ -718,20 +718,20 @@ static int log_set_indices(struct arena_info *arena)
>   return 0;
>  }
>  
> -static int btt_rtt_init(struct arena_info *arena)
> +static int btt_rtt_init(struct device *dev, struct arena_info *arena)
>  {
> - arena->rtt = kcalloc(arena->nfree, sizeof(u32), GFP_KERNEL);
> + arena->rtt = devm_kcalloc(dev, arena->nfree, sizeof(u32), GFP_KERNEL);
>   if (arena->rtt == NULL)
>   return -ENOMEM;
>  
>   return 0;
>  }
>  
> -static int btt_maplocks_init(struct arena_info *arena)
> +static int btt_maplocks_init(struct device *dev, struct arena_info *arena)
>  {
>   u32 i;
>  
> - arena->map_locks = kcalloc(arena->nfree, sizeof(struct aligned_lock),
> + arena->map_locks = devm_kcalloc(dev, arena->nfree, sizeof(struct 
> aligned_lock),
>   GFP_KERNEL);
>   if (!arena->map_locks)
>   return -ENOMEM;
> @@ -805,9 +805,6 @@ static void free_arenas(struct btt *btt)
>  
>   list_for_each_entry_safe(arena, next, &btt->arena_list, list) {
>   list_del(&arena->list);
> - kfree(arena->rtt);
> - kfree(arena->map_locks);
> - kfree(arena->freelist);

This does not quite work.

free_arenas() is used in the error paths of create_arenas() and
discover_arenas().  In those cases devm_kfree() is probably a better way
to clean up this.

However...

>   debugfs_remove_recursive(arena->debugfs_dir);
>   kfree(arena);

Why can't arena be allocated with devm_*?

We need to change this up a bit more to handle the error path vs regular
device shutdown free (automatic devm frees).

Ira

Re: [PATCH] [v2] nvdimm-btt: fix several memleaks

2023-12-15 Thread Ira Weiny

Ira Weiny wrote:
> Dinghao Liu wrote:

[snip]

> >  
> > -static int btt_maplocks_init(struct arena_info *arena)
> > +static int btt_maplocks_init(struct device *dev, struct arena_info *arena)
> >  {
> > u32 i;
> >  
> > -   arena->map_locks = kcalloc(arena->nfree, sizeof(struct aligned_lock),
> > +   arena->map_locks = devm_kcalloc(dev, arena->nfree, sizeof(struct 
> > aligned_lock),
> > GFP_KERNEL);
> > if (!arena->map_locks)
> > return -ENOMEM;
> > @@ -805,9 +805,6 @@ static void free_arenas(struct btt *btt)
> >  
> > list_for_each_entry_safe(arena, next, &btt->arena_list, list) {
> > list_del(&arena->list);
> > -   kfree(arena->rtt);
> > -   kfree(arena->map_locks);
> > -   kfree(arena->freelist);
> 
> This does not quite work.
> 
> free_arenas() is used in the error paths of create_arenas() and
> discover_arenas().  In those cases devm_kfree() is probably a better way
> to clean up this.
> 
> However...
> 
> > debugfs_remove_recursive(arena->debugfs_dir);
> > kfree(arena);
> 
> Why can't arena be allocated with devm_*?
> 
> We need to change this up a bit more to handle the error path vs regular
> device shutdown free (automatic devm frees).

We might want to look at using no_free_ptr() in this code.  See this
patch[1] for an example of how to inhibit the cleanup and pass the pointer
on when the function succeeds.

[1] 
https://lore.kernel.org/all/170261791914.1714654.6447680285357545638.st...@dwillia2-xfh.jf.intel.com/

Ira

Re: [PATCH] [v2] nvdimm-btt: fix several memleaks

2023-12-19 Thread Ira Weiny

dinghao.liu@ wrote:
> > Ira Weiny wrote:
> > > Dinghao Liu wrote:
> > 
> > [snip]
> > 
> > -static int btt_freelist_init(struct arena_info *arena)
> > +static int btt_freelist_init(struct device *dev, struct arena_info *arena)
> > 
> > Both struct arena_info and struct btt contain references to struct nd_btt
> > which is the device you are passing down this call stack.
> > 
> > Just use the device in the arena/btt rather than passing a device
> > structure.  That makes the code easier to read and ensures that the device
> > associated with this arena or btt is used.
> 
> Thanks for this suggestion! I will fix this in the v3 patch.
> 
> > [snip]
> > > >  
> > > > -static int btt_maplocks_init(struct arena_info *arena)
> > > > +static int btt_maplocks_init(struct device *dev, struct arena_info 
> > > > *arena)
> > > >  {
> > > > u32 i;
> > > >  
> > > > -   arena->map_locks = kcalloc(arena->nfree, sizeof(struct 
> > > > aligned_lock),
> > > > +   arena->map_locks = devm_kcalloc(dev, arena->nfree, 
> > > > sizeof(struct aligned_lock),
> > > > GFP_KERNEL);
> > > > if (!arena->map_locks)
> > > > return -ENOMEM;
> > > > @@ -805,9 +805,6 @@ static void free_arenas(struct btt *btt)
> > > >  
> > > > list_for_each_entry_safe(arena, next, &btt->arena_list, list) {
> > > > list_del(&arena->list);
> > > > -   kfree(arena->rtt);
> > > > -   kfree(arena->map_locks);
> > > > -   kfree(arena->freelist);
> > > 
> > > This does not quite work.
> > > 
> > > free_arenas() is used in the error paths of create_arenas() and
> > > discover_arenas().  In those cases devm_kfree() is probably a better way
> > > to clean up this.
> 
> Here I'm a little confused about when devm_kfree() should be used.

Over all it should be used whenever memory is allocated for the lifetime
of the device.

> Code in btt_init() implies that resources allocated by devm_* could be
> auto freed in both error and success paths of probe/attach (e.g., btt 
> allocated by devm_kzalloc is never freed by devm_kfree).
> Using devm_kfree() in free_arenas() is ok for me, but I want to know
> whether not using devm_kfree() constitutes a bug.

Unfortunately I'm not familiar enough with this code to know for sure.

After my quick checks before I thought it was.  But each time I look at it
I get confused.  This is why I was thinking maybe not using devm_*() and
using no_free_ptr() may be a better solution to make sure things don't
leak without changing the success path (which is likely working fine
because no bugs have been found.)

> 
> > > 
> > > However...
> > > 
> > > > debugfs_remove_recursive(arena->debugfs_dir);
> > > > kfree(arena);
> > > 
> > > Why can't arena be allocated with devm_*?
> > > 
> > > We need to change this up a bit more to handle the error path vs regular
> > > device shutdown free (automatic devm frees).
> > 
> 
> At first, I think the use of arena is correct. Therefore, allocating arena
> with devm_* should be a code style optimization. However, I rechecked
> discover_arenas and found arena might also be leaked (e.g., if
> alloc_arena() fails in the second loop, the previously allocated
> resources in the loop is leaked). The correct code could be found in
> create_arenas(), where free_arenas is called on failure of
> alloc_arena().
 
Yea I've not reached that level of detail in my analysis.

>
> To fix this issue, I think it's better to change the 'goto out_super'
> tag to 'goto out'. We could also use devm_* for arena to simplify the
> error path in discover_arenas(). 

I think it is your call at this point as I don't have time to dig in more
than I have.  Sorry.

> 
> > We might want to look at using no_free_ptr() in this code.  See this
> > patch[1] for an example of how to inhibit the cleanup and pass the
> > pointer on when the function succeeds.
> > 
> > [1]
> > https://lore.kernel.org/all/170261791914.1714654.6447680285357545638.st...@dwillia2-xfh.jf.intel.com/
> > 
> > Ira
> 
> Thanks for this example. But it seems that no_free_ptr() is used to
> handle the scope based resource management. Changes in this patch does
> not introduce this feature. Do I understand this correctly?

You do understand but I was thinking that perhaps using no_free_ptr()
rather than devm_*() might be an easier way to fix this bug without trying
to decode the lifetime of everything.

Ira

> 
> Regards,
> Dinghao

Re: [PATCH] nvdimm: Fix devs leaks in scan_labels()

2024-06-06 Thread Ira Weiny

Li Zhijian wrote:
> Don't allocate devs again when it's valid pointer which has pionted to
> the memory allocated above with size (count + 2 * sizeof(dev)).
> 
> A kmemleak reports:
> unreferenced object 0x88800dda1980 (size 16):
>   comm "kworker/u10:5", pid 69, jiffies 4294671781
>   hex dump (first 16 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace (crc 0):
> [] __kmalloc+0x32c/0x470
> [<9ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 
> [libnvdimm]
> [<0e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm]
> [<7b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm]
> [] really_probe+0xc6/0x390
> [<129e2a69>] __driver_probe_device+0x78/0x150
> [<2dfed28b>] driver_probe_device+0x1e/0x90
> [] __device_attach_driver+0x85/0x110
> [<32dca295>] bus_for_each_drv+0x85/0xe0
> [<391c5a7d>] __device_attach+0xbe/0x1e0
> [<26dabec0>] bus_probe_device+0x94/0xb0
> [] device_add+0x656/0x870
> [<3d69bfaa>] nd_async_device_register+0xe/0x50 [libnvdimm]
> [<3f4c52a4>] async_run_entry_fn+0x2e/0x110
> [] process_one_work+0x1ee/0x600
> [<6d90d5a9>] worker_thread+0x183/0x350
> 
> Fixes: 1b40e09a1232 ("libnvdimm: blk labels and namespace instantiation")
> Signed-off-by: Li Zhijian 
> ---
>  drivers/nvdimm/namespace_devs.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> index d6d558f94d6b..56b016dbe307 100644
> --- a/drivers/nvdimm/namespace_devs.c
> +++ b/drivers/nvdimm/namespace_devs.c
> @@ -1994,7 +1994,9 @@ static struct device **scan_labels(struct nd_region 
> *nd_region)
>   /* Publish a zero-sized namespace for userspace to configure. */
>   nd_mapping_free_labels(nd_mapping);
>  
> - devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
> + /* devs probably has been allocated */

I don't think this is where the bug is.  The loop above is processing the
known labels and should exit with a count > 0 if devs is not NULL.

>From what I can tell create_namespace_pmem() must be returning EAGAIN
which leaves devs allocated but fails to increment count.  Thus there are
no valid labels but devs was not free'ed.

Can you trace the error you are seeing a bit more to see if this is the
case?  

Thanks,
Ira

> + if (!devs)
> + devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
>   if (!devs)
>   goto err;
>  
> -- 
> 2.29.2
>

Re: [PATCH RESEND] nvdimm: add missing MODULE_DESCRIPTION() macros

2024-06-10 Thread Ira Weiny

Jeff Johnson wrote:
> Fix the 'make W=1' warnings:
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/libnvdimm.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_pmem.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_btt.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_e820.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/of_pmem.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_virtio.o

Just to double check.  This is a resend of this patch?

https://lore.kernel.org/all/20240526-md-drivers-nvdimm-v1-1-172e682e7...@quicinc.com/

Dave Jiang, I'm picking up all these for the nvdimm tree and I think there
were a couple I was not CC'ed on.  I'll coordinate with you because I'm
still seeing a couple of these warnings on other modules in the test
build.

Also I want to double check all the descriptions before I send for 6.11.

Jeff is it ok if I alter the text?  I know you mentioned to Jonathan you
really just wanted to see the errors go away.

Ira

> 
> Signed-off-by: Jeff Johnson 
> ---
>  drivers/nvdimm/btt.c   | 1 +
>  drivers/nvdimm/core.c  | 1 +
>  drivers/nvdimm/e820.c  | 1 +
>  drivers/nvdimm/nd_virtio.c | 1 +
>  drivers/nvdimm/of_pmem.c   | 1 +
>  drivers/nvdimm/pmem.c  | 1 +
>  6 files changed, 6 insertions(+)
> 
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index 1e5aedaf8c7b..a47acc5d05df 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -1721,6 +1721,7 @@ static void __exit nd_btt_exit(void)
>  
>  MODULE_ALIAS_ND_DEVICE(ND_DEVICE_BTT);
>  MODULE_AUTHOR("Vishal Verma ");
> +MODULE_DESCRIPTION("NVDIMM Block Translation Table");
>  MODULE_LICENSE("GPL v2");
>  module_init(nd_btt_init);
>  module_exit(nd_btt_exit);
> diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
> index 2023a661bbb0..f4b6fb4b9828 100644
> --- a/drivers/nvdimm/core.c
> +++ b/drivers/nvdimm/core.c
> @@ -540,6 +540,7 @@ static __exit void libnvdimm_exit(void)
>   nvdimm_devs_exit();
>  }
>  
> +MODULE_DESCRIPTION("NVDIMM (Non-Volatile Memory Device) core module");
>  MODULE_LICENSE("GPL v2");
>  MODULE_AUTHOR("Intel Corporation");
>  subsys_initcall(libnvdimm_init);
> diff --git a/drivers/nvdimm/e820.c b/drivers/nvdimm/e820.c
> index 4cd18be9d0e9..008b9aae74ff 100644
> --- a/drivers/nvdimm/e820.c
> +++ b/drivers/nvdimm/e820.c
> @@ -69,5 +69,6 @@ static struct platform_driver e820_pmem_driver = {
>  module_platform_driver(e820_pmem_driver);
>  
>  MODULE_ALIAS("platform:e820_pmem*");
> +MODULE_DESCRIPTION("NVDIMM support for e820 type-12 memory");
>  MODULE_LICENSE("GPL v2");
>  MODULE_AUTHOR("Intel Corporation");
> diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> index 1f8c667c6f1e..35c8fbbba10e 100644
> --- a/drivers/nvdimm/nd_virtio.c
> +++ b/drivers/nvdimm/nd_virtio.c
> @@ -123,4 +123,5 @@ int async_pmem_flush(struct nd_region *nd_region, struct 
> bio *bio)
>   return 0;
>  };
>  EXPORT_SYMBOL_GPL(async_pmem_flush);
> +MODULE_DESCRIPTION("Virtio Persistent Memory Driver");
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> index d3fca0ab6290..5134a8d08bf9 100644
> --- a/drivers/nvdimm/of_pmem.c
> +++ b/drivers/nvdimm/of_pmem.c
> @@ -111,5 +111,6 @@ static struct platform_driver of_pmem_region_driver = {
>  
>  module_platform_driver(of_pmem_region_driver);
>  MODULE_DEVICE_TABLE(of, of_pmem_region_match);
> +MODULE_DESCRIPTION("NVDIMM Device Tree support");
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("IBM Corporation");
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 598fe2e89bda..57cb30f8a3b8 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -768,4 +768,5 @@ static struct nd_device_driver nd_pmem_driver = {
>  module_nd_driver(nd_pmem_driver);
>  
>  MODULE_AUTHOR("Ross Zwisler ");
> +MODULE_DESCRIPTION("NVDIMM Persistent Memory Driver");
>  MODULE_LICENSE("GPL v2");
> 
> ---
> base-commit: 416ff45264d50a983c3c0b99f0da6ee59f9acd68
> change-id: 20240526-md-drivers-nvdimm-121215a4b93f
> -- 
> Jeff Johnson 
>

[PATCH] testing: nvdimm: Add MODULE_DESCRIPTION() macros

2024-06-11 Thread Ira Weiny

When building with W=1 the following errors are seen:

WARNING: modpost: missing MODULE_DESCRIPTION() in 
tools/testing/nvdimm/test/nfit_test.o
WARNING: modpost: missing MODULE_DESCRIPTION() in 
tools/testing/nvdimm/test/ndtest.o

Add the required MODULE_DESCRIPTION() to the test platform device
drivers.

Suggested-by: Jeff Johnson 
Signed-off-by: Ira Weiny 
---
Jeff I'm not seeing a patch to cover these cases for the missing module
descriptions you have been sending out.  If you have an outstanding
patch I missed could you point me to it?  Otherwise I believe this
cleans up the nvdimm tree.
---
 tools/testing/nvdimm/test/ndtest.c | 1 +
 tools/testing/nvdimm/test/nfit.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tools/testing/nvdimm/test/ndtest.c 
b/tools/testing/nvdimm/test/ndtest.c
index b438f3d053ee..892e990c034a 100644
--- a/tools/testing/nvdimm/test/ndtest.c
+++ b/tools/testing/nvdimm/test/ndtest.c
@@ -987,5 +987,6 @@ static __exit void ndtest_exit(void)
 
 module_init(ndtest_init);
 module_exit(ndtest_exit);
+MODULE_DESCRIPTION("Test non-NFIT devices");
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("IBM Corporation");
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index a61df347a33d..cfd4378e2129 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -3382,5 +3382,6 @@ static __exit void nfit_test_exit(void)
 
 module_init(nfit_test_init);
 module_exit(nfit_test_exit);
+MODULE_DESCRIPTION("Test ACPI NFIT devices");
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Intel Corporation");

---
base-commit: 2df0193e62cf887f373995fb8a91068562784adc
change-id: 20240611-nvdimm-test-mod-warn-8cf773360b37

Best regards,
-- 
Ira Weiny

Re: [PATCH] nvdimm: make nd_class constant

2024-06-12 Thread Ira Weiny

Greg Kroah-Hartman wrote:
> On Mon, Jun 10, 2024 at 10:44:42AM -0700, Dan Williams wrote:
> > Greg Kroah-Hartman wrote:
> > > Now that the driver core allows for struct class to be in read-only
> > > memory, we should make all 'class' structures declared at build time
> > > placing them into read-only memory, instead of having to be dynamically
> > > allocated at runtime.
> > 
> > Change looks good to me,
> > 
> > Reviewed-by: Dan Williams 
> > 
> > ...changelog grammar tripped me up though, how about:
> > 
> > "Now that the driver core allows for struct class to be in read-only
> > memory, it is possible to make all 'class' structures be declared at
> > build time. Move the class to a 'static const' declaration and register
> > it rather than dynamically create it."
> 
> That works too, want me to resubmit with this, or can I update it when I
> commit it to my tree?

In that case.

Reviewed-by: Ira Weiny 

> 
> thanks,
> 
> greg "the changelog is the hardest part" k-h

[GIT PULL] NVDIMM and DAX for 6.11

2024-07-19 Thread Ira Weiny

Hi Linus, please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
tags/libnvdimm-for-6.11

... to get the 6.11 updates for the libnvdimm tree.  The bulk of the changes
are to clean up W=1 warnings with one small update with the use of sizeof().

These have been linux-next without issue since 6/18.

Thank you,
Ira

---

The following changes since commit c3f38fa61af77b49866b006939479069cd451173:

  Linux 6.10-rc2 (2024-06-02 15:44:56 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git 
tags/libnvdimm-for-6.11

for you to fetch changes up to b0d478e34dbfccb7ce430e20cbe77d4d10593fa3:

  testing: nvdimm: Add MODULE_DESCRIPTION() macros (2024-06-17 18:43:08 -0500)


6.11 updates for libnvdimm

One small cleanup to use sizeof(*pointer)

A series of patches to add MODULE_DESCRIPTIONS() to eliminate make W=1
warnings.


Erick Archer (1):
  nvdimm/btt: use sizeof(*pointer) instead of sizeof(type)

Ira Weiny (1):
  testing: nvdimm: Add MODULE_DESCRIPTION() macros

Jeff Johnson (4):
  ACPI: NFIT: add missing MODULE_DESCRIPTION() macro
  nvdimm: add missing MODULE_DESCRIPTION() macros
  dax: add missing MODULE_DESCRIPTION() macros
  testing: nvdimm: iomap: add MODULE_DESCRIPTION()

 drivers/acpi/nfit/core.c   | 1 +
 drivers/dax/cxl.c  | 1 +
 drivers/dax/device.c   | 1 +
 drivers/dax/hmem/hmem.c| 1 +
 drivers/dax/kmem.c | 1 +
 drivers/dax/pmem.c | 1 +
 drivers/dax/super.c| 1 +
 drivers/nvdimm/btt.c   | 5 +++--
 drivers/nvdimm/core.c  | 1 +
 drivers/nvdimm/e820.c  | 1 +
 drivers/nvdimm/nd_virtio.c | 1 +
 drivers/nvdimm/of_pmem.c   | 1 +
 drivers/nvdimm/pmem.c  | 1 +
 tools/testing/nvdimm/test/iomap.c  | 1 +
 tools/testing/nvdimm/test/ndtest.c | 1 +
 tools/testing/nvdimm/test/nfit.c   | 1 +
 16 files changed, 18 insertions(+), 2 deletions(-)

Re: [PATCH v2 1/2] nvdimm: Fix devs leaks in scan_labels()

2024-07-26 Thread Ira Weiny

Li Zhijian wrote:
> The leakage would happend when create_namespace_pmem() meets an invalid
> label which gets failure in validating isetcookie.
> 
> Try to resuse the devs that may have already been allocated with size
> (2 * sizeof(dev)) previously.
> 
> A kmemleak reports:
> unreferenced object 0x88800dda1980 (size 16):
>   comm "kworker/u10:5", pid 69, jiffies 4294671781
>   hex dump (first 16 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace (crc 0):
> [<c5dea560>] __kmalloc+0x32c/0x470
> [<9ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 
> [libnvdimm]
> [<0e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm]
> [<7b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm]
> [<a5f3da2e>] really_probe+0xc6/0x390
> [<129e2a69>] __driver_probe_device+0x78/0x150
> [<2dfed28b>] driver_probe_device+0x1e/0x90
> [<e7048de2>] __device_attach_driver+0x85/0x110
> [<32dca295>] bus_for_each_drv+0x85/0xe0
> [<391c5a7d>] __device_attach+0xbe/0x1e0
> [<26dabec0>] bus_probe_device+0x94/0xb0
> [<c590d936>] device_add+0x656/0x870
> [<3d69bfaa>] nd_async_device_register+0xe/0x50 [libnvdimm]
> [<3f4c52a4>] async_run_entry_fn+0x2e/0x110
> [<e201f4b0>] process_one_work+0x1ee/0x600
> [<6d90d5a9>] worker_thread+0x183/0x350
> 
> Cc: Dave Jiang 
> Cc: Ira Weiny 
> Fixes: 1b40e09a1232 ("libnvdimm: blk labels and namespace instantiation")

What is the bigger effect the user will see?

Does this cause a long term user effect?  For example, if a users system
has a bad label I think this is going to be a pretty minor memory leak
which just hangs around until reboot, correct?

> Signed-off-by: Li Zhijian 
> ---
> 
> Cc: Ira Weiny 
> > From what I can tell create_namespace_pmem() must be returning EAGAIN
> > which leaves devs allocated but fails to increment count.  Thus there are
> > no valid labels but devs was not free'ed.
> 
> > Can you trace the error you are seeing a bit more to see if this is the
> > case?
>   Hi Ira, Sorry for the late reply. I have reproduced it these days.
>   Yeah, the LSA is containing a label in which the isetcookie is invalid.

NP, sometimes it takes a while to really debug something.

> 
> V2:
>   update description and comment
> ---
>  drivers/nvdimm/namespace_devs.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> index d6d558f94d6b..28c9afc01dca 100644
> --- a/drivers/nvdimm/namespace_devs.c
> +++ b/drivers/nvdimm/namespace_devs.c
> @@ -1994,7 +1994,13 @@ static struct device **scan_labels(struct nd_region 
> *nd_region)
>   /* Publish a zero-sized namespace for userspace to configure. */
>   nd_mapping_free_labels(nd_mapping);
>  
> - devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
> + /*
> +  * Try to use the devs that may have already been allocated
> +  * above first. This would happend when create_namespace_pmem()
> +  * meets an invalid label.
> +  */
> + if (!devs)
> + devs = kcalloc(2, sizeof(dev), GFP_KERNEL);

I'm still tempted to try and fix the count but I think this will work.
Let me know about the severity of the issue.

Ira

>   if (!devs)
>   goto err;
>  
> -- 
> 2.29.2
>

[GIT PULL] DAX for 6.11

2024-08-16 Thread Ira Weiny

Hi Linux, please pull from 

  https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/ 
tags/libnvdimm-fixes-6.11-rc4

To get a fix for filesystem DAX.

It has been in -next since August 12th without any reported issues.

Thanks,
Ira Weiny

---

The following changes since commit afdab700f65e14070d8ab92175544b1c62b8bf03:

  Merge tag 'bitmap-6.11-rc' of https://github.com/norov/linux (2024-08-09 
11:18:09 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/ 
tags/libnvdimm-fixes-6.11-rc4

for you to fetch changes up to d5240fa65db071909e9d1d5adcc5fd1abc8e96fe:

  nvdimm/pmem: Set dax flag for all 'PFN_MAP' cases (2024-08-09 14:29:58 -0500)


libnvdimm fixes for v6.11-rc4

Commit f467fee48da4 ("block: move the dax flag to queue_limits") broke
the DAX tests by skipping over the legacy pmem mapping pages case.

- Set the DAX flag in this case as well.


Zhihao Cheng (1):
  nvdimm/pmem: Set dax flag for all 'PFN_MAP' cases

 drivers/nvdimm/pmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Re: [GIT PULL] DAX for 6.11

2024-08-16 Thread Ira Weiny

Ira Weiny wrote:
> Hi Linux, please pull from 
 ^
 Linus.

Apologies,
Ira

> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/ 
> tags/libnvdimm-fixes-6.11-rc4
> 
> To get a fix for filesystem DAX.
> 
> It has been in -next since August 12th without any reported issues.
> 
> Thanks,
> Ira Weiny
> 
> ---
> 
> The following changes since commit afdab700f65e14070d8ab92175544b1c62b8bf03:
> 
>   Merge tag 'bitmap-6.11-rc' of https://github.com/norov/linux (2024-08-09 
> 11:18:09 -0700)
> 
> are available in the Git repository at:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/ 
> tags/libnvdimm-fixes-6.11-rc4
> 
> for you to fetch changes up to d5240fa65db071909e9d1d5adcc5fd1abc8e96fe:
> 
>   nvdimm/pmem: Set dax flag for all 'PFN_MAP' cases (2024-08-09 14:29:58 
> -0500)
> 
> 
> libnvdimm fixes for v6.11-rc4
> 
> Commit f467fee48da4 ("block: move the dax flag to queue_limits") broke
> the DAX tests by skipping over the legacy pmem mapping pages case.
> 
>   - Set the DAX flag in this case as well.
> 
> 
> Zhihao Cheng (1):
>   nvdimm/pmem: Set dax flag for all 'PFN_MAP' cases
> 
>  drivers/nvdimm/pmem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
>

Re: [PATCH] virtio_pmem: Check device status before requesting flush

2024-08-19 Thread Ira Weiny

Philip Chen wrote:
> If a pmem device is in a bad status, the driver side could wait for
> host ack forever in virtio_pmem_flush(), causing the system to hang.

I assume this was supposed to be v2 and you resent this as a proper v2
with a change list from v1?

Ira

> 
> Signed-off-by: Philip Chen 
> ---
>  drivers/nvdimm/nd_virtio.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> index 35c8fbbba10e..3b4d07aa8447 100644
> --- a/drivers/nvdimm/nd_virtio.c
> +++ b/drivers/nvdimm/nd_virtio.c
> @@ -44,6 +44,15 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
>   unsigned long flags;
>   int err, err1;
>  
> + /*
> +  * Don't bother to send the request to the device if the device is not
> +  * acticated.
> +  */
> + if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) {
> + dev_info(&vdev->dev, "virtio pmem device needs a reset\n");
> + return -EIO;
> + }
> +
>   might_sleep();
>   req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
>   if (!req_data)
> -- 
> 2.46.0.76.ge559c4bf1a-goog
>

Re: [PATCH] virtio_pmem: Check device status before requesting flush

2024-08-20 Thread Ira Weiny

Philip Chen wrote:
> On Mon, Aug 19, 2024 at 2:56 PM Ira Weiny  wrote:
> >
> > Philip Chen wrote:
> > > If a pmem device is in a bad status, the driver side could wait for
> > > host ack forever in virtio_pmem_flush(), causing the system to hang.
> >
> > I assume this was supposed to be v2 and you resent this as a proper v2
> > with a change list from v1?
> Ah...yes, I'll fix it and re-send it as a v2 patch.

Wait didn't you already do that?  Wasn't this v2?

https://lore.kernel.org/all/20240815010337.2334245-1-philipc...@chromium.org/

Ira

Re: [PATCH v3 1/2] nvdimm: Fix devs leaks in scan_labels()

2024-08-21 Thread Ira Weiny

Li Zhijian wrote:
> scan_labels() leaks memory when label scanning fails and it falls back
> to just creating a default "seed" namespace for userspace to configure.
> Root can force the kernel to leak memory.
> 
> Allocate the minimum resources unconditionally and release them when
> unneeded to avoid the memory leak.
> 
> A kmemleak reports:
> unreferenced object 0x88800dda1980 (size 16):
>   comm "kworker/u10:5", pid 69, jiffies 4294671781
>   hex dump (first 16 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace (crc 0):
> [<c5dea560>] __kmalloc+0x32c/0x470
> [<9ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 
> [libnvdimm]
> [<0e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm]
> [<7b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm]
> [<a5f3da2e>] really_probe+0xc6/0x390
> [<129e2a69>] __driver_probe_device+0x78/0x150
> [<2dfed28b>] driver_probe_device+0x1e/0x90
> [<e7048de2>] __device_attach_driver+0x85/0x110
> [<32dca295>] bus_for_each_drv+0x85/0xe0
> [<391c5a7d>] __device_attach+0xbe/0x1e0
> [<26dabec0>] bus_probe_device+0x94/0xb0
> [<c590d936>] device_add+0x656/0x870
> [<3d69bfaa>] nd_async_device_register+0xe/0x50 [libnvdimm]
> [<00003f4c52a4>] async_run_entry_fn+0x2e/0x110
> [<e201f4b0>] process_one_work+0x1ee/0x600
> [<6d90d5a9>] worker_thread+0x183/0x350
> 
> Cc: Dave Jiang 
> Cc: Ira Weiny 
> Fixes: 1b40e09a1232 ("libnvdimm: blk labels and namespace instantiation")
> Suggested-by: Dan Williams 
> Signed-off-by: Li Zhijian 
> Reviewed-by: Dan Williams 

Reviewed-by: Ira Weiny 


> ---
> V3:
>   update commit log and allocate the minimum(2 *dev) unconditionally. # Dan
> 
> V2:
>   update description and comment
> Signed-off-by: Li Zhijian 
> ---
>  drivers/nvdimm/namespace_devs.c | 34 -
>  1 file changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> index d6d558f94d6b..35d9f3cc2efa 100644
> --- a/drivers/nvdimm/namespace_devs.c
> +++ b/drivers/nvdimm/namespace_devs.c
> @@ -1937,12 +1937,16 @@ static int cmp_dpa(const void *a, const void *b)
>  static struct device **scan_labels(struct nd_region *nd_region)
>  {
>   int i, count = 0;
> - struct device *dev, **devs = NULL;
> + struct device *dev, **devs;
>   struct nd_label_ent *label_ent, *e;
>   struct nd_mapping *nd_mapping = &nd_region->mapping[0];
>   struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
>   resource_size_t map_end = nd_mapping->start + nd_mapping->size - 1;
>  
> + devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
> + if (!devs)
> + return NULL;
> +
>   /* "safe" because create_namespace_pmem() might list_move() label_ent */
>   list_for_each_entry_safe(label_ent, e, &nd_mapping->labels, list) {
>   struct nd_namespace_label *nd_label = label_ent->label;
> @@ -1961,12 +1965,14 @@ static struct device **scan_labels(struct nd_region 
> *nd_region)
>   goto err;
>   if (i < count)
>   continue;
> - __devs = kcalloc(count + 2, sizeof(dev), GFP_KERNEL);
> - if (!__devs)
> - goto err;
> - memcpy(__devs, devs, sizeof(dev) * count);
> - kfree(devs);
> - devs = __devs;
> + if (count) {
> + __devs = kcalloc(count + 2, sizeof(dev), GFP_KERNEL);
> + if (!__devs)
> + goto err;
> + memcpy(__devs, devs, sizeof(dev) * count);
> + kfree(devs);
> + devs = __devs;
> + }
>  
>   dev = create_namespace_pmem(nd_region, nd_mapping, nd_label);
>   if (IS_ERR(dev)) {
> @@ -1993,11 +1999,6 @@ static struct device **scan_labels(struct nd_region 
> *nd_region)
>  
>   /* Publish a zero-sized namespace for userspace to configure. */
>   nd_mapping_free_labels(nd_mapping);
> -
> - devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
> - if (!devs)
> - goto err;
> -
>   nspm = kzalloc(sizeof(*nspm), GFP_KERNEL);
>   if (!nspm)
>   goto err;
> @@ -2036,11 +2037,10 @@ static struct device **scan_labels(struct nd_region 
> *nd_region)
>   return devs;
>  
>   err:
> - if (devs) {
> - for (i = 0; devs[i]; i++)
> - namespace_pmem_release(devs[i]);
> - kfree(devs);
> - }
> + for (i = 0; devs[i]; i++)
> + namespace_pmem_release(devs[i]);
> + kfree(devs);
> +
>   return NULL;
>  }
>  
> -- 
> 2.29.2
>

Re: [PATCH v3 2/2] nvdimm: Remove dead code for ENODEV checking in scan_labels()

2024-08-21 Thread Ira Weiny

Li Zhijian wrote:
> The only way create_namespace_pmem() returns an ENODEV code is if
> select_pmem_id(nd_region, &uuid) returns ENODEV when its 2nd parameter
> is a null pointer. However, this is impossible because &uuid is always
> valid.
> 
> Furthermore, create_namespace_pmem() is the only user of
> select_pmem_id(), it's safe to remove the 'return -ENODEV' branch.
> 
> Signed-off-by: Li Zhijian 

Reviewed-by: Ira Weiny 

> ---
> V2:
>   new patch.
>   It's found when I'm Reviewing/tracing the return values of 
> create_namespace_pmem()
> ---
>  drivers/nvdimm/namespace_devs.c | 9 -
>  1 file changed, 9 deletions(-)
> 
> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> index 35d9f3cc2efa..55cfbf1e0a95 100644
> --- a/drivers/nvdimm/namespace_devs.c
> +++ b/drivers/nvdimm/namespace_devs.c
> @@ -1612,9 +1612,6 @@ static int select_pmem_id(struct nd_region *nd_region, 
> const uuid_t *pmem_id)
>  {
>   int i;
>  
> - if (!pmem_id)
> - return -ENODEV;
> -
>   for (i = 0; i < nd_region->ndr_mappings; i++) {
>   struct nd_mapping *nd_mapping = &nd_region->mapping[i];
>   struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
> @@ -1790,9 +1787,6 @@ static struct device *create_namespace_pmem(struct 
> nd_region *nd_region,
>   case -EINVAL:
>   dev_dbg(&nd_region->dev, "invalid label(s)\n");
>   break;
> - case -ENODEV:
> - dev_dbg(&nd_region->dev, "label not found\n");
> - break;
>   default:
>   dev_dbg(&nd_region->dev, "unexpected err: %d\n", rc);
>   break;
> @@ -1980,9 +1974,6 @@ static struct device **scan_labels(struct nd_region 
> *nd_region)
>   case -EAGAIN:
>   /* skip invalid labels */
>   continue;
> - case -ENODEV:
> - /* fallthrough to seed creation */
> - break;
>   default:
>   goto err;
>   }
> -- 
> 2.29.2
>

Re: [PATCH v2] virtio_pmem: Check device status before requesting flush

2024-08-21 Thread Ira Weiny

Philip Chen wrote:
> Hi,
> 
> On Tue, Aug 20, 2024 at 1:01 PM Dave Jiang  wrote:
> >
> >
> >
> > On 8/20/24 10:22 AM, Philip Chen wrote:
> > > If a pmem device is in a bad status, the driver side could wait for
> > > host ack forever in virtio_pmem_flush(), causing the system to hang.
> > >
> > > So add a status check in the beginning of virtio_pmem_flush() to return
> > > early if the device is not activated.
> > >
> > > Signed-off-by: Philip Chen 
> > > ---
> > >
> > > v2:
> > > - Remove change id from the patch description
> > > - Add more details to the patch description
> > >
> > >  drivers/nvdimm/nd_virtio.c | 9 +
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > > index 35c8fbbba10e..97addba06539 100644
> > > --- a/drivers/nvdimm/nd_virtio.c
> > > +++ b/drivers/nvdimm/nd_virtio.c
> > > @@ -44,6 +44,15 @@ static int virtio_pmem_flush(struct nd_region 
> > > *nd_region)
> > >   unsigned long flags;
> > >   int err, err1;
> > >
> > > + /*
> > > +  * Don't bother to submit the request to the device if the device is
> > > +  * not acticated.
> >
> > s/acticated/activated/
> 
> Thanks for the review.
> I'll fix this typo in v3.
> 
> In addition to this typo, does anyone have any other concerns?

I'm not super familiar with the virtio-pmem workings and the needs reset
flag is barely used.

Did you actually experience this hang?  How was this found?  What is the
user visible issue and how critical is it?

Thanks,
Ira

> 
> >
> > > +  */
> > > + if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) {
> > > + dev_info(&vdev->dev, "virtio pmem device needs a reset\n");
> > > + return -EIO;
> > > + }
> > > +
> > >   might_sleep();
> > >   req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
> > >   if (!req_data)

Re: [PATCH] nvdimm: Use of_property_present() and of_property_read_bool()

2024-09-04 Thread Ira Weiny

Rob Herring wrote:
> On Wed, Jul 31, 2024 at 2:14 PM Rob Herring (Arm)  wrote:
> >
> > Use of_property_present() and of_property_read_bool() to test
> > property presence and read boolean properties rather than
> > of_(find|get)_property(). This is part of a larger effort to remove
> > callers of of_find_property() and similar functions.
> > of_(find|get)_property() leak the DT struct property and data pointers
> > which is a problem for dynamically allocated nodes which may be freed.
> >
> > Signed-off-by: Rob Herring (Arm) 
> > ---
> >  drivers/nvdimm/of_pmem.c | 2 +-
> >  drivers/nvmem/layouts.c  | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> Ping

It is soaking for 6.12.

https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/log/?h=libnvdimm-for-next

Thanks,
Ira

> 
> > diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> > index 403384f25ce3..b4a1cf70e8b7 100644
> > --- a/drivers/nvdimm/of_pmem.c
> > +++ b/drivers/nvdimm/of_pmem.c
> > @@ -47,7 +47,7 @@ static int of_pmem_region_probe(struct platform_device 
> > *pdev)
> > }
> > platform_set_drvdata(pdev, priv);
> >
> > -   is_volatile = !!of_find_property(np, "volatile", NULL);
> > +   is_volatile = of_property_read_bool(np, "volatile");
> > dev_dbg(&pdev->dev, "Registering %s regions from %pOF\n",
> > is_volatile ? "volatile" : "non-volatile",  np);
> >
> > diff --git a/drivers/nvmem/layouts.c b/drivers/nvmem/layouts.c
> > index 77a4119efea8..65d39e19f6ec 100644
> > --- a/drivers/nvmem/layouts.c
> > +++ b/drivers/nvmem/layouts.c
> > @@ -123,7 +123,7 @@ static int nvmem_layout_bus_populate(struct 
> > nvmem_device *nvmem,
> > int ret;
> >
> > /* Make sure it has a compatible property */
> > -   if (!of_get_property(layout_dn, "compatible", NULL)) {
> > +   if (!of_property_present(layout_dn, "compatible")) {
> > pr_debug("%s() - skipping %pOF, no compatible prop\n",
> >  __func__, layout_dn);
> > return 0;
> > --
> > 2.43.0
> >

Re: [PATCH] virtio_pmem: Add freeze/restore callbacks

2024-09-04 Thread Ira Weiny

Philip Chen wrote:
> Hi maintainers,
> 
> Can anyone let me know if this patch makes sense?
> Any comment/feedback is appreciated.
> Thanks in advance!

I'm not an expert on virtio but the code looks ok on the surface.  I've
discussed this with Dan a bit and virtio-pmem is not heavily tested.

Based on our discussion [1] I wonder if there is a way we can recreate this
with QEMU to incorporate into our testing?

Ira

[1] 
https://lore.kernel.org/lkml/ca+cxxhnb2i5o7_biofklth5zwp5t62d6f6c39vnut53cuku...@mail.gmail.com/

> 
> On Wed, Aug 14, 2024 at 5:46 PM Philip Chen  wrote:
> >
> > Add basic freeze/restore PM callbacks to support hibernation (S4):
> > - On freeze, delete vq and quiesce the device to prepare for
> >   snapshotting.
> > - On restore, re-init vq and mark DRIVER_OK.
> >
> > Signed-off-by: Philip Chen 
> > ---
> >  drivers/nvdimm/virtio_pmem.c | 24 
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > index c9b97aeabf85..2396d19ce549 100644
> > --- a/drivers/nvdimm/virtio_pmem.c
> > +++ b/drivers/nvdimm/virtio_pmem.c
> > @@ -143,6 +143,28 @@ static void virtio_pmem_remove(struct virtio_device 
> > *vdev)
> > virtio_reset_device(vdev);
> >  }
> >
> > +static int virtio_pmem_freeze(struct virtio_device *vdev)
> > +{
> > +   vdev->config->del_vqs(vdev);
> > +   virtio_reset_device(vdev);
> > +
> > +   return 0;
> > +}
> > +
> > +static int virtio_pmem_restore(struct virtio_device *vdev)
> > +{
> > +   int ret;
> > +
> > +   ret = init_vq(vdev->priv);
> > +   if (ret) {
> > +   dev_err(&vdev->dev, "failed to initialize virtio pmem's 
> > vq\n");
> > +   return ret;
> > +   }
> > +   virtio_device_ready(vdev);
> > +
> > +   return 0;
> > +}
> > +
> >  static unsigned int features[] = {
> > VIRTIO_PMEM_F_SHMEM_REGION,
> >  };
> > @@ -155,6 +177,8 @@ static struct virtio_driver virtio_pmem_driver = {
> > .validate   = virtio_pmem_validate,
> > .probe  = virtio_pmem_probe,
> > .remove = virtio_pmem_remove,
> > +   .freeze = virtio_pmem_freeze,
> > +   .restore= virtio_pmem_restore,
> >  };
> >
> >  module_virtio_driver(virtio_pmem_driver);
> > --
> > 2.46.0.76.ge559c4bf1a-goog
> >

Re: [PATCH] nvdimm/pmem: Fix an error handling path in 'pmem_attach_disk()'

2021-11-07 Thread Ira Weiny

On Sat, Nov 06, 2021 at 06:27:11PM +0100, Christophe JAILLET wrote:
> If 'devm_init_badblocks()' fails, a previous 'blk_alloc_disk()' call must
> be undone.

I think this is a problem...

> 
> Signed-off-by: Christophe JAILLET 
> ---
> This patch is speculative. Several fixes on error handling paths have been
> done recently, but this one has been left as-is. There was maybe a good
> reason that I have missed for that. So review with care!
> 
> I've not been able to identify a Fixes tag that please me :(
> ---
>  drivers/nvdimm/pmem.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index fe7ece1534e1..c37a1e6750b3 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -490,8 +490,9 @@ static int pmem_attach_disk(struct device *dev,
>   nvdimm_namespace_disk_name(ndns, disk->disk_name);
>   set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset)
>   / 512);
> - if (devm_init_badblocks(dev, &pmem->bb))
> - return -ENOMEM;
> + rc = devm_init_badblocks(dev, &pmem->bb);
> + if (rc)
> + goto out;

But I don't see this 'out' label in the function currently?  Was that part of
your patch missing?

Ira

>   nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_range);
>   disk->bb = &pmem->bb;
>  
> -- 
> 2.30.2
>

Re: [PATCH] nvdimm/pmem: Fix an error handling path in 'pmem_attach_disk()'

2021-11-08 Thread Ira Weiny

On Sun, Nov 07, 2021 at 06:20:14PM +0100, Christophe JAILLET wrote:
> Le 07/11/2021 à 18:11, Ira Weiny a écrit :
> > On Sat, Nov 06, 2021 at 06:27:11PM +0100, Christophe JAILLET wrote:
> > > If 'devm_init_badblocks()' fails, a previous 'blk_alloc_disk()' call must
> > > be undone.
> > 
> > I think this is a problem...
> > 
> > > 
> > > Signed-off-by: Christophe JAILLET 
> > > ---
> > > This patch is speculative. Several fixes on error handling paths have been
> > > done recently, but this one has been left as-is. There was maybe a good
> > > reason that I have missed for that. So review with care!
> > > 
> > > I've not been able to identify a Fixes tag that please me :(
> > > ---
> > >   drivers/nvdimm/pmem.c | 5 +++--
> > >   1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> > > index fe7ece1534e1..c37a1e6750b3 100644
> > > --- a/drivers/nvdimm/pmem.c
> > > +++ b/drivers/nvdimm/pmem.c
> > > @@ -490,8 +490,9 @@ static int pmem_attach_disk(struct device *dev,
> > >   nvdimm_namespace_disk_name(ndns, disk->disk_name);
> > >   set_capacity(disk, (pmem->size - pmem->pfn_pad - 
> > > pmem->data_offset)
> > >   / 512);
> > > - if (devm_init_badblocks(dev, &pmem->bb))
> > > - return -ENOMEM;
> > > + rc = devm_init_badblocks(dev, &pmem->bb);
> > > + if (rc)
> > > + goto out;
> > 
> > But I don't see this 'out' label in the function currently?  Was that part 
> > of
> > your patch missing?
> 
> Hi,
> the patch is based on the latest linux-next.
> See [1]. The 'out' label exists there and is already used.

Ah I see.  Sorry.

> 
> In fact, I run an own-made coccinelle script which tries to spot mix-up
> between return and goto.
> In this case, we have a 'return -ENOMEM' after a 'goto out' which looks
> spurious.
>

Yes agreed.  I was looking at Linus' kernel not linux-next sorry.

Sorry for the noise,
Ira

> Hence, my patch.
> 
> [1]:https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/nvdimm/pmem.c#n512
> 
> CJ
> 
> > 
> > Ira
> > 
> > >   nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_range);
> > >   disk->bb = &pmem->bb;
> > > -- 
> > > 2.30.2
> > > 
> > 
>

Re: [PATCH V7 08/18] x86/entry: Preserve PKRS MSR across exceptions

2021-11-12 Thread Ira Weiny

On Tue, Aug 03, 2021 at 09:32:21PM -0700, 'Ira Weiny' wrote:
> From: Ira Weiny 
> 
> The PKRS MSR is not managed by XSAVE.  It is preserved through a context
> switch but this support leaves exception handling code open to memory
> accesses during exceptions.
> 
> 2 possible places for preserving this state were considered,
> irqentry_state_t or pt_regs.[1]  pt_regs was much more complicated and
> was potentially fraught with unintended consequences.[2]  However, Andy
> came up with a way to hide additional values on the stack which could be
> accessed as "extended_pt_regs".[3]

Andy,

I'm preparing to send V8 of this PKS work.  But I have not seen any feed back
since I originally implemented this in V4[1].

Does this meets your expectations?  Are there any issues you can see with this
code?

I would appreciate your feedback.

Thanks,
Ira

[1] https://lore.kernel.org/lkml/20210322053020.2287058-9-ira.we...@intel.com/

> This method allows for; any place
> which has struct pt_regs can get access to the extra information; no
> extra information is added to irq_state; and pt_regs is left intact for
> compatibility with outside tools like BPF.
> 
> To simplify, the assembly code only adds space on the stack.  The
> setting or use of any needed values are left to the C code.  While some
> entry points may not use this space it is still added where ever pt_regs
> is passed to the C code for consistency.
> 
> Each nested exception gets another copy of this extended space allowing
> for any number of levels of exception handling.
> 
> In the assembly, a macro is defined to allow a central place to add
> space for other uses should the need arise.
> 
> Finally export pkrs_{save|restore}_irq to the common code to allow
> it to preserve the current task's PKRS in the new extended pt_regs if
> enabled.
> 
> Peter, Thomas, Andy, Dave, and Dan all suggested parts of the patch or
> aided in the development of the patch..
> 
> [1] 
> https://lore.kernel.org/lkml/calcetrve1i5jdyzd_bcctxqjn+ze3t38efpgjxn1f577m36...@mail.gmail.com/
> [2] https://lore.kernel.org/lkml/874kpxx4jf@nanos.tec.linutronix.de/#t
> [3] 
> https://lore.kernel.org/lkml/CALCETrUHwZPic89oExMMe-WyDY8-O3W68NcZvse3=PGW+iW5=w...@mail.gmail.com/
> 
> Cc: Dave Hansen 
> Cc: Dan Williams 
> Suggested-by: Dave Hansen 
> Suggested-by: Dan Williams 
> Suggested-by: Peter Zijlstra 
> Suggested-by: Thomas Gleixner 
> Suggested-by: Andy Lutomirski 
> Signed-off-by: Ira Weiny 
> 
> ---
> Changes for V7:
>   Rebased to 5.14 entry code
>   declare write_pkrs() in pks.h
>   s/INIT_PKRS_VALUE/pkrs_init_value
>   Remove unnecessary INIT_PKRS_VALUE def
>   s/pkrs_save_set_irq/pkrs_save_irq/
>   The inital value for exceptions is best managed
>   completely within the pkey code.
> ---
>  arch/x86/entry/calling.h   | 26 +
>  arch/x86/entry/common.c| 54 ++
>  arch/x86/entry/entry_64.S  | 22 ++-
>  arch/x86/entry/entry_64_compat.S   |  6 +--
>  arch/x86/include/asm/pks.h | 18 +
>  arch/x86/include/asm/processor-flags.h |  2 +
>  arch/x86/kernel/head_64.S  |  7 ++--
>  arch/x86/mm/fault.c|  3 ++
>  include/linux/pkeys.h  | 11 +-
>  kernel/entry/common.c  | 14 ++-
>  10 files changed, 143 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index a4c061fb7c6e..a2f94677c3fd 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -63,6 +63,32 @@ For 32-bit we have the following conventions - kernel is 
> built with
>   * for assembly code:
>   */
>  
> +/*
> + * __call_ext_ptregs - Helper macro to call into C with extended pt_regs
> + * @cfunc:   C function to be called
> + *
> + * This will ensure that extended_ptregs is added and removed as needed 
> during
> + * a call into C code.
> + */
> +.macro __call_ext_ptregs cfunc annotate_retpoline_safe:req
> +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> + /* add space for extended_pt_regs */
> + subq$EXTENDED_PT_REGS_SIZE, %rsp
> +#endif
> + .if \annotate_retpoline_safe == 1
> + ANNOTATE_RETPOLINE_SAFE
> + .endif
> + call\cfunc
> +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> + /* remove space for extended_pt_regs */
> + addq$EXTENDED_PT_REGS_SIZE, %rsp
> +#endif
> +.endm
> +
> +.macro call_ext_ptregs cfunc
> + __call_ext_ptregs \cfunc, annotate_retpoline_safe=0
> +.endm
> +
>  .macro PUSH_REGS rdx=%rdx rax=%rax save_ret=0

Re: [PATCH V7 08/18] x86/entry: Preserve PKRS MSR across exceptions

2021-12-06 Thread Ira Weiny

Thomas,

Thanks for the review.  Sorry for being so late to respond I was sick all last
week and so it took me longer to figure out some of this stuff.

On Thu, Nov 25, 2021 at 03:12:47PM +0100, Thomas Gleixner wrote:
> Ira,
> 
> On Tue, Aug 03 2021 at 21:32, ira weiny wrote:
> > +/*
> > + * __call_ext_ptregs - Helper macro to call into C with extended pt_regs
> > + * @cfunc: C function to be called
> > + *
> > + * This will ensure that extended_ptregs is added and removed as needed 
> > during
> > + * a call into C code.
> > + */
> > +.macro __call_ext_ptregs cfunc annotate_retpoline_safe:req
> > +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> > +   /* add space for extended_pt_regs */
> > +   subq$EXTENDED_PT_REGS_SIZE, %rsp
> > +#endif
> > +   .if \annotate_retpoline_safe == 1
> > +   ANNOTATE_RETPOLINE_SAFE
> > +   .endif
> 
> This annotation is new and nowhere mentioned why it is part of this
> patch.

I don't understand.  ANNOTATE_RETPOLINE_SAFE has been around since:

9e0e3c5130e9 x86/speculation, objtool: Annotate indirect calls/jumps for objtool

> 
> Can you please do _ONE_ functional change per patch and not a
> unreviewable pile of changes in one go? Same applies for the ASM and the
> C code changes. The ASM change has to go first and then the C code can
> build upon it.

I'm sorry for having the ASM and C code together but this all seemed like 1
change to me.

I can split it if you prefer.  How about a patch with just the x86 extended
pt_regs stuff but that would leave a zero size for the extended stuff?  Then
followed by the pks bits?

> 
> > +   call\cfunc
> > +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> > +   /* remove space for extended_pt_regs */
> > +   addq$EXTENDED_PT_REGS_SIZE, %rsp
> > +#endif
> 
> I really have to ask the question whether this #ifdeffery has any value
> at all. 8 bytes extra stack usage is not going to be the end of the
> world and distro kernels will enable that config anyway.

My goal with this has always been 0 overhead if turned off.  So this seemed
like a logical addition.  Furthermore, ARCH_ENABLE_SUPERVISOR_PKEYS is
predicated on ARCH_HAS_SUPERVISOR_PKEYS which is only available with x86_64.
This removes the space for x86 when not needed.

All the config stuff was introduced in patch 04/18.[0]

> 
> If we really want to save the space then certainly not by sprinkling
> CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS all over the place and hiding the
> extra sized ptregs in the pkrs header.
> 
> You are changing generic architecture code so you better think about
> making such a change generic and extensible.

I agree.  And I tried to do so.  The generic entry code is modified only by the
addition of pkrs_[save|restore]_irq().  These are only defined if the arch
defines ARCH_HAS_SUPERVISOR_PKEYS and furthermore, if something triggers
enabling ARCH_ENABLE_SUPERVISOR_PKEYS.

ARCH_HAS_SUPERVISOR_PKEYS is restricted to x86_64 at the moment.  All other
arch's including x86 should not see any changes in the generic code.

I thought we had agreed that it was ok for me to restrict the addition of the
extended pt_regs to what was required for PKS when these changes were
discussed.  Because at the time I was concerned about my lack of knowledge of
all the other architectures.[1]

>
> Can folks please start
> thinking beyond the brim of their teacup and not pretend that the
> feature they are working on is the unicorn which requires unique magic
> brandnamed after the unicorn of the day.
> 
> If the next feature comes around which needs to save something in that
> extended area then we are going to change the world again, right?

I'm not sure what you mean by 'change the world'.  I would anticipate the entry
code to be modified with something similar to pks_[save|restore]_irq() and let
the arch deal with the specifics.

Also in [1] I thought Peter and Andy agreed that placing additional generic
state in the extended pt_regs was not needed and does not buy us anything.  I
specifically asked if that was something we wanted to do in.[2]

> Certainly not.
> 
> This wants to go into asm/ptrace.h:
> 
> struct pt_regs_aux {
>   u32 pkrs;
> };
> 
> struct pt_regs_extended {
>   struct pt_regs_aux  aux;
> struct pt_regsregs __attribute__((aligned(8)));
> };

Ok the aligned attribute does what I was doing much more gracefully.  This is a
good idea yes, thank you.

> 
> and then have in asm-offset:
> 
>DEFINE(PT_REGS_AUX_SIZE, sizeof(struct pt_regs_extended) - sizeof(struct 
> pt_regs));
> 
> which does the right thing whatever the size of pt_regs_aux is. So for
> the above it will have:
> 
>  #define PT_REGS_

Re: [PATCH V7 08/18] x86/entry: Preserve PKRS MSR across exceptions

2021-12-06 Thread Ira Weiny

On Mon, Dec 06, 2021 at 05:54:23PM -0800, 'Ira Weiny' wrote:

[snip]

> > 
> > Though, if you look at the xen_pv_evtchn_do_upcall() part where you
> > added this extra invocation you might figure out that adding
> > pkrs_restore_irq() to irqentry_exit_cond_resched() and explicitely to
> > the 'else' path in irqentry_exit() makes it magically consistent for
> > both use cases.
> > 
> 
> Thank you, yes good catch.  However, I think I need at least 1 more call in 
> the
> !regs_irqs_disabled() && state.exit_rcu case right?
> 
> 11:29:48 > git di
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 717091910ebc..667676ebc129 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -363,8 +363,6 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct 
> pt_regs *regs)
>  
> inhcall = get_and_clear_inhcall();
> if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) {
> -   /* Normally called by irqentry_exit, restore pkrs here */
> -   pkrs_restore_irq(regs);
> irqentry_exit_cond_resched();
> instrumentation_end();
> restore_inhcall(inhcall);
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index 1c0a70a17e93..60ae3d4c9cc0 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -385,6 +385,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs 
> *regs)
>  
>  void irqentry_exit_cond_resched(void)

Opps...  Of course regs will need to be passed in here now...

Ira

>  {
> +   pkrs_restore_irq(regs);
> if (!preempt_count()) {
> /* Sanity check RCU and thread stack */
> rcu_irq_exit_check_preempt();
> @@ -408,8 +409,6 @@ noinstr void irqentry_exit(struct pt_regs *regs, 
> irqentry_state_t state)
> return;
> }
>  
> -   pkrs_restore_irq(regs);
> -
> if (!regs_irqs_disabled(regs)) {
> /*
>  * If RCU was not watching on entry this needs to be done
> @@ -421,6 +420,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, 
> irqentry_state_t state)
> /* Tell the tracer that IRET will enable interrupts */
> trace_hardirqs_on_prepare();
> lockdep_hardirqs_on_prepare(CALLER_ADDR0);
> +   pkrs_restore_irq(regs);
> instrumentation_end();
> rcu_irq_exit();
> lockdep_hardirqs_on(CALLER_ADDR0);
> @@ -439,6 +439,10 @@ noinstr void irqentry_exit(struct pt_regs *regs, 
> irqentry_state_t state)
> trace_hardirqs_on();
> instrumentation_end();
> } else {
> +   instrumentation_begin();
> +   pkrs_restore_irq(regs);
> +   instrumentation_end();
> +
> /*
>  * IRQ flags state is correct already. Just tell RCU if it
>  * was not watching on entry.
> @@ -470,8 +474,8 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct 
> pt_regs *regs)
>  
>  void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t 
> irq_state)
>  {
> -   pkrs_restore_irq(regs);
> instrumentation_begin();
> +   pkrs_restore_irq(regs);
> ftrace_nmi_exit();
> if (irq_state.lockdep) {
> trace_hardirqs_on_prepare();
> 
> 
> Thank you again for the review,
> Ira
> 
> 
> [0] https://lore.kernel.org/lkml/20210804043231.2655537-5-ira.we...@intel.com/
> [1] 
> https://lore.kernel.org/lkml/20201217131924.gw3...@hirez.programming.kicks-ass.net/
> [2] 
> https://lore.kernel.org/lkml/20201216013202.gy1563...@iweiny-desk2.sc.intel.com/
> [3] https://lore.kernel.org/lkml/87y2hwqwng@nanos.tec.linutronix.de/
> 
> > Thanks,
> > 
> > tglx

Re: [PATCH V7 03/18] x86/pks: Add additional PKEY helper macros

2021-12-07 Thread Ira Weiny

On Thu, Nov 25, 2021 at 03:25:09PM +0100, Thomas Gleixner wrote:
> On Tue, Aug 03 2021 at 21:32, ira weiny wrote:
> > @@ -200,16 +200,14 @@ __setup("init_pkru=", setup_init_pkru);
> >   */
> >  u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags)
> >  {
> > -   int pkey_shift = pkey * PKR_BITS_PER_PKEY;
> > -
> > /*  Mask out old bit values */
> > -   pk_reg &= ~(((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift);
> > +   pk_reg &= ~PKR_PKEY_MASK(pkey);
> >  
> > /*  Or in new values */
> > if (flags & PKEY_DISABLE_ACCESS)
> > -   pk_reg |= PKR_AD_BIT << pkey_shift;
> > +   pk_reg |= PKR_AD_KEY(pkey);
> > if (flags & PKEY_DISABLE_WRITE)
> > -   pk_reg |= PKR_WD_BIT << pkey_shift;
> > +   pk_reg |= PKR_WD_KEY(pkey);
> 
> I'm not seeing how this is improving that code. Quite the contrary.

Fair enough.  Even more so when using the code you suggested for 
pkey_update_pkval().

In that case it boils down to:

diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index eb6d6b872652..b7127329d115 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -198,7 +198,7 @@ __setup("init_pkru=", setup_init_pkru);
  */
 u32 pkey_update_pkval(u32 pkval, int pkey, u32 accessbits)
 {
-int shift = pkey * PKR_BITS_PER_PKEY;
+int shift = PKR_PKEY_SHIFT(pkey);

 if (WARN_ON_ONCE(accessbits & ~PKEY_ACCESS_MASK))
 accessbits &= PKEY_ACCESS_MASK;

Better?

As to the reason of why to put this patch after the other one.  Why would I
improve the old pre-refactoring code only to throw it away when moving it to
pkey_update_pkval()?  This reasoning is even stronger when pkey_update_pkval()
is implemented.

I agree with Dan regarding the macros though.  I think they make it easier to
see what is going on without dealing with masks and shifts directly.  But I can
remove this patch if you feel that strongly about it.

Ira

> 
> Thanks,
> 
> tglx

Re: [PATCH -next] nvdimm/btt: Fix btt_init() kernel-doc comment

2021-12-30 Thread Ira Weiny

On Thu, Dec 30, 2021 at 05:25:20PM +0800, Yang Li wrote:
> Add the description of @nd_region and remove @maxlane in
> btt_init() kernel-doc comment to remove warnings found
> by running scripts/kernel-doc, which is caused by using 'make W=1'.
> drivers/nvdimm/btt.c:1584: warning: Function parameter or member
> 'nd_region' not described in 'btt_init'
> drivers/nvdimm/btt.c:1584: warning: Excess function parameter 'maxlane'
> description in 'btt_init'
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Yang Li 

Reviewed-by: Ira Weiny 

> ---
>  drivers/nvdimm/btt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index da3f007a1211..293b8c107817 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -1567,7 +1567,7 @@ static void btt_blk_cleanup(struct btt *btt)
>   * @rawsize: raw size in bytes of the backing device
>   * @lbasize: lba size of the backing device
>   * @uuid:A uuid for the backing device - this is stored on media
> - * @maxlane: maximum number of parallel requests the device can handle
> + * @nd_region:  region id and number of lanes possible
>   *
>   * Initialize a Block Translation Table on a backing device to provide
>   * single sector power fail atomicity.
> -- 
> 2.20.1.7.g153144c
>

Re: [PATCH] dax: make sure inodes are flushed before destroy cache

2022-02-14 Thread Ira Weiny

On Fri, Feb 11, 2022 at 11:11:11PM -0800, Tong Zhang wrote:
> A bug can be triggered by following command
> 
> $ modprobe nd_pmem && modprobe -r nd_pmem
> 
> [   10.060014] BUG dax_cache (Not tainted): Objects remaining in dax_cache on 
> __kmem_cache_shutdown()
> [   10.060938] Slab 0x85b729ac objects=9 used=1 fp=0x4f5ae469 
> flags=0x2010200(slab|head|node)
> [   10.062433] Call Trace:
> [   10.062673]  dump_stack_lvl+0x34/0x44
> [   10.062865]  slab_err+0x90/0xd0
> [   10.063619]  __kmem_cache_shutdown+0x13b/0x2f0
> [   10.063848]  kmem_cache_destroy+0x4a/0x110
> [   10.064058]  __x64_sys_delete_module+0x265/0x300
> 
> This is caused by dax_fs_exit() not flushing inodes before destroy cache.
> To fix this issue, call rcu_barrier() before destroy cache.

I don't doubt that this fixes the bug.  However, I can't help but think this is
hiding a bug, or perhaps a missing step, in the kmem_cache layer?  As far as I
can see dax does not call call_rcu() and only uses srcu not rcu?  I was tempted
to suggest srcu_barrier() but dax does not call call_srcu() either.

So I'm not clear about what is really going on and why this fixes it.  I know
that dax is not using srcu is a standard way so perhaps this helps in a way I
don't quite grok?  If so perhaps a comment here would be in order?

Ira

> 
> Signed-off-by: Tong Zhang 
> ---
>  drivers/dax/super.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index e3029389d809..6bd565fe2e63 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -476,6 +476,7 @@ static int dax_fs_init(void)
>  static void dax_fs_exit(void)
>  {
>   kern_unmount(dax_mnt);
> + rcu_barrier();
>   kmem_cache_destroy(dax_cache);
>  }
>  
> -- 
> 2.25.1
> 
>

Re: [PATCH] dax: make sure inodes are flushed before destroy cache

2022-02-14 Thread Ira Weiny

On Mon, Feb 14, 2022 at 12:09:54PM -0800, Dan Williams wrote:
> On Mon, Feb 14, 2022 at 9:59 AM Ira Weiny  wrote:
> >
> > On Fri, Feb 11, 2022 at 11:11:11PM -0800, Tong Zhang wrote:
> > > A bug can be triggered by following command
> > >
> > > $ modprobe nd_pmem && modprobe -r nd_pmem
> > >
> > > [   10.060014] BUG dax_cache (Not tainted): Objects remaining in 
> > > dax_cache on __kmem_cache_shutdown()
> > > [   10.060938] Slab 0x85b729ac objects=9 used=1 
> > > fp=0x4f5ae469 flags=0x2010200(slab|head|node)
> > > [   10.062433] Call Trace:
> > > [   10.062673]  dump_stack_lvl+0x34/0x44
> > > [   10.062865]  slab_err+0x90/0xd0
> > > [   10.063619]  __kmem_cache_shutdown+0x13b/0x2f0
> > > [   10.063848]  kmem_cache_destroy+0x4a/0x110
> > > [   10.064058]  __x64_sys_delete_module+0x265/0x300
> > >
> > > This is caused by dax_fs_exit() not flushing inodes before destroy cache.
> > > To fix this issue, call rcu_barrier() before destroy cache.
> >
> > I don't doubt that this fixes the bug.  However, I can't help but think 
> > this is
> > hiding a bug, or perhaps a missing step, in the kmem_cache layer?  As far 
> > as I
> > can see dax does not call call_rcu() and only uses srcu not rcu?  I was 
> > tempted
> > to suggest srcu_barrier() but dax does not call call_srcu() either.
> 
> This rcu_barrier() is associated with the call_rcu() in destroy_inode().

Ok yea.

> 
> While kern_unmount() does a full sycnrhonize_rcu() after clearing
> ->mnt_ns. Any pending destroy_inode() callbacks need to be flushed
> before the kmem_cache is destroyed.
> 
> > So I'm not clear about what is really going on and why this fixes it.  I 
> > know
> > that dax is not using srcu is a standard way so perhaps this helps in a way 
> > I
> > don't quite grok?  If so perhaps a comment here would be in order?
> 
> Looks like a common pattern I missed that all filesystem exit paths implement.

I think a comment would be in order, especially since since it looks like every
other FS has one:

fs/ext4/super.c:

...
/*
 * Make sure all delayed rcu free inodes are flushed before we
 * destroy cache.
 */
rcu_barrier();  
...

Anyway ok.

Reviewed-by: Ira Weiny 

Thanks for looking Dan,
Ira

[PATCH] fs/dax: Fix run_dax() missing prototype

2022-03-04 Thread ira . weiny

From: Ira Weiny 

The function run_dax() was missing a prototype when compiling with
warnings.

Add bus.h to fix this.

Signed-off-by: Ira Weiny 
---
 drivers/dax/super.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e3029389d809..5c003cc73d04 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include "dax-private.h"
+#include "bus.h"
 
 /**
  * struct dax_device - anchor object for dax services
-- 
2.35.1

[PATCH] fs/dax: Fix missing kdoc for dax_device

2022-03-04 Thread ira . weiny

From: Ira Weiny 

struct dax_device has a member named ops which was undocumented.

Add the kdoc.

Signed-off-by: Ira Weiny 
---
 drivers/dax/super.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 5c003cc73d04..2fd3a01ba34b 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -22,6 +22,7 @@
  * @cdev: optional character interface for "device dax"
  * @private: dax driver private data
  * @flags: state and boolean properties
+ * @ops: operations for this device
  */
 struct dax_device {
struct inode inode;
-- 
2.35.1

Re: [PATCH] fs/dax: Fix run_dax() missing prototype

2022-03-10 Thread Ira Weiny

On Wed, Mar 09, 2022 at 09:08:36PM -0800, Dan Williams wrote:
> On Fri, Mar 4, 2022 at 12:38 PM  wrote:
> >
> > From: Ira Weiny 
> >
> > The function run_dax() was missing a prototype when compiling with
> > warnings.
> >
> > Add bus.h to fix this.
> >
> 
> Always include the warning and the compiler in the changelog.

Sorry.

> I
> suspect you hit this with LLVM and not gcc?

No this was with gcc.

gcc -Wp,-MMD,drivers/dax/.super.o.d -nostdinc -I./arch/x86/include
...
  -D__KBUILD_MODNAME=kmod_dax -c -o drivers/dax/super.o drivers/dax/super.c  ;
  ./tools/objtool/objtool orc generate   --no-fp   --retpoline  --uaccess 
drivers/dax/super.o
drivers/dax/super.c:276:6: warning: no previous prototype for ‘run_dax’ 
[-Wmissing-prototypes]
276 | void run_dax(struct dax_device *dax_dev)
  |  ^~~

> 
> super.c has no business including bus.h. If the bots are tripping over
> this a better fix is to move it into dax-private.h.

It was not a bot just me using W=1.

I can ignore it or move the prototype to dax-private.h.

Ira

Re: [PATCH v3 1/8] cxl: Replace lockdep_mutex with local lock classes

2022-04-22 Thread Ira Weiny

On Thu, Apr 21, 2022 at 08:33:13AM -0700, Dan Williams wrote:
> In response to an attempt to expand dev->lockdep_mutex for device_lock()
> validation [1], Peter points out [2] that the lockdep API already has
> the ability to assign a dedicated lock class per subsystem device-type.
> 
> Use lockdep_set_class() to override the default device_lock()
> '__lockdep_no_validate__' class for each CXL subsystem device-type. This
> enables lockdep to detect deadlocks and recursive locking within the
> device-driver core and the subsystem. The
> lockdep_set_class_and_subclass() API is used for port objects that
> recursively lock the 'cxl_port_key' class by hierarchical topology
> depth.
> 
> Link: 
> https://lore.kernel.org/r/164982968798.684294.15817853329823976469.st...@dwillia2-desk3.amr.corp.intel.com
>  [1]
> Link: 
> https://lore.kernel.org/r/ylf0dewci8myl...@hirez.programming.kicks-ass.net [2]
> Suggested-by: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Will Deacon 
> Cc: Waiman Long 
> Cc: Boqun Feng 
> Cc: Alison Schofield 
> Cc: Vishal Verma 
> Cc: Ira Weiny 
> Cc: Ben Widawsky 
> Cc: Jonathan Cameron 
> Signed-off-by: Dan Williams 

Reviewed-by: Ira Weiny 

> ---
>  drivers/cxl/core/memdev.c |3 +++
>  drivers/cxl/core/pmem.c   |6 ++
>  drivers/cxl/core/port.c   |   13 +
>  3 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 1f76b28f9826..f7cdcd33504a 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -228,6 +228,8 @@ static void detach_memdev(struct work_struct *work)
>   put_device(&cxlmd->dev);
>  }
>  
> +static struct lock_class_key cxl_memdev_key;
> +
>  static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
>  const struct file_operations *fops)
>  {
> @@ -247,6 +249,7 @@ static struct cxl_memdev *cxl_memdev_alloc(struct 
> cxl_dev_state *cxlds,
>  
>   dev = &cxlmd->dev;
>   device_initialize(dev);
> + lockdep_set_class(&dev->mutex, &cxl_memdev_key);
>   dev->parent = cxlds->dev;
>   dev->bus = &cxl_bus_type;
>   dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
> diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c
> index 8de240c4d96b..e825e261278d 100644
> --- a/drivers/cxl/core/pmem.c
> +++ b/drivers/cxl/core/pmem.c
> @@ -80,6 +80,8 @@ struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct 
> cxl_nvdimm *cxl_nvd)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_find_nvdimm_bridge, CXL);
>  
> +static struct lock_class_key cxl_nvdimm_bridge_key;
> +
>  static struct cxl_nvdimm_bridge *cxl_nvdimm_bridge_alloc(struct cxl_port 
> *port)
>  {
>   struct cxl_nvdimm_bridge *cxl_nvb;
> @@ -99,6 +101,7 @@ static struct cxl_nvdimm_bridge 
> *cxl_nvdimm_bridge_alloc(struct cxl_port *port)
>   cxl_nvb->port = port;
>   cxl_nvb->state = CXL_NVB_NEW;
>   device_initialize(dev);
> + lockdep_set_class(&dev->mutex, &cxl_nvdimm_bridge_key);
>   device_set_pm_not_required(dev);
>   dev->parent = &port->dev;
>   dev->bus = &cxl_bus_type;
> @@ -214,6 +217,8 @@ struct cxl_nvdimm *to_cxl_nvdimm(struct device *dev)
>  }
>  EXPORT_SYMBOL_NS_GPL(to_cxl_nvdimm, CXL);
>  
> +static struct lock_class_key cxl_nvdimm_key;
> +
>  static struct cxl_nvdimm *cxl_nvdimm_alloc(struct cxl_memdev *cxlmd)
>  {
>   struct cxl_nvdimm *cxl_nvd;
> @@ -226,6 +231,7 @@ static struct cxl_nvdimm *cxl_nvdimm_alloc(struct 
> cxl_memdev *cxlmd)
>   dev = &cxl_nvd->dev;
>   cxl_nvd->cxlmd = cxlmd;
>   device_initialize(dev);
> + lockdep_set_class(&dev->mutex, &cxl_nvdimm_key);
>   device_set_pm_not_required(dev);
>   dev->parent = &cxlmd->dev;
>   dev->bus = &cxl_bus_type;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 2ab1ba4499b3..750aac95ed5f 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -391,6 +391,8 @@ static int devm_cxl_link_uport(struct device *host, 
> struct cxl_port *port)
>   return devm_add_action_or_reset(host, cxl_unlink_uport, port);
>  }
>  
> +static struct lock_class_key cxl_port_key;
> +
>  static struct cxl_port *cxl_port_alloc(struct device *uport,
>  resource_size_t component_reg_phys,
>  struct cxl_port *parent_port)
> @@ -415,9 +417,10 @@ static struct cxl_port *cxl_port_alloc(struct device 
> *uport,
>* description.
>*/
>

Re: [PATCH v3 2/8] cxl/acpi: Add root device lockdep validation

2022-04-22 Thread Ira Weiny

On Thu, Apr 21, 2022 at 08:33:18AM -0700, Dan Williams wrote:
> The CXL "root" device, ACPI0017, is an attach point for coordinating
> platform level CXL resources and is the parent device for a CXL port
> topology tree. As such it has distinct locking rules relative to other
> CXL subsystem objects, but because it is an ACPI device the lock class
> is established well before it is given to the cxl_acpi driver.
 
This final sentence gave me pause because it implied that the device lock class
was set to something other than no validate.  But I don't see that anywhere in
the acpi code.  So given that it looks to me like ACPI is just using the
default no validate class...

Reviewed-by: Ira Weiny 

> However, the lockdep API does support changing the lock class "live" for
> situations like this. Add a device_lock_set_class() helper that a driver
> can use in ->probe() to set a custom lock class, and
> device_lock_reset_class() to return to the default "no validate" class
> before the custom lock class key goes out of scope after ->remove().
> 
> Note the helpers are all macros to support dead code elimination in the
> CONFIG_PROVE_LOCKING=n case.
> 
> Suggested-by: Peter Zijlstra 
> Cc: Greg Kroah-Hartman 
> Cc: "Rafael J. Wysocki" 
> Cc: Ingo Molnar 
> Cc: Will Deacon 
> Cc: Waiman Long 
> Cc: Boqun Feng 
> Cc: Alison Schofield 
> Cc: Vishal Verma 
> Cc: Ira Weiny 
> Cc: Ben Widawsky 
> Cc: Jonathan Cameron 
> Signed-off-by: Dan Williams 
> ---
>  drivers/cxl/acpi.c |   15 +++
>  include/linux/device.h |   25 +
>  2 files changed, 40 insertions(+)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index d15a6aec0331..e19cea27387e 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -275,6 +275,15 @@ static int add_root_nvdimm_bridge(struct device *match, 
> void *data)
>   return 1;
>  }
>  
> +static struct lock_class_key cxl_root_key;
> +
> +static void cxl_acpi_lock_reset_class(void *_dev)
> +{
> + struct device *dev = _dev;
> +
> + device_lock_reset_class(dev);
> +}
> +
>  static int cxl_acpi_probe(struct platform_device *pdev)
>  {
>   int rc;
> @@ -283,6 +292,12 @@ static int cxl_acpi_probe(struct platform_device *pdev)
>   struct acpi_device *adev = ACPI_COMPANION(host);
>   struct cxl_cfmws_context ctx;
>  
> + device_lock_set_class(&pdev->dev, &cxl_root_key);
> + rc = devm_add_action_or_reset(&pdev->dev, cxl_acpi_lock_reset_class,
> +   &pdev->dev);
> + if (rc)
> + return rc;
> +
>   root_port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL);
>   if (IS_ERR(root_port))
>   return PTR_ERR(root_port);
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 93459724dcde..82c9d307e7bd 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -850,6 +850,31 @@ static inline bool device_supports_offline(struct device 
> *dev)
>   return dev->bus && dev->bus->offline && dev->bus->online;
>  }
>  
> +#define __device_lock_set_class(dev, name, key) \
> + lock_set_class(&(dev)->mutex.dep_map, name, key, 0, _THIS_IP_)
> +
> +/**
> + * device_lock_set_class - Specify a temporary lock class while a device
> + *  is attached to a driver
> + * @dev: device to modify
> + * @key: lock class key data
> + *
> + * This must be called with the device_lock() already held, for example
> + * from driver ->probe().
> + */
> +#define device_lock_set_class(dev, key)  \
> + __device_lock_set_class(dev, #key, key)
> +
> +/**
> + * device_lock_reset_class - Return a device to the default lockdep 
> novalidate state
> + * @dev: device to modify
> + *
> + * This must be called with the device_lock() already held, for example
> + * from driver ->remove().
> + */
> +#define device_lock_reset_class(dev) \
> + device_lock_set_class(dev, &__lockdep_no_validate__)
> +
>  void lock_device_hotplug(void);
>  void unlock_device_hotplug(void);
>  int lock_device_hotplug_sysfs(void);
>

Re: [PATCH v3 3/8] cxl: Drop cxl_device_lock()

2022-04-22 Thread Ira Weiny

On Thu, Apr 21, 2022 at 08:33:23AM -0700, Dan Williams wrote:
> Now that all CXL subsystem locking is validated with custom lock
> classes, there is no need for the custom usage of the lockdep_mutex.
> 
> Cc: Alison Schofield 
> Cc: Vishal Verma 
> Cc: Ira Weiny 
> Cc: Ben Widawsky 
> Cc: Jonathan Cameron 
> Signed-off-by: Dan Williams 

Reviewed-by: Ira Weiny 

> ---
>  drivers/cxl/core/pmem.c |4 +-
>  drivers/cxl/core/port.c |   55 ++---
>  drivers/cxl/cxl.h   |   78 
> ---
>  drivers/cxl/mem.c   |4 +-
>  drivers/cxl/pmem.c  |   12 ---
>  lib/Kconfig.debug   |6 
>  6 files changed, 33 insertions(+), 126 deletions(-)
> 
> diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c
> index e825e261278d..bec7cfb54ebf 100644
> --- a/drivers/cxl/core/pmem.c
> +++ b/drivers/cxl/core/pmem.c
> @@ -124,10 +124,10 @@ static void unregister_nvb(void *_cxl_nvb)
>* work to flush. Once the state has been changed to 'dead' then no new
>* work can be queued by user-triggered bind.
>*/
> - cxl_device_lock(&cxl_nvb->dev);
> + device_lock(&cxl_nvb->dev);
>   flush = cxl_nvb->state != CXL_NVB_NEW;
>   cxl_nvb->state = CXL_NVB_DEAD;
> - cxl_device_unlock(&cxl_nvb->dev);
> + device_unlock(&cxl_nvb->dev);
>  
>   /*
>* Even though the device core will trigger device_release_driver()
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 750aac95ed5f..ea60abda6500 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -312,10 +312,10 @@ static void cxl_port_release(struct device *dev)
>   struct cxl_port *port = to_cxl_port(dev);
>   struct cxl_ep *ep, *_e;
>  
> - cxl_device_lock(dev);
> + device_lock(dev);
>   list_for_each_entry_safe(ep, _e, &port->endpoints, list)
>   cxl_ep_release(ep);
> - cxl_device_unlock(dev);
> + device_unlock(dev);
>   ida_free(&cxl_port_ida, port->id);
>   kfree(port);
>  }
> @@ -556,7 +556,7 @@ static int match_root_child(struct device *dev, const 
> void *match)
>   return 0;
>  
>   port = to_cxl_port(dev);
> - cxl_device_lock(dev);
> + device_lock(dev);
>   list_for_each_entry(dport, &port->dports, list) {
>   iter = match;
>   while (iter) {
> @@ -566,7 +566,7 @@ static int match_root_child(struct device *dev, const 
> void *match)
>   }
>   }
>  out:
> - cxl_device_unlock(dev);
> + device_unlock(dev);
>  
>   return !!iter;
>  }
> @@ -625,13 +625,13 @@ static int add_dport(struct cxl_port *port, struct 
> cxl_dport *new)
>  static void cond_cxl_root_lock(struct cxl_port *port)
>  {
>   if (is_cxl_root(port))
> - cxl_device_lock(&port->dev);
> + device_lock(&port->dev);
>  }
>  
>  static void cond_cxl_root_unlock(struct cxl_port *port)
>  {
>   if (is_cxl_root(port))
> - cxl_device_unlock(&port->dev);
> + device_unlock(&port->dev);
>  }
>  
>  static void cxl_dport_remove(void *data)
> @@ -738,15 +738,15 @@ static int add_ep(struct cxl_port *port, struct cxl_ep 
> *new)
>  {
>   struct cxl_ep *dup;
>  
> - cxl_device_lock(&port->dev);
> + device_lock(&port->dev);
>   if (port->dead) {
> - cxl_device_unlock(&port->dev);
> + device_unlock(&port->dev);
>   return -ENXIO;
>   }
>   dup = find_ep(port, new->ep);
>   if (!dup)
>   list_add_tail(&new->list, &port->endpoints);
> - cxl_device_unlock(&port->dev);
> + device_unlock(&port->dev);
>  
>   return dup ? -EEXIST : 0;
>  }
> @@ -856,12 +856,12 @@ static void delete_endpoint(void *data)
>   goto out;
>   parent = &parent_port->dev;
>  
> - cxl_device_lock(parent);
> + device_lock(parent);
>   if (parent->driver && endpoint->uport) {
>   devm_release_action(parent, cxl_unlink_uport, endpoint);
>   devm_release_action(parent, unregister_port, endpoint);
>   }
> - cxl_device_unlock(parent);
> + device_unlock(parent);
>   put_device(parent);
>  out:
>   put_device(&endpoint->dev);
> @@ -922,7 +922,7 @@ static void cxl_detach_ep(void *data)
>   }
>  
>   parent_port = to_cxl_port(port->dev.parent);

Re: [PATCH v3 4/8] nvdimm: Replace lockdep_mutex with local lock classes

2022-04-22 Thread Ira Weiny

On Thu, Apr 21, 2022 at 08:33:29AM -0700, Dan Williams wrote:
> In response to an attempt to expand dev->lockdep_mutex for device_lock()
> validation [1], Peter points out [2] that the lockdep API already has
> the ability to assign a dedicated lock class per subsystem device-type.
> 
> Use lockdep_set_class() to override the default device_lock()
> '__lockdep_no_validate__' class for each NVDIMM subsystem device-type. This
> enables lockdep to detect deadlocks and recursive locking within the
> device-driver core and the subsystem.
> 
> Link: 
> https://lore.kernel.org/r/164982968798.684294.15817853329823976469.st...@dwillia2-desk3.amr.corp.intel.com
>  [1]
> Link: 
> https://lore.kernel.org/r/ylf0dewci8myl...@hirez.programming.kicks-ass.net [2]
> Suggested-by: Peter Zijlstra 
> Cc: Vishal Verma 
> Cc: Dave Jiang 
> Cc: Ira Weiny 
> Signed-off-by: Dan Williams 

Reviewed-by: Ira Weiny 

> ---
>  drivers/nvdimm/btt_devs.c   |7 +--
>  drivers/nvdimm/bus.c|   14 +++---
>  drivers/nvdimm/dax_devs.c   |4 ++--
>  drivers/nvdimm/dimm_devs.c  |4 
>  drivers/nvdimm/namespace_devs.c |   10 +-
>  drivers/nvdimm/nd-core.h|2 +-
>  drivers/nvdimm/pfn_devs.c   |7 +--
>  drivers/nvdimm/region_devs.c|4 
>  8 files changed, 37 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c
> index e5a58520d398..120821796668 100644
> --- a/drivers/nvdimm/btt_devs.c
> +++ b/drivers/nvdimm/btt_devs.c
> @@ -178,6 +178,8 @@ bool is_nd_btt(struct device *dev)
>  }
>  EXPORT_SYMBOL(is_nd_btt);
>  
> +static struct lock_class_key nvdimm_btt_key;
> +
>  static struct device *__nd_btt_create(struct nd_region *nd_region,
> unsigned long lbasize, uuid_t *uuid,
> struct nd_namespace_common *ndns)
> @@ -205,6 +207,7 @@ static struct device *__nd_btt_create(struct nd_region 
> *nd_region,
>   dev->parent = &nd_region->dev;
>   dev->type = &nd_btt_device_type;
>   device_initialize(&nd_btt->dev);
> + lockdep_set_class(&nd_btt->dev.mutex, &nvdimm_btt_key);
>   if (ndns && !__nd_attach_ndns(&nd_btt->dev, ndns, &nd_btt->ndns)) {
>   dev_dbg(&ndns->dev, "failed, already claimed by %s\n",
>   dev_name(ndns->claim));
> @@ -225,7 +228,7 @@ struct device *nd_btt_create(struct nd_region *nd_region)
>  {
>   struct device *dev = __nd_btt_create(nd_region, 0, NULL, NULL);
>  
> - __nd_device_register(dev);
> + nd_device_register(dev);
>   return dev;
>  }
>  
> @@ -324,7 +327,7 @@ static int __nd_btt_probe(struct nd_btt *nd_btt,
>   if (!nd_btt->uuid)
>   return -ENOMEM;
>  
> - __nd_device_register(&nd_btt->dev);
> + nd_device_register(&nd_btt->dev);
>  
>   return 0;
>  }
> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> index 7b0d1443217a..85ffa04e93c2 100644
> --- a/drivers/nvdimm/bus.c
> +++ b/drivers/nvdimm/bus.c
> @@ -334,6 +334,8 @@ struct nvdimm_bus *nvdimm_to_bus(struct nvdimm *nvdimm)
>  }
>  EXPORT_SYMBOL_GPL(nvdimm_to_bus);
>  
> +static struct lock_class_key nvdimm_bus_key;
> +
>  struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
>   struct nvdimm_bus_descriptor *nd_desc)
>  {
> @@ -360,6 +362,7 @@ struct nvdimm_bus *nvdimm_bus_register(struct device 
> *parent,
>   nvdimm_bus->dev.bus = &nvdimm_bus_type;
>   nvdimm_bus->dev.of_node = nd_desc->of_node;
>   device_initialize(&nvdimm_bus->dev);
> + lockdep_set_class(&nvdimm_bus->dev.mutex, &nvdimm_bus_key);
>   device_set_pm_not_required(&nvdimm_bus->dev);
>   rc = dev_set_name(&nvdimm_bus->dev, "ndbus%d", nvdimm_bus->id);
>   if (rc)
> @@ -511,7 +514,7 @@ static void nd_async_device_unregister(void *d, 
> async_cookie_t cookie)
>   put_device(dev);
>  }
>  
> -void __nd_device_register(struct device *dev)
> +void nd_device_register(struct device *dev)
>  {
>   if (!dev)
>   return;
> @@ -537,12 +540,6 @@ void __nd_device_register(struct device *dev)
>   async_schedule_dev_domain(nd_async_device_register, dev,
> &nd_async_domain);
>  }
> -
> -void nd_device_register(struct device *dev)
> -{
> - device_initialize(dev);
> - __nd_device_register(dev);
> -}
>  EXPORT_SYMBOL(nd_device_register);
>  
>  void nd_device_unregister(struct device *dev, enum nd_as

Re: [PATCH v3 5/8] ACPI: NFIT: Drop nfit_device_lock()

2022-04-22 Thread Ira Weiny

On Thu, Apr 21, 2022 at 08:33:34AM -0700, Dan Williams wrote:
> The nfit_device_lock() helper was added to provide lockdep coverage for
> the NFIT driver's usage of device_lock() on the nvdimm_bus object. Now
> that nvdimm_bus objects have their own lock class this wrapper can be
> dropped.
> 
> Cc: Vishal Verma 
> Cc: Dave Jiang 
> Cc: Ira Weiny 
> Signed-off-by: Dan Williams 

Reviewed-by: Ira Weiny 

> ---
>  drivers/acpi/nfit/core.c |   30 +++---
>  drivers/acpi/nfit/nfit.h |   24 
>  2 files changed, 15 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index fe61f617a943..ae5f4acf2675 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -1230,7 +1230,7 @@ static ssize_t hw_error_scrub_store(struct device *dev,
>   if (rc)
>   return rc;
>  
> - nfit_device_lock(dev);
> + device_lock(dev);
>   nd_desc = dev_get_drvdata(dev);
>   if (nd_desc) {
>   struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
> @@ -1247,7 +1247,7 @@ static ssize_t hw_error_scrub_store(struct device *dev,
>   break;
>   }
>   }
> - nfit_device_unlock(dev);
> + device_unlock(dev);
>   if (rc)
>   return rc;
>   return size;
> @@ -1267,10 +1267,10 @@ static ssize_t scrub_show(struct device *dev,
>   ssize_t rc = -ENXIO;
>   bool busy;
>  
> - nfit_device_lock(dev);
> + device_lock(dev);
>   nd_desc = dev_get_drvdata(dev);
>   if (!nd_desc) {
> - nfit_device_unlock(dev);
> + device_unlock(dev);
>   return rc;
>   }
>   acpi_desc = to_acpi_desc(nd_desc);
> @@ -1287,7 +1287,7 @@ static ssize_t scrub_show(struct device *dev,
>   }
>  
>   mutex_unlock(&acpi_desc->init_mutex);
> - nfit_device_unlock(dev);
> + device_unlock(dev);
>   return rc;
>  }
>  
> @@ -1304,14 +1304,14 @@ static ssize_t scrub_store(struct device *dev,
>   if (val != 1)
>   return -EINVAL;
>  
> - nfit_device_lock(dev);
> + device_lock(dev);
>   nd_desc = dev_get_drvdata(dev);
>   if (nd_desc) {
>   struct acpi_nfit_desc *acpi_desc = to_acpi_desc(nd_desc);
>  
>   rc = acpi_nfit_ars_rescan(acpi_desc, ARS_REQ_LONG);
>   }
> - nfit_device_unlock(dev);
> + device_unlock(dev);
>   if (rc)
>   return rc;
>   return size;
> @@ -1697,9 +1697,9 @@ static void acpi_nvdimm_notify(acpi_handle handle, u32 
> event, void *data)
>   struct acpi_device *adev = data;
>   struct device *dev = &adev->dev;
>  
> - nfit_device_lock(dev->parent);
> + device_lock(dev->parent);
>   __acpi_nvdimm_notify(dev, event);
> - nfit_device_unlock(dev->parent);
> + device_unlock(dev->parent);
>  }
>  
>  static bool acpi_nvdimm_has_method(struct acpi_device *adev, char *method)
> @@ -3152,8 +3152,8 @@ static int acpi_nfit_flush_probe(struct 
> nvdimm_bus_descriptor *nd_desc)
>   struct device *dev = acpi_desc->dev;
>  
>   /* Bounce the device lock to flush acpi_nfit_add / acpi_nfit_notify */
> - nfit_device_lock(dev);
> - nfit_device_unlock(dev);
> + device_lock(dev);
> + device_unlock(dev);
>  
>   /* Bounce the init_mutex to complete initial registration */
>   mutex_lock(&acpi_desc->init_mutex);
> @@ -3305,8 +3305,8 @@ void acpi_nfit_shutdown(void *data)
>* acpi_nfit_ars_rescan() submissions have had a chance to
>* either submit or see ->cancel set.
>*/
> - nfit_device_lock(bus_dev);
> - nfit_device_unlock(bus_dev);
> + device_lock(bus_dev);
> + device_unlock(bus_dev);
>  
>   flush_workqueue(nfit_wq);
>  }
> @@ -3449,9 +3449,9 @@ EXPORT_SYMBOL_GPL(__acpi_nfit_notify);
>  
>  static void acpi_nfit_notify(struct acpi_device *adev, u32 event)
>  {
> - nfit_device_lock(&adev->dev);
> + device_lock(&adev->dev);
>   __acpi_nfit_notify(&adev->dev, adev->handle, event);
> - nfit_device_unlock(&adev->dev);
> + device_unlock(&adev->dev);
>  }
>  
>  static const struct acpi_device_id acpi_nfit_ids[] = {
> diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h
> index 50882bdbeb96..6023ad61831a 100644
> --- a/drivers/acpi/nfit/nfit.h
> +++ b/drivers/acpi/nfit/nfit.h
> @@ -337,30 +337,6 @@ static inline struct acpi_nfit_desc *to_acpi_desc(
>   return container_of(nd_desc, struct acpi_nfi

Re: [PATCH v3 6/8] nvdimm: Drop nd_device_lock()

2022-04-22 Thread Ira Weiny

On Thu, Apr 21, 2022 at 08:33:39AM -0700, Dan Williams wrote:
> Now that all NVDIMM subsystem locking is validated with custom lock
> classes, there is no need for the custom usage of the lockdep_mutex.
> 
> Cc: Vishal Verma 
> Cc: Dave Jiang 
> Cc: Ira Weiny 
> Signed-off-by: Dan Williams 

Reviewed-by: Ira Weiny 

> ---
>  drivers/nvdimm/btt_devs.c   |   16 +
>  drivers/nvdimm/bus.c|   24 +-
>  drivers/nvdimm/core.c   |   10 +++---
>  drivers/nvdimm/dimm_devs.c  |8 ++---
>  drivers/nvdimm/namespace_devs.c |   36 +++--
>  drivers/nvdimm/nd-core.h|   66 
> ---
>  drivers/nvdimm/pfn_devs.c   |   24 +++---
>  drivers/nvdimm/pmem.c   |2 +
>  drivers/nvdimm/region.c |2 +
>  drivers/nvdimm/region_devs.c|   16 +
>  lib/Kconfig.debug   |   17 --
>  11 files changed, 66 insertions(+), 155 deletions(-)
> 
> diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c
> index 120821796668..fabbb31f2c35 100644
> --- a/drivers/nvdimm/btt_devs.c
> +++ b/drivers/nvdimm/btt_devs.c
> @@ -50,14 +50,14 @@ static ssize_t sector_size_store(struct device *dev,
>   struct nd_btt *nd_btt = to_nd_btt(dev);
>   ssize_t rc;
>  
> - nd_device_lock(dev);
> + device_lock(dev);
>   nvdimm_bus_lock(dev);
>   rc = nd_size_select_store(dev, buf, &nd_btt->lbasize,
>   btt_lbasize_supported);
>   dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf,
>   buf[len - 1] == '\n' ? "" : "\n");
>   nvdimm_bus_unlock(dev);
> - nd_device_unlock(dev);
> + device_unlock(dev);
>  
>   return rc ? rc : len;
>  }
> @@ -79,11 +79,11 @@ static ssize_t uuid_store(struct device *dev,
>   struct nd_btt *nd_btt = to_nd_btt(dev);
>   ssize_t rc;
>  
> - nd_device_lock(dev);
> + device_lock(dev);
>   rc = nd_uuid_store(dev, &nd_btt->uuid, buf, len);
>   dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf,
>   buf[len - 1] == '\n' ? "" : "\n");
> - nd_device_unlock(dev);
> + device_unlock(dev);
>  
>   return rc ? rc : len;
>  }
> @@ -108,13 +108,13 @@ static ssize_t namespace_store(struct device *dev,
>   struct nd_btt *nd_btt = to_nd_btt(dev);
>   ssize_t rc;
>  
> - nd_device_lock(dev);
> + device_lock(dev);
>   nvdimm_bus_lock(dev);
>   rc = nd_namespace_store(dev, &nd_btt->ndns, buf, len);
>   dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf,
>   buf[len - 1] == '\n' ? "" : "\n");
>   nvdimm_bus_unlock(dev);
> - nd_device_unlock(dev);
> + device_unlock(dev);
>  
>   return rc;
>  }
> @@ -126,14 +126,14 @@ static ssize_t size_show(struct device *dev,
>   struct nd_btt *nd_btt = to_nd_btt(dev);
>   ssize_t rc;
>  
> - nd_device_lock(dev);
> + device_lock(dev);
>   if (dev->driver)
>   rc = sprintf(buf, "%llu\n", nd_btt->size);
>   else {
>   /* no size to convey if the btt instance is disabled */
>   rc = -ENXIO;
>   }
> - nd_device_unlock(dev);
> + device_unlock(dev);
>  
>   return rc;
>  }
> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> index 85ffa04e93c2..a4fc17db707c 100644
> --- a/drivers/nvdimm/bus.c
> +++ b/drivers/nvdimm/bus.c
> @@ -88,10 +88,7 @@ static int nvdimm_bus_probe(struct device *dev)
>   dev->driver->name, dev_name(dev));
>  
>   nvdimm_bus_probe_start(nvdimm_bus);
> - debug_nvdimm_lock(dev);
>   rc = nd_drv->probe(dev);
> - debug_nvdimm_unlock(dev);
> -
>   if ((rc == 0 || rc == -EOPNOTSUPP) &&
>   dev->parent && is_nd_region(dev->parent))
>   nd_region_advance_seeds(to_nd_region(dev->parent), dev);
> @@ -111,11 +108,8 @@ static void nvdimm_bus_remove(struct device *dev)
>   struct module *provider = to_bus_provider(dev);
>   struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
>  
> - if (nd_drv->remove) {
> - debug_nvdimm_lock(dev);
> + if (nd_drv->remove)
>   nd_drv->remove(dev);
> - debug_nvdimm_unlock(dev);
> - }
>  
>   dev_dbg(&nvdimm_bus->dev, "%s.remove(%s)\n", dev->driver->name,
>   dev_name(dev));
> @@ -139,7 +133,7 @@ static void nvdimm_bus_shut

Re: [PATCH v3 7/8] device-core: Kill the lockdep_mutex

2022-04-22 Thread Ira Weiny

On Thu, Apr 21, 2022 at 08:33:45AM -0700, Dan Williams wrote:
> Per Peter [1], the lockdep API has native support for all the use cases
> lockdep_mutex was attempting to enable. Now that all lockdep_mutex users
> have been converted to those APIs, drop this lock.
> 
> Link: 
> https://lore.kernel.org/r/ylf0dewci8myl...@hirez.programming.kicks-ass.net [1]
> Cc: Greg Kroah-Hartman 
> Cc: "Rafael J. Wysocki" 
> Suggested-by: Peter Zijlstra 
> Signed-off-by: Dan Williams 

Reviewed-by: Ira Weiny 

> ---
>  drivers/base/core.c|3 ---
>  include/linux/device.h |5 -
>  2 files changed, 8 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 3d6430eb0c6a..2eede2ec3d64 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -2864,9 +2864,6 @@ void device_initialize(struct device *dev)
>   kobject_init(&dev->kobj, &device_ktype);
>   INIT_LIST_HEAD(&dev->dma_pools);
>   mutex_init(&dev->mutex);
> -#ifdef CONFIG_PROVE_LOCKING
> - mutex_init(&dev->lockdep_mutex);
> -#endif
>   lockdep_set_novalidate_class(&dev->mutex);
>   spin_lock_init(&dev->devres_lock);
>   INIT_LIST_HEAD(&dev->devres_head);
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 82c9d307e7bd..c00ab223da50 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -400,8 +400,6 @@ struct dev_msi_info {
>   *   This identifies the device type and carries type-specific
>   *   information.
>   * @mutex:   Mutex to synchronize calls to its driver.
> - * @lockdep_mutex: An optional debug lock that a subsystem can use as a
> - *   peer lock to gain localized lockdep coverage of the device_lock.
>   * @bus: Type of bus device is on.
>   * @driver:  Which driver has allocated this
>   * @platform_data: Platform data specific to the device.
> @@ -499,9 +497,6 @@ struct device {
>  core doesn't touch it */
>   void*driver_data;   /* Driver data, set and get with
>  dev_set_drvdata/dev_get_drvdata */
> -#ifdef CONFIG_PROVE_LOCKING
> - struct mutexlockdep_mutex;
> -#endif
>   struct mutexmutex;  /* mutex to synchronize calls to
>* its driver.
>*/
> 
>

Re: [PATCH v3 8/8] nvdimm: Fix firmware activation deadlock scenarios

2022-04-22 Thread Ira Weiny

On Thu, Apr 21, 2022 at 08:33:51AM -0700, Dan Williams wrote:
> Lockdep reports the following deadlock scenarios for CXL root device
> power-management, device_prepare(), operations, and device_shutdown()
> operations for 'nd_region' devices:
> 
> ---
>  Chain exists of:
>&nvdimm_region_key --> &nvdimm_bus->reconfig_mutex --> 
> system_transition_mutex
> 
>   Possible unsafe locking scenario:
> 
> CPU0CPU1
> 
>lock(system_transition_mutex);
> lock(&nvdimm_bus->reconfig_mutex);
> lock(system_transition_mutex);
>lock(&nvdimm_region_key);
> 
> --
> 
>  Chain exists of:
>&cxl_nvdimm_bridge_key --> acpi_scan_lock --> &cxl_root_key
> 
>   Possible unsafe locking scenario:
> 
> CPU0CPU1
> 
>lock(&cxl_root_key);
> lock(acpi_scan_lock);
> lock(&cxl_root_key);
>lock(&cxl_nvdimm_bridge_key);
> 
> ---
> 
> These stem from holding nvdimm_bus_lock() over hibernate_quiet_exec()
> which walks the entire system device topology taking device_lock() along
> the way. The nvdimm_bus_lock() is protecting against unregistration,
> multiple simultaneous ops callers, and preventing activate_show() from
> racing activate_store(). For the first 2, the lock is redundant.
> Unregistration already flushes all ops users, and sysfs already prevents
> multiple threads to be active in an ops handler at the same time. For
> the last userspace should already be waiting for its last
> activate_store() to complete, and does not need activate_show() to flush
> the write side, so this lock usage can be deleted in these attributes.
>

I'm sorry if this is obvious but why can't the locking be removed from
capability_show() and nvdimm_bus_firmware_visible() as well?

Effectively it sounds like we don't care if the cap read is racing any state
change?  And we know the device can't go away while sysfs is calling those
functions.

Ira

> 
> Fixes: 48001ea50d17 ("PM, libnvdimm: Add runtime firmware activation support")
> Signed-off-by: Dan Williams 
> ---
>  drivers/nvdimm/core.c |4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
> index 144926b7451c..7c7f4a43fd4f 100644
> --- a/drivers/nvdimm/core.c
> +++ b/drivers/nvdimm/core.c
> @@ -395,10 +395,8 @@ static ssize_t activate_show(struct device *dev,
>   if (!nd_desc->fw_ops)
>   return -EOPNOTSUPP;
>  
> - nvdimm_bus_lock(dev);
>   cap = nd_desc->fw_ops->capability(nd_desc);
>   state = nd_desc->fw_ops->activate_state(nd_desc);
> - nvdimm_bus_unlock(dev);
>  
>   if (cap < NVDIMM_FWA_CAP_QUIESCE)
>   return -EOPNOTSUPP;
> @@ -443,7 +441,6 @@ static ssize_t activate_store(struct device *dev,
>   else
>   return -EINVAL;
>  
> - nvdimm_bus_lock(dev);
>   state = nd_desc->fw_ops->activate_state(nd_desc);
>  
>   switch (state) {
> @@ -461,7 +458,6 @@ static ssize_t activate_store(struct device *dev,
>   default:
>   rc = -ENXIO;
>   }
> - nvdimm_bus_unlock(dev);
>  
>   if (rc == 0)
>   rc = len;
> 
>

Re: [PATCH] tools/testing/nvdimm: remove unneeded flush_workqueue

2022-04-25 Thread Ira Weiny

On Sun, Apr 24, 2022 at 06:26:55AM +, cgel@gmail.com wrote:
> From: ran jianping 
> 
> All work currently pending will be done first by calling destroy_workqueue,
> so there is no need to flush it explicitly.
> 
> Reported-by: Zeal Robot 
> Signed-off-by: ran jianping 

Reviewed-by: Ira Weiny 

> ---
>  tools/testing/nvdimm/test/nfit.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/tools/testing/nvdimm/test/nfit.c 
> b/tools/testing/nvdimm/test/nfit.c
> index 1da76ccde448..e7e1a640e482 100644
> --- a/tools/testing/nvdimm/test/nfit.c
> +++ b/tools/testing/nvdimm/test/nfit.c
> @@ -3375,7 +3375,6 @@ static __exit void nfit_test_exit(void)
>  {
>   int i;
>  
> - flush_workqueue(nfit_wq);
>   destroy_workqueue(nfit_wq);
>   for (i = 0; i < NUM_NFITS; i++)
>   platform_device_unregister(&instances[i]->pdev);
> -- 
> 2.25.1
>

Re: [PATCH] fs: Replace kmap{,_atomic}() with kmap_local_page()

2022-07-08 Thread Ira Weiny

On Thu, Jun 30, 2022 at 06:35:27PM +0200, Fabio M. De Francesco wrote:
> The use of kmap() and kmap_atomic() are being deprecated in favor of
> kmap_local_page().
> 
> With kmap_local_page(), the mappings are per thread, CPU local and not
> globally visible. Furthermore, the mappings can be acquired from any
> context (including interrupts).
> 
> Therefore, use kmap_local_page() in exec.c because these mappings are per
> thread, CPU local, and not globally visible.
> 
> Tested with xfstests on a QEMU + KVM 32-bits VM booting a kernel with
> HIGHMEM64GB enabled.
> 
> Suggested-by: Ira Weiny 

This looks good but there is a kmap_atomic() in this file which I _think_ can
be converted as well.  But that is good as a separate patch.

Reviewed-by: Ira Weiny 

> Signed-off-by: Fabio M. De Francesco 
> ---
>  fs/exec.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 0989fb8472a1..4a2129c0d422 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -583,11 +583,11 @@ static int copy_strings(int argc, struct user_arg_ptr 
> argv,
>  
>   if (kmapped_page) {
>   flush_dcache_page(kmapped_page);
> - kunmap(kmapped_page);
> + kunmap_local(kaddr);
>   put_arg_page(kmapped_page);
>   }
>   kmapped_page = page;
> - kaddr = kmap(kmapped_page);
> + kaddr = kmap_local_page(kmapped_page);
>   kpos = pos & PAGE_MASK;
>   flush_arg_page(bprm, kpos, kmapped_page);
>   }
> @@ -601,7 +601,7 @@ static int copy_strings(int argc, struct user_arg_ptr 
> argv,
>  out:
>   if (kmapped_page) {
>   flush_dcache_page(kmapped_page);
> - kunmap(kmapped_page);
> + kunmap_local(kaddr);
>   put_arg_page(kmapped_page);
>   }
>   return ret;
> @@ -883,11 +883,11 @@ int transfer_args_to_stack(struct linux_binprm *bprm,
>  
>   for (index = MAX_ARG_PAGES - 1; index >= stop; index--) {
>   unsigned int offset = index == stop ? bprm->p & ~PAGE_MASK : 0;
> - char *src = kmap(bprm->page[index]) + offset;
> + char *src = kmap_local_page(bprm->page[index]) + offset;
>   sp -= PAGE_SIZE - offset;
>   if (copy_to_user((void *) sp, src, PAGE_SIZE - offset) != 0)
>   ret = -EFAULT;
> - kunmap(bprm->page[index]);
> + kunmap_local(src);
>   if (ret)
>   goto out;
>   }
> @@ -1680,13 +1680,13 @@ int remove_arg_zero(struct linux_binprm *bprm)
>   ret = -EFAULT;
>   goto out;
>   }
> - kaddr = kmap_atomic(page);
> + kaddr = kmap_local_page(page);
>  
>   for (; offset < PAGE_SIZE && kaddr[offset];
>   offset++, bprm->p++)
>   ;
>  
> - kunmap_atomic(kaddr);
> + kunmap_local(kaddr);
>   put_arg_page(page);
>   } while (offset == PAGE_SIZE);
>  
> -- 
> 2.36.1
>

Re: [PATCH] fs: Call kmap_local_page() in copy_string_kernel()

2022-07-21 Thread Ira Weiny

On Sun, Jul 10, 2022 at 12:01:36PM +0200, Fabio M. De Francesco wrote:
> The use of kmap_atomic() is being deprecated in favor of kmap_local_page().
> 
> With kmap_local_page(), the mappings are per thread, CPU local, not
> globally visible and can take page faults. Furthermore, the mappings can be
> acquired from any context (including interrupts).
> 
> Therefore, use kmap_local_page() in copy_string_kernel() instead of
> kmap_atomic().
> 
> Tested with xfstests on a QEMU + KVM 32-bits VM booting a kernel with
> HIGHMEM64GB enabled.
> 
> Suggested-by: Ira Weiny 
> Signed-off-by: Fabio M. De Francesco 
> ---
> 
> I sent a first patch to fs/exec.c for converting kmap() and kmap_atomic()
> to kmap_local_page():
> https://lore.kernel.org/lkml/20220630163527.9776-1-fmdefrance...@gmail.com/
> 
> Some days ago, Ira Weiny, while he was reviewing that patch, made me notice
> that I had overlooked a second kmap_atomic() in the same file (thanks):
> https://lore.kernel.org/lkml/YsiQptk19txHrG4c@iweiny-desk3/
> 
> I've been asked to send this as an additional change. This is why there will
> not be any second version of that previous patch.
> 
>  fs/exec.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 4a2129c0d422..5fa652ca5823 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -639,11 +639,11 @@ int copy_string_kernel(const char *arg, struct 
> linux_binprm *bprm)
>   page = get_arg_page(bprm, pos, 1);
>   if (!page)
>   return -E2BIG;
> - kaddr = kmap_atomic(page);
> + kaddr = kmap_local_page(page);
>   flush_arg_page(bprm, pos & PAGE_MASK, page);

I really question why we can't use memcpy_to_page() here and move the
flush_arg_page() prior to the mapping?

flush_arg_page() only calls flush_cache_page() which does not need the
mapping to work correctly AFAICT.

Ira

>   memcpy(kaddr + offset_in_page(pos), arg, bytes_to_copy);
>   flush_dcache_page(page);
> - kunmap_atomic(kaddr);
> + kunmap_local(kaddr);
>   put_arg_page(page);
>   }
>  
> -- 
> 2.36.1
>

Re: [PATCH] dax: Check dev_set_name() return value

2022-08-05 Thread Ira Weiny

On Fri, Aug 05, 2022 at 01:33:19AM -0400, Bo Liu wrote:
> It's possible that dev_set_name() returns -ENOMEM, catch and handle this.

Did this cause a bug or some other problem when the name was not set?

I think it is an ok change but without digging into the code I'm not clear why
you did this.

Ira

> 
> Signed-off-by: Bo Liu 
> ---
>  drivers/dax/bus.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 1dad813ee4a6..36cf245ee467 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -765,7 +765,12 @@ static int devm_register_dax_mapping(struct dev_dax 
> *dev_dax, int range_id)
>   device_initialize(dev);
>   dev->parent = &dev_dax->dev;
>   dev->type = &dax_mapping_type;
> - dev_set_name(dev, "mapping%d", mapping->id);
> + rc = dev_set_name(dev, "mapping%d", mapping->id);
> + if (rc) {
> + kfree(mapping);
> + return rc;
> + }
> +
>   rc = device_add(dev);
>   if (rc) {
>   put_device(dev);
> @@ -1334,7 +1339,9 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data 
> *data)
>   dev_dax->region = dax_region;
>   dev = &dev_dax->dev;
>   device_initialize(dev);
> - dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id);
> + rc = dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id);
> + if (rc)
> + goto err_range;
>  
>   rc = alloc_dev_dax_range(dev_dax, dax_region->res.start, data->size);
>   if (rc)
> -- 
> 2.27.0
> 
>

Re: [PATCH v2] arch/cacheflush: Introduce flush_all_caches()

2022-08-19 Thread Ira Weiny

On Fri, Aug 19, 2022 at 10:10:24AM -0700, Davidlohr Bueso wrote:
> With CXL security features, global CPU cache flushing nvdimm requirements
> are no longer specific to that subsystem, even beyond the scope of
> security_ops. CXL will need such semantics for features not necessarily
> limited to persistent memory.
> 
> The functionality this is enabling is to be able to instantaneously
> secure erase potentially terabytes of memory at once and the kernel
> needs to be sure that none of the data from before the secure is still
> present in the cache. It is also used when unlocking a memory device
> where speculative reads and firmware accesses could have cached poison
> from before the device was unlocked.
> 
> This capability is typically only used once per-boot (for unlock), or
> once per bare metal provisioning event (secure erase), like when handing
> off the system to another tenant or decomissioning a device. That small
> scope plus the fact that none of this is available to a VM limits the
> potential damage.
> 
> While the scope of this is for physical address space, add a new
> flush_all_caches() in cacheflush headers such that each architecture
> can define it, when capable. For x86 just use the wbinvd hammer and
> prevent any other arch from being capable.
> 
> Signed-off-by: Davidlohr Bueso 
> ---
> 
> Changes from v1 
> (https://lore.kernel.org/all/20220815160706.tqd42dv24tgb7x7y@offworld/):
> - Added comments and improved changelog to reflect this is
>   routine should be avoided and not considered a general API (Peter, Dan).
> 
>  arch/x86/include/asm/cacheflush.h |  4 +++
>  drivers/acpi/nfit/intel.c | 41 ++-
>  include/asm-generic/cacheflush.h  | 31 +++
>  3 files changed, 53 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cacheflush.h 
> b/arch/x86/include/asm/cacheflush.h
> index b192d917a6d0..ac4d4fd4e508 100644
> --- a/arch/x86/include/asm/cacheflush.h
> +++ b/arch/x86/include/asm/cacheflush.h
> @@ -10,4 +10,8 @@
>  
>  void clflush_cache_range(void *addr, unsigned int size);
>  
> +/* see comments in the stub version */
> +#define flush_all_caches() \
> + do { wbinvd_on_all_cpus(); } while(0)
> +
>  #endif /* _ASM_X86_CACHEFLUSH_H */
> diff --git a/drivers/acpi/nfit/intel.c b/drivers/acpi/nfit/intel.c
> index 8dd792a55730..f2f6c31e6ab7 100644
> --- a/drivers/acpi/nfit/intel.c
> +++ b/drivers/acpi/nfit/intel.c
> @@ -4,6 +4,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "intel.h"
>  #include "nfit.h"
>  
> @@ -190,8 +191,6 @@ static int intel_security_change_key(struct nvdimm 
> *nvdimm,
>   }
>  }
>  
> -static void nvdimm_invalidate_cache(void);
> -
>  static int __maybe_unused intel_security_unlock(struct nvdimm *nvdimm,
>   const struct nvdimm_key_data *key_data)
>  {
> @@ -210,6 +209,9 @@ static int __maybe_unused intel_security_unlock(struct 
> nvdimm *nvdimm,
>   };
>   int rc;
>  
> + if (!flush_all_caches_capable())
> + return -EINVAL;
> +
>   if (!test_bit(NVDIMM_INTEL_UNLOCK_UNIT, &nfit_mem->dsm_mask))
>   return -ENOTTY;
>  
> @@ -228,7 +230,7 @@ static int __maybe_unused intel_security_unlock(struct 
> nvdimm *nvdimm,
>   }
>  
>   /* DIMM unlocked, invalidate all CPU caches before we read it */
> - nvdimm_invalidate_cache();
> + flush_all_caches();
>  
>   return 0;
>  }
> @@ -294,11 +296,14 @@ static int __maybe_unused intel_security_erase(struct 
> nvdimm *nvdimm,
>   },
>   };
>  
> + if (!flush_all_caches_capable())
> + return -EINVAL;
> +
>   if (!test_bit(cmd, &nfit_mem->dsm_mask))
>   return -ENOTTY;
>  
>   /* flush all cache before we erase DIMM */
> - nvdimm_invalidate_cache();
> + flush_all_caches();
>   memcpy(nd_cmd.cmd.passphrase, key->data,
>   sizeof(nd_cmd.cmd.passphrase));
>   rc = nvdimm_ctl(nvdimm, ND_CMD_CALL, &nd_cmd, sizeof(nd_cmd), NULL);
> @@ -318,7 +323,7 @@ static int __maybe_unused intel_security_erase(struct 
> nvdimm *nvdimm,
>   }
>  
>   /* DIMM erased, invalidate all CPU caches before we read it */
> - nvdimm_invalidate_cache();
> + flush_all_caches();
>   return 0;
>  }
>  
> @@ -338,6 +343,9 @@ static int __maybe_unused 
> intel_security_query_overwrite(struct nvdimm *nvdimm)
>   },
>   };
>  
> + if (!flush_all_caches_capable())
> + return -EINVAL;
> +
>   if (!test_bit(NVDIMM_INTEL_QUERY_OVERWRITE, &nfit_mem->dsm_mask))
>   return -ENOTTY;
>  
> @@ -355,7 +363,7 @@ static int __maybe_unused 
> intel_security_query_overwrite(struct nvdimm *nvdimm)
>   }
>  
>   /* flush all cache before we make the nvdimms available */
> - nvdimm_invalidate_cache();
> + flush_all_caches();
>   return 0;
>  }
>  
> @@ -377,11 +385,14 @@ static int __maybe_unused 
> intel_security_overwrite(struct nvdimm *nv

Re: [PATCH] dax: Remove usage of the deprecated ida_simple_xxx API

2022-09-26 Thread Ira Weiny

On Sun, Sep 25, 2022 at 09:26:35PM -0400, Bo Liu wrote:
> Use ida_alloc_xxx()/ida_free() instead of
> ida_simple_get()/ida_simple_remove().
> The latter is deprecated and more verbose.
> 
> Signed-off-by: Bo Liu 

Reviewed-by: Ira Weiny 

> ---
>  drivers/dax/super.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 9b5e2a5eb0ae..da4438f3188c 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -363,7 +363,7 @@ static void dax_free_inode(struct inode *inode)
>  {
>   struct dax_device *dax_dev = to_dax_dev(inode);
>   if (inode->i_rdev)
> - ida_simple_remove(&dax_minor_ida, iminor(inode));
> + ida_free(&dax_minor_ida, iminor(inode));
>   kmem_cache_free(dax_cache, dax_dev);
>  }
>  
> @@ -445,7 +445,7 @@ struct dax_device *alloc_dax(void *private, const struct 
> dax_operations *ops)
>   if (WARN_ON_ONCE(ops && !ops->zero_page_range))
>   return ERR_PTR(-EINVAL);
>  
> - minor = ida_simple_get(&dax_minor_ida, 0, MINORMASK+1, GFP_KERNEL);
> + minor = ida_alloc_max(&dax_minor_ida, MINORMASK, GFP_KERNEL);
>   if (minor < 0)
>   return ERR_PTR(-ENOMEM);
>  
> @@ -459,7 +459,7 @@ struct dax_device *alloc_dax(void *private, const struct 
> dax_operations *ops)
>   return dax_dev;
>  
>   err_dev:
> - ida_simple_remove(&dax_minor_ida, minor);
> + ida_free(&dax_minor_ida, minor);
>   return ERR_PTR(-ENOMEM);
>  }
>  EXPORT_SYMBOL_GPL(alloc_dax);
> -- 
> 2.27.0
> 
>

Re: [PATCH] dax/hmem: Fix refcount leak in dax_hmem_probe()

2022-12-03 Thread Ira Weiny

On Sat, Dec 03, 2022 at 09:58:58AM +, Yongqiang Liu wrote:
> We should always call dax_region_put() whenever devm_create_dev_dax()
> succeed or fail to avoid refcount leak of dax_region. Move the return
> value check after dax_region_put().

I think dax_region_put is called from dax_region_unregister() automatically on
tear down.

> 
> Fixes: c01044cc8191 ("ACPI: HMAT: refactor hmat_register_target_device to 
> hmem_register_device")

I'm also not sure how this patch is related to this fix.

Ira

> Signed-off-by: Yongqiang Liu 
> ---
>  drivers/dax/hmem/hmem.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> index 1bf040dbc834..09f5cd7b6c8e 100644
> --- a/drivers/dax/hmem/hmem.c
> +++ b/drivers/dax/hmem/hmem.c
> @@ -36,12 +36,11 @@ static int dax_hmem_probe(struct platform_device *pdev)
>   .size = region_idle ? 0 : resource_size(res),
>   };
>   dev_dax = devm_create_dev_dax(&data);
> - if (IS_ERR(dev_dax))
> - return PTR_ERR(dev_dax);
>  
>   /* child dev_dax instances now own the lifetime of the dax_region */
>   dax_region_put(dax_region);
> - return 0;
> +
> + return IS_ERR(dev_dax) ? PTR_ERR(dev_dax) : 0;
>  }
>  
>  static int dax_hmem_remove(struct platform_device *pdev)
> -- 
> 2.25.1
> 
>

Re: [PATCH -next] libnvdimm: Fix some kernel-doc comments

2023-02-23 Thread Ira Weiny

Yang Li wrote:
> Make the description of @nvdimm to @ndd in function
> nvdimm_init_nsarea() and nvdimm_allocated_dpa () to silence the warnings:
> drivers/nvdimm/dimm_devs.c:59: warning: Function parameter or member 'ndd' 
> not described in 'nvdimm_init_nsarea'
> drivers/nvdimm/dimm_devs.c:59: warning: Excess function parameter 'nvdimm' 
> description in 'nvdimm_init_nsarea'
> drivers/nvdimm/dimm_devs.c:844: warning: Function parameter or member 'ndd' 
> not described in 'nvdimm_allocated_dpa'
> drivers/nvdimm/dimm_devs.c:844: warning: Excess function parameter 'nvdimm' 
> description in 'nvdimm_allocated_dpa
> 
> Reported-by: Abaci Robot 
 Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4118

Reviewed-by: Ira Weiny 

> Signed-off-by: Yang Li 
> ---
>  drivers/nvdimm/dimm_devs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
> index 957f7c3d17ba..fc152e6016ca 100644
> --- a/drivers/nvdimm/dimm_devs.c
> +++ b/drivers/nvdimm/dimm_devs.c
> @@ -53,7 +53,7 @@ static int validate_dimm(struct nvdimm_drvdata *ndd)
>  
>  /**
>   * nvdimm_init_nsarea - determine the geometry of a dimm's namespace area
> - * @nvdimm: dimm to initialize
> + * @ndd: dimm to initialize
>   */
>  int nvdimm_init_nsarea(struct nvdimm_drvdata *ndd)
>  {
> @@ -836,7 +836,7 @@ struct resource *nvdimm_allocate_dpa(struct 
> nvdimm_drvdata *ndd,
>  
>  /**
>   * nvdimm_allocated_dpa - sum up the dpa currently allocated to this label_id
> - * @nvdimm: container of dpa-resource-root + labels
> + * @ndd: container of dpa-resource-root + labels
>   * @label_id: dpa resource name of the form pmem-
>   */
>  resource_size_t nvdimm_allocated_dpa(struct nvdimm_drvdata *ndd,
> -- 
> 2.20.1.7.g153144c
>

Re: [PATCH] dax/hmem: Fix refcount leak in dax_hmem_probe()

2023-06-02 Thread Ira Weiny

Paul Cassella wrote:
> On Sat, 3 Dec 2022, Ira Weiny wrote:
> > On Sat, Dec 03, 2022 at 09:58:58AM +, Yongqiang Liu wrote:
> 
> > > We should always call dax_region_put() whenever devm_create_dev_dax()
> > > succeed or fail to avoid refcount leak of dax_region. Move the return
> > > value check after dax_region_put().
> 
> > I think dax_region_put is called from dax_region_unregister() automatically 
> > on
> > tear down.
> 
> Hi Ira,
> 
> Note the reference dax_region_unregister() will be putting is the one 
> devm_create_dev_dax() takes by kref_get(&dax_region->kref).   I think 
> dax_hmem_probe() needs to put its reference in the error case, as in the 
> successful case.

Looking at this again I'm inclined to agree that something is wrong.  But
I'm not sure this patch fixes it. anything.

When you say:

> ...   I think 
> dax_hmem_probe() needs to put its reference in the error case, as in the 
> successful case.

... which kref_get() is dax_hmem_probe() letting go?

Here is what I see with the current code.

dax_hmem_probe()
alloc_dax_region()
kref_get(&dax_region->kref)
devm_add_action_or_reset(... dax_region_unregister, ...)
... kref_put() later...

devm_create_dev_dax()
...may return error...
kref_get()
[dev_dax_release() set to call kref_put() later]
...may return error...

if not error
dax_region_put() => kref_put()

I think this is an extra unneeded put???

Dan I see this pattern repeated in cxl and pmem.  Is the intent to remove
the need for dax_region_unregister() to be called when the platform device
unwinds because the platform device is not intended to own the dax_region
after success?  If so it seems like the device managed call should be
removed too.  Not just calling dax_region_put()?  :-/

But wouldn't that cause an issue with the sysfs entries created?

> 
> Consider, devm_create_dev_dax() has error paths that return without 
> involving dax_region_unregister(), prior to kref_get() and device_add().  
> dax_hmem_probe() is clearly responsible for freeing the region in those 
> cases.

No the devm_add_action_or_reset(... dax_region_unregister, ...) will cause
dax_region_unregister() to release the reference when the platform device
unwinds.

devm_create_dev_dax() configures a reference release through the
dev_dax->type release.  So when the dev_dax device is put the dax_region
will be released through dev_dax_release()->dax_region_put().

> 
> 
> dax_hmem_probe() drops its own reference in the successful case because 
> (per the comment) "child dev_dax instances now own the lifetime of the 
> dax_region".  That ownership is the reference that the error-case 
> dax_region_unregister() is dropping.

No dax_region_unregister() is not just an error case flow.

>
> dax_hmem_probe()'s initial reference 
> also needs to be dropped in the error case, as in the successful case.
> 

I don't follow this.  Doesn't this now result in an invalid reference
release in dax_region_unregister() when the platform device is unwound?
Furthermore, that reference is required I think for the sysfs entries.

> 
> > > Fixes: c01044cc8191 ("ACPI: HMAT: refactor hmat_register_target_device to 
> > > hmem_register_device")
> > 
> > I'm also not sure how this patch is related to this fix.
> > 
> > Ira
> > 
> > > Signed-off-by: Yongqiang Liu 
> > > ---
> > >  drivers/dax/hmem/hmem.c | 5 ++---
> > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> > > index 1bf040dbc834..09f5cd7b6c8e 100644
> > > --- a/drivers/dax/hmem/hmem.c
> > > +++ b/drivers/dax/hmem/hmem.c
> > > @@ -36,12 +36,11 @@ static int dax_hmem_probe(struct platform_device 
> > > *pdev)
> > >   .size = region_idle ? 0 : resource_size(res),
> > >   };
> > >   dev_dax = devm_create_dev_dax(&data);
> > > - if (IS_ERR(dev_dax))
> > > - return PTR_ERR(dev_dax);
> > >  
> > >   /* child dev_dax instances now own the lifetime of the dax_region */
> 
> This comment should probably be updated now.  :)
> 

I think removed...

Dan what do you think of this patch?  Am I seriously off the rails here?
I worry about this code being around for so long.  But since it is in tear
down perhaps it is just a race which has never been lost.

Ira

 8< -

>From f63969c761b04fb5e646e7ba7df77a48bc26ba1c Mon Sep 17 00:00:00 2001
From: Ira Weiny 
Date: Fri, 2 Jun

Re: [PATCH] dax/hmem: Fix refcount leak in dax_hmem_probe()

2023-06-02 Thread Ira Weiny

Paul Cassella wrote:
> On Fri, 2 Jun 2023, Ira Weiny wrote:
> > Paul Cassella wrote:
> > > On Sat, 3 Dec 2022, Ira Weiny wrote:
> > > > On Sat, Dec 03, 2022 at 09:58:58AM +, Yongqiang Liu wrote:
> 
> > > > > We should always call dax_region_put() whenever devm_create_dev_dax()
> > > > > succeed or fail to avoid refcount leak of dax_region. Move the return
> > > > > value check after dax_region_put().
> 
> > > > I think dax_region_put is called from dax_region_unregister() 
> > > > automatically on
> > > > tear down.
> 
> > > Note the reference dax_region_unregister() will be putting is the one 
> > > devm_create_dev_dax() takes by kref_get(&dax_region->kref).   I think 
> > > dax_hmem_probe() needs to put its reference in the error case, as in the 
> > > successful case.
> 
> > Looking at this again I'm inclined to agree that something is wrong.  But
> > I'm not sure this patch fixes it. anything.
> 
> > When you say:
> > 
> > > ...   I think 
> > > dax_hmem_probe() needs to put its reference in the error case, as in the 
> > > successful case.
> > 
> > ... which kref_get() is dax_hmem_probe() letting go?
> 
> Isn't it letting go of the initial kref_init() reference from 
> alloc_dax_region()?

Oh wow!  I did not realize that about kref_init()...  :-(

> 
> Sorry, I had gone through the code a little carelessly yesterday.  Now I 
> think that kref_init() reference is the one that dax_hmem_probe() is 
> dropping in the success case, and which still needs to be dropped in the 
> error case.

yea ok...

> 
> (If so, I think the alloc_dax_region() path that returns NULL on 
> devm_add_action_or_reset() failure, releasing the kref_get reference, will 
> leak the kref_init reference and the region.)

I see now.  The ref is taken prior to the add action or reset to ensure
the dax_region does not go away should the platform device go away...

However, in all the call paths currently the device passed to
alloc_dax_region() can't go away prior to the dev_dax taking a reference.
So I don't think this extra reference is required.  :-/

As you say the ref counting right now is not correct on error flows.  But
I'm torn on the correct solution.

I think a small series broken out so it can be backported if needed would
be best.

Ira

[PATCH RFC 0/4] dax: Clean up dax_region references

2023-06-02 Thread Ira Weiny

In[*] Yongqiang Liu presented a fix to the reference counting associated
with the dax_region in hmem.  At the time it was thought the patch was
unnecessary because dax_region_unregister() call would properly handle
reference counting.

Upon closer inspection Paul noted that this was not the case.  In fact
Yongqiang's patch was correct but there were other issues as well.

This series includes Yongqiang's patch and breaks up additional fixes
which can be backported if necessary followed by a final patch which
simplifies the reference counting.

[*] 
https://lore.kernel.org/all/20221203095858.612027-1-liuyongqian...@huawei.com/

Signed-off-by: Ira Weiny 
---
Ira Weiny (3):
  dax/bus: Fix leaked reference in alloc_dax_region()
  dax/cxl: Fix refcount leak in cxl_dax_region_probe()
  dax/bus: Remove unnecessary reference in alloc_dax_region()

Yongqiang Liu (1):
  dax/hmem: Fix refcount leak in dax_hmem_probe()

 drivers/dax/bus.c   | 6 +-
 drivers/dax/cxl.c   | 7 +--
 drivers/dax/hmem/hmem.c | 7 +--
 drivers/dax/pmem.c  | 8 +---
 4 files changed, 8 insertions(+), 20 deletions(-)
---
base-commit: 921bdc72a0d68977092d6a64855a1b8967acc1d9
change-id: 20230602-dax-region-put-72c289137cb1

Best regards,
-- 
Ira Weiny

[PATCH RFC 2/4] dax/hmem: Fix refcount leak in dax_hmem_probe()

2023-06-02 Thread ira . weiny

From: Yongqiang Liu 

We should always call dax_region_put() whenever devm_create_dev_dax()
succeed or fail to avoid refcount leak of dax_region. Move the return
value check after dax_region_put().

Cc: nvd...@lists.linux.dev
Fixes: c01044cc8191 ("ACPI: HMAT: refactor hmat_register_target_device to 
hmem_register_device")
Reviewed-by: Ira Weiny 
Signed-off-by: Yongqiang Liu 
---
 drivers/dax/hmem/hmem.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index e5fe8b39fb94..b4831a3d3934 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -39,12 +39,10 @@ static int dax_hmem_probe(struct platform_device *pdev)
.size = region_idle ? 0 : range_len(&mri->range),
};
dev_dax = devm_create_dev_dax(&data);
-   if (IS_ERR(dev_dax))
-   return PTR_ERR(dev_dax);
 
/* child dev_dax instances now own the lifetime of the dax_region */
dax_region_put(dax_region);
-   return 0;
+   return IS_ERR(dev_dax) ? PTR_ERR(dev_dax) : 0;
 }
 
 static struct platform_driver dax_hmem_driver = {

-- 
2.40.0

[PATCH RFC 1/4] dax/bus: Fix leaked reference in alloc_dax_region()

2023-06-02 Thread Ira Weiny

kref_init() initializes the ref count to 1.  An extra kref is taken on
the dax_region to be used by the caller.  If devm_add_action_or_reset()
fails this extra reference is leaked.

Drop the extra reference on error.

Fixes: d7fe1a67f658 ("dax: add region 'id', 'size', and 'align' attributes")
Cc: Dan Williams 
Signed-off-by: Ira Weiny 
---
 drivers/dax/bus.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 227800053309..899e29d107b4 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -583,6 +583,7 @@ static void dax_region_unregister(void *region)
dax_region_put(dax_region);
 }
 
+/* The dax_region reference returned should be dropped with dax_region_put() */
 struct dax_region *alloc_dax_region(struct device *parent, int region_id,
struct range *range, int target_node, unsigned int align,
unsigned long flags)
@@ -625,9 +626,13 @@ struct dax_region *alloc_dax_region(struct device *parent, 
int region_id,
return NULL;
}
 
+   /* Hold a reference to return to the caller */
kref_get(&dax_region->kref);
-   if (devm_add_action_or_reset(parent, dax_region_unregister, dax_region))
+   if (devm_add_action_or_reset(parent, dax_region_unregister,
+dax_region)) {
+   kref_put(&dax_region->kref, dax_region_free);
return NULL;
+   }
return dax_region;
 }
 EXPORT_SYMBOL_GPL(alloc_dax_region);

-- 
2.40.0

[PATCH RFC 3/4] dax/cxl: Fix refcount leak in cxl_dax_region_probe()

2023-06-02 Thread Ira Weiny

alloc_dax_region() returns a reference protected dax_region.  Regardless
of the success of the devm_create_dev_dax() the reference returned from
alloc_dax_region() needs to be released.

Drop the dax_region reference regardless of the success of dev_dax
creation.  Clean up comments.

Fixes: 09d09e04d2fc ("cxl/dax: Create dax devices for CXL RAM regions")
Cc: Dan Williams 
Cc: linux-...@vger.kernel.org
Signed-off-by: Ira Weiny 

---
This work was inspired by Yongqiang Liu here:

https://lore.kernel.org/all/20221203095858.612027-1-liuyongqian...@huawei.com/
---
 drivers/dax/cxl.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index ccdf8de85bd5..bbfe71cf4325 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -29,12 +29,11 @@ static int cxl_dax_region_probe(struct device *dev)
.size = range_len(&cxlr_dax->hpa_range),
};
dev_dax = devm_create_dev_dax(&data);
-   if (IS_ERR(dev_dax))
-   return PTR_ERR(dev_dax);
 
/* child dev_dax instances now own the lifetime of the dax_region */
dax_region_put(dax_region);
-   return 0;
+
+   return IS_ERR(dev_dax) ? PTR_ERR(dev_dax) : 0;
 }
 
 static struct cxl_driver cxl_dax_region_driver = {

-- 
2.40.0

[PATCH RFC 4/4] dax/bus: Remove unnecessary reference in alloc_dax_region()

2023-06-02 Thread Ira Weiny

All the callers to alloc_dax_region() maintain the device associated
with the dax_region until the dax_region is referenced by the dax_dev
they are creating.

Remove the extra kref that alloc_dax_region() takes.  Add a comment to
clarify the reference counting should additional callers be grown later.

Cc: Yongqiang Liu 
Cc: Dan Williams 
Cc: Paul Cassella 
Signed-off-by: Ira Weiny 
---
 drivers/dax/bus.c   | 13 ++---
 drivers/dax/cxl.c   |  4 
 drivers/dax/hmem/hmem.c |  3 ---
 drivers/dax/pmem.c  |  8 +---
 4 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 899e29d107b4..ed34d8aa6b26 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -583,7 +583,11 @@ static void dax_region_unregister(void *region)
dax_region_put(dax_region);
 }
 
-/* The dax_region reference returned should be dropped with dax_region_put() */
+/*
+ * Caller is responsible to ensure the parent device stays live while the
+ * returned dax_region is in use.  Or as is typically the case, a separate
+ * reference should be taken.
+ */
 struct dax_region *alloc_dax_region(struct device *parent, int region_id,
struct range *range, int target_node, unsigned int align,
unsigned long flags)
@@ -626,13 +630,8 @@ struct dax_region *alloc_dax_region(struct device *parent, 
int region_id,
return NULL;
}
 
-   /* Hold a reference to return to the caller */
-   kref_get(&dax_region->kref);
-   if (devm_add_action_or_reset(parent, dax_region_unregister,
-dax_region)) {
-   kref_put(&dax_region->kref, dax_region_free);
+   if (devm_add_action_or_reset(parent, dax_region_unregister, dax_region))
return NULL;
-   }
return dax_region;
 }
 EXPORT_SYMBOL_GPL(alloc_dax_region);
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index bbfe71cf4325..5ad600ee68b3 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -29,10 +29,6 @@ static int cxl_dax_region_probe(struct device *dev)
.size = range_len(&cxlr_dax->hpa_range),
};
dev_dax = devm_create_dev_dax(&data);
-
-   /* child dev_dax instances now own the lifetime of the dax_region */
-   dax_region_put(dax_region);
-
return IS_ERR(dev_dax) ? PTR_ERR(dev_dax) : 0;
 }
 
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index b4831a3d3934..46e1b343f26e 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -39,9 +39,6 @@ static int dax_hmem_probe(struct platform_device *pdev)
.size = region_idle ? 0 : range_len(&mri->range),
};
dev_dax = devm_create_dev_dax(&data);
-
-   /* child dev_dax instances now own the lifetime of the dax_region */
-   dax_region_put(dax_region);
return IS_ERR(dev_dax) ? PTR_ERR(dev_dax) : 0;
 }
 
diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index f050ea78bb83..a4f016d7f4f5 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -13,7 +13,6 @@ static struct dev_dax *__dax_pmem_probe(struct device *dev)
int rc, id, region_id;
resource_size_t offset;
struct nd_pfn_sb *pfn_sb;
-   struct dev_dax *dev_dax;
struct dev_dax_data data;
struct nd_namespace_io *nsio;
struct dax_region *dax_region;
@@ -65,12 +64,7 @@ static struct dev_dax *__dax_pmem_probe(struct device *dev)
.pgmap = &pgmap,
.size = range_len(&range),
};
-   dev_dax = devm_create_dev_dax(&data);
-
-   /* child dev_dax instances now own the lifetime of the dax_region */
-   dax_region_put(dax_region);
-
-   return dev_dax;
+   return devm_create_dev_dax(&data);
 }
 
 static int dax_pmem_probe(struct device *dev)

-- 
2.40.0

Re: [PATCH] nvdimm: make nd_class variable static

2023-06-16 Thread Ira Weiny

Ben Dooks wrote:
> The nd_class is not used outside of drivers/nvdimm/bus.c and thus sparse
> is generating the following warning. Remove this by making it static:
> 
> drivers/nvdimm/bus.c:28:14: warning: symbol 'nd_class' was not declared. 
> Should it be static?
 
Reviewed-by: Ira Weiny 

> Signed-off-by: Ben Dooks 
> ---
>  drivers/nvdimm/bus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> index 954dbc105fc8..5852fe290523 100644
> --- a/drivers/nvdimm/bus.c
> +++ b/drivers/nvdimm/bus.c
> @@ -25,7 +25,7 @@
>  
>  int nvdimm_major;
>  static int nvdimm_bus_major;
> -struct class *nd_class;
> +static struct class *nd_class;
>  static DEFINE_IDA(nd_ida);
>  
>  static int to_nd_device_type(const struct device *dev)
> -- 
> 2.39.2
>

Re: [PATCH] dax: include bus.h for definition of run_dax()

2023-06-16 Thread Ira Weiny

Ben Dooks wrote:
> The run_dax() prototype is defined in "bus.h" but drivers/dax/super.c
> does not include this. Include bus.h to silece the following sparse
> warning:
> 
> drivers/dax/super.c:337:6: warning: symbol 'run_dax' was not declared. Should 
> it be static?

A different version of this fix has already been queued up.

https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/commit/?h=libnvdimm-fixes&id=2d5153526f929838b0912ded26862840f72745f4

Thank you anyway,
Ira

Re: [PATCH] libnvdimm/of_pmem: Add check and kfree for kstrdup

2023-06-20 Thread Ira Weiny

Jiasheng Jiang wrote:
> Add check for the return value of kstrdup() and return the error
> if it fails in order to avoid NULL pointer dereference.
> Moreover, use kfree() in the later error handling in order to avoid
> memory leak.
> 
> Fixes: 49bddc73d15c ("libnvdimm/of_pmem: Provide a unique name for bus 
> provider")
> Signed-off-by: Jiasheng Jiang 
> ---
>  drivers/nvdimm/of_pmem.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> index 10dbdcdfb9ce..fe6edb7e6631 100644
> --- a/drivers/nvdimm/of_pmem.c
> +++ b/drivers/nvdimm/of_pmem.c
> @@ -31,11 +31,17 @@ static int of_pmem_region_probe(struct platform_device 
> *pdev)
>   return -ENOMEM;
>  
>   priv->bus_desc.provider_name = kstrdup(pdev->name, GFP_KERNEL);
> + if (!priv->bus_desc.provider_name) {
> + kfree(priv);
> + return -ENOMEM;
> + }
> +
>   priv->bus_desc.module = THIS_MODULE;
>   priv->bus_desc.of_node = np;
>  
>   priv->bus = bus = nvdimm_bus_register(&pdev->dev, &priv->bus_desc);
>   if (!bus) {
> + kfree(priv->bus_desc.provider_name);

Nice catch!

However, this free needs to happen in of_pmem_region_remove() as well.

Ira

>   kfree(priv);
>   return -ENODEV;
>   }
> -- 
> 2.25.1
>

Re: [PATCH] libnvdimm/of_pmem: Add check and kfree for kstrdup

2023-06-20 Thread Ira Weiny

Ira Weiny wrote:
> Jiasheng Jiang wrote:
> > Add check for the return value of kstrdup() and return the error
> > if it fails in order to avoid NULL pointer dereference.
> > Moreover, use kfree() in the later error handling in order to avoid
> > memory leak.
> > 
> > Fixes: 49bddc73d15c ("libnvdimm/of_pmem: Provide a unique name for bus 
> > provider")
> > Signed-off-by: Jiasheng Jiang 
> > ---
> >  drivers/nvdimm/of_pmem.c | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> > index 10dbdcdfb9ce..fe6edb7e6631 100644
> > --- a/drivers/nvdimm/of_pmem.c
> > +++ b/drivers/nvdimm/of_pmem.c
> > @@ -31,11 +31,17 @@ static int of_pmem_region_probe(struct platform_device 
> > *pdev)
> > return -ENOMEM;
> >  
> > priv->bus_desc.provider_name = kstrdup(pdev->name, GFP_KERNEL);
> > +   if (!priv->bus_desc.provider_name) {
> > +   kfree(priv);
> > +   return -ENOMEM;
> > +   }
> > +
> > priv->bus_desc.module = THIS_MODULE;
> > priv->bus_desc.of_node = np;
> >  
> > priv->bus = bus = nvdimm_bus_register(&pdev->dev, &priv->bus_desc);
> > if (!bus) {
> > +   kfree(priv->bus_desc.provider_name);
> 
> Nice catch!
> 
> However, this free needs to happen in of_pmem_region_remove() as well.

Looks like the mail from my phone had html in it.  Sorry for that.

This would be better with devm_kstrdup() and then we don't have to worry
about the kfree at all.

Ira

Re: [PATCH] libnvdimm/of_pmem: Replace kstrdup with devm_kstrdup and add check

2023-06-23 Thread Ira Weiny

Jiasheng Jiang wrote:
> Replace kstrdup() with devm_kstrdup() to avoid memory leak and
> add check for the return value of the devm_kstrdup() to avoid
> NULL pointer dereference
> 
> Fixes: 49bddc73d15c ("libnvdimm/of_pmem: Provide a unique name for bus 
> provider")
> Signed-off-by: Jiasheng Jiang 

V2?  references to the first fix and review?

> ---
>  drivers/nvdimm/of_pmem.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> index 10dbdcdfb9ce..5106dfe0147b 100644
> --- a/drivers/nvdimm/of_pmem.c
> +++ b/drivers/nvdimm/of_pmem.c
> @@ -30,7 +30,12 @@ static int of_pmem_region_probe(struct platform_device 
> *pdev)
>   if (!priv)
>   return -ENOMEM;
>  
> - priv->bus_desc.provider_name = kstrdup(pdev->name, GFP_KERNEL);
> + priv->bus_desc.provider_name = devm_kstrdup(pdev->name, GFP_KERNEL);

Again, thanks for finding and trying to fix this but you did not even compile
test this.  :-(

/** 
 * devm_kstrdup - Allocate resource managed space and
 *copy an existing string into that.
 * @dev: Device to allocate memory for
 * @s: the string to duplicate
 * @gfp: the GFP mask used in the devm_kmalloc() call when
 *   allocating memory
 * RETURNS:
 * Pointer to allocated string on success, NULL on failure.
 */ 
char *devm_kstrdup(struct device *dev, const char *s, gfp_t gfp)
   ^^
   ??

Ira

> + if (!priv->bus_desc.provider_name) {
> + kfree(priv);
> + return -ENOMEM;
> + }
> +
>   priv->bus_desc.module = THIS_MODULE;
>   priv->bus_desc.of_node = np;
>  
> -- 
> 2.25.1
>

Re: [PATCH] libnvdimm/of_pmem: Add check and kfree for kstrdup

2023-06-23 Thread Ira Weiny

Jiasheng Jiang wrote:
> On Wed, Jun 21, 2023 at 00:04:36 +0800, Ira Weiny wrote:
> > Ira Weiny wrote:
> >> Jiasheng Jiang wrote:

[snip]

> >> 
> >> Nice catch!
> >> 
> >> However, this free needs to happen in of_pmem_region_remove() as well.
> > 
> > Looks like the mail from my phone had html in it.  Sorry for that.
> > 
> > This would be better with devm_kstrdup() and then we don't have to worry
> > about the kfree at all.
> 
> Looks good.
> I have submitted a new patch "libnvdimm/of_pmem: Replace kstrdup with 
> devm_kstrdup and add check".
> Since the titie has been modified, I did not submitted a v2.

Ah ok...  But looks like we will need a v3.  See the other email.

Thanks again for trying to fix this,
Ira

Re: [PATCH v3] libnvdimm/of_pmem: Replace kstrdup with devm_kstrdup and add check

2023-07-10 Thread Ira Weiny

Jiasheng Jiang wrote:
> Replace kstrdup() with devm_kstrdup() to avoid memory leak and
> add check for the return value of the devm_kstrdup() to avoid
> NULL pointer dereference
> 
> Fixes: 49bddc73d15c ("libnvdimm/of_pmem: Provide a unique name for bus 
> provider")
> Signed-off-by: Jiasheng Jiang 

LTGM
Reviewed-by: Ira Weiny 

> ---
> Changelog:
> 
> v2 -> v3:
> 
> 1. Correct the usage of devm_kstrdup().
> 
> v1 -> v2:
> 
> 1. Replace kstrdup() with devm_kstrdup().
> ---
>  drivers/nvdimm/of_pmem.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> index 10dbdcdfb9ce..1d2b1ab5b737 100644
> --- a/drivers/nvdimm/of_pmem.c
> +++ b/drivers/nvdimm/of_pmem.c
> @@ -30,7 +30,12 @@ static int of_pmem_region_probe(struct platform_device 
> *pdev)
>   if (!priv)
>   return -ENOMEM;
>  
> - priv->bus_desc.provider_name = kstrdup(pdev->name, GFP_KERNEL);
> + priv->bus_desc.provider_name = devm_kstrdup(&pdev->dev, pdev->name, 
> GFP_KERNEL);
> + if (!priv->bus_desc.provider_name) {
> + kfree(priv);
> + return -ENOMEM;
> + }
> +
>   priv->bus_desc.module = THIS_MODULE;
>   priv->bus_desc.of_node = np;
>  
> -- 
> 2.25.1
>

Re: [PATCH v2] nvdimm: of_pmem: Check return value and add kfree for kstrdup

2023-08-29 Thread Ira Weiny

Chen Ni wrote:
> Check the return value of kstrdup() and add kfree() for kstrdup() to 
> avoid memory leak.
> 
> Fixes: 49bddc73d15c ("libnvdimm/of_pmem: Provide a unique name for bus 
> provider")
> Signed-off-by: Chen Ni 
> ---
> Changelog:
> 
> v1 -> v2:
> 
> 1.Add a fixes tag.
> 2.Update commit message.
> ---
>  drivers/nvdimm/of_pmem.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> index 10dbdcdfb9ce..fe6edb7e6631 100644
> --- a/drivers/nvdimm/of_pmem.c
> +++ b/drivers/nvdimm/of_pmem.c
> @@ -31,11 +31,17 @@ static int of_pmem_region_probe(struct platform_device 
> *pdev)
>   return -ENOMEM;
>  
>   priv->bus_desc.provider_name = kstrdup(pdev->name, GFP_KERNEL);

Could this be done with a devm_kstrdup()?

> + if (!priv->bus_desc.provider_name) {
> + kfree(priv);
> + return -ENOMEM;
> + }
> +
>   priv->bus_desc.module = THIS_MODULE;
>   priv->bus_desc.of_node = np;
>  
>   priv->bus = bus = nvdimm_bus_register(&pdev->dev, &priv->bus_desc);
>   if (!bus) {
> + kfree(priv->bus_desc.provider_name);

If not using devm_kstrdup() don't you need a kfree in
of_pmem_region_remove?

Ira

>   kfree(priv);
>   return -ENODEV;
>   }
> -- 
> 2.25.1
>

Re: [PATCH v3] libnvdimm/of_pmem: Use devm_kstrdup instead of kstrdup and check its return value

2023-09-14 Thread Ira Weiny

Dave Jiang wrote:
> 
> 
> On 9/14/23 00:03, Chen Ni wrote:

[snip]

> > diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> > index 1b9f5b8a6167..5765674b36f2 100644
> > --- a/drivers/nvdimm/of_pmem.c
> > +++ b/drivers/nvdimm/of_pmem.c
> > @@ -30,7 +30,13 @@ static int of_pmem_region_probe(struct platform_device 
> > *pdev)
> > if (!priv)
> > return -ENOMEM;
> >  
> > -   priv->bus_desc.provider_name = kstrdup(pdev->name, GFP_KERNEL);
> > +   priv->bus_desc.provider_name = devm_kstrdup(&pdev->dev, pdev->name,
> > +   GFP_KERNEL);
> > +   if (!priv->bus_desc.provider_name) {
> > +   kfree(priv);
> 
> I wonder if priv should be allocated with devm_kzalloc() instead to reduce 
> the resource management burden. 

I think it could be but this is the driver and I wonder if leaving the
allocation around until the platform device goes away was undesirable for
some reason?

Ira

Re: [PATCH] nd_btt: Make BTT lanes preemptible

2023-09-14 Thread Ira Weiny

Tomáš Glozar wrote:
> From: Tomas Glozar 
> 
> nd_region_acquire_lane uses get_cpu, which disables preemption. This is
> an issue on PREEMPT_RT kernels, since btt_write_pg and also
> nd_region_acquire_lane itself take a spin lock, resulting in BUG:
> sleeping function called from invalid context.

Is the bug in 1 of 2 places?

1) When btt_write_pg()->lock_map() (when the number of lanes is < number
   of cpus) and the lane is acquired is called?

*or*

2) When nd_region_acquire_lane() internally trys to take it's lock?

A copy/paste of the BUG observed would have been more clear I think.

Regardless I *think* this is ok but I'm worried I don't fully understand
what the problem is.

Ira

> 
> Fix the issue by replacing get_cpu with smp_process_id and
> migrate_disable when needed. This makes BTT operations preemptible, thus
> permitting the use of spin_lock.
> 
> Fixes: 5212e11fde4d ("nd_btt: atomic sector updates")
> Signed-off-by: Tomas Glozar 
> ---
>  drivers/nvdimm/region_devs.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 0a81f87f6f6c..e2f1fb99707f 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -939,7 +939,8 @@ unsigned int nd_region_acquire_lane(struct nd_region 
> *nd_region)
>  {
>   unsigned int cpu, lane;
>  
> - cpu = get_cpu();
> + migrate_disable();
> + cpu = smp_processor_id();
>   if (nd_region->num_lanes < nr_cpu_ids) {
>   struct nd_percpu_lane *ndl_lock, *ndl_count;
>  
> @@ -958,16 +959,15 @@ EXPORT_SYMBOL(nd_region_acquire_lane);
>  void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane)
>  {
>   if (nd_region->num_lanes < nr_cpu_ids) {
> - unsigned int cpu = get_cpu();
> + unsigned int cpu = smp_processor_id();
>   struct nd_percpu_lane *ndl_lock, *ndl_count;
>  
>   ndl_count = per_cpu_ptr(nd_region->lane, cpu);
>   ndl_lock = per_cpu_ptr(nd_region->lane, lane);
>   if (--ndl_count->count == 0)
>   spin_unlock(&ndl_lock->lock);
> - put_cpu();
>   }
> - put_cpu();
> + migrate_enable();
>  }
>  EXPORT_SYMBOL(nd_region_release_lane);
>  
> -- 
> 2.39.3
>

Re: [PATCH v2 1/2] ACPI: NFIT: Fix incorrect calculation of idt size

2023-09-15 Thread Ira Weiny

Yu Liao wrote:
> acpi_nfit_interleave's field 'line_offset' is switched to flexible array [1],
> but sizeof_idt() still calculates the size in the form of 1-element array.
> 
> Therefore, fix incorrect calculation in sizeof_idt().
> 
> [1] https://lore.kernel.org/lkml/2652195.BddDVKsqQX@kreacher/
> 
> Fixes: 2a5ab99847bd ("ACPICA: struct acpi_nfit_interleave: Replace 1-element 
> array with flexible array")
> Cc: sta...@vger.kernel.org # v6.4+
> Signed-off-by: Yu Liao 
> Reviewed-by: Dave Jiang 

Reviewed-by: Ira Weiny 

> ---
> v1 -> v2: add Dave's review tag and cc nvd...@lists.linux.dev
> ---
>  drivers/acpi/nfit/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index 07204d482968..305f590c54a8 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -855,7 +855,7 @@ static size_t sizeof_idt(struct acpi_nfit_interleave *idt)
>  {
>   if (idt->header.length < sizeof(*idt))
>   return 0;
> - return sizeof(*idt) + sizeof(u32) * (idt->line_count - 1);
> + return sizeof(*idt) + sizeof(u32) * idt->line_count;
>  }
>  
>  static bool add_idt(struct acpi_nfit_desc *acpi_desc,
> -- 
> 2.25.1
>

Re: [PATCH v2 2/2] ACPI: NFIT: use struct_size() helper

2023-09-15 Thread Ira Weiny

Yu Liao wrote:
> Make use of the struct_size() helper instead of an open-coded version,
> in order to avoid any potential type mistakes or integer overflows that,
> in the worst scenario, could lead to heap overflows.
> 
> Signed-off-by: Yu Liao 
> Reviewed-by: Dave Jiang 

Reviewed-by: Ira Weiny 

> ---
>  drivers/acpi/nfit/core.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index 305f590c54a8..2f7217600307 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -712,8 +712,7 @@ static bool add_spa(struct acpi_nfit_desc *acpi_desc,
>   }
>   }
>  
> - nfit_spa = devm_kzalloc(dev, sizeof(*nfit_spa) + sizeof_spa(spa),
> - GFP_KERNEL);
> + nfit_spa = devm_kzalloc(dev, struct_size(nfit_spa, spa, 1), GFP_KERNEL);
>   if (!nfit_spa)
>   return false;
>   INIT_LIST_HEAD(&nfit_spa->list);
> @@ -741,7 +740,7 @@ static bool add_memdev(struct acpi_nfit_desc *acpi_desc,
>   return true;
>   }
>  
> - nfit_memdev = devm_kzalloc(dev, sizeof(*nfit_memdev) + sizeof(*memdev),
> + nfit_memdev = devm_kzalloc(dev, struct_size(nfit_memdev, memdev, 1),
>   GFP_KERNEL);
>   if (!nfit_memdev)
>   return false;
> @@ -812,8 +811,7 @@ static bool add_dcr(struct acpi_nfit_desc *acpi_desc,
>   return true;
>   }
>  
> - nfit_dcr = devm_kzalloc(dev, sizeof(*nfit_dcr) + sizeof(*dcr),
> - GFP_KERNEL);
> + nfit_dcr = devm_kzalloc(dev, struct_size(nfit_dcr, dcr, 1), GFP_KERNEL);
>   if (!nfit_dcr)
>   return false;
>   INIT_LIST_HEAD(&nfit_dcr->list);
> @@ -855,7 +853,7 @@ static size_t sizeof_idt(struct acpi_nfit_interleave *idt)
>  {
>   if (idt->header.length < sizeof(*idt))
>   return 0;
> - return sizeof(*idt) + sizeof(u32) * idt->line_count;
> + return struct_size(idt, line_offset, idt->line_count);
>  }
>  
>  static bool add_idt(struct acpi_nfit_desc *acpi_desc,
> -- 
> 2.25.1
>

Re: [PATCH v3] libnvdimm/of_pmem: Use devm_kstrdup instead of kstrdup and check its return value

2023-09-15 Thread Ira Weiny

Chen Ni wrote:
> Use devm_kstrdup() instead of kstrdup() and check its return value to
> avoid memory leak.
> 
> Fixes: 49bddc73d15c ("libnvdimm/of_pmem: Provide a unique name for bus 
> provider")
> Signed-off-by: Chen Ni 

Reviewed-by: Ira Weiny 

> ---
> Changelog:
> 
> v2 -> v3:
> 
> 1. Use devm_kstrdup() instead of kstrdup()
> 
> v1 -> v2:
> 
> 1. Add a fixes tag.
> 2. Update commit message.
> ---
>  drivers/nvdimm/of_pmem.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> index 1b9f5b8a6167..5765674b36f2 100644
> --- a/drivers/nvdimm/of_pmem.c
> +++ b/drivers/nvdimm/of_pmem.c
> @@ -30,7 +30,13 @@ static int of_pmem_region_probe(struct platform_device 
> *pdev)
>   if (!priv)
>   return -ENOMEM;
>  
> - priv->bus_desc.provider_name = kstrdup(pdev->name, GFP_KERNEL);
> + priv->bus_desc.provider_name = devm_kstrdup(&pdev->dev, pdev->name,
> + GFP_KERNEL);
> + if (!priv->bus_desc.provider_name) {
> + kfree(priv);
> + return -ENOMEM;
> + }
> +
>   priv->bus_desc.module = THIS_MODULE;
>   priv->bus_desc.of_node = np;
>  
> -- 
> 2.25.1
>

Re: [PATCH] nd_btt: Make BTT lanes preemptible

2023-09-18 Thread Ira Weiny

Tomas Glozar wrote:
> čt 14. 9. 2023 v 22:18 odesílatel Ira Weiny  napsal:
> > Is the bug in 1 of 2 places?
> >
> > 1) When btt_write_pg()->lock_map() (when the number of lanes is < number
> >of cpus) and the lane is acquired is called?
> >
> > *or*
> >
> > 2) When nd_region_acquire_lane() internally trys to take it's lock?
> >
> > A copy/paste of the BUG observed would have been more clear I think.
> >
> 
> The BUG was observed on btt_write_pg()->lock_map(), but I assume the
> BUG will also happen on the lock in nd_region_acquire_lane, since that
> is also a spin lock, i.e. a sleeping lock on RT.
> 
> BUG observed in dmesg when running ndctl tests on RT kernel without the patch:

Thanks for clarifying.  Could you respin the patch with the text below?
That would have saved me a lot of time digging to see what the code path
was.

...
BUG: sleeping function called from invalid context at 
kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 4903, name: 
libndctl
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
1 lock held by libndctl/4903:
 #0: 8c184a270060 (&arena->map_locks[i].lock){+.+.}-{2:2}, at: 
btt_write_pg+0x2d7/0x500 [nd_btt]
Preemption disabled at:
[] nd_region_acquire_lane+0x15/0x90 [libnvdimm]
Call Trace:

 dump_stack_lvl+0x8e/0xb0
 __might_resched+0x19b/0x250
 rt_spin_lock+0x4c/0x100
 ? btt_write_pg+0x2d7/0x500 [nd_btt]
 btt_write_pg+0x2d7/0x500 [nd_btt]
 ? local_clock_noinstr+0x9/0xc0
 btt_submit_bio+0x16d/0x270 [nd_btt]
 __submit_bio+0x48/0x80
 __submit_bio_noacct+0x7e/0x1e0
 submit_bio_wait+0x58/0xb0
 __blkdev_direct_IO_simple+0x107/0x240
 ? inode_set_ctime_current+0x51/0x110
 ? __pfx_submit_bio_wait_endio+0x10/0x10
 blkdev_write_iter+0x1d8/0x290
 vfs_write+0x237/0x330
...

With a respin including this trace:

Reviewed-by: Ira Weiny

Re: [PATCH v2] nd_btt: Make BTT lanes preemptible

2023-09-25 Thread Ira Weiny

Tomas Glozar wrote:
> nd_region_acquire_lane uses get_cpu, which disables preemption. This is
> an issue on PREEMPT_RT kernels, since btt_write_pg and also
> nd_region_acquire_lane itself take a spin lock, resulting in BUG:
> sleeping function called from invalid context.
> 
> Fix the issue by replacing get_cpu with smp_process_id and
> migrate_disable when needed. This makes BTT operations preemptible, thus
> permitting the use of spin_lock.
> 
> BUG example occurring when running ndctl tests on PREEMPT_RT kernel:
> 
> BUG: sleeping function called from invalid context at
> kernel/locking/spinlock_rt.c:48
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 4903, name:
> libndctl
> preempt_count: 1, expected: 0
> RCU nest depth: 0, expected: 0
> Preemption disabled at:
> [] nd_region_acquire_lane+0x15/0x90 [libnvdimm]
> Call Trace:
>  
>  dump_stack_lvl+0x8e/0xb0
>  __might_resched+0x19b/0x250
>  rt_spin_lock+0x4c/0x100
>  ? btt_write_pg+0x2d7/0x500 [nd_btt]
>  btt_write_pg+0x2d7/0x500 [nd_btt]
>  ? local_clock_noinstr+0x9/0xc0
>  btt_submit_bio+0x16d/0x270 [nd_btt]
>  __submit_bio+0x48/0x80
>  __submit_bio_noacct+0x7e/0x1e0
>  submit_bio_wait+0x58/0xb0
>  __blkdev_direct_IO_simple+0x107/0x240
>  ? inode_set_ctime_current+0x51/0x110
>  ? __pfx_submit_bio_wait_endio+0x10/0x10
>  blkdev_write_iter+0x1d8/0x290
>  vfs_write+0x237/0x330
>  ...
>  
> 
> Fixes: 5212e11fde4d ("nd_btt: atomic sector updates")
> Signed-off-by: Tomas Glozar 

Thanks for the clarification.

Reviewed-by: Ira Weiny

[PATCH 1/4] fs/btrfs: Convert kmap to kmap_local_page() using coccinelle

2021-02-16 Thread ira . weiny

From: Ira Weiny 

Use a simple coccinelle script to help convert the most common
kmap()/kunmap() patterns to kmap_local_page()/kunmap_local().

Note that some kmaps which were caught by this script needed to be
handled by hand because of the strict unmapping order of kunmap_local()
so they are not included in this patch.  But this script got us started.

The development of this patch was aided by the follow script:

// 
// SPDX-License-Identifier: GPL-2.0-only
// Find kmap and replace with kmap_local_page then mark kunmap
//
// Confidence: Low
// Copyright: (C) 2021 Intel Corporation
// URL: http://coccinelle.lip6.fr/

@ catch_all @
expression e, e2;
@@

(
-kmap(e)
+kmap_local_page(e)
)
...
(
-kunmap(...)
+kunmap_local()
)

// 

Signed-off-by: Ira Weiny 
---
 fs/btrfs/compression.c | 4 ++--
 fs/btrfs/inode.c   | 4 ++--
 fs/btrfs/lzo.c | 9 -
 fs/btrfs/raid56.c  | 4 ++--
 4 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index a219dcdb749e..21833a2fe8c6 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1578,7 +1578,7 @@ static void heuristic_collect_sample(struct inode *inode, 
u64 start, u64 end,
curr_sample_pos = 0;
while (index < index_end) {
page = find_get_page(inode->i_mapping, index);
-   in_data = kmap(page);
+   in_data = kmap_local_page(page);
/* Handle case where the start is not aligned to PAGE_SIZE */
i = start % PAGE_SIZE;
while (i < PAGE_SIZE - SAMPLING_READ_SIZE) {
@@ -1591,7 +1591,7 @@ static void heuristic_collect_sample(struct inode *inode, 
u64 start, u64 end,
start += SAMPLING_INTERVAL;
curr_sample_pos += SAMPLING_READ_SIZE;
}
-   kunmap(page);
+   kunmap_local(in_data);
put_page(page);
 
index++;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 641f21a11722..66c6f1185de2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6845,7 +6845,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode 
*inode,
if (ret)
goto out;
} else {
-   map = kmap(page);
+   map = kmap_local_page(page);
read_extent_buffer(leaf, map + pg_offset, ptr,
   copy_size);
if (pg_offset + copy_size < PAGE_SIZE) {
@@ -6853,7 +6853,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode 
*inode,
   PAGE_SIZE - pg_offset -
   copy_size);
}
-   kunmap(page);
+   kunmap_local(map);
}
flush_dcache_page(page);
}
diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
index 9084a950dc09..cd042c7567a4 100644
--- a/fs/btrfs/lzo.c
+++ b/fs/btrfs/lzo.c
@@ -118,7 +118,7 @@ int lzo_compress_pages(struct list_head *ws, struct 
address_space *mapping,
struct workspace *workspace = list_entry(ws, struct workspace, list);
int ret = 0;
char *data_in;
-   char *cpage_out;
+   char *cpage_out, *sizes_ptr;
int nr_pages = 0;
struct page *in_page = NULL;
struct page *out_page = NULL;
@@ -258,10 +258,9 @@ int lzo_compress_pages(struct list_head *ws, struct 
address_space *mapping,
}
 
/* store the size of all chunks of compressed data */
-   cpage_out = kmap(pages[0]);
-   write_compress_length(cpage_out, tot_out);
-
-   kunmap(pages[0]);
+   sizes_ptr = kmap_local_page(pages[0]);
+   write_compress_length(sizes_ptr, tot_out);
+   kunmap_local(sizes_ptr);
 
ret = 0;
*total_out = tot_out;
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 7c3f6dc918c1..9759fb31b73e 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2391,13 +2391,13 @@ static noinline void finish_parity_scrub(struct 
btrfs_raid_bio *rbio,
 
/* Check scrubbing parity and repair it */
p = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
-   parity = kmap(p);
+   parity = kmap_local_page(p);
if (memcmp(parity, pointers[rbio->scrubp], PAGE_SIZE))
copy_page(parity, pointers[rbio->scrubp]);
else
/* Parity is right, needn't writeback */
bitmap_clear(rbio->dbitmap, pagenr, 1);
-   kunmap(p);
+   kunmap_local(parity);
 
for (stripe = 0; stripe < nr_data; stripe++)
kunmap(page_in_rbio(rbio,

[PATCH 2/4] fs/btrfs: Convert raid5/6 kmaps to kmap_local_page()

2021-02-16 Thread ira . weiny

From: Ira Weiny 

These kmaps are thread local and don't need to be atomic.  So they can use
the more efficient kmap_local_page().  However, the mapping of pages in
the stripes and the additional parity and qstripe pages are a bit
trickier because the unmapping must occur in the opposite order from the
mapping.  Furthermore, the pointer array in __raid_recover_end_io() may
get reordered.

Convert these calls to kmap_local_page() taking care to reverse the
unmappings of any page arrays as well as being careful with the mappings
of any special pages such as the parity and qstripe pages.

Signed-off-by: Ira Weiny 

---
This patch depends on the fix to raid5/6 kmapping sent previously

https://lore.kernel.org/lkml/20210205163943.gd5...@iweiny-desk2.sc.intel.com/#t
---
 fs/btrfs/raid56.c | 57 +++
 1 file changed, 28 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 9759fb31b73e..04abae305582 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1233,13 +1233,13 @@ static noinline void finish_rmw(struct btrfs_raid_bio 
*rbio)
/* first collect one page from each data stripe */
for (stripe = 0; stripe < nr_data; stripe++) {
p = page_in_rbio(rbio, stripe, pagenr, 0);
-   pointers[stripe] = kmap(p);
+   pointers[stripe] = kmap_local_page(p);
}
 
/* then add the parity stripe */
p = rbio_pstripe_page(rbio, pagenr);
SetPageUptodate(p);
-   pointers[stripe++] = kmap(p);
+   pointers[stripe++] = kmap_local_page(p);
 
if (has_qstripe) {
 
@@ -1249,7 +1249,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio 
*rbio)
 */
p = rbio_qstripe_page(rbio, pagenr);
SetPageUptodate(p);
-   pointers[stripe++] = kmap(p);
+   pointers[stripe++] = kmap_local_page(p);
 
raid6_call.gen_syndrome(rbio->real_stripes, PAGE_SIZE,
pointers);
@@ -1258,10 +1258,8 @@ static noinline void finish_rmw(struct btrfs_raid_bio 
*rbio)
copy_page(pointers[nr_data], pointers[0]);
run_xor(pointers + 1, nr_data - 1, PAGE_SIZE);
}
-
-
-   for (stripe = 0; stripe < rbio->real_stripes; stripe++)
-   kunmap(page_in_rbio(rbio, stripe, pagenr, 0));
+   for (stripe = stripe - 1; stripe >= 0; stripe--)
+   kunmap_local(pointers[stripe]);
}
 
/*
@@ -1780,6 +1778,7 @@ static void __raid_recover_end_io(struct btrfs_raid_bio 
*rbio)
 {
int pagenr, stripe;
void **pointers;
+   void **unmap_array;
int faila = -1, failb = -1;
struct page *page;
blk_status_t err;
@@ -1791,6 +1790,12 @@ static void __raid_recover_end_io(struct btrfs_raid_bio 
*rbio)
goto cleanup_io;
}
 
+   unmap_array = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
+   if (!unmap_array) {
+   err = BLK_STS_RESOURCE;
+   goto cleanup_pointers;
+   }
+
faila = rbio->faila;
failb = rbio->failb;
 
@@ -1814,6 +1819,9 @@ static void __raid_recover_end_io(struct btrfs_raid_bio 
*rbio)
 
/* setup our array of pointers with pages
 * from each stripe
+*
+* NOTE Store a duplicate array of pointers to preserve the
+* pointer order.
 */
for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
/*
@@ -1827,7 +1835,8 @@ static void __raid_recover_end_io(struct btrfs_raid_bio 
*rbio)
} else {
page = rbio_stripe_page(rbio, stripe, pagenr);
}
-   pointers[stripe] = kmap(page);
+   pointers[stripe] = kmap_local_page(page);
+   unmap_array[stripe] = pointers[stripe];
}
 
/* all raid6 handling here */
@@ -1920,24 +1929,14 @@ static void __raid_recover_end_io(struct btrfs_raid_bio 
*rbio)
}
}
}
-   for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
-   /*
-* if we're rebuilding a read, we have to use
-* pages from the bio list
-*/
-   if ((rbio->operation == BTRFS_RBIO_READ_REBUILD ||
-rbio->operation == BTRFS_RBIO_REBUILD_MISSING) &&
-   (stripe == faila || stripe == failb)

[PATCH 0/4] btrfs: Convert more kmaps to kmap_local_page()

2021-02-16 Thread ira . weiny

From: Ira Weiny 

I am submitting these for 5.13.

Further work to remove more kmap() calls in favor of the kmap_local_page() this
series converts those calls which required more than a common pattern which
were covered in my previous series[1].  This is the second of what I hope to be
3 series to fully convert btrfs.  However, the 3rd series is going to be an RFC
because I need to have more eyes on it before I'm sure about what to do.  For
now this series should be good to go for 5.13.

Also this series converts the kmaps in the raid5/6 code which required a fix to
the kmap'ings which was submitted in [2].

Thanks,
Ira

[1] https://lore.kernel.org/lkml/20210210062221.3023586-1-ira.we...@intel.com/
[2] 
https://lore.kernel.org/lkml/20210205163943.gd5...@iweiny-desk2.sc.intel.com/


Ira Weiny (4):
  fs/btrfs: Convert kmap to kmap_local_page() using coccinelle
  fs/btrfs: Convert raid5/6 kmaps to kmap_local_page()
  fs/btrfs: Use kmap_local_page() in __btrfsic_submit_bio()
  fs/btrfs: Convert block context kmap's to kmap_local_page()

 fs/btrfs/check-integrity.c | 12 
 fs/btrfs/compression.c |  4 +--
 fs/btrfs/inode.c   |  4 +--
 fs/btrfs/lzo.c |  9 +++---
 fs/btrfs/raid56.c  | 61 +++---
 5 files changed, 44 insertions(+), 46 deletions(-)

-- 
2.28.0.rc0.12.gb6a658bd00c9

[PATCH 4/4] fs/btrfs: Convert block context kmap's to kmap_local_page()

2021-02-16 Thread ira . weiny

From: Ira Weiny 

btrfsic_read_block() (which calls kmap()) and
btrfsic_release_block_ctx() (which calls kunmap()) are always called
within a single thread of execution.

Therefore the mappings created within these calls can be a thread local
mapping.

Convert the kmap() of bloc_ctx->pagev to kmap_local_page().  Luckily the
unmap loops backwards through the array pointer so no adjustment needs
to be made to the unmapping order.

Signed-off-by: Ira Weiny 
---
 fs/btrfs/check-integrity.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 726eb894be8b..1ff9e19508a7 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -1558,7 +1558,7 @@ static void btrfsic_release_block_ctx(struct 
btrfsic_block_data_ctx *block_ctx)
while (num_pages > 0) {
num_pages--;
if (block_ctx->datav[num_pages]) {
-   kunmap(block_ctx->pagev[num_pages]);
+   kunmap_local(block_ctx->datav[num_pages]);
block_ctx->datav[num_pages] = NULL;
}
if (block_ctx->pagev[num_pages]) {
@@ -1637,7 +1637,7 @@ static int btrfsic_read_block(struct btrfsic_state *state,
i = j;
}
for (i = 0; i < num_pages; i++)
-   block_ctx->datav[i] = kmap(block_ctx->pagev[i]);
+   block_ctx->datav[i] = kmap_local_page(block_ctx->pagev[i]);
 
return block_ctx->len;
 }
-- 
2.28.0.rc0.12.gb6a658bd00c9

[PATCH 3/4] fs/btrfs: Use kmap_local_page() in __btrfsic_submit_bio()

2021-02-16 Thread ira . weiny

From: Ira Weiny 

Again there is an array of pointers which must be unmapped in the correct
order.

Convert the kmap()'s to kmap_local_page() and adjust the unmapping
to work backwards through the unmapping loop.

Signed-off-by: Ira Weiny 
---
 fs/btrfs/check-integrity.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 6ff44e53814c..726eb894be8b 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2677,7 +2677,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
dev_state = btrfsic_dev_state_lookup(bio_dev(bio) + bio->bi_partno);
if (NULL != dev_state &&
(bio_op(bio) == REQ_OP_WRITE) && bio_has_data(bio)) {
-   unsigned int i = 0;
+   int i = 0;
u64 dev_bytenr;
u64 cur_bytenr;
struct bio_vec bvec;
@@ -2702,7 +2702,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
 
bio_for_each_segment(bvec, bio, iter) {
BUG_ON(bvec.bv_len != PAGE_SIZE);
-   mapped_datav[i] = kmap(bvec.bv_page);
+   mapped_datav[i] = kmap_local_page(bvec.bv_page);
i++;
 
if (dev_state->state->print_mask &
@@ -2715,8 +2715,8 @@ static void __btrfsic_submit_bio(struct bio *bio)
  mapped_datav, segs,
  bio, &bio_is_patched,
  bio->bi_opf);
-   bio_for_each_segment(bvec, bio, iter)
-   kunmap(bvec.bv_page);
+   for (--i; i >= 0; i--)
+   kunmap_local(mapped_datav[i]);
kfree(mapped_datav);
} else if (NULL != dev_state && (bio->bi_opf & REQ_PREFLUSH)) {
if (dev_state->state->print_mask &
-- 
2.28.0.rc0.12.gb6a658bd00c9

Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

2019-03-02 Thread Ira Weiny

FWIW I don't have ODP hardware either.  So I can't test this either.

On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> 1. Bug fix: the error handling release pages starting
> at the first page that experienced an error.
> 
> 2. Refinement: release_pages() is better than put_page()
> in a loop.
> 
> 3. Dead code removal: the check for (user_virt & ~page_mask)
> is checking for a condition that can never happen,
> because earlier:
> 
> user_virt = user_virt & page_mask;
> 
> ...so, remove that entire phrase.
> 
> 4. Minor: As long as I'm here, shorten up a couple of long lines
> in the same function, without harming the ability to
> grep for the printed error message.
> 
> Cc: Ira Weiny 
> Cc: Jason Gunthorpe 
> Cc: Andrew Morton 
> Cc: Doug Ledford 
> Cc: linux-r...@vger.kernel.org
> Cc: linux...@kvack.org
> Signed-off-by: John Hubbard 
> ---
> 
> v2: Fixes a kbuild test robot reported build failure, by directly
> including pagemap.h
> 
>  drivers/infiniband/core/umem_odp.c | 25 ++---
>  1 file changed, 10 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem_odp.c 
> b/drivers/infiniband/core/umem_odp.c
> index acb882f279cb..83872c1f3f2c 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -40,6 +40,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -648,25 +649,17 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp 
> *umem_odp, u64 user_virt,
>  
>   if (npages < 0) {
>   if (npages != -EAGAIN)
> - pr_warn("fail to get %zu user pages with error 
> %d\n", gup_num_pages, npages);
> + pr_warn("fail to get %zu user pages with error 
> %d\n",
> + gup_num_pages, npages);
>   else
> - pr_debug("fail to get %zu user pages with error 
> %d\n", gup_num_pages, npages);
> + pr_debug("fail to get %zu user pages with error 
> %d\n",
> +  gup_num_pages, npages);
>   break;
>   }
>  
>   bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
>   mutex_lock(&umem_odp->umem_mutex);
>   for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
> - if (user_virt & ~page_mask) {
> - p += PAGE_SIZE;
> - if (page_to_phys(local_page_list[j]) != p) {
> - ret = -EFAULT;
> - break;
> - }
> - put_page(local_page_list[j]);
> - continue;
> - }
> -

I think this is trying to account for compound pages. (ie page_mask could
represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
But putting the page in that case seems to be the wrong thing to do?

Yes this was added by Artemy[1] now cc'ed.

>   ret = ib_umem_odp_map_dma_single_page(
>   umem_odp, k, local_page_list[j],
>   access_mask, current_seq);
> @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp 
> *umem_odp, u64 user_virt,
>   mutex_unlock(&umem_odp->umem_mutex);
>  
>   if (ret < 0) {
> - /* Release left over pages when handling errors. */
> - for (++j; j < npages; ++j)
> - put_page(local_page_list[j]);
> + /*
> +  * Release pages, starting at the the first page
> +  * that experienced an error.
> +  */
> + release_pages(&local_page_list[j], npages - j);

My concern here is that release_pages handle compound pages, perhaps
differently from the above code so calling it may may not work?  But I've not
really spent a lot of time on it...

:-/

Ira

[1]

commit 403cd12e2cf759ead5cbdcb62bf9872b9618d400
Author: Artemy Kovalyov 
Date:   Wed Apr 5 09:23:55 2017 +0300

IB/umem: Add contiguous ODP support

Currenlty ODP supports only regular MMU pages.
Add ODP support for regions consisting of physically contiguous chunks
of arbitrary order (huge pages for instance) to improve performance.

Signed-off-by: Artemy Kovalyov 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Doug Ledford

Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-12 Thread Ira Weiny

On Tue, Mar 12, 2019 at 05:23:21AM +, Christopher Lameter wrote:
> On Mon, 11 Mar 2019, Dave Chinner wrote:
> 
> > > Direct IO on a mmapped file backed page doesnt make any sense.
> >
> > People have used it for many, many years as zero-copy data movement
> > pattern. i.e. mmap the destination file, use direct IO to DMA direct
> > into the destination file page cache pages, fdatasync() to force
> > writeback of the destination file.
> 
> Well we could make that more safe through a special API that designates a
> range of pages in a file in the same way as for RDMA. This is inherently
> not reliable as we found out.

I'm not following.  What API was not reliable?  In[2] we had ideas on such an
API but AFAIK these have not been tried.

>From what I have seen the above is racy and is prone to the issues John has
seen.  The difference is that Direct IO has a smaller window than RDMA.  (Or at
least I thought we already established that?)

"And also remember that while RDMA might be the case at least some
people care about here it really isn't different from any of the other
gup + I/O cases, including doing direct I/O to a mmap area.  The only
difference in the various cases is how long the area should be pinned
down..."

-- Christoph Hellwig : https://lkml.org/lkml/2018/10/1/591

> 
> > Now we have copy_file_range() to optimise this sort of data
> > movement, the need for games with mmap+direct IO largely goes away.
> > However, we still can't just remove that functionality as it will
> > break lots of random userspace stuff...
> 
> It is already broken and unreliable. Are there really "lots" of these
> things around? Can we test this by adding a warning in the kernel and see
> where it actually crops up?

IMHO I don't think that the copy_file_range() is going to carry us through the
next wave of user performance requirements.  RDMA, while the first, is not the
only technology which is looking to have direct access to files.  XDP is
another.[1]

Ira

[1] https://www.kernel.org/doc/html/v4.19-rc1/networking/af_xdp.html
[2] 
https://lore.kernel.org/lkml/20190205175059.gb21...@iweiny-desk2.sc.intel.com/

Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-12 Thread Ira Weiny

On Wed, Mar 13, 2019 at 09:11:13AM +1100, Dave Chinner wrote:
> On Tue, Mar 12, 2019 at 03:39:33AM -0700, Ira Weiny wrote:
> > IMHO I don't think that the copy_file_range() is going to carry us through 
> > the
> > next wave of user performance requirements.  RDMA, while the first, is not 
> > the
> > only technology which is looking to have direct access to files.  XDP is
> > another.[1]
> 
> Sure, all I doing here was demonstrating that people have been
> trying to get local direct access to file mappings to DMA directly
> into them for a long time. Direct Io games like these are now
> largely unnecessary because we now have much better APIs to do
> zero-copy data transfer between files (which can do hardware offload
> if it is available!).
> 
> It's the long term pins that RDMA does that are the problem here.
> I'm asssuming that for XDP, you're talking about userspace zero copy
> from files to the network hardware and vice versa? transmit is
> simple (read-only mapping), but receive probably requires bpf
> programs to ensure that data (minus headers) in the incoming packet
> stream is correctly placed into the UMEM region?

Yes, exactly.

> 
> XDP receive seems pretty much like the same problem as RDMA writes
> into the file. i.e.  the incoming write DMAs are going to have to
> trigger page faults if the UMEM is a long term pin so the filesystem
> behaves correctly with this remote data placement.  I'd suggest that
> RDMA, XDP and anything other hardware that is going to pin
> file-backed mappings for the long term need to use the same "inform
> the fs of a write operation into it's mapping" mechanisms...

Yes agreed.  I have a hack patch I'm testing right now which allows the user to
take a LAYOUT lease from user space and GUP triggers on that, either allowing
or rejecting the pin based on the lease.  I think this is the first step of
what Jan suggested.[1]  There is a lot more detail to work out with what
happens if that lease needs to be broken.

> 
> And if we start talking about wanting to do peer-to-peer DMA from
> network/GPU device to storage device without going through a
> file-backed CPU mapping, we still need to have the filesystem
> involved to translate file offsets to storage locations the
> filesystem has allocated for the data and to lock them down for as
> long as the peer-to-peer DMA offload is in place.  In effect, this
> is the same problem as RDMA+FS-DAXs - the filesystem owns the file
> offset to storage location mapping and manages storage access
> arbitration, not the mm/vma mapping presented to userspace

I've only daydreamed about Peer-to-peer transfers.  But yes I think this is the
direction we need to go.  But The details of doing a

GPU -> RDMA -> {network } -> RDMA -> FS DAX

And back again... without CPU/OS involvement are only a twinkle in my eye...
If that.

Ira

[1] https://lore.kernel.org/lkml/20190212160707.ga19...@quack2.suse.cz/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1014 matches

Mail list logo