On 09/17/2014 12:15 PM, Nathan Fontenot wrote: > On 09/17/2014 02:07 AM, Michael Ellerman wrote: >> >> On Mon, 2014-09-15 at 15:31 -0500, Nathan Fontenot wrote: >>> For pseries system the kernel will be notified of hotplug requests in >>> the form of rtas hotplug events. >> >> Can you flesh that design out a bit for me, I don't entirely get how it's >> going >> to work. >> >> The kernel gets the rtas hotplug events (in rtasd.c) and spits them out to >> userspace, which then writes them back in ? >> >>> This patch creates a common routine that can handle these requests in both >>> the PowerVM anbd PowerKVM environments, handle_dlpar_errorlog(). This also >> ^ >>> creates the initial memory hotplug request handling stub. >>> >>> For PowerVM this patch also creates a new /proc file that the drmgr >>> command will use to write rtas hotplug events to. >> >> Why is this different between phyp and KVM? >> >>> For future PowerKVM handling the rtas check-exception code can pass >>> any rtas hotplug events received to handle_dlpar_errorlog(). >> >> Internally to the kernel you mean? >> > > Perhaps a better explanation of how things work today and where I see > them going is needed. I was trying to avoid a long explanation and I > don't think my shortened explanation worked. I'll include this in v2 > of the patchset too. > > The current hotplug (or dlpar) of devices (the process is generally the > same for memory, cpu, and pci) on PowerVM systems is initiated > from the HMC, which communicates the request to the partitions through > the RSCT framework. The RSCT framework then invokes the drmgr command. > The drmgr command performs the hotplug operation by doing some pieces, > such as most of the rtas calls and device tree parsing, in userspace > and make requests to the kernel to online/offline the device, update the > device tree and add/remove the device. > > For PowerKVM the approach is to follow what is currently being done for > pci hotplug. A hotplug request is initiated from the host. QEMU then > sends an EPOW interrupt to the guest which causes the guest to make the > rtas,check-exception call. In QEMU, the rtas,check-exception call > returns a rtas hotplug event to the guest. I was using this same framework > to also enable memory (and next cpu) hotplug. > > You are correct that the current pci hotplug path for PowerKVM involves > the kernel receiving the rtas event, passing it to rtas_errd in userspace, > and having rtas_errd invoke drmgr. The drmgr command then handles the request > as described above for PowerVM systems. > > There is no need for this circuitous route, we should just handle the entire > hotplug of devices in the kernel. What I am hoping to do is to enable this > by moving the code to handle hotplug from drmgr into the kernel and > provide a single path for handling hotplug for PowerVM and PowerKVM. To > make this work for PowerKVM we will update the kernel rtas code to > recognize rtas hotplug events returned from rtas,check-exception calls > and call handle_dlpar_errorlog(). The hotplug rtas event is never sent out > to userspace.
Wouldn't we still want the event surfaced to userspace so that it can at least be logged? -Tyrel > > For PowerVM systems, I created the /proc/powerpc/dlpar file that a rtas > hotplug event can be written to and passed to handle_dlpar_errorlog(). > There is no chance of updating how we receive hotplug requests on PowerVM > systems. > > Hopefully that explains the design better. > >>> diff --git a/arch/powerpc/platforms/pseries/dlpar.c >>> b/arch/powerpc/platforms/pseries/dlpar.c >>> index a2450b8..574ec73 100644 >>> --- a/arch/powerpc/platforms/pseries/dlpar.c >>> +++ b/arch/powerpc/platforms/pseries/dlpar.c >>> @@ -16,7 +16,9 @@ >>> #include <linux/cpu.h> >>> #include <linux/slab.h> >>> #include <linux/of.h> >>> +#include <linux/proc_fs.h> >>> #include "offline_states.h" >>> +#include "pseries.h" >>> >>> #include <asm/prom.h> >>> #include <asm/machdep.h> >>> @@ -530,13 +532,72 @@ static ssize_t dlpar_cpu_release(const char *buf, >>> size_t count) >>> return count; >>> } >>> >>> +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */ >> >> That is really confusing, but I think it's just a diff artifact? > > Yes, diff artifact. > >> >>> +static int handle_dlpar_errorlog(struct rtas_error_log *error_log) >>> +{ >>> + struct pseries_errorlog *pseries_log; >>> + struct pseries_hp_errorlog *hp_elog; >>> + int rc = -EINVAL; >>> + >>> + pseries_log = get_pseries_errorlog(error_log, >>> + PSERIES_ELOG_SECT_ID_HOTPLUG); >>> + if (!pseries_log) >>> + return rc; >>> + >>> + hp_elog = (struct pseries_hp_errorlog *)pseries_log->data; >>> + if (!hp_elog) >>> + return rc; >> >> I don't see how that can happen? >> >> struct pseries_errorlog { >> __be16 id; /* 0x00 2-byte ASCII section ID */ >> __be16 length; /* 0x02 Section length in bytes */ >> uint8_t version; /* 0x04 Section version */ >> uint8_t subtype; /* 0x05 Section subtype */ >> __be16 creator_component; /* 0x06 Creator component ID */ >> uint8_t data[]; /* 0x08 Start of section data */ >> }; >> >> Should you be checking for length == 0 instead ? >> > > You are correct. > >> Also I think the code will probably end up cleaner if you do the endian >> conversions immediately when you read the hp_elog, rather than passing it >> around in BE and having to remember to convert at all the usages. >> > > Agreed. I'll clean that up. > >>> + switch (hp_elog->resource) { >>> + case PSERIES_HP_ELOG_RESOURCE_MEM: >>> + rc = dlpar_memory(hp_elog); >>> + break; >> >> Please add: >> >> default: >> pr_warn_ratelimited("Unknown resource ..") >> >> Or something. > > can do. > >> >> >>> + } >>> + >>> + return rc; >>> +} >>> + >>> +static ssize_t dlpar_write(struct file *file, const char __user *buf, >>> + size_t count, loff_t *offset) >>> +{ >>> + char *event_buf; >>> + int rc; >>> + >>> + event_buf = kmalloc(count + 1, GFP_KERNEL); >> >> Why + 1 ? It's not null-terminated AFAICS. > > I think that's just habit. You're correct, it's not needed. > >> >>> + if (!event_buf) >>> + return -ENOMEM; >>> + >>> + rc = copy_from_user(event_buf, buf, count); >>> + if (rc) { >>> + kfree(event_buf); >>> + return rc; >>> + } >>> + >>> + rc = handle_dlpar_errorlog((struct rtas_error_log *)event_buf); >> >> If you start with a struct rtas_error_log * you shouldn't need any casts. > > good point. > >> >>> + kfree(event_buf); >>> + return rc ? rc : count; >>> +} >>> + >>> +static const struct file_operations dlpar_fops = { >>> + .write = dlpar_write, >>> + .llseek = noop_llseek, >>> +}; >>> + >>> static int __init pseries_dlpar_init(void) >>> { >>> + struct proc_dir_entry *proc_ent; >>> + >>> + proc_ent = proc_create("powerpc/dlpar", S_IWUSR, NULL, &dlpar_fops); >>> + if (proc_ent) >>> + proc_set_size(proc_ent, 0); >> >> else >> error message at least please >> >> Why are we putting it in /proc, can't it go in /sys/kernel like the mobility >> stuff? > > I have no personal preference, I'll move it to /sys/kernel and update the > creation handling. > >> >>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c >>> b/arch/powerpc/platforms/pseries/hotplug-memory.c >>> index 24abc5c..0e60e15 100644 >>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c >>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c >>> @@ -20,6 +22,9 @@ >>> #include <asm/machdep.h> >>> #include <asm/prom.h> >>> #include <asm/sparsemem.h> >>> +#include <asm/rtas.h> >>> + >>> +DEFINE_MUTEX(dlpar_mem_mutex); >> >> static ? > > yes, that should have been static. > >> >>> diff --git a/arch/powerpc/platforms/pseries/pseries.h >>> b/arch/powerpc/platforms/pseries/pseries.h >>> index b94516b..28bd994 100644 >>> --- a/arch/powerpc/platforms/pseries/pseries.h >>> +++ b/arch/powerpc/platforms/pseries/pseries.h >>> @@ -62,6 +63,15 @@ extern int dlpar_detach_node(struct device_node *); >>> extern int dlpar_acquire_drc(u32); >>> extern int dlpar_release_drc(u32); >>> >>> +#ifdef CONFIG_MEMORY_HOTPLUG >>> +extern int dlpar_memory(struct pseries_hp_errorlog *); >>> +#else >>> +static inline int dlpar_memory(struct pseries_hp_errorlog *hp_elog) >>> +{ >>> + return -ENOTSUPP; >> >> EOPNOTSUPP is a bit more standard. > > ok. > > Thanks for all the feedback. > > -Nathan > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev > _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev