Quoting Bharata B Rao (2014-11-25 22:54:12) > On Wed, Oct 1, 2014 at 8:00 PM, Alexander Graf <ag...@suse.de> wrote: > > > > > > On 01.10.14 00:08, Michael Roth wrote: > >> Quoting Alexander Graf (2014-08-26 06:36:57) > >>> On 19.08.14 02:21, Michael Roth wrote: > >>>> From: Mike Day <ncm...@ncultra.org> > >>>> > >>>> Signed-off-by: Mike Day <ncm...@ncultra.org> > >>>> Signed-off-by: Michael Roth <mdr...@linux.vnet.ibm.com> > >>>> --- > >>>> hw/ppc/spapr_pci.c | 119 > >>>> +++++++++++++++++++++++++++++++++++++++++++++++++ > >>>> include/hw/ppc/spapr.h | 3 ++ > >>>> 2 files changed, 122 insertions(+) > >>>> > >>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > >>>> index 924d488..23a3477 100644 > >>>> --- a/hw/ppc/spapr_pci.c > >>>> +++ b/hw/ppc/spapr_pci.c > >>>> @@ -36,6 +36,16 @@ > >>>> > >>>> #include "hw/pci/pci_bus.h" > >>>> > >>>> +/* #define DEBUG_SPAPR */ > >>>> + > >>>> +#ifdef DEBUG_SPAPR > >>>> +#define DPRINTF(fmt, ...) \ > >>>> + do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0) > >>>> +#else > >>>> +#define DPRINTF(fmt, ...) \ > >>>> + do { } while (0) > >>>> +#endif > >>>> + > >>>> /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */ > >>>> #define RTAS_QUERY_FN 0 > >>>> #define RTAS_CHANGE_FN 1 > >>>> @@ -47,6 +57,31 @@ > >>>> #define RTAS_TYPE_MSI 1 > >>>> #define RTAS_TYPE_MSIX 2 > >>>> > >>>> +/* For set-indicator RTAS interface */ > >>>> +#define INDICATOR_ISOLATION_MASK 0x0001 /* 9001 one bit */ > >>>> +#define INDICATOR_GLOBAL_INTERRUPT_MASK 0x0002 /* 9005 one bit */ > >>>> +#define INDICATOR_ERROR_LOG_MASK 0x0004 /* 9006 one bit */ > >>>> +#define INDICATOR_IDENTIFY_MASK 0x0008 /* 9007 one bit */ > >>>> +#define INDICATOR_RESET_MASK 0x0010 /* 9009 one bit */ > >>>> +#define INDICATOR_DR_MASK 0x00e0 /* 9002 three bits > >>>> */ > >>>> +#define INDICATOR_ALLOCATION_MASK 0x0300 /* 9003 two bits */ > >>>> +#define INDICATOR_EPOW_MASK 0x1c00 /* 9 three bits */ > >>>> + > >>>> +#define INDICATOR_ISOLATION_SHIFT 0x00 /* bit 0 */ > >>>> +#define INDICATOR_GLOBAL_INTERRUPT_SHIFT 0x01 /* bit 1 */ > >>>> +#define INDICATOR_ERROR_LOG_SHIFT 0x02 /* bit 2 */ > >>>> +#define INDICATOR_IDENTIFY_SHIFT 0x03 /* bit 3 */ > >>>> +#define INDICATOR_RESET_SHIFT 0x04 /* bit 4 */ > >>>> +#define INDICATOR_DR_SHIFT 0x05 /* bits 5-7 */ > >>>> +#define INDICATOR_ALLOCATION_SHIFT 0x08 /* bits 8-9 */ > >>>> +#define INDICATOR_EPOW_SHIFT 0x0a /* bits 10-12 */ > >>>> + > >>>> +#define DECODE_DRC_STATE(state, m, s) \ > >>>> + ((((uint32_t)(state) & (uint32_t)(m))) >> (s)) > >>>> + > >>>> +#define ENCODE_DRC_STATE(val, m, s) \ > >>>> + (((uint32_t)(val) << (s)) & (uint32_t)(m)) > >>>> + > >>>> static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid) > >>>> { > >>>> sPAPRPHBState *sphb; > >>>> @@ -402,6 +437,80 @@ static void > >>>> rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu, > >>>> rtas_st(rets, 2, 1);/* 0 == level; 1 == edge */ > >>>> } > >>>> > >>>> +static void rtas_set_indicator(PowerPCCPU *cpu, sPAPREnvironment *spapr, > >>>> + uint32_t token, uint32_t nargs, > >>>> + target_ulong args, uint32_t nret, > >>>> + target_ulong rets) > >>>> +{ > >>>> + uint32_t indicator = rtas_ld(args, 0); > >>>> + uint32_t drc_index = rtas_ld(args, 1); > >>>> + uint32_t indicator_state = rtas_ld(args, 2); > >>>> + uint32_t encoded = 0, shift = 0, mask = 0; > >>>> + uint32_t *pind; > >>>> + sPAPRDrcEntry *drc_entry = NULL; > >>> > >>> This rtas call does not have any idea what a PHB is. Why does it live in > >>> spapr_pci.c? > >> > >> spapr_rtas.c does seem like a better home > >> > >>> > >>>> + > >>>> + if (drc_index == 0) { /* platform indicator */ > >>>> + pind = &spapr->state; > >>>> + } else { > >>>> + drc_entry = spapr_find_drc_entry(drc_index); > >>>> + if (!drc_entry) { > >>>> + DPRINTF("rtas_set_indicator: unable to find drc_entry for > >>>> %x", > >>>> + drc_index); > >>>> + rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR); > >>>> + return; > >>>> + } > >>>> + pind = &drc_entry->state; > >>>> + } > >>>> + > >>>> + switch (indicator) { > >>>> + case 9: /* EPOW */ > >>>> + shift = INDICATOR_EPOW_SHIFT; > >>>> + mask = INDICATOR_EPOW_MASK; > >>>> + break; > >>>> + case 9001: /* Isolation state */ > >>>> + /* encode the new value into the correct bit field */ > >>>> + shift = INDICATOR_ISOLATION_SHIFT; > >>>> + mask = INDICATOR_ISOLATION_MASK; > >>>> + break; > >>>> + case 9002: /* DR */ > >>>> + shift = INDICATOR_DR_SHIFT; > >>>> + mask = INDICATOR_DR_MASK; > >>>> + break; > >>>> + case 9003: /* Allocation State */ > >>>> + shift = INDICATOR_ALLOCATION_SHIFT; > >>>> + mask = INDICATOR_ALLOCATION_MASK; > >>>> + break; > >>>> + case 9005: /* global interrupt */ > >>>> + shift = INDICATOR_GLOBAL_INTERRUPT_SHIFT; > >>>> + mask = INDICATOR_GLOBAL_INTERRUPT_MASK; > >>>> + break; > >>>> + case 9006: /* error log */ > >>>> + shift = INDICATOR_ERROR_LOG_SHIFT; > >>>> + mask = INDICATOR_ERROR_LOG_MASK; > >>>> + break; > >>>> + case 9007: /* identify */ > >>>> + shift = INDICATOR_IDENTIFY_SHIFT; > >>>> + mask = INDICATOR_IDENTIFY_MASK; > >>>> + break; > >>>> + case 9009: /* reset */ > >>>> + shift = INDICATOR_RESET_SHIFT; > >>>> + mask = INDICATOR_RESET_MASK; > >>>> + break; > >>>> + default: > >>>> + DPRINTF("rtas_set_indicator: indicator not implemented: %d", > >>>> + indicator); > >>>> + rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR); > >>>> + return; > >>>> + } > >>>> + > >>>> + encoded = ENCODE_DRC_STATE(indicator_state, mask, shift); > >>>> + /* clear the current indicator value */ > >>>> + *pind &= ~mask; > >>>> + /* set the new value */ > >>>> + *pind |= encoded; > >>>> + rtas_st(rets, 0, RTAS_OUT_SUCCESS); > >>>> +} > >>>> + > >>>> static int pci_spapr_swizzle(int slot, int pin) > >>>> { > >>>> return (slot + pin) % PCI_NUM_PINS; > >>>> @@ -624,6 +733,14 @@ static void spapr_phb_realize(DeviceState *dev, > >>>> Error **errp) > >>>> sphb->lsi_table[i].irq = irq; > >>>> } > >>>> > >>>> + /* make sure the platform EPOW sensor is initialized - the > >>>> + * guest will probe it when there is a hotplug event. > >>>> + */ > >>>> + spapr->state &= ~(uint32_t)INDICATOR_EPOW_MASK; > >>>> + spapr->state |= ENCODE_DRC_STATE(0, > >>>> + INDICATOR_EPOW_MASK, > >>>> + INDICATOR_EPOW_SHIFT); > >>>> + > >>>> if (!info->finish_realize) { > >>>> error_setg(errp, "finish_realize not defined"); > >>>> return; > >>>> @@ -1056,6 +1173,8 @@ void spapr_pci_rtas_init(void) > >>>> spapr_rtas_register(RTAS_IBM_CHANGE_MSI, "ibm,change-msi", > >>>> rtas_ibm_change_msi); > >>>> } > >>>> + spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator", > >>>> + rtas_set_indicator); > >>>> } > >>>> > >>>> static void spapr_pci_register_types(void) > >>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > >>>> index 0ac1a19..fac85f8 100644 > >>>> --- a/include/hw/ppc/spapr.h > >>>> +++ b/include/hw/ppc/spapr.h > >>>> @@ -72,6 +72,9 @@ typedef struct sPAPREnvironment { > >>>> > >>>> /* state for Dynamic Reconfiguration Connectors */ > >>>> sPAPRDrcEntry drc_table[SPAPR_DRC_TABLE_SIZE]; > >>>> + > >>>> + /* Platform state - sensors and indicators */ > >>>> + uint32_t state; > >>> > >>> Do you think it'd be possible to create a special DRC device that > >>> contains all of its tables and global state and also exposes sensors and > >>> indicators? That device could then get linked via qom links to the PHBs > >>> for their slots. > >> > >> Sorry for the delay, I've been going back through the code with this > >> suggestion in mind and there does seem to be a lot of state that > >> can be nicely encapsulated by modeling the DR Connectors as a QOM > >> "device" (though I haven't gone as far as to make them actual > >> DeviceState's since it's more of a firmware abstraction than real > >> hardware) > >> > >> I'm not sure what the best way to plumb things together is, as a first > >> run, since each DRC must have a index drc_index as per spec, I've moved > >> put them under /machine/DRConnector as a flat list, where top-level > >> PHB/CPU/MEMORY DRCs would be allocated statically during sPAPR machine > >> init (since the corresponding DRC indexes/types/etc are hard-coded into > >> the top-level of the boot-time DT anyway, though I guess we could also > >> allocate these on the fly...seems messier though than just plugging new > >> resources into existing DRCs) > >> > >> PHB's in turn will associate themselves with a DRC via an attach/detach > >> method as part of realize (and in the future, hotplug hooks, though > >> that's not part of the series). The PHBs in turn will create a DRC for each > >> hotpluggable PCI slot. > >> > >> Creation is via: > >> > >> sPAPRDRConnector *spapr_dr_connector_new(sPAPRDRConnectorType type, > >> uint32_t id); > >> > >> where the code computes the drc index based on <type> (one of phb, cpu, > >> pci, > >> memory, etc) and <id>, and sticks them under > >> /machine/dr-Connector/<drc_index> > >> > >> Any pci/phb/cpu hotplug hooks can then fetch the DRC via type/id, > >> and hotplug/unplug via attach()/detach() methods. attach() adds > >> the attached/hotplugged DeviceState as a link property of the > >> DRC object, and sets the initial sensor state. > >> > >> rtas calls can fetch DRCs via drc_index, and set/get sensor state > >> via DRC sensor get/set methods. > >> > >> Hotplug event delivery still lives outside of DRC implementation for now. I > >> thought of moving them into DRC, but decisions like whether we should > >> emit events during coldplug/initial boot seemed to require pushing > >> a lot of general machine state into DRCs and making the encapsulation > >> seem superficial. > >> > >> Things end up looking like this (2xxxxxxx are PHBs, 4xxxxxxx are PCI > >> slots): > >> > >> mdroth@loki:~/w/qom/machine/dr-connector$ ls > >> 20000000 40000018 40000038 40000058 40000078 40000098 400000b8 > >> 400000d8 400000f8 > >> 40000000 40000020 40000040 40000060 40000080 400000a0 400000c0 > >> 400000e0 type > >> 40000008 40000028 40000048 40000068 40000088 400000a8 400000c8 > >> 400000e8 > >> 40000010 40000030 40000050 40000070 40000090 400000b0 400000d0 > >> 400000f0 > >> mdroth@loki:~/w/qom/machine/dr-connector$ cd 40000000/ > >> mdroth@loki:~/w/qom/machine/dr-connector/40000000$ ls -l > >> total 0 > >> -rw-r--r-- 1 mdroth mdroth 4096 Dec 31 1969 allocation-state > >> lrwxr-xr-x 2 mdroth mdroth 4096 Dec 31 1969 device -> > >> ../../../machine/peripheral/hp0 > >> -rw-r--r-- 1 mdroth mdroth 4096 Dec 31 1969 drc-index > >> -rw-r--r-- 1 mdroth mdroth 4096 Dec 31 1969 entity-sense > >> -rw-r--r-- 1 mdroth mdroth 4096 Dec 31 1969 indicator-state > >> -rw-r--r-- 1 mdroth mdroth 4096 Dec 31 1969 isolation-state > >> -rw-r--r-- 1 mdroth mdroth 4096 Dec 31 1969 type > >> mdroth@loki:~/w/qom/machine/dr-connector/40000000$ cat allocation-state > >> 1 > >> mdroth@loki:~/w/qom/machine/dr-connector/40000000$ cat indicator-state > >> 1 > >> mdroth@loki:~/w/qom/machine/dr-connector/40000000$ cat > >> ../../../machine/peripheral/hp0/type > >> virtio-net-pci > >> mdroth@loki:~/w/qom/machine/dr-connector/40000000$ > >> > >> Hopefully this is sort of the approach you were thinking of? > > > > This look quite neat so far, looking forward to the patches :). > > Michael, > > Do you have this code/patches anywhere that I could use ? I have got > the initial working versions of both CPU and memory hotplug now for > sPAPR guests based on top of your old PCI hotplug patchset and it > would be good to rebase them on top of your DR connector device work.
Hi Bharata, Here's the latest branch: https://github.com/mdroth/qemu/commits/spapr-pci-hotplug-ppc-next-cleanup4.2 The sPAPRDREntry stuff is now modeled by the sPAPRDRConnector QOM object in hw/ppc/spapr_drc.c, which manages the device's life-cycle based on rtas-set-sensor-state calls from the guest. As part of qemu-side hotplug/unplug you use the attach/detach methods of the DRC to associate DT bits and callbacks for things like device cleanup or rtas calls to fetch a DT node from the device associated with a particular DRC. I still need to fix endian issues, and am realizing the dr connectors and DT bits for PHBs are not actually a prereq for PCI hotplug, so I may be pulling that out to a separate series specific to enabling PHB hotplug (namely for VFIO hotplug). I realize your CPU/MEM sort of depend on the top-level PHB device tree code so I'm not sure how best to deal with that. Worse case we'd roll the initial code into your series and base a follow-up series on that of that instead. Let me know if you have any questions. > > Regards, > Bharata.