man page for s390_runtime_instr syscall
Hi Michael, I've written a man page for the s390_runtime_instr syscall which was merged with 3.7 (e4b8b3f). Now the question is if you would like to include it in the man-pages although it is completely s390 specific and wont be available on any other arch? Or should it go into a different package? thanks, Jan -- --- /dev/null 2012-12-04 10:52:46.657720288 +0100 +++ s390_runtime_instr.22012-10-09 13:55:39.0 +0200 @@ -0,0 +1,73 @@ +.\" Copyright IBM Corp. 2012 +.\" Author: Jan Glauber +.\" +.TH S390_RUNTIME_INSTR 2 2012-10-09 "Linux Programmer's Manual" +.SH NAME +s390_runtime_instr \- enable/disable s390 CPU runtime instrumentation +.SH SYNOPSIS +.nf +.B #include + +.BI "int s390_runtime_instr(int " command ", int " signum "); +.fi + +.SH DESCRIPTION +The +.BR s390_runtime_instr () +system call starts or stops CPU runtime instrumentation for the current thread. + +The +.IR command +argument controls whether runtime instumentation is started +( 1 ) or stopped ( 2 ) for the current thread. + +The +.IR signum +argument specifies the number of a real-time signal. The +real-time signal is sent to the thread if the runtime instrumentation +buffer is full or if the runtime-instrumentation-halted interrupt +occured. + +.SH RETURN VALUE +On success +.BR s390_runtime_instr () +returns 0 and enables the thread for +runtime instrumentation by assigning the thread a default runtime +instrumentation control block. The caller can then read and modify the +control block and start the runtime instrumentation. On error, -1 is +returned and +.IR errno +is set to one of the error codes listed below. + +.SH ERRORS +.TP +.B EOPNOTSUPP +The runtime instrumentation facility is not available. +.TP +.B EINVAL +The value specified in +.IR command +is not a valid command or the value specified in +.IR signum +is not a real-time signal number. +.TP +.B ENOMEM +Allocating memory for the runtime instrumentation control block +failed. + +.SH VERSIONS +This system call is available since Linux 3.7. + +.SH CONFORMING TO +This system call +is only available on the s390 architecture. The runtime instrumentation facility is available +beginning with System z EC12. + +.SH NOTES +Glibc does not provide a wrapper for this system call, use +.BR syscall (2) +to call it. + +.SH SEE ALSO +.BR signal (7), +.BR syscall (2) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 03/10] s390/bitops: find leftmost bit instruction support
The flogr instruction scans a bitmap starting from the leftmost bit. Implement support for these bitops. This could be useful to scan bitmaps like an interrupt vector set by the hardware starting at the leftmost bit. Signed-off-by: Jan Glauber --- arch/s390/include/asm/bitops.h | 81 ++ 1 file changed, 81 insertions(+) diff --git a/arch/s390/include/asm/bitops.h b/arch/s390/include/asm/bitops.h index 6f57389..1542293 100644 --- a/arch/s390/include/asm/bitops.h +++ b/arch/s390/include/asm/bitops.h @@ -640,6 +640,87 @@ static inline unsigned long find_first_bit(const unsigned long * addr, } #define find_first_bit find_first_bit +/* + * Big endian variant whichs starts bit counting from left using + * the flogr (find leftmost one) instruction. + */ +static inline unsigned long __flo_word(unsigned long nr, unsigned long val) +{ + register unsigned long bit asm("2") = val; + register unsigned long out asm("3"); + + asm volatile ( + " .insn rre,0xb983,%[bit],%[bit]\n" + : [bit] "+d" (bit), [out] "=d" (out) : : "cc"); + return nr + bit; +} + +/* + * 64 bit special left bitops format: + * order in memory: + *00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f + *10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f + *20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f + *30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f + * after that follows the next long with bit numbers + *40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f + *50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f + *60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f + *70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f + * The reason for this bit ordering is the fact that + * the hardware sets bits in a bitmap starting at bit 0 + * and we don't want to scan the bitmap from the 'wrong + * end'. + */ +static inline unsigned long find_first_bit_left(const unsigned long *addr, + unsigned long size) +{ + unsigned long bytes, bits; + + if (!size) + return 0; + bytes = __ffs_word_loop(addr, size); + bits = __flo_word(bytes * 8, __load_ulong_be(addr, bytes)); + return (bits < size) ? bits : size; +} + +static inline int find_next_bit_left(const unsigned long *addr, +unsigned long size, +unsigned long offset) +{ + const unsigned long *p; + unsigned long bit, set; + + if (offset >= size) + return size; + bit = offset & (__BITOPS_WORDSIZE - 1); + offset -= bit; + size -= offset; + p = addr + offset / __BITOPS_WORDSIZE; + if (bit) { + set = __flo_word(0, *p & (~0UL << bit)); + if (set >= size) + return size + offset; + if (set < __BITOPS_WORDSIZE) + return set + offset; + offset += __BITOPS_WORDSIZE; + size -= __BITOPS_WORDSIZE; + p++; + } + return offset + find_first_bit_left(p, size); +} + +#define for_each_set_bit_left(bit, addr, size) \ + for ((bit) = find_first_bit_left((addr), (size)); \ +(bit) < (size);\ +(bit) = find_next_bit_left((addr), (size), (bit) + 1)) + +/* same as for_each_set_bit() but use bit as value to start with */ +#define for_each_set_bit_left_cont(bit, addr, size)\ + for ((bit) = find_next_bit_left((addr), (size), (bit)); \ +(bit) < (size);\ +(bit) = find_next_bit_left((addr), (size), (bit) + 1)) + /** * find_next_zero_bit - find the first zero bit in a memory region * @addr: The address to base the search on -- 1.7.12.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 10/10] vga: compile fix, disable vga for s390
Signed-off-by: Jan Glauber --- drivers/gpu/vga/Kconfig | 2 +- include/video/vga.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/vga/Kconfig b/drivers/gpu/vga/Kconfig index f348388..29437ea 100644 --- a/drivers/gpu/vga/Kconfig +++ b/drivers/gpu/vga/Kconfig @@ -1,7 +1,7 @@ config VGA_ARB bool "VGA Arbitration" if EXPERT default y - depends on PCI + depends on (PCI && !S390) help Some "legacy" VGA devices implemented on PCI typically have the same hard-decoded addresses as they did on ISA. When multiple PCI devices diff --git a/include/video/vga.h b/include/video/vga.h index cac567f..7978d31 100644 --- a/include/video/vga.h +++ b/include/video/vga.h @@ -19,7 +19,9 @@ #include #include +#ifndef __s390x__ #include +#endif #include -- 1.7.12.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 09/10] s390/pci: add PCI Kconfig options
Signed-off-by: Jan Glauber --- arch/s390/Kconfig | 56 +-- 1 file changed, 50 insertions(+), 6 deletions(-) diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 5dba755..e3dd4aec 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -34,12 +34,6 @@ config GENERIC_BUG config GENERIC_BUG_RELATIVE_POINTERS def_bool y -config NO_IOMEM - def_bool y - -config NO_DMA - def_bool y - config ARCH_DMA_ADDR_T_64BIT def_bool 64BIT @@ -58,6 +52,12 @@ config KEXEC config AUDIT_ARCH def_bool y +config NO_IOPORT + def_bool y + +config PCI_QUIRKS + def_bool n + config S390 def_bool y select USE_GENERIC_SMP_HELPERS if SMP @@ -423,6 +423,50 @@ config QDIO If unsure, say Y. +menuconfig PCI + bool "PCI support" + default y + depends on 64BIT + select ARCH_SUPPORTS_MSI + select PCI_MSI + help + Enable PCI support. + +if PCI + +config PCI_NR_FUNCTIONS + int "Maximum number of PCI functions (1-4096)" + range 1 4096 + default "64" + help + This allows you to specify the maximum number of PCI functions which + this kernel will support. + +source "drivers/pci/Kconfig" +source "drivers/pci/pcie/Kconfig" +source "drivers/pci/hotplug/Kconfig" + +endif # PCI + +config PCI_DOMAINS + def_bool PCI + +config HAS_IOMEM + def_bool PCI + +config IOMMU_HELPER + def_bool PCI + +config HAS_DMA + def_bool PCI + select HAVE_DMA_API_DEBUG + +config NEED_SG_DMA_LENGTH + def_bool PCI + +config HAVE_DMA_ATTRS + def_bool PCI + config CHSC_SCH def_tristate m prompt "Support for CHSC subchannels" -- 1.7.12.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 02/10] s390/pci: CLP interface
CLP instructions are used to query the firmware about detected PCI functions, the attributes of those functions and to enable or disable a PCI function. The CLP interface is the equivalent to a PCI bus scan. Signed-off-by: Jan Glauber --- arch/s390/include/asm/clp.h | 28 arch/s390/include/asm/pci.h | 7 + arch/s390/include/asm/pci_clp.h | 182 +++ arch/s390/pci/Makefile | 2 +- arch/s390/pci/pci.c | 28 arch/s390/pci/pci_clp.c | 317 6 files changed, 563 insertions(+), 1 deletion(-) create mode 100644 arch/s390/include/asm/clp.h create mode 100644 arch/s390/include/asm/pci_clp.h create mode 100644 arch/s390/pci/pci_clp.c diff --git a/arch/s390/include/asm/clp.h b/arch/s390/include/asm/clp.h new file mode 100644 index 000..6c3aecc --- /dev/null +++ b/arch/s390/include/asm/clp.h @@ -0,0 +1,28 @@ +#ifndef _ASM_S390_CLP_H +#define _ASM_S390_CLP_H + +/* CLP common request & response block size */ +#define CLP_BLK_SIZE (PAGE_SIZE * 2) + +struct clp_req_hdr { + u16 len; + u16 cmd; +} __packed; + +struct clp_rsp_hdr { + u16 len; + u16 rsp; +} __packed; + +/* CLP Response Codes */ +#define CLP_RC_OK 0x0010 /* Command request successfully */ +#define CLP_RC_CMD 0x0020 /* Command code not recognized */ +#define CLP_RC_PERM0x0030 /* Command not authorized */ +#define CLP_RC_FMT 0x0040 /* Invalid command request format */ +#define CLP_RC_LEN 0x0050 /* Invalid command request length */ +#define CLP_RC_8K 0x0060 /* Command requires 8K LPCB */ +#define CLP_RC_RESNOT0 0x0070 /* Reserved field not zero */ +#define CLP_RC_NODATA 0x0080 /* No data available */ +#define CLP_RC_FC_UNKNOWN 0x0100 /* Function code not recognized */ + +#endif diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index 5e286ce..079b7e2 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -8,6 +8,7 @@ #include #include +#include #define PCIBIOS_MIN_IO 0x1000 #define PCIBIOS_MIN_MEM0x1000 @@ -76,6 +77,12 @@ void zpci_stop_device(struct zpci_dev *); void zpci_free_device(struct zpci_dev *); int zpci_scan_device(struct zpci_dev *); +/* CLP */ +int clp_find_pci_devices(void); +int clp_add_pci_device(u32, u32, int); +int clp_enable_fh(struct zpci_dev *, u8); +int clp_disable_fh(struct zpci_dev *); + /* Helpers */ struct zpci_dev *get_zdev(struct pci_dev *); struct zpci_dev *get_zdev_by_fid(u32); diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h new file mode 100644 index 000..e97163e --- /dev/null +++ b/arch/s390/include/asm/pci_clp.h @@ -0,0 +1,182 @@ +#ifndef _ASM_S390_PCI_CLP_H +#define _ASM_S390_PCI_CLP_H + +#include + +/* + * Call Logical Processor - Command Codes + */ +#define CLP_LIST_PCI 0x0002 +#define CLP_QUERY_PCI_FN 0x0003 +#define CLP_QUERY_PCI_FNGRP0x0004 +#define CLP_SET_PCI_FN 0x0005 + +/* PCI function handle list entry */ +struct clp_fh_list_entry { + u16 device_id; + u16 vendor_id; + u32 config_state : 1; + u32 : 31; + u32 fid;/* PCI function id */ + u32 fh; /* PCI function handle */ +} __packed; + +#define CLP_RC_SETPCIFN_FH 0x0101 /* Invalid PCI fn handle */ +#define CLP_RC_SETPCIFN_FHOP 0x0102 /* Fn handle not valid for op */ +#define CLP_RC_SETPCIFN_DMAAS 0x0103 /* Invalid DMA addr space */ +#define CLP_RC_SETPCIFN_RES0x0104 /* Insufficient resources */ +#define CLP_RC_SETPCIFN_ALRDY 0x0105 /* Fn already in requested state */ +#define CLP_RC_SETPCIFN_ERR0x0106 /* Fn in permanent error state */ +#define CLP_RC_SETPCIFN_RECPND 0x0107 /* Error recovery pending */ +#define CLP_RC_SETPCIFN_BUSY 0x0108 /* Fn busy */ +#define CLP_RC_LISTPCI_BADRT 0x010a /* Resume token not recognized */ +#define CLP_RC_QUERYPCIFG_PFGID0x010b /* Unrecognized PFGID */ + +/* request or response block header length */ +#define LIST_PCI_HDR_LEN 32 + +/* Number of function handles fitting in response block */ +#define CLP_FH_LIST_NR_ENTRIES \ + ((CLP_BLK_SIZE - 2 * LIST_PCI_HDR_LEN) \ + / sizeof(struct clp_fh_list_entry)) + +#define CLP_SET_ENABLE_PCI_FN 0 /* Yes, 0 enables it */ +#define CLP_SET_DISABLE_PCI_FN 1 /* Yes, 1 disables it */ + +#define CLP_UTIL_STR_LEN 64 + +/* List PCI functions request */ +struct clp_req_list_pci { + struct clp_req_hdr hdr; + u32 fmt : 4; /* cmd request block format */ + u32 : 28; + u64 reserved1; + u64 resume_token; + u64 reserved2; +} __packed; + +/* List PCI funct
[RFC PATCH 08/10] s390/pci: s390 specific PCI sysfs attributes
Add some s390 specific sysfs attributes to the PCI device directory. The following attributes are introduced: - function_id (PCI function ID) - function_handle (PCI function handle) - pchid (PCI channel ID) - pfgid (PCI function group ID aka PCI root complex) Signed-off-by: Jan Glauber --- arch/s390/include/asm/pci.h | 4 +++ arch/s390/pci/Makefile | 2 +- arch/s390/pci/pci.c | 6 arch/s390/pci/pci_sysfs.c | 86 + 4 files changed, 97 insertions(+), 1 deletion(-) create mode 100644 arch/s390/pci/pci_sysfs.c diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index 739d9f1..39057e4 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -141,6 +141,10 @@ struct zpci_dev *get_zdev(struct pci_dev *); struct zpci_dev *get_zdev_by_fid(u32); bool zpci_fid_present(u32); +/* sysfs */ +int zpci_sysfs_add_device(struct device *); +void zpci_sysfs_remove_device(struct device *); + /* DMA */ int zpci_dma_init(void); void zpci_dma_exit(void); diff --git a/arch/s390/pci/Makefile b/arch/s390/pci/Makefile index 7e36f42..ab0827b 100644 --- a/arch/s390/pci/Makefile +++ b/arch/s390/pci/Makefile @@ -3,4 +3,4 @@ # obj-$(CONFIG_PCI) += pci.o pci_dma.o pci_clp.o pci_msi.o \ - pci_event.o + pci_sysfs.o pci_event.o diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index c523594..0723b10 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -628,6 +628,7 @@ static void zpci_remove_device(struct pci_dev *pdev) dev_info(&pdev->dev, "Removing device %u\n", zdev->domain); zdev->state = ZPCI_FN_STATE_CONFIGURED; zpci_dma_exit_device(zdev); + zpci_sysfs_remove_device(&pdev->dev); zpci_unmap_resources(pdev); list_del(&zdev->entry); /* can be called from init */ zdev->pdev = NULL; @@ -676,6 +677,11 @@ void pcibios_disable_device(struct pci_dev *pdev) pdev->sysdata = NULL; } +int pcibios_add_platform_entries(struct pci_dev *pdev) +{ + return zpci_sysfs_add_device(&pdev->dev); +} + int zpci_request_irq(unsigned int irq, irq_handler_t handler, void *data) { int msi_nr = irq_to_msi_nr(irq); diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c new file mode 100644 index 000..a42cce6 --- /dev/null +++ b/arch/s390/pci/pci_sysfs.c @@ -0,0 +1,86 @@ +/* + * Copyright IBM Corp. 2012 + * + * Author(s): + * Jan Glauber + */ + +#define COMPONENT "zPCI" +#define pr_fmt(fmt) COMPONENT ": " fmt + +#include +#include +#include + +static ssize_t show_fid(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct zpci_dev *zdev = get_zdev(container_of(dev, struct pci_dev, dev)); + + sprintf(buf, "0x%08x\n", zdev->fid); + return strlen(buf); +} +static DEVICE_ATTR(function_id, S_IRUGO, show_fid, NULL); + +static ssize_t show_fh(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct zpci_dev *zdev = get_zdev(container_of(dev, struct pci_dev, dev)); + + sprintf(buf, "0x%08x\n", zdev->fh); + return strlen(buf); +} +static DEVICE_ATTR(function_handle, S_IRUGO, show_fh, NULL); + +static ssize_t show_pchid(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct zpci_dev *zdev = get_zdev(container_of(dev, struct pci_dev, dev)); + + sprintf(buf, "0x%04x\n", zdev->pchid); + return strlen(buf); +} +static DEVICE_ATTR(pchid, S_IRUGO, show_pchid, NULL); + +static ssize_t show_pfgid(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct zpci_dev *zdev = get_zdev(container_of(dev, struct pci_dev, dev)); + + sprintf(buf, "0x%02x\n", zdev->pfgid); + return strlen(buf); +} +static DEVICE_ATTR(pfgid, S_IRUGO, show_pfgid, NULL); + +static struct device_attribute *zpci_dev_attrs[] = { + &dev_attr_function_id, + &dev_attr_function_handle, + &dev_attr_pchid, + &dev_attr_pfgid, + NULL, +}; + +int zpci_sysfs_add_device(struct device *dev) +{ + int i, rc = 0; + + for (i = 0; zpci_dev_attrs[i]; i++) { + rc = device_create_file(dev, zpci_dev_attrs[i]); + if (rc) + goto error; + } + return 0; + +error: + while (--i >= 0) + device_remove_file(dev, zpci_dev_attrs[i]); + return rc; +} + +void zpci_sysfs_remove_device(struct device *dev) +{ + int i; + + for (i = 0; zpci_dev_attrs[i]; i++) + device_remove_file(dev, zpci_dev_attrs[i]); +} -- 1.7.12.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 07/10] s390/pci: PCI hotplug support via SCLP
Add SCLP PCI configure/deconfigure and implement a PCI hotplug controller (s390_pci_hpc). The hotplug controller creates a slot for every PCI function in stand-by or configured state. The PCI functions are named after the PCI function ID (fid). By writing to the power attribute in /sys/bus/pci/slots//power the PCI function is moved to stand-by or configured state. If moved to the configured state the device is automatically scanned by the s390 PCI layer. Signed-off-by: Jan Glauber --- arch/s390/include/asm/pci.h| 11 ++ arch/s390/include/asm/sclp.h | 2 + arch/s390/pci/pci.c| 9 ++ drivers/pci/hotplug/Kconfig| 11 ++ drivers/pci/hotplug/Makefile | 1 + drivers/pci/hotplug/s390_pci_hpc.c | 252 + drivers/s390/char/sclp.h | 3 +- drivers/s390/char/sclp_cmd.c | 64 +- 8 files changed, 351 insertions(+), 2 deletions(-) create mode 100644 drivers/pci/hotplug/s390_pci_hpc.c diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index 3dc2cb6..739d9f1 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -95,6 +95,11 @@ struct zpci_dev { enum pci_bus_speed max_bus_speed; }; +struct pci_hp_callback_ops { + int (*create_slot) (struct zpci_dev *zdev); + void (*remove_slot) (struct zpci_dev *zdev); +}; + static inline bool zdev_enabled(struct zpci_dev *zdev) { return (zdev->fh & (1UL << 31)) ? true : false; @@ -140,4 +145,10 @@ bool zpci_fid_present(u32); int zpci_dma_init(void); void zpci_dma_exit(void); +/* Hotplug */ +extern struct mutex zpci_list_lock; +extern struct list_head zpci_list; +extern struct pci_hp_callback_ops hotplug_ops; +extern unsigned int pci_probe; + #endif diff --git a/arch/s390/include/asm/sclp.h b/arch/s390/include/asm/sclp.h index e62a555..8337886 100644 --- a/arch/s390/include/asm/sclp.h +++ b/arch/s390/include/asm/sclp.h @@ -55,5 +55,7 @@ int sclp_chp_read_info(struct sclp_chp_info *info); void sclp_get_ipl_info(struct sclp_ipl_info *info); bool sclp_has_linemode(void); bool sclp_has_vt220(void); +int sclp_pci_configure(u32 fid); +int sclp_pci_deconfigure(u32 fid); #endif /* _ASM_S390_SCLP_H */ diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index 5a2ef9e..c523594 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -47,7 +47,12 @@ /* list of all detected zpci devices */ LIST_HEAD(zpci_list); +EXPORT_SYMBOL_GPL(zpci_list); DEFINE_MUTEX(zpci_list_lock); +EXPORT_SYMBOL_GPL(zpci_list_lock); + +struct pci_hp_callback_ops hotplug_ops; +EXPORT_SYMBOL_GPL(hotplug_ops); static DECLARE_BITMAP(zpci_domain, ZPCI_NR_DEVICES); static DEFINE_SPINLOCK(zpci_domain_lock); @@ -935,6 +940,8 @@ int zpci_create_device(struct zpci_dev *zdev) mutex_lock(&zpci_list_lock); list_add_tail(&zdev->entry, &zpci_list); + if (hotplug_ops.create_slot) + hotplug_ops.create_slot(zdev); mutex_unlock(&zpci_list_lock); if (zdev->state == ZPCI_FN_STATE_STANDBY) @@ -948,6 +955,8 @@ int zpci_create_device(struct zpci_dev *zdev) out_start: mutex_lock(&zpci_list_lock); list_del(&zdev->entry); + if (hotplug_ops.remove_slot) + hotplug_ops.remove_slot(zdev); mutex_unlock(&zpci_list_lock); out_bus: zpci_free_domain(zdev); diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig index b0e46de..0594c4a 100644 --- a/drivers/pci/hotplug/Kconfig +++ b/drivers/pci/hotplug/Kconfig @@ -151,4 +151,15 @@ config HOTPLUG_PCI_SGI When in doubt, say N. +config HOTPLUG_PCI_S390 + tristate "System z PCI Hotplug Support" + depends on S390 && 64BIT + help + Say Y here if you want to use the System z PCI Hotplug + driver for PCI devices. Without this driver it is not + possible to access stand-by PCI functions nor to deconfigure + PCI functions. + + When in doubt, say Y. + endif # HOTPLUG_PCI diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile index c459cd4..47ec8c8 100644 --- a/drivers/pci/hotplug/Makefile +++ b/drivers/pci/hotplug/Makefile @@ -18,6 +18,7 @@ obj-$(CONFIG_HOTPLUG_PCI_RPA) += rpaphp.o obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)+= rpadlpar_io.o obj-$(CONFIG_HOTPLUG_PCI_SGI) += sgi_hotplug.o obj-$(CONFIG_HOTPLUG_PCI_ACPI) += acpiphp.o +obj-$(CONFIG_HOTPLUG_PCI_S390) += s390_pci_hpc.o # acpiphp_ibm extends acpiphp, so should be linked afterwards. diff --git a/drivers/pci/hotplug/s390_pci_hpc.c b/drivers/pci/hotplug/s390_pci_hpc.c new file mode 100644 index 000..d4de895 --- /dev/null +++ b/drivers/pci/hotplug/s390_pci_hpc.c @@ -0,0 +1,252 @@ +/* + * PCI Hot Plug Controller Driver for System z + * + * Copyright 2012 IBM Corp. + * + * Author(s): + * Jan Glauber + */ + +#define COMP
[RFC PATCH 06/10] s390/pci: CHSC PCI support for error and availability events
Add CHSC store-event-information support for PCI (notfication type 2) and report error and availability events to the PCI architecture layer. Signed-off-by: Jan Glauber --- arch/s390/include/asm/pci.h | 4 ++ arch/s390/pci/Makefile | 3 +- arch/s390/pci/pci_event.c | 93 ++ drivers/s390/cio/chsc.c | 156 +++- 4 files changed, 211 insertions(+), 45 deletions(-) create mode 100644 arch/s390/pci/pci_event.c diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index ca48caf..3dc2cb6 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -127,6 +127,10 @@ void zpci_teardown_msi_irq(struct zpci_dev *, struct msi_desc *); int zpci_msihash_init(void); void zpci_msihash_exit(void); +/* Error handling and recovery */ +void zpci_event_error(void *); +void zpci_event_availability(void *); + /* Helpers */ struct zpci_dev *get_zdev(struct pci_dev *); struct zpci_dev *get_zdev_by_fid(u32); diff --git a/arch/s390/pci/Makefile b/arch/s390/pci/Makefile index 4590596..7e36f42 100644 --- a/arch/s390/pci/Makefile +++ b/arch/s390/pci/Makefile @@ -2,4 +2,5 @@ # Makefile for the s390 PCI subsystem. # -obj-$(CONFIG_PCI) += pci.o pci_dma.o pci_clp.o pci_msi.o +obj-$(CONFIG_PCI) += pci.o pci_dma.o pci_clp.o pci_msi.o \ + pci_event.o diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c new file mode 100644 index 000..dbed8cd --- /dev/null +++ b/arch/s390/pci/pci_event.c @@ -0,0 +1,93 @@ +/* + * Copyright IBM Corp. 2012 + * + * Author(s): + *Jan Glauber + */ + +#define COMPONENT "zPCI" +#define pr_fmt(fmt) COMPONENT ": " fmt + +#include +#include + +/* Content Code Description for PCI Function Error */ +struct zpci_ccdf_err { + u32 reserved1; + u32 fh; /* function handle */ + u32 fid;/* function id */ + u32 ett : 4; /* expected table type */ + u32 mvn : 12; /* MSI vector number */ + u32 dmaas : 8; /* DMA address space */ + u32 : 6; + u32 q : 1; /* event qualifier */ + u32 rw : 1; /* read/write */ + u64 faddr; /* failing address */ + u32 reserved3; + u16 reserved4; + u16 pec;/* PCI event code */ +} __packed; + +/* Content Code Description for PCI Function Availability */ +struct zpci_ccdf_avail { + u32 reserved1; + u32 fh; /* function handle */ + u32 fid;/* function id */ + u32 reserved2; + u32 reserved3; + u32 reserved4; + u32 reserved5; + u16 reserved6; + u16 pec;/* PCI event code */ +} __packed; + +static void zpci_event_log_err(struct zpci_ccdf_err *ccdf) +{ + struct zpci_dev *zdev = get_zdev_by_fid(ccdf->fid); + + dev_err(&zdev->pdev->dev, "event code: 0x%x\n", ccdf->pec); +} + +static void zpci_event_log_avail(struct zpci_ccdf_avail *ccdf) +{ + struct zpci_dev *zdev = get_zdev_by_fid(ccdf->fid); + + pr_err("%s%s: availability event: fh: 0x%x fid: 0x%x event code: 0x%x reason:", + (zdev) ? dev_driver_string(&zdev->pdev->dev) : "?", + (zdev) ? dev_name(&zdev->pdev->dev) : "?", + ccdf->fh, ccdf->fid, ccdf->pec); + print_hex_dump(KERN_CONT, "ccdf", DUMP_PREFIX_OFFSET, + 16, 1, ccdf, sizeof(*ccdf), false); + + switch (ccdf->pec) { + case 0x0301: + zpci_enable_device(zdev); + break; + case 0x0302: + clp_add_pci_device(ccdf->fid, ccdf->fh, 0); + break; + case 0x0306: + clp_find_pci_devices(); + break; + default: + break; + } +} + +void zpci_event_error(void *data) +{ + struct zpci_ccdf_err *ccdf = data; + struct zpci_dev *zdev; + + zpci_event_log_err(ccdf); + zdev = get_zdev_by_fid(ccdf->fid); + if (!zdev) { + pr_err("Error event for unknown fid: %x", ccdf->fid); + return; + } +} + +void zpci_event_availability(void *data) +{ + zpci_event_log_avail(data); +} diff --git a/drivers/s390/cio/chsc.c b/drivers/s390/cio/chsc.c index 4d51a7c..68e80e2 100644 --- a/drivers/s390/cio/chsc.c +++ b/drivers/s390/cio/chsc.c @@ -1,7 +1,7 @@ /* * S/390 common I/O routines -- channel subsystem call * - *Copyright IBM Corp. 1999, 2010 + *Copyright IBM Corp. 1999,2012 *Author(s): Ingo Adlung (adl...@de.ibm.com) * Cornelia Huck (cornelia.h...@de.ibm.com) * Arnd Bergmann (ar.
[RFC PATCH 04/10] s390/pci: PCI adapter interrupts for MSI/MSI-X
Support PCI adapter interrupts using the Single-IRQ-mode. Single-IRQ-mode disables an adapter IRQ automatically after delivering it until the SIC instruction enables it again. This is used to reduce the number of IRQs for streaming workloads. Up to 64 MSI handlers can be registered per PCI function. A hash table is used to map interrupt numbers to MSI descriptors. The interrupt vector is scanned using the flogr instruction. Only MSI/MSI-X interrupts are supported, no legacy INTs. Signed-off-by: Jan Glauber --- arch/s390/include/asm/hw_irq.h | 22 ++ arch/s390/include/asm/irq.h| 12 ++ arch/s390/include/asm/isc.h| 1 + arch/s390/include/asm/pci.h| 27 +++ arch/s390/kernel/irq.c | 2 + arch/s390/pci/Makefile | 2 +- arch/s390/pci/pci.c| 464 - arch/s390/pci/pci_clp.c| 3 + arch/s390/pci/pci_msi.c| 141 + drivers/pci/msi.c | 10 +- include/linux/irq.h| 10 +- 11 files changed, 685 insertions(+), 9 deletions(-) create mode 100644 arch/s390/include/asm/hw_irq.h create mode 100644 arch/s390/pci/pci_msi.c diff --git a/arch/s390/include/asm/hw_irq.h b/arch/s390/include/asm/hw_irq.h new file mode 100644 index 000..7e3d258 --- /dev/null +++ b/arch/s390/include/asm/hw_irq.h @@ -0,0 +1,22 @@ +#ifndef _HW_IRQ_H +#define _HW_IRQ_H + +#include +#include + +static inline struct msi_desc *irq_get_msi_desc(unsigned int irq) +{ + return __irq_get_msi_desc(irq); +} + +/* Must be called with msi map lock held */ +static inline int irq_set_msi_desc(unsigned int irq, struct msi_desc *msi) +{ + if (!msi) + return -EINVAL; + + msi->irq = irq; + return 0; +} + +#endif diff --git a/arch/s390/include/asm/irq.h b/arch/s390/include/asm/irq.h index 6703dd9..e6972f8 100644 --- a/arch/s390/include/asm/irq.h +++ b/arch/s390/include/asm/irq.h @@ -33,6 +33,8 @@ enum interruption_class { IOINT_APB, IOINT_ADM, IOINT_CSC, + IOINT_PCI, + IOINT_MSI, NMI_NMI, NR_IRQS, }; @@ -51,4 +53,14 @@ void service_subclass_irq_unregister(void); void measurement_alert_subclass_register(void); void measurement_alert_subclass_unregister(void); +#ifdef CONFIG_LOCKDEP +# define disable_irq_nosync_lockdep(irq) disable_irq_nosync(irq) +# define disable_irq_nosync_lockdep_irqsave(irq, flags) \ + disable_irq_nosync(irq) +# define disable_irq_lockdep(irq) disable_irq(irq) +# define enable_irq_lockdep(irq) enable_irq(irq) +# define enable_irq_lockdep_irqrestore(irq, flags) \ + enable_irq(irq) +#endif + #endif /* _ASM_IRQ_H */ diff --git a/arch/s390/include/asm/isc.h b/arch/s390/include/asm/isc.h index 5ae6064..68d7d68 100644 --- a/arch/s390/include/asm/isc.h +++ b/arch/s390/include/asm/isc.h @@ -18,6 +18,7 @@ #define CHSC_SCH_ISC 7 /* CHSC subchannels */ /* Adapter interrupts. */ #define QDIO_AIRQ_ISC IO_SCH_ISC /* I/O subchannel in qdio mode */ +#define PCI_ISC 2 /* PCI I/O subchannels */ #define AP_ISC 6 /* adjunct processor (crypto) devices */ /* Functions for registration of I/O interruption subclasses */ diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index 079b7e2..a9aadeb 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -20,6 +20,10 @@ void pci_iounmap(struct pci_dev *, void __iomem *); int pci_domain_nr(struct pci_bus *); int pci_proc_domain(struct pci_bus *); +/* MSI arch hooks */ +#define arch_setup_msi_irqsarch_setup_msi_irqs +#define arch_teardown_msi_irqs arch_teardown_msi_irqs + #define ZPCI_BUS_NR0 /* default bus number */ #define ZPCI_DEVFN 0 /* default device number */ @@ -29,6 +33,15 @@ int pci_proc_domain(struct pci_bus *); #define ZPCI_FC_BLOCKED0x20 #define ZPCI_FC_DMA_ENABLED0x10 +struct msi_map { + unsigned long irq; + struct msi_desc *msi; + struct hlist_node msi_chain; +}; + +#define ZPCI_NR_MSI_VECS 64 +#define ZPCI_MSI_MASK (ZPCI_NR_MSI_VECS - 1) + enum zpci_state { ZPCI_FN_STATE_RESERVED, ZPCI_FN_STATE_STANDBY, @@ -56,6 +69,12 @@ struct zpci_dev { u8 pfgid; /* function group ID */ u16 domain; + /* IRQ stuff */ + u64 msi_addr; /* MSI address */ + struct zdev_irq_map *irq_map; + struct msi_map *msi_map[ZPCI_NR_MSI_VECS]; + unsigned intaisb; /* number of the summary bit */ + struct zpci_bar_struct bars[PCI_BAR_COUNT]; enum pci_bus_speed max_bus_speed; @@ -83,6 +102,14 @@ int clp_add_pci_device(u32, u32, int); int clp_enable_fh(struct zpci_dev *, u8);
[RFC PATCH 05/10] s390/pci: DMA support
Add DMA IOMMU support using 4K page table entries. Implement dma_map_ops. Signed-off-by: Jan Glauber --- arch/s390/include/asm/dma-mapping.h | 76 ++ arch/s390/include/asm/dma.h | 19 +- arch/s390/include/asm/pci.h | 21 ++ arch/s390/include/asm/pci_dma.h | 196 ++ arch/s390/pci/Makefile | 2 +- arch/s390/pci/pci.c | 36 +++ arch/s390/pci/pci_clp.c | 4 + arch/s390/pci/pci_dma.c | 505 8 files changed, 848 insertions(+), 11 deletions(-) create mode 100644 arch/s390/include/asm/dma-mapping.h create mode 100644 arch/s390/include/asm/pci_dma.h create mode 100644 arch/s390/pci/pci_dma.c diff --git a/arch/s390/include/asm/dma-mapping.h b/arch/s390/include/asm/dma-mapping.h new file mode 100644 index 000..aab2158 --- /dev/null +++ b/arch/s390/include/asm/dma-mapping.h @@ -0,0 +1,76 @@ +#ifndef _ASM_S390_DMA_MAPPING_H +#define _ASM_S390_DMA_MAPPING_H + +#include +#include +#include +#include +#include +#include +#include + +#define DMA_ERROR_CODE (~(dma_addr_t) 0x0) + +extern struct dma_map_ops s390_dma_ops; + +static inline struct dma_map_ops *get_dma_ops(struct device *dev) +{ + return &s390_dma_ops; +} + +extern int dma_set_mask(struct device *dev, u64 mask); +extern int dma_is_consistent(struct device *dev, dma_addr_t dma_handle); +extern void dma_cache_sync(struct device *dev, void *vaddr, size_t size, + enum dma_data_direction direction); + +#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) +#define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h) + +#include + +static inline int dma_supported(struct device *dev, u64 mask) +{ + struct dma_map_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops->dma_supported == NULL) + return 1; + return dma_ops->dma_supported(dev, mask); +} + +static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size) +{ + if (!dev->dma_mask) + return 0; + return addr + size - 1 <= *dev->dma_mask; +} + +static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) +{ + struct dma_map_ops *dma_ops = get_dma_ops(dev); + + if (dma_ops->mapping_error) + return dma_ops->mapping_error(dev, dma_addr); + return (dma_addr == 0UL); +} + +static inline void *dma_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_handle, gfp_t flag) +{ + struct dma_map_ops *ops = get_dma_ops(dev); + void *ret; + + ret = ops->alloc(dev, size, dma_handle, flag, NULL); + debug_dma_alloc_coherent(dev, size, *dma_handle, ret); + return ret; +} + +static inline void dma_free_coherent(struct device *dev, size_t size, +void *cpu_addr, dma_addr_t dma_handle) +{ + struct dma_map_ops *dma_ops = get_dma_ops(dev); + + dma_ops->free(dev, size, cpu_addr, dma_handle, NULL); + debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); +} + +#endif /* _ASM_S390_DMA_MAPPING_H */ diff --git a/arch/s390/include/asm/dma.h b/arch/s390/include/asm/dma.h index 6fb6de4..de015d8 100644 --- a/arch/s390/include/asm/dma.h +++ b/arch/s390/include/asm/dma.h @@ -1,14 +1,13 @@ -/* - * S390 version - */ - -#ifndef _ASM_DMA_H -#define _ASM_DMA_H +#ifndef _ASM_S390_DMA_H +#define _ASM_S390_DMA_H -#include /* need byte IO */ +#include +/* + * MAX_DMA_ADDRESS is ambiguous because on s390 its completely unrelated + * to DMA. It _is_ used for the s390 memory zone split at 2GB caused + * by the 31 bit heritage. + */ #define MAX_DMA_ADDRESS 0x8000 -#define free_dma(x)do { } while (0) - -#endif /* _ASM_DMA_H */ +#endif /* _ASM_S390_DMA_H */ diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index a9aadeb..ca48caf 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -75,8 +75,23 @@ struct zpci_dev { struct msi_map *msi_map[ZPCI_NR_MSI_VECS]; unsigned intaisb; /* number of the summary bit */ + /* DMA stuff */ + unsigned long *dma_table; + spinlock_t dma_table_lock; + int tlb_refresh; + + spinlock_t iommu_bitmap_lock; + unsigned long *iommu_bitmap; + unsigned long iommu_size; + unsigned long iommu_pages; + unsigned intnext_bit; + struct zpci_bar_struct bars[PCI_BAR_COUNT]; + u64 start_dma; /* Start of available DMA addresses */ + u64 end_dma;/* End of available DMA addresses */ + u64 dma_mask; /* DMA address space mask */ + enum pci_bus_speed max_bus_speed; }; @@ -95,6 +110,8 @@ int zpci_enable_device(struct zpci_dev *); void zpci_stop_device(struct zpci
[RFC PATCH 00/10] s390/pci: PCI support on System z
Hi, this patchset based on 3.7-rc4 aims at adding PCI(e) support to the System z (s390) architecture. PCI is currently not available on s390, so this is early prototype code. I'm posting in the hope to get some feedback and direction on the more humble aspects of it. Patches 01-05 are only split to ease the review, they belong functional together and don't make much sense without each other. PCI on the mainframe differs in some aspects from PCI on other platforms. The most notable difference is the use of instructions for memory mapped IO. So on s390 the BAR spaces cannot be mapped to memory. To access the BAR spaces privileged instructions must be used (how that will be addressed for user-space is not part of this patchset). See patch 01 for how the read/write pci primitives are mapped to the s390 instructions. Another difference is that the addresses of BARs from two different PCI functions may be identical. That means we can't distinguish from an iomem address which PCI device the address belongs to. Therefore s390 needs a custom implementation of pci_iomap to remap iomem addresses using the bar and device information provided by pci_iomap. Device naming - the topology on System z is that every PCI function is attached directly to a PCI root complex. Therefore we decided to use a domain number per function and leave the bus, slot and function 0. Next one - IRQs. s390 will use its existing adapter interrupt infrastructure to deliver MSI/MSI-X interrupts. Legacy INTs are _not_ supported at all. The only thing that s390 needs is to close the gap between adapter interrupts and MSI, in other words to locate a struct msi_desc from an IRQ number. For that I've implemented a simple hash, see patch 04. Then there is Kconfig, see patch 09, which gives s390 a bunch of options that we would rather avoid. Like VGA support, sound subsystem, etc. I do not have a solution how to disable these subsystems. We would like to limit the subsystems that become available on s390 because the usable PCI hardware will be limited to a small number of hand-picked and supported PCI cards. There are several options I can imagine: a) Don't disable anything - that would result in lots of drivers builtable on s390 that are not supported and weird error reports b) Introduce negative options for whole subsystems like we have already with NO_IOPORT, so maybe NO_USB, NO_SOUND, ... c) Introduce a special CONFIG_PCI_S390 and add that explicitely to all subsystems that we want to enable on s390 None of the above looks like the obvious winner to me. Hope to get some feedback, TIA, Jan Jan Glauber (10): s390/pci: base support s390/pci: CLP interface s390/bitops: find leftmost bit instruction support s390/pci: PCI adapter interrupts for MSI/MSI-X s390/pci: DMA support s390/pci: CHSC PCI support for error and availability events s390/pci: PCI hotplug support via SCLP s390/pci: s390 specific PCI sysfs attributes s390/pci: add PCI Kconfig options vga: compile fix, disable vga for s390 arch/s390/Kbuild|1 + arch/s390/Kconfig | 56 +- arch/s390/include/asm/bitops.h | 81 +++ arch/s390/include/asm/clp.h | 28 + arch/s390/include/asm/dma-mapping.h | 76 +++ arch/s390/include/asm/dma.h | 19 +- arch/s390/include/asm/hw_irq.h | 22 + arch/s390/include/asm/io.h | 55 +- arch/s390/include/asm/irq.h | 12 + arch/s390/include/asm/isc.h |1 + arch/s390/include/asm/pci.h | 156 - arch/s390/include/asm/pci_clp.h | 182 ++ arch/s390/include/asm/pci_dma.h | 196 +++ arch/s390/include/asm/pci_insn.h| 280 + arch/s390/include/asm/pci_io.h | 194 +++ arch/s390/include/asm/sclp.h|2 + arch/s390/kernel/dis.c | 15 + arch/s390/kernel/irq.c |2 + arch/s390/pci/Makefile |6 + arch/s390/pci/pci.c | 1098 +++ arch/s390/pci/pci_clp.c | 324 +++ arch/s390/pci/pci_dma.c | 505 arch/s390/pci/pci_event.c | 93 +++ arch/s390/pci/pci_msi.c | 141 + arch/s390/pci/pci_sysfs.c | 86 +++ drivers/gpu/vga/Kconfig |2 +- drivers/pci/hotplug/Kconfig | 11 + drivers/pci/hotplug/Makefile|1 + drivers/pci/hotplug/s390_pci_hpc.c | 252 drivers/pci/msi.c | 10 +- drivers/s390/char/sclp.h|3 +- drivers/s390/char/sclp_cmd.c| 64 +- drivers/s390/cio/chsc.c | 156 +++-- include/asm-generic/io.h| 21 +- include/linux/irq.h | 10 +- include/video/vga.h |2 + 36 files changed, 4084 insertions(+), 79 deletions(-) create mode 100644 arch/s390/include/asm/clp.h create mode 100644 arch/s390/include/asm/d
[RFC PATCH 01/10] s390/pci: base support
Add PCI support for s390, (only 64 bit mode is supported by hardware): - PCI facility tests - PCI instructions: pcilg, pcistg, pcistb, stpcifc, mpcifc, rpcit - map readb/w/l/q and writeb/w/l/q to pcilg and pcistg instructions - pci_iomap implementation - memcpy_fromio/toio - pci_root_ops using special pcilg/pcistg - device, bus and domain allocation Signed-off-by: Jan Glauber --- arch/s390/Kbuild | 1 + arch/s390/include/asm/io.h | 55 +++- arch/s390/include/asm/pci.h | 82 +- arch/s390/include/asm/pci_insn.h | 280 arch/s390/include/asm/pci_io.h | 194 ++ arch/s390/kernel/dis.c | 15 ++ arch/s390/pci/Makefile | 5 + arch/s390/pci/pci.c | 557 +++ include/asm-generic/io.h | 21 +- 9 files changed, 1201 insertions(+), 9 deletions(-) create mode 100644 arch/s390/include/asm/pci_insn.h create mode 100644 arch/s390/include/asm/pci_io.h create mode 100644 arch/s390/pci/Makefile create mode 100644 arch/s390/pci/pci.c diff --git a/arch/s390/Kbuild b/arch/s390/Kbuild index cc45d25..647c3ec 100644 --- a/arch/s390/Kbuild +++ b/arch/s390/Kbuild @@ -6,3 +6,4 @@ obj-$(CONFIG_S390_HYPFS_FS) += hypfs/ obj-$(CONFIG_APPLDATA_BASE)+= appldata/ obj-$(CONFIG_MATHEMU) += math-emu/ obj-y += net/ +obj-$(CONFIG_PCI) += pci/ diff --git a/arch/s390/include/asm/io.h b/arch/s390/include/asm/io.h index 559e921..5f3766b 100644 --- a/arch/s390/include/asm/io.h +++ b/arch/s390/include/asm/io.h @@ -9,9 +9,9 @@ #ifndef _S390_IO_H #define _S390_IO_H +#include #include - -#define IO_SPACE_LIMIT 0x +#include /* * Change virtual addresses to physical addresses and vv. @@ -24,10 +24,11 @@ static inline unsigned long virt_to_phys(volatile void * address) " lra %0,0(%1)\n" " jz 0f\n" " la %0,0\n" - "0:" +"0:" : "=a" (real_address) : "a" (address) : "cc"); -return real_address; + return real_address; } +#define virt_to_phys virt_to_phys static inline void * phys_to_virt(unsigned long address) { @@ -42,4 +43,50 @@ void unxlate_dev_mem_ptr(unsigned long phys, void *addr); */ #define xlate_dev_kmem_ptr(p) p +#define IO_SPACE_LIMIT 0 + +#ifdef CONFIG_PCI + +#define ioremap_nocache(addr, size)ioremap(addr, size) +#define ioremap_wc ioremap_nocache + +/* TODO: s390 cannot support io_remap_pfn_range... */ +#define io_remap_pfn_range(vma, vaddr, pfn, size, prot)\ + remap_pfn_range(vma, vaddr, pfn, size, prot) + +static inline void __iomem *ioremap(unsigned long offset, unsigned long size) +{ + return (void __iomem *) offset; +} + +static inline void iounmap(volatile void __iomem *addr) +{ +} + +/* + * s390 needs a private implementation of pci_iomap since ioremap with its + * offset parameter isn't sufficient. That's because BAR spaces are not + * disjunctive on s390 so we need the bar parameter of pci_iomap to find + * the corresponding device and create the mapping cookie. + */ +#define pci_iomap pci_iomap +#define pci_iounmap pci_iounmap + +#define memcpy_fromio(dst, src, count) zpci_memcpy_fromio(dst, src, count) +#define memcpy_toio(dst, src, count) zpci_memcpy_toio(dst, src, count) +#define memset_io(dst, val, count) zpci_memset_io(dst, val, count) + +#define __raw_readbzpci_read_u8 +#define __raw_readwzpci_read_u16 +#define __raw_readlzpci_read_u32 +#define __raw_readqzpci_read_u64 +#define __raw_writeb zpci_write_u8 +#define __raw_writew zpci_write_u16 +#define __raw_writel zpci_write_u32 +#define __raw_writeq zpci_write_u64 + +#endif /* CONFIG_PCI */ + +#include + #endif diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index 42a145c..5e286ce 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -1,10 +1,84 @@ #ifndef __ASM_S390_PCI_H #define __ASM_S390_PCI_H -/* S/390 systems don't have a PCI bus. This file is just here because some stupid .c code - * includes it even if CONFIG_PCI is not set. - */ +/* must be set before including asm-generic/pci.h */ #define PCI_DMA_BUS_IS_PHYS (0) +/* must be set before including pci_clp.h */ +#define PCI_BAR_COUNT 6 -#endif /* __ASM_S390_PCI_H */ +#include +#include +#define PCIBIOS_MIN_IO 0x1000 +#define PCIBIOS_MIN_MEM0x1000 + +#define pcibios_assign_all_busses()(0) + +void __iomem *pci_iomap(struct pci_dev *, int, unsigned long); +void pci_iounmap(struct pci_dev *, void __iomem *); +int pci_domain_nr(struct pci_bus *); +int pci_proc_domain(struct pci_bus *); + +#define ZPCI_BUS_NR0 /* default bus number */ +#define ZPCI_DEVFN
Re: Ambigiuous thread stack annotation in /proc/pid/[s]maps
On Thu, Jun 27, 2013 at 10:00:51PM +0530, Siddhesh Poyarekar wrote: > On 27 June 2013 21:32, Jan Glauber wrote: > > But isn't that confusing to the user? At least it is to me. Imagine someone > > who uses the maps or smaps output to determine the size of code, data and > > stack of a process. Maybe it would be better to not print the stack:tid data > > at all if the kernel cannot distinguish the vma's? > > Is that behaviour documented anywhere? > > I'm afraid the documentation update I made to proc.txt did not mention > this. In fact I went through the discussion thread and I don't think > I or anyone mentioned this in the thread either. I think I assumed > back then that this was accepted since there was a point made that > vmas used by makecontext/swapcontext should also get marked correctly > with [stack]. > > However, I agree that you have a point about it being misleading. > Avoiding a vma merge is a possible solution, but we don't have flags > available any more to do that. I'll try to think of another way. In > the mean time I could add a note to the proc.txt documentation and > even adjust the language. Would that be good enough or do you think > the patch should be reverted until I or someone else comes up with a > better solution? I think we should try to solve the problem first. If it turns out that it is not possible to prevent these vma merges than we should document it. Removing the annotation would IMHO be bad since it is really useful information to the user. Can anyone from the mm folks comment on the vm_flags situation (CC-ing linux-mm) ? Would it be possible to re-use one of the defined flags? Or should we come up with a crude hack? > > Never seen that makecontext stuff before. Do you have an example output how > > the maps would look like if that is used? > > This is a sample program I cribbed from the makecontext man page and > modified slightly. The stack is in the data area in this example. > There could be a case of a stack being with the main program stack or > even in the heap. > > #include > #include > #include > > static ucontext_t uctx_main, uctx_func1, uctx_func2; > > #define handle_error(msg) \ >do { perror(msg); exit(EXIT_FAILURE); } while (0) > > static char func1_stack[16384]; > static char func2_stack[16384]; > > static void func1(void) > { > printf("func1: started\n"); > printf("func1: swapcontext(&uctx_func1, &uctx_func2)\n"); > if (swapcontext(&uctx_func1, &uctx_func2) == -1) > handle_error("swapcontext"); > > sleep (1); > printf("func1: returning\n"); > } > > static void func2(void) > { > printf("func2: started\n"); > printf("func2: swapcontext(&uctx_func2, &uctx_func1)\n"); > if (swapcontext(&uctx_func2, &uctx_func1) == -1) > handle_error("swapcontext"); > sleep (1); > printf("func2: returning\n"); > } > > int main(int argc, char *argv[]) > { > if (getcontext(&uctx_func1) == -1) > handle_error("getcontext"); > uctx_func1.uc_stack.ss_sp = func1_stack; > uctx_func1.uc_stack.ss_size = sizeof(func1_stack); > uctx_func1.uc_link = &uctx_main; > makecontext(&uctx_func1, func1, 0); > > if (getcontext(&uctx_func2) == -1) > handle_error("getcontext"); > uctx_func2.uc_stack.ss_sp = func2_stack; > uctx_func2.uc_stack.ss_size = sizeof(func2_stack); > /* Successor context is f1(), unless argc > 1 */ > uctx_func2.uc_link = (argc > 1) ? NULL : &uctx_func1; > makecontext(&uctx_func2, func2, 0); > > printf("main: swapcontext(&uctx_main, &uctx_func2)\n"); > if (swapcontext(&uctx_main, &uctx_func2) == -1) > handle_error("swapcontext"); > > printf("main: exiting\n"); > exit(EXIT_SUCCESS); > } > > > $./a.out > $ cat /proc/$(pgrep a.out)/maps > > 0040-00401000 r-xp fd:00 1704114 > /tmp/a.out > 00601000-00602000 rw-p 1000 fd:00 1704114 > /tmp/a.out > 00602000-0060a000 rw-p 00:00 0 > [stack:7352] > 7fac662d2000-7fac6647e000 r-xp fd:00 524922 > /usr/lib64/libc-2.15.so > 7fac6647e000-7fac6667e000 ---p 001ac000 fd:00 524922 > /usr/lib64/libc-2.15.so > 7fac6667e000-7fac66682000 r--p 001ac000 fd:00 524922 > /usr/lib64/libc-2.15.so > 7fac66682000-7fac66684000 rw-p 001b
Ambigiuous thread stack annotation in /proc/pid/[s]maps
Commit b764375 added annotations of thread stacks. This annotation can be wrong depending on the memory layout of a task (will probably only happen under 32 bit). If a large allocation happens before the creation of a thread you can get: root@box:~# cat /proc/2032/maps 08048000-08049000 r-xp 08:01 3407931/root/pt 08049000-0804a000 r--p 08:01 3407931/root/pt 0804a000-0804b000 rw-p 1000 08:01 3407931/root/pt 093b8000-093d9000 rw-p 00:00 0 [heap] 363fe000-363ff000 ---p 00:00 0 363ff000-b6c0 rw-p 00:00 0 [stack:2036] b6c0-b6c21000 rw-p 00:00 0 b6c21000-b6d0 ---p 00:00 0 b6d31000-b6d32000 ---p 00:00 0 b6d32000-b7534000 rw-p 00:00 0 [stack:2033] b7534000-b76d7000 r-xp 08:01 3145793 /lib/i386-linux-gnu/libc-2.15.so b76d7000-b76d9000 r--p 001a3000 08:01 3145793 /lib/i386-linux-gnu/libc-2.15.so b76d9000-b76da000 rw-p 001a5000 08:01 3145793 /lib/i386-linux-gnu/libc-2.15.so b76da000-b76dd000 rw-p 00:00 0 b76dd000-b76f4000 r-xp 08:01 3149883 /lib/i386-linux-gnu/libpthread-2.15.so b76f4000-b76f5000 r--p 00016000 08:01 3149883 /lib/i386-linux-gnu/libpthread-2.15.so b76f5000-b76f6000 rw-p 00017000 08:01 3149883 /lib/i386-linux-gnu/libpthread-2.15.so b76f6000-b76f8000 rw-p 00:00 0 b7707000-b770b000 rw-p 00:00 0 b770b000-b770c000 r-xp 00:00 0 [vdso] b770c000-b772c000 r-xp 08:01 3149887/lib/i386-linux-gnu/ld-2.15.so b772c000-b772d000 r--p 0001f000 08:01 3149887/lib/i386-linux-gnu/ld-2.15.so b772d000-b772e000 rw-p 0002 08:01 3149887/lib/i386-linux-gnu/ld-2.15.so bfb15000-bfb36000 rw-p 00:00 0 [stack] Now the stack for thread 2036 is amazingly large for a pthread. Actually it is the result of: 1. malloc(2048 * 1024 * 1024) 2. pthread_create (with attr=NULL) My theory is that this can happen anytime because a thread stack is just created with mmap (MAP_PRIVATE | MAP_ANONYMOUS). The 1. malloc also resulted in a mmap (libc fallback on brk failure) with the same flags (MAP_PRIVATE | MAP_ANONYMOUS). The kernel does not care about thread stacks and tries to merge vma's which seems to happen here and which is correct from the kernel POV. Any ideas how that can be fixed? The only solution that comes to my mind is to prevent merging vma's that are used for thread stacks. There is already a flag (MAP_STACK) which is set by the libc for mmap'ing thread stacks but the kernel currently does not care. If the kernel gets an mmap request with MAP_STACK set we could mark the VMA and avoid merging it with others. Unfortunately there seems to be no bit left in the vm_flags to store the MAP_STACK information... --Jan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ambigiuous thread stack annotation in /proc/pid/[s]maps
On Wed, Jun 26, 2013 at 10:05:41PM +0530, Siddhesh Poyarekar wrote: > On 26 June 2013 17:13, Jan Glauber wrote: > > Any ideas how that can be fixed? The only solution that comes to my mind > > is to prevent merging vma's that are used for thread stacks. There is > > already a > > flag (MAP_STACK) which is set by the libc for mmap'ing thread stacks but the > > kernel currently does not care. If the kernel gets an mmap request with > > MAP_STACK set we could mark the VMA and avoid merging it with others. > > > > Unfortunately there seems to be no bit left in the vm_flags to store the > > MAP_STACK information... > > The annotations essentially point out that the vma contains the stack > and not that the vma *is* the stack. We'd get similar output with But isn't that confusing to the user? At least it is to me. Imagine someone who uses the maps or smaps output to determine the size of code, data and stack of a process. Maybe it would be better to not print the stack:tid data at all if the kernel cannot distinguish the vma's? Is that behaviour documented anywhere? > makecontext/getcontext, where the stack may just be a portion of > memory in another vma. I don't remember if I had explicitly mentioned > that during the original discussion. Never seen that makecontext stuff before. Do you have an example output how the maps would look like if that is used? --Jan > Siddhesh > -- > http://siddhesh.in -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 01/10] s390/pci: base support
On Mon, 2012-12-10 at 14:14 -0700, Bjorn Helgaas wrote: > On Wed, Nov 14, 2012 at 2:41 AM, Jan Glauber wrote: > > Add PCI support for s390, (only 64 bit mode is supported by hardware): > > - PCI facility tests > > - PCI instructions: pcilg, pcistg, pcistb, stpcifc, mpcifc, rpcit > > - map readb/w/l/q and writeb/w/l/q to pcilg and pcistg instructions > > - pci_iomap implementation > > - memcpy_fromio/toio > > - pci_root_ops using special pcilg/pcistg > > - device, bus and domain allocation > > > > Signed-off-by: Jan Glauber > > I think these patches are in -next already, so I just have some > general comments & questions. Yes, since the feedback was manageable we decided to give the patches some exposure in -next and if no one complains we'll just go for the next merge window. BTW, Sebastian & Gerald (on CC:) will continue the work on the PCI code. > My overall impression is that these are exceptionally well done. > They're easy to read, well organized, and well documented. It's a > refreshing change from a lot of the stuff that's posted. Thanks Björn! > As with other arches that run on top of hypervisors, you have > arch-specific code that enumerates PCI devices using hypervisor calls, > and you hook into the PCI core at a lower level than > pci_scan_root_bus(). That is, you call pci_create_root_bus(), > pci_scan_single_device(), pci_bus_add_devices(), etc., directly from > the arch code. This is the typical approach, but it does make more > dependencies between the arch code and the PCI core than I'd like. > > Did you consider hiding any of the hypervisor details behind the PCI > config accessor interface? If that could be done, the overall > structure might be more similar to other architectures. You mean pci_root_ops? I'm not sure I understand how that can be used to hide hipervisor details. One reason why we use the lower level functions is that we need to create the root bus (and its resources) much earlier then the pci_dev. For instance pci_hp_register wants a pci_bus to create the PCI slot and the slot can exist without a pci_dev. > The current config accessors only work for dev/fn 00.0 (they fail when > "devfn != ZPCI_DEVFN"). Why is that? It looks like it precludes > multi-function devices and basically prevents you from building an > arbitrary PCI hierarchy. Our hypervisor does not support multi-function devices. In fact the hypervisor will limit the reported PCI devices to a hand-picked selection so we can be sure that there will be no unsupported devices. The PCI hierarchy is hidden by the hipervisor. We only use the PCI domain number, bus and devfn are always zero. So it looks like every function is directly attached to a PCI root complex. That was the reason for the sanity check, but thinking about it I could remove it since although we don't support multi-function devices I think the s390 code should be more agnostic to these limitations. > zpci_map_resources() is very unusual. The struct pci_dev resource[] > table normally contains CPU physical addresses, but > zpci_map_resources() fills it with virtual addresses. I suspect this > has something to do with the "BAR spaces are not disjunctive on s390" > comment. It almost sounds like that's describing host bridges where > the PCI bus address is not equal to the CPU physical address -- in > that case, device A and device B may have the same values in their > BARs, but they can still be distinct if they're under host bridges > that apply different CPU-to-bus address translations. Yeah, you've found it... I've had 3 or 4 tries on different implementations but all failed. If we use the resources as they are we cannot map them to the instructions (and ioremap does not help because there we cannot find out which device the resource belongs to). If we change the BARs on the card MMIO stops to work. I don't know about host bridges - if we would use a host bridge at which point in the translation process would it kick in? Jan > Bjorn > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: man page for s390_runtime_instr syscall
On Sat, 2012-12-15 at 19:38 +0100, Michael Kerrisk (man-pages) wrote: > Thanks for this page. The man-pages package is the right place for it, > but a few things need fixing. Could you see below and resubmit please? I've addressed all of your comments but for the example. We could add the example later when the official documentation is released which did not yet happen. thanks, Jan --- /dev/null 2012-12-17 11:58:11.967183723 +0100 +++ s390_runtime_instr.22012-12-17 12:55:20.942872393 +0100 @@ -0,0 +1,91 @@ +.\" Copyright IBM Corp. 2012 +.\" Author: Jan Glauber +.\" Copyright (c) 2012, IBM Corp. +.\" +.\" This is free documentation; you can redistribute it and/or +.\" modify it under the terms of the GNU General Public License as +.\" published by the Free Software Foundation; either version 2 of +.\" the License, or (at your option) any later version. +.\" +.\" The GNU General Public License's references to "object code" +.\" and "executables" are to be interpreted as the output of any +.\" document formatting or typesetting system, including +.\" intermediate and printed output. +.\" +.\" This manual is distributed in the hope that it will be useful, +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +.\" GNU General Public License for more details. +.\" +.\" You should have received a copy of the GNU General Public +.\" License along with this manual; if not, see +.\" <http://www.gnu.org/licenses/>. +.TH S390_RUNTIME_INSTR 2 2012-12-17 "Linux Programmer's Manual" +.SH NAME +s390_runtime_instr \- enable/disable s390 CPU runtime instrumentation +.SH SYNOPSIS +.nf +.B #include + +.BI "int s390_runtime_instr(int " command ", int " signum "); +.fi + +.SH DESCRIPTION +The +.BR s390_runtime_instr () +system call starts or stops CPU runtime instrumentation for the current thread. + +The +.IR command +argument controls whether runtime instrumentation is started +( 1 ) or stopped ( 2 ) for the current thread. + +The +.IR signum +argument specifies the number of a real-time signal. +The real-time signal is sent to the thread if the runtime instrumentation +buffer is full or if the runtime-instrumentation-halted interrupt +occurred. + +.SH RETURN VALUE +On success +.BR s390_runtime_instr () +returns 0 and enables the thread for +runtime instrumentation by assigning the thread a default runtime +instrumentation control block. +The caller can then read and modify the control block and start the runtime +instrumentation. +On error, -1 is returned and +.IR errno +is set to one of the error codes listed below. + +.SH ERRORS +.TP +.B EOPNOTSUPP +The runtime instrumentation facility is not available. +.TP +.B EINVAL +The value specified in +.IR command +is not a valid command or the value specified in +.IR signum +is not a real-time signal number. +.TP +.B ENOMEM +Allocating memory for the runtime instrumentation control block failed. + +.SH VERSIONS +This system call is available since Linux 3.7. + +.SH CONFORMING TO +This Linux-specific system call is only available on the s390 architecture. +The runtime instrumentation facility is available beginning with System z EC12. + +.SH NOTES +Glibc does not provide a wrapper for this system call, use +.BR syscall (2) +to call it. + +.SH SEE ALSO +.BR syscall (2), +.BR signal (7) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Module init call vs symbols exporting race?
On Sat, 2007-11-10 at 18:27 +1100, Rusty Russell wrote: > We fail rather than sleep in the "dependency isn't ready" case. Partially > because it's not happened before, but partially because we risk nasty loops. If we fail since we have to in the "dependency isn't ready" case then the warning seems quite useless. I'll send a patch in a seperate mail to remove the warning. -jang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] module loader should not complain about unknown symbol
If module A depends on module B and module B has not yet finished its init() the module loader may print a warning about an unknown symbol. This happens if module B is still in state MODULE_STATE_COMING, as module A runs into resolve_symbol() for a symbol from module B. resolve_symbol() return 0 in that case and causes the warning. --- Signed-off-by: Jan Glauber <[EMAIL PROTECTED]> Index: linux-2.6/kernel/module.c === --- linux-2.6.orig/kernel/module.c 2007-11-12 13:53:44.0 + +++ linux-2.6/kernel/module.c 2007-11-12 13:54:33.0 + @@ -1411,8 +1411,8 @@ if (ELF_ST_BIND(sym[i].st_info) == STB_WEAK) break; - printk(KERN_WARNING "%s: Unknown symbol %s\n", - mod->name, strtab + sym[i].st_name); + DEBUGP("%s: Unknown symbol %s\n", mod->name, + strtab + sym[i].st_name); ret = -ENOENT; break; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] module loader should not complain about unknown symbol
On Tue, 2007-11-13 at 13:52 +1100, Rusty Russell wrote: > On Tuesday 13 November 2007 09:23:12 Rusty Russell wrote: > > Better might be to put in a waitqueue and wake it up whenever a module is > > deleted or changes status. Then use_module() can wait if > > strong_try_module_get() returns -EBUSY (up to 30 seconds, then print a > > warning and fail). > > And here it is. Does it work for you Jan? No, it hangs at boot time after starting udev. All CPUs are in cpu_idle(). Unfortunately I've had no time for further debugging and I'll be offline for the next 10 days... -jang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] virtual sched_clock() for s390
On Thu, 2007-07-19 at 21:38 +0200, Ingo Molnar wrote: > * Jan Glauber <[EMAIL PROTECTED]> wrote: > > > > still, CFS needs time measurement across idle periods as well, for > > > another purpose: to be able to do precise task statistics for /proc. > > > (for top, ps, etc.) So it's still true that sched_clock() should > > > include idle periods too. > > > > I'm not sure, s390 already has an implemetation for precise accounting > > in the architecture code, does CFS also improve accounting data? > > what kind of precise accounting does s390 have in the architecture code? > CFS changes task (and load) accounting to be sched_clock() driven in > essence. s390 has per-process accounting that is aware of virtual cpu time, implemented in arch/s390/kernel/time.c: account_ticks() and arch/s390/kernel/vtime.c: account_*_vtime(). Timestamps are taken in entry.S for system calls, interrupts and other system entries and are accounted later, we don't call update_process_times(). Jan > Ingo > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] virtual sched_clock() for s390
On Fri, 2007-07-20 at 09:22 +0200, Ingo Molnar wrote: > * Paul Mackerras <[EMAIL PROTECTED]> wrote: > > As with s390, 64-bit PowerPC also uses CONFIG_VIRT_CPU_ACCOUNTING. > > That affects how tsk->utime and tsk->stime are accumulated (we call > > account_user_time and account_system_time directly rather than calling > > update_process_times) as well as the system hardirq/softirq time, idle > > time, and stolen time. > > tsk->utime and tsk->stime is only used for a single purpose: to > determine the 'split' factor of how to split up the precise total time > between user and system time. > > > When you say "precise task statistics for /proc", where are they > > accumulated? I don't see any changes to the way that tsk->utime and > > ctime are computed. > > we now use p->se.sum_exec_runtime that measures (in nanoseconds) the > precise amount of time spent executing (sum of system and user time) - > and ->stime and ->utime is used to determine the 'split'. [this allows > us to gather ->stime and ->utime via low-resolution sampling, while > keeping the 'total' precise. Accounting at every system entry point > would be quite expensive on most platforms.] Using se.sum_exec_runtime to generate ->utime and ->stime breaks the process accounting we have on s390 (and probably on PowerPC too). With CONFIG_VIRT_CPU_ACCOUNTING we already have precise values in ->utime and ->stime. Can we make the calculation of the CFS-based time values conditional by CONFIG_VIRT_CPU_ACCOUNTING? Jan > Ingo > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 0/2] use virtual time for CFS on s390
Hi Ingo, an outcome from the previous discussion about a virtual sched_clock() on s390 was that scheduler_tick() should also be called based on virtual time. The second patch changes the scheduler_tick() call to only happen after a tick passed for the virtual cpu. The patches cause nothing obvious to break, numbers from top look sane but I've seen something strange... For a simple make -j6 workload top reports processes very often to be in state . Thats' not terribly wrong since this can happens also without the patches but I wonder if this indicates that the scheduler behaves badly and does less often schedule the parent processes that gathers the waiting zombie processes? Are there any indicators in the sched_debug output that I could look for? Jan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 1/2] virtual sched_clock() for s390
From: Jan Glauber <[EMAIL PROTECTED]> From: Christian Borntraeger <[EMAIL PROTECTED]> This patch introduces a cpu time clock for s390 (only ticking if the virtual cpu is running) and bases the s390 implementation of sched_clock() on it. The time slice length on a virtual cpu can be anything between the calculated time slice and zero. In reality this doesn't seem to be problem, since the scheduler is fair enough to not let a single process starve but the current implementation can lead to inefficient short time slices. By providing a 'virtual' sched_clock() we improve the scheduler fairness, regardless of scheduling decisions from the hypervisor. We also lay a foundation to base process accouting on virtual cpu time without CONFIG_VIRT_CPU_ACCOUNTING. Signed-off-by: Jan Glauber <[EMAIL PROTECTED]> Signed-off-by: Christian Borntraeger <[EMAIL PROTECTED]> CC: Martin Schwidefsky <[EMAIL PROTECTED]> --- arch/s390/kernel/time.c | 30 +-- arch/s390/kernel/vtime.c | 52 --- include/asm-s390/timer.h |2 + 3 files changed, 70 insertions(+), 14 deletions(-) Index: linux-2.6/arch/s390/kernel/time.c === --- linux-2.6.orig/arch/s390/kernel/time.c +++ linux-2.6/arch/s390/kernel/time.c @@ -62,21 +62,27 @@ static u64 jiffies_timer_cc; static u64 xtime_cc; /* - * Scheduler clock - returns current time in nanosec units. + * Monotonic_clock - returns # of nanoseconds passed since time_init() */ -unsigned long long sched_clock(void) +unsigned long long monotonic_clock(void) { - return ((get_clock() - jiffies_timer_cc) * 125) >> 9; + return ((get_clock() - jiffies_timer_cc) >> 9) * 125; } +EXPORT_SYMBOL(monotonic_clock); /* - * Monotonic_clock - returns # of nanoseconds passed since time_init() + * Scheduler clock - returns current time in nanosec units. + * Now based on virtual cpu time to only account time the guest + * was actually running. */ -unsigned long long monotonic_clock(void) +unsigned long long sched_clock(void) { - return sched_clock(); +#ifdef CONFIG_VIRT_TIMER + return s390_cpu_clock(); +#else + return monotonic_clock(); +#endif } -EXPORT_SYMBOL(monotonic_clock); void tod_to_timeval(__u64 todval, struct timespec *xtime) { @@ -348,13 +354,15 @@ void __init time_init(void) /* Enable TOD clock interrupts on the boot cpu. */ init_cpu_timer(); -#ifdef CONFIG_NO_IDLE_HZ - nohz_init(); -#endif - + /* The virtual timer must be ready on the first tick, therefore, + The vtime notifier callback must be registered first */ #ifdef CONFIG_VIRT_TIMER vtime_init(); #endif + +#ifdef CONFIG_NO_IDLE_HZ + nohz_init(); +#endif } /* Index: linux-2.6/arch/s390/kernel/vtime.c === --- linux-2.6.orig/arch/s390/kernel/vtime.c +++ linux-2.6/arch/s390/kernel/vtime.c @@ -3,8 +3,9 @@ *Virtual cpu timer based timer functions. * * S390 version - *Copyright (C) 2004 IBM Deutschland Entwicklung GmbH, IBM Corporation + *Copyright 2004,2007 IBM Corp. *Author(s): Jan Glauber <[EMAIL PROTECTED]> + * Christian Borntraeger <[EMAIL PROTECTED]> */ #include @@ -26,6 +27,44 @@ static ext_int_info_t ext_int_info_timer; static DEFINE_PER_CPU(struct vtimer_queue, virt_cpu_timer); +static DEFINE_PER_CPU(struct vtimer_list, cpu_clock_timer); + +/* + * read the remaining time of a virtual timer running on the current cpu + */ +static unsigned long long read_cpu_timer(struct vtimer_list *timer) +{ + struct vtimer_queue *vt_list; + unsigned long flags; + __u64 done; + + local_irq_save(flags); + + BUG_ON(timer->cpu != smp_processor_id()); + + vt_list = &per_cpu(virt_cpu_timer, timer->cpu); + asm volatile ("STPT %0" : "=m" (done)); + + done = timer->expires - vt_list->offset - (vt_list->to_expire - done); + local_irq_restore(flags); + return done; +} + +/* + * Cpu clock, returns cpu time in nanosec units. + * Must be called with preemption disabled. + */ +unsigned long long s390_cpu_clock(void) +{ + __u64 remaining = read_cpu_timer(&__get_cpu_var(cpu_clock_timer)); + return ((VTIMER_MAX_SLICE - remaining) >> 9) * 125; +} + +/* expire after 142 years ... */ +static void cpu_clock_timer_callback(unsigned long data) +{ + BUG(); +} #ifdef CONFIG_VIRT_CPU_ACCOUNTING /* @@ -148,7 +187,7 @@ static void start_cpu_timer(void) vt_list = &__get_cpu_var(virt_cpu_timer); /* CPU timer interrupt is pending, don't reprogramm it */ - if (vt_list->idle & 1LL<<63) + if ((s64) vt_list->idle < 0) return; if (!list_empty(&vt_list->
[RFC PATCH 2/2] base scheduler_tick() on virtual time
Make the scheduler_tick() dependent on the s390 cpu timer so it gets only called after a virtual cpu has completed a tick. Together with the virtual sched_clock() this should make the scheduler decisions for s390 based on virtual time. Signed-off-by: Jan Glauber <[EMAIL PROTECTED]> Signed-off-by: Martin Schwidefsky <[EMAIL PROTECTED]> --- arch/s390/kernel/time.c|4 arch/s390/kernel/vtime.c |7 ++- include/asm-s390/lowcore.h |8 ++-- include/asm-s390/timer.h |4 4 files changed, 16 insertions(+), 7 deletions(-) Index: linux-2.6/include/asm-s390/timer.h === --- linux-2.6.orig/include/asm-s390/timer.h +++ linux-2.6/include/asm-s390/timer.h @@ -14,6 +14,10 @@ #include +/* change this if you have some constant time drift */ +#define USECS_PER_JIFFY ((unsigned long) 100/HZ) +#define CLK_TICKS_PER_JIFFY ((unsigned long) USECS_PER_JIFFY << 12) + #define VTIMER_MAX_SLICE (0x7000LL) struct vtimer_list { Index: linux-2.6/include/asm-s390/lowcore.h === --- linux-2.6.orig/include/asm-s390/lowcore.h +++ linux-2.6/include/asm-s390/lowcore.h @@ -83,6 +83,7 @@ #define __LC_JIFFY_TIMER 0xC80 #define __LC_CURRENT 0xC90 #define __LC_INT_CLOCK 0xC98 +#define __LC_VIRT_TICK 0xCA0 #else /* __s390x__ */ #define __LC_IRB 0x210 #define __LC_SYNC_ENTER_TIMER 0x250 @@ -106,6 +107,7 @@ #define __LC_JIFFY_TIMER 0xDC0 #define __LC_CURRENT 0xDD8 #define __LC_INT_CLOCK 0xDE8 +#define __LC_VIRT_TICK 0xDF0 #endif /* __s390x__ */ @@ -282,7 +284,8 @@ struct _lowcore __u32current_task; /* 0xc90 */ __u32softirq_pending; /* 0xc94 */ __u64int_clock;/* 0xc98 */ -__u8 pad11[0xe00-0xca0]; /* 0xca0 */ + __u64virt_tick;/* 0xca0 */ + __u8 pad11[0xe00-0xca8]; /* 0xca8 */ /* 0xe00 is used as indicator for dump tools */ /* whether the kernel died with panic() or not */ @@ -374,7 +377,8 @@ struct _lowcore __u64current_task; /* 0xdd8 */ __u64softirq_pending; /* 0xde0 */ __u64int_clock;/* 0xde8 */ -__u8 pad12[0xe00-0xdf0]; /* 0xdf0 */ + __u64virt_tick;/* 0xdf0 */ + __u8 pad12[0xe00-0xdf8]; /* 0xdf8 */ /* 0xe00 is used as indicator for dump tools */ /* whether the kernel died with panic() or not */ Index: linux-2.6/arch/s390/kernel/time.c === --- linux-2.6.orig/arch/s390/kernel/time.c +++ linux-2.6/arch/s390/kernel/time.c @@ -40,10 +40,6 @@ #include #include -/* change this if you have some constant time drift */ -#define USECS_PER_JIFFY ((unsigned long) 100/HZ) -#define CLK_TICKS_PER_JIFFY ((unsigned long) USECS_PER_JIFFY << 12) - /* The value of the TOD clock for 1.1.1970. */ #define TOD_UNIX_EPOCH 0x7d91048bca00ULL Index: linux-2.6/arch/s390/kernel/vtime.c === --- linux-2.6.orig/arch/s390/kernel/vtime.c +++ linux-2.6/arch/s390/kernel/vtime.c @@ -85,6 +85,7 @@ void account_tick_vtime(struct task_stru "=m" (S390_lowcore.last_update_clock) ); S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer; S390_lowcore.steal_clock += S390_lowcore.last_update_clock - clock; + S390_lowcore.virt_tick += timer - S390_lowcore.last_update_timer; cputime = S390_lowcore.user_timer >> 12; rcu_user_flag = cputime != 0; @@ -107,7 +108,11 @@ void account_tick_vtime(struct task_stru run_local_timers(); if (rcu_pending(smp_processor_id())) rcu_check_callbacks(smp_processor_id(), rcu_user_flag); - scheduler_tick(); + + while (S390_lowcore.virt_tick >= CLK_TICKS_PER_JIFFY) { + S390_lowcore.virt_tick -= CLK_TICKS_PER_JIFFY; + scheduler_tick(); + } run_posix_cpu_timers(tsk); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Module init call vs symbols exporting race?
Hi Rusty, I've seen a symbol-resolving race on s390. The qeth module uses symbols from qdio and although the loading order seems correct and the qdio symbols should be available the following error appears: qdio: loading QDIO base support version 2 qeth: Unknown symbol qdio_synchronize qeth: Unknown symbol do_QDIO qeth: Unknown symbol qdio_initialize qeth: Unknown symbol qdio_cleanup qeth: Unknown symbol qdio_activate qeth: Unknown symbol qdio_synchronize qeth: Unknown symbol do_QDIO qeth: Unknown symbol qdio_initialize qeth: Unknown symbol qdio_cleanup qeth: Unknown symbol qdio_activate qeth: loading qeth S/390 OSA-Express driver qeth: Device 0.0.f5f0/0.0.f5f1/0.0.f5f2 is a OSD Express card (level: 087a) with link type OSD_1000 (portname: OSAPORT) qeth: Hardware IP fragmentation not supported on eth0 qeth: VLAN enabled qeth: Multicast enabled qeth: IPV6 enabled qeth: Broadcast enabled qeth: Using SW checksumming on eth0. qeth: Outbound TSO enabled After that both drivers work fine but I'm curious why this happens. Cheers, Jan On Tue, 2007-11-06 at 23:41 +1100, Rusty Russell wrote: > On Tuesday 06 November 2007 20:20:58 Pavel Emelyanov wrote: > > Hi. > > > > I looked at the sys_init_module() and found that the ->init callback > > for the module is called without the module_mutex held and *after* > > the module's symbols are exported. Doesn't this create the race when > > loading two modules in parallel? Like this. > > Hi Pavel, > > In a word, no. See "strong_try_module_get()". > > Cheers, > Rusty. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Module init call vs symbols exporting race?
On Thu, 2007-11-08 at 13:10 +1100, Rusty Russell wrote: > On Wednesday 07 November 2007 21:01:30 Jan Glauber wrote: > > Hi Rusty, > > > > I've seen a symbol-resolving race on s390. The qeth module uses symbols > > from qdio and although the loading order seems correct and the qdio > > symbols should be available the following error appears: > > > > qdio: loading QDIO base support version 2 > > qeth: Unknown symbol qdio_synchronize > > Looks like qdio does something which triggers qeth to load, but of course > qdio > isn't finished initializing yet so its symbols aren't available. > > It's not obvious what's triggering the load, but you could probably find it > by > using printk's through qdio.c's init_QDIO(). Digging through the module loader I found what triggers the error... CPU0 (sys_init_module for qdio) CPU1 (sys_init_module for qeth) mutex_lock() -> load_module() mod->state = COMING mutex_unlock() mutex_lock() init()...takes some timeload_module() -> resolve_symbols() -> use_module() -> stong_try_module_get() bails out because state == COMING -> simplify_symbols() complains with the warning mutex_lock() mod->state = LIVE mutex_unlock() So is it correct that sys_init_module() is called for qeth even if qdio is not yet in MODULE_STATE_LIVE? -jang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Pseudo-random number generator
On Tue, 2006-12-05 at 14:07 +0100, Jan Glauber wrote: > Yes, if an attacker knows the initial clock value a brute-force attack > would be feasible to predict the output. But I don't know if the > hardware completely relies on the clock values or if there is any > internal state which is not visible by an attacker. I will try to find > out more details... OK, after some research I got a new proposal. What the hardware does internally is just a Triple-DES operation with 3 different keys. That means with my old implementation the input for a TDES operation is the buffer full of timestamps. If an attacker can guess the first timestamp he knows the entire input, since the timestamp loop indeed produces timestamps with a constant offset. Therefore I'd like to replace the timestamp loop with only a single timestamp, which has the following advantages: - it is as safe as the loop (since the additional timestamps contain very low entropy) - its faster, since we don't have to overwrite the entire buffer - the one timestamp still guarantees unique, non-repeating output values Even if an attacker knows the input buffer for the next operation (known-plaintext attack) it turns out that currently this attack is not feasible against TDES. Therefore, as long as the key is not compromised there is no way to guess the next output from the generator. Jan -- arch/s390/crypto/Makefile |1 arch/s390/crypto/crypt_s390.h |3 arch/s390/crypto/crypt_s390_query.c |2 arch/s390/crypto/prng.c | 209 arch/s390/defconfig |1 drivers/s390/Kconfig|7 + 6 files changed, 222 insertions(+), 1 deletion(-) diff -urNp linux-2.6.orig/arch/s390/crypto/Makefile linux-2.6/arch/s390/crypto/Makefile --- linux-2.6.orig/arch/s390/crypto/Makefile2006-11-17 10:26:35.0 +0100 +++ linux-2.6/arch/s390/crypto/Makefile 2007-01-16 13:03:12.0 +0100 @@ -6,5 +6,6 @@ obj-$(CONFIG_CRYPTO_SHA1_S390) += sha1_s obj-$(CONFIG_CRYPTO_SHA256_S390) += sha256_s390.o obj-$(CONFIG_CRYPTO_DES_S390) += des_s390.o des_check_key.o obj-$(CONFIG_CRYPTO_AES_S390) += aes_s390.o +obj-$(CONFIG_S390_PRNG) += prng.o obj-$(CONFIG_CRYPTO_TEST) += crypt_s390_query.o diff -urNp linux-2.6.orig/arch/s390/crypto/crypt_s390.h linux-2.6/arch/s390/crypto/crypt_s390.h --- linux-2.6.orig/arch/s390/crypto/crypt_s390.h2006-11-17 10:26:35.0 +0100 +++ linux-2.6/arch/s390/crypto/crypt_s390.h 2007-01-16 13:03:12.0 +0100 @@ -68,6 +68,7 @@ enum crypt_s390_kmc_func { KMC_AES_192_DECRYPT = CRYPT_S390_KMC | 0x13 | 0x80, KMC_AES_256_ENCRYPT = CRYPT_S390_KMC | 0x14, KMC_AES_256_DECRYPT = CRYPT_S390_KMC | 0x14 | 0x80, + KMC_PRNG = CRYPT_S390_KMC | 0x43, }; /* function codes for KIMD (COMPUTE INTERMEDIATE MESSAGE DIGEST) @@ -147,7 +148,7 @@ crypt_s390_km(long func, void* param, u8 * @param src: address of source memory area * @param src_len: length of src operand in bytes * @returns < zero for failure, 0 for the query func, number of processed bytes - * for encryption/decryption funcs + * for encryption/decryption funcs */ static inline int crypt_s390_kmc(long func, void* param, u8* dest, const u8* src, long src_len) diff -urNp linux-2.6.orig/arch/s390/crypto/crypt_s390_query.c linux-2.6/arch/s390/crypto/crypt_s390_query.c --- linux-2.6.orig/arch/s390/crypto/crypt_s390_query.c 2006-11-17 10:26:35.0 +0100 +++ linux-2.6/arch/s390/crypto/crypt_s390_query.c 2007-01-16 13:03:12.0 +0100 @@ -54,6 +54,8 @@ static void query_available_functions(vo crypt_s390_func_available(KMC_AES_192_ENCRYPT)); printk(KERN_INFO "KMC_AES_256: %d\n", crypt_s390_func_available(KMC_AES_256_ENCRYPT)); + printk(KERN_INFO "KMC_PRNG: %d\n", + crypt_s390_func_available(KMC_PRNG)); /* query available KIMD functions */ printk(KERN_INFO "KIMD_QUERY: %d\n", diff -urNp linux-2.6.orig/arch/s390/crypto/prng.c linux-2.6/arch/s390/crypto/prng.c --- linux-2.6.orig/arch/s390/crypto/prng.c 1970-01-01 01:00:00.0 +0100 +++ linux-2.6/arch/s390/crypto/prng.c 2007-01-16 13:04:45.0 +0100 @@ -0,0 +1,209 @@ +/* + * Copyright 2006 IBM Corporation + * Author(s): Jan Glauber <[EMAIL PROTECTED]> + * Driver for the s390 pseudo random number generator + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "crypt_s390.h" + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Jan Glauber <[EMAIL PROTECTED]>"); +MODULE_DESCRIPTION("s390 PRNG interface"); + +static int prng_chunk_size = 256; +module_param(prng_chunk_size, int, 0); +MODULE_PARM_DESC(prng_chunk_size, "PRNG read chunk size in bytes"); + +static int prng_entropy_l
Re: [patch] i386/x86_64: smp_call_function locking inconsistency
On Fri, 2007-02-09 at 09:42 +0100, Heiko Carstens wrote: > I just want to avoid that s390 has different semantics for > smp_call_functiom*() than any other architecture. But then again it > will probably not hurt since we allow more. > Another thing that comes into my mind is smp_call_function together > with cpu hotplug. Who is responsible that preemption and with that > cpu hotplug is disabled? > Is it the caller or smp_call_function itself? I think the caller must disable preemption since smp_call_function() means "do something on all but the current cpu". If the preempt_disable would happen only in smp_call_function() it could already be running on a different cpu, which is not what the caller wants. If preemption must be disabled before smp_call_function() we should have the same semantics for all smp_call_function_* variants. Jan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Pseudo-random number generator
On Mon, 2006-12-04 at 11:15 -0500, [EMAIL PROTECTED] wrote: > On Fri, 01 Dec 2006 14:19:15 +0100, Jan Glauber said: > > New s390 machines have hardware support for the generation of pseudo-random > > numbers. This patch implements a simple char driver that exports this > > numbers > > to user-space. Other possible implementations would have been: > > > + for (i = 0; i < 16; i++) { > > + entropy[0] = get_clock(); > > + entropy[1] = get_clock(); > > + entropy[2] = get_clock(); > > + entropy[3] = get_clock(); > > By the time this loop completes, we'll have done 64 get_clock() - and if an > attacker has a good estimate of what the system clock has in it, they'll be > able to guess all 64 values, since each pass through the loop will have fairly > predictable timing. So as a result, the pseudo-random stream will be a *lot* > less random than one would hope for... I completely agree. Filling the input buffer with timestamps looks quite uncomfortable but was exactly what the hardware specification suggested. At least for the initialisation of the PRNG I preferred get_random_bytes() over get_clock to get a good initial seed. But get_random_bytes cannot be used during normal operation since the PRNG read should not block. > > + /* > > +* It shouldn't weaken the quality of the random numbers > > +* passing the full 16 bytes from STCKE to the generator. > > +*/ > > As long as you realize that probably 12 or 13 or even more of those 16 bytes > are likely predictable (depending exactly how fast the hardware clock ticks), > and as a result the output stream will also be predictable. Yes, if an attacker knows the initial clock value a brute-force attack would be feasible to predict the output. But I don't know if the hardware completely relies on the clock values or if there is any internal state which is not visible by an attacker. I will try to find out more details... Jan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Pseudo-random number generator
On Thu, 2006-12-07 at 16:06 +0100, Arnd Bergmann wrote: > On Friday 01 December 2006 14:19, Jan Glauber wrote: > > I've chosen the char driver since it allows the user to decide which > > pseudo-random > > numbers he wants to use. That means there is a new interface for the s390 > > PRNG, called /dev/prandom. > > > > I would like to know if there are any objections, especially with the > > chosen device > > name. > > This may be a stupid question, but what is it _good_ for? My understanding is > that the crypt_s390_kmc() opcodes work in user mode as well as kernel mode, so > you should not need a character device at all, but maybe just a small tool > that spits prandom data to stdout. Hm, why is /dev/urandom implemented in the kernel? It could be done completely in user-space (like libica already does) but I think having a device node where you can read from is the simplest implementation. Also, if we can solve the security flaw we could use it as replacement for /dev/urandom. Jan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Pseudo-random number generator
On Thu, 2006-12-07 at 19:43 +0100, Arnd Bergmann wrote: > On Thursday 07 December 2006 16:19, Jan Glauber wrote: > > Hm, why is /dev/urandom implemented in the kernel? > > > > It could be done completely in user-space (like libica already does) > > but I think having a device node where you can read from is the simplest > > implementation. Also, if we can solve the security flaw we could use it > > as replacement for /dev/urandom. > > urandom is more useful, because can't be implemented in user space at > all. /dev/urandom will use the real randomness from the kernel as a seed > without depleting the entropy pool. How does your /dev/prandom device > compare to /dev/urandom performance-wise? If it can be made to use > the same input data and it turns out to be significantly faster, I can > see some use for it. The performance of the PRNG without constantly adding entropy is up tp factor 40 faster than /dev/urandom ;- , depending on the block size of the read. With the current patch it performs not so well because of the STCKE loop before every KMC. I think about removing them and changing the periodically seed to use get_random_bytes instead. Jan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH] Pseudo-random number generator
New s390 machines have hardware support for the generation of pseudo-random numbers. This patch implements a simple char driver that exports this numbers to user-space. Other possible implementations would have been: * using the new hwrandom number generator API PRO: reuse of an existing interface CON: this API is for "truly" random data, not for pseudo-random numbers * merging the s390 PRNG with the random pool implementation PRO: no new interface, random numbers can be read through /dev/urandom CON: complex implementation, could only use traditional /dev/urandom algorithm or hardware-accelerated implementation I've chosen the char driver since it allows the user to decide which pseudo-random numbers he wants to use. That means there is a new interface for the s390 PRNG, called /dev/prandom. I would like to know if there are any objections, especially with the chosen device name. Jan --- arch/s390/crypto/Makefile |1 arch/s390/crypto/crypt_s390.h |3 arch/s390/crypto/crypt_s390_query.c |2 arch/s390/crypto/prng.c | 206 arch/s390/defconfig |1 drivers/s390/Kconfig|7 + include/asm-s390/timex.h| 12 ++ 7 files changed, 231 insertions(+), 1 deletion(-) diff -urNp linux-2.5/arch/s390/crypto/Makefile linux-2.5_prng/arch/s390/crypto/Makefile --- linux-2.5/arch/s390/crypto/Makefile 2005-12-06 16:11:12.0 +0100 +++ linux-2.5_prng/arch/s390/crypto/Makefile2006-12-01 13:04:27.0 +0100 @@ -6,5 +6,6 @@ obj-$(CONFIG_CRYPTO_SHA1_S390) += sha1_s obj-$(CONFIG_CRYPTO_SHA256_S390) += sha256_s390.o obj-$(CONFIG_CRYPTO_DES_S390) += des_s390.o des_check_key.o obj-$(CONFIG_CRYPTO_AES_S390) += aes_s390.o +obj-$(CONFIG_S390_PRNG) += prng.o obj-$(CONFIG_CRYPTO_TEST) += crypt_s390_query.o diff -urNp linux-2.5/arch/s390/crypto/crypt_s390.h linux-2.5_prng/arch/s390/crypto/crypt_s390.h --- linux-2.5/arch/s390/crypto/crypt_s390.h 2006-12-01 12:54:23.0 +0100 +++ linux-2.5_prng/arch/s390/crypto/crypt_s390.h2006-12-01 13:05:26.0 +0100 @@ -68,6 +68,7 @@ enum crypt_s390_kmc_func { KMC_AES_192_DECRYPT = CRYPT_S390_KMC | 0x13 | 0x80, KMC_AES_256_ENCRYPT = CRYPT_S390_KMC | 0x14, KMC_AES_256_DECRYPT = CRYPT_S390_KMC | 0x14 | 0x80, + KMC_PRNG = CRYPT_S390_KMC | 0x43, }; /* function codes for KIMD (COMPUTE INTERMEDIATE MESSAGE DIGEST) @@ -147,7 +148,7 @@ crypt_s390_km(long func, void* param, u8 * @param src: address of source memory area * @param src_len: length of src operand in bytes * @returns < zero for failure, 0 for the query func, number of processed bytes - * for encryption/decryption funcs + * for encryption/decryption funcs */ static inline int crypt_s390_kmc(long func, void* param, u8* dest, const u8* src, long src_len) diff -urNp linux-2.5/arch/s390/crypto/crypt_s390_query.c linux-2.5_prng/arch/s390/crypto/crypt_s390_query.c --- linux-2.5/arch/s390/crypto/crypt_s390_query.c 2006-06-19 14:01:10.0 +0200 +++ linux-2.5_prng/arch/s390/crypto/crypt_s390_query.c 2006-12-01 13:05:46.0 +0100 @@ -54,6 +54,8 @@ static void query_available_functions(vo crypt_s390_func_available(KMC_AES_192_ENCRYPT)); printk(KERN_INFO "KMC_AES_256: %d\n", crypt_s390_func_available(KMC_AES_256_ENCRYPT)); + printk(KERN_INFO "KMC_PRNG: %d\n", + crypt_s390_func_available(KMC_PRNG)); /* query available KIMD functions */ printk(KERN_INFO "KIMD_QUERY: %d\n", diff -urNp linux-2.5/arch/s390/crypto/prng.c linux-2.5_prng/arch/s390/crypto/prng.c --- linux-2.5/arch/s390/crypto/prng.c 1970-01-01 01:00:00.0 +0100 +++ linux-2.5_prng/arch/s390/crypto/prng.c 2006-12-01 13:06:50.0 +0100 @@ -0,0 +1,206 @@ +/* + * Copyright 2006 IBM Corporation + * Author(s): Jan Glauber <[EMAIL PROTECTED]> + * Driver for the s390 pseudo random number generator + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "crypt_s390.h" + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Jan Glauber <[EMAIL PROTECTED]>"); +MODULE_DESCRIPTION("s390 PRNG interface"); + +static int prng_chunk_size = 32; +module_param(prng_chunk_size, int, 0); +MODULE_PARM_DESC(prng_chunk_size, "PRNG read chunk size in bytes"); + +static int prng_entropy_limit = 4096; +module_param(prng_entropy_limit, int, 0); +MODULE_PARM_DESC(prng_entropy_limit, "PRNG add entropy after that much bytes were produced"); + +/* + * Any one who considers arithmetical methods of producing random digits is, + * of course, in a state of sin. -- John von Neumann + */ + +struct s390_prng_data { + unsigned long count; /* how many bytes were produced */ + char *
Re: [RFC][PATCH] Pseudo-random number generator
On Fri, 2006-12-01 at 13:39 +, Alan wrote: > > * merging the s390 PRNG with the random pool implementation > > PRO: no new interface, random numbers can be read through /dev/urandom > > CON: complex implementation, could only use traditional /dev/urandom > > algorithm > > or hardware-accelerated implementation > > Also PRO: Can be verified by non-IBM people, high resistance if there is > a flaw (accidental or US government 8)) in the 390 hardware. No, the interesting stuff is done internally in the hardware, we cannot see it with both implementations. > > I've chosen the char driver since it allows the user to decide which > > pseudo-random > > numbers he wants to use. That means there is a new interface for the s390 > > PRNG, called /dev/prandom. > > And people can pipe this into the urandom layer if they wish. Yes, a user can just symlink urandom to prandom and will have a faster generator. > > +/* copied from libica, use a non-zero initial parameter block */ > > +unsigned char parm_block[32] = { > > +0x0F,0x2B,0x8E,0x63,0x8C,0x8E,0xD2,0x52,0x64,0xB7,0xA0,0x7B,0x75,0x28,0xB8,0xF4, > > +0x75,0x5F,0xD2,0xA6,0x8D,0x97,0x11,0xFF,0x49,0xD8,0x23,0xF3,0x7E,0x21,0xEC,0xA0, > > +}; > > + > > Global variables called "p" and "parm_block". Plus parm_block appears to > be const Argh. I'll make them static. > Also your register the misc device before allocating the buffer so it can > be opened in theory before the alloc is done and crash. Fixed. The misc_register is now done at the very end. thanks, Jan --- arch/s390/crypto/Makefile |1 arch/s390/crypto/crypt_s390.h |3 arch/s390/crypto/crypt_s390_query.c |2 arch/s390/crypto/prng.c | 203 arch/s390/defconfig |1 drivers/s390/Kconfig|7 + include/asm-s390/timex.h| 12 ++ 7 files changed, 228 insertions(+), 1 deletion(-) diff -urNp linux-2.5/arch/s390/crypto/Makefile linux-2.5_prng/arch/s390/crypto/Makefile --- linux-2.5/arch/s390/crypto/Makefile 2006-12-01 15:40:54.0 +0100 +++ linux-2.5_prng/arch/s390/crypto/Makefile2006-11-30 18:30:23.0 +0100 @@ -6,5 +6,6 @@ obj-$(CONFIG_CRYPTO_SHA1_S390) += sha1_s obj-$(CONFIG_CRYPTO_SHA256_S390) += sha256_s390.o obj-$(CONFIG_CRYPTO_DES_S390) += des_s390.o des_check_key.o obj-$(CONFIG_CRYPTO_AES_S390) += aes_s390.o +obj-$(CONFIG_S390_PRNG) += prng.o obj-$(CONFIG_CRYPTO_TEST) += crypt_s390_query.o diff -urNp linux-2.5/arch/s390/crypto/crypt_s390.h linux-2.5_prng/arch/s390/crypto/crypt_s390.h --- linux-2.5/arch/s390/crypto/crypt_s390.h 2006-12-01 15:40:54.0 +0100 +++ linux-2.5_prng/arch/s390/crypto/crypt_s390.h2006-12-01 15:38:00.0 +0100 @@ -68,6 +68,7 @@ enum crypt_s390_kmc_func { KMC_AES_192_DECRYPT = CRYPT_S390_KMC | 0x13 | 0x80, KMC_AES_256_ENCRYPT = CRYPT_S390_KMC | 0x14, KMC_AES_256_DECRYPT = CRYPT_S390_KMC | 0x14 | 0x80, + KMC_PRNG = CRYPT_S390_KMC | 0x43, }; /* function codes for KIMD (COMPUTE INTERMEDIATE MESSAGE DIGEST) @@ -147,7 +148,7 @@ crypt_s390_km(long func, void* param, u8 * @param src: address of source memory area * @param src_len: length of src operand in bytes * @returns < zero for failure, 0 for the query func, number of processed bytes - * for encryption/decryption funcs + * for encryption/decryption funcs */ static inline int crypt_s390_kmc(long func, void* param, u8* dest, const u8* src, long src_len) diff -urNp linux-2.5/arch/s390/crypto/crypt_s390_query.c linux-2.5_prng/arch/s390/crypto/crypt_s390_query.c --- linux-2.5/arch/s390/crypto/crypt_s390_query.c 2006-12-01 15:40:54.0 +0100 +++ linux-2.5_prng/arch/s390/crypto/crypt_s390_query.c 2006-12-01 15:37:27.0 +0100 @@ -54,6 +54,8 @@ static void query_available_functions(vo crypt_s390_func_available(KMC_AES_192_ENCRYPT)); printk(KERN_INFO "KMC_AES_256: %d\n", crypt_s390_func_available(KMC_AES_256_ENCRYPT)); + printk(KERN_INFO "KMC_PRNG: %d\n", + crypt_s390_func_available(KMC_PRNG)); /* query available KIMD functions */ printk(KERN_INFO "KIMD_QUERY: %d\n", diff -urNp linux-2.5/arch/s390/crypto/prng.c linux-2.5_prng/arch/s390/crypto/prng.c --- linux-2.5/arch/s390/crypto/prng.c 1970-01-01 01:00:00.0 +0100 +++ linux-2.5_prng/arch/s390/crypto/prng.c 2006-12-01 16:14:22.0 +0100 @@ -0,0 +1,203 @@ +/* + * Copyright 2006 IBM Corporation + * Author(s): Jan Glauber <[EMAIL PROTECTED]> + * Driver for the s390 pseudo random number generator + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "crypt_s390.h&q
[PATCH] virtual sched_clock() for s390
This patch introduces a cpu time clock for s390 (only ticking if the virtual cpu is running) and bases the s390 implementation of sched_clock() on it. The times lice length on a virtual cpu can be anything between the calculated time slice and zero. In reality this doesn't seem to be problem, since the scheduler is fair enough to not let a single process starve but the current implementation can lead to inefficient short time slices. By providing a 'virtual' sched_clock() we guarantee that a process can get its time slice regardless of scheduling decisions from the hypervisor. Patch applies to 2.6.22 git and works fine with CFS. Jan -- arch/s390/kernel/time.c | 18 -- arch/s390/kernel/vtime.c | 45 + include/asm-s390/timer.h |2 ++ 3 files changed, 59 insertions(+), 6 deletions(-) --- ./include/asm-s390/timer.h.cpu_clock2007-07-18 13:43:53.0 +0200 +++ ./include/asm-s390/timer.h 2007-07-18 20:41:13.0 +0200 @@ -48,6 +48,8 @@ extern void init_cpu_vtimer(void); extern void vtime_init(void); +extern unsigned long long cpu_clock(void); + #endif /* __KERNEL__ */ #endif /* _ASM_S390_TIMER_H */ --- ./arch/s390/kernel/time.c.cpu_clock 2007-07-18 13:43:35.0 +0200 +++ ./arch/s390/kernel/time.c 2007-07-18 21:01:07.0 +0200 @@ -62,21 +62,27 @@ static u64 xtime_cc; /* - * Scheduler clock - returns current time in nanosec units. + * Monotonic_clock - returns # of nanoseconds passed since time_init() */ -unsigned long long sched_clock(void) +unsigned long long monotonic_clock(void) { return ((get_clock() - jiffies_timer_cc) * 125) >> 9; } +EXPORT_SYMBOL(monotonic_clock); /* - * Monotonic_clock - returns # of nanoseconds passed since time_init() + * Scheduler clock - returns current time in nanosec units. + * Now based on virtual cpu time to only account time the guest + * was actually running. */ -unsigned long long monotonic_clock(void) +unsigned long long sched_clock(void) { - return sched_clock(); +#ifdef CONFIG_VIRT_TIMER + return cpu_clock(); +#else + return monotonic_clock(); +#endif } -EXPORT_SYMBOL(monotonic_clock); void tod_to_timeval(__u64 todval, struct timespec *xtime) { --- ./arch/s390/kernel/vtime.c.cpu_clock2007-07-18 13:43:44.0 +0200 +++ ./arch/s390/kernel/vtime.c 2007-07-18 20:52:14.0 +0200 @@ -26,6 +26,44 @@ static ext_int_info_t ext_int_info_timer; static DEFINE_PER_CPU(struct vtimer_queue, virt_cpu_timer); +static DEFINE_PER_CPU(struct vtimer_list, cpu_clock_timer); + +/* + * read the remaining time of a virtual timer running on the current cpu + */ +static unsigned long long read_cpu_timer(struct vtimer_list *timer) +{ + struct vtimer_queue *vt_list; + unsigned long flags; + __u64 done; + + local_irq_save(flags); + local_irq_disable(); + + BUG_ON(timer->cpu != smp_processor_id()); + + vt_list = &per_cpu(virt_cpu_timer, timer->cpu); + asm volatile ("STPT %0" : "=m" (done)); + + done = vt_list->to_expire + vt_list->offset - done; + local_irq_restore(flags); + return done; +} + +/* + * Cpu clock, returns cpu time in nanosec units. + * Must be called with preemption disabled. + */ +unsigned long long cpu_clock(void) +{ + return ((read_cpu_timer(&__get_cpu_var(cpu_clock_timer)) * 125) >> 9); +} + +/* expire after 142 years ... */ +static void cpu_clock_timer_callback(unsigned long data) +{ + BUG(); +} #ifdef CONFIG_VIRT_CPU_ACCOUNTING /* @@ -522,6 +560,7 @@ void init_cpu_vtimer(void) { struct vtimer_queue *vt_list; + struct vtimer_list *timer; /* kick the virtual timer */ S390_lowcore.exit_timer = VTIMER_MAX_SLICE; @@ -539,6 +578,12 @@ vt_list->offset = 0; vt_list->idle = 0; + /* add dummy timers needed for cpu_clock */ + timer = &__get_cpu_var(cpu_clock_timer); + init_virt_timer(timer); + timer->expires = VTIMER_MAX_SLICE; + timer->function = cpu_clock_timer_callback; + add_virt_timer(timer); } static int vtimer_idle_notify(struct notifier_block *self, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] virtual sched_clock() for s390
On Thu, 2007-07-19 at 18:00 +0200, Ingo Molnar wrote: > * Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > > > > /* > > > - * Monotonic_clock - returns # of nanoseconds passed since time_init() > > > + * Scheduler clock - returns current time in nanosec units. > > > + * Now based on virtual cpu time to only account time the guest > > > + * was actually running. > > > > Runn*ing*? Does it include time the VCPU spends idle/blocked? If > > not, then the scheduler won't be able to tell how long a process has > > been asleep. Maybe this doesn't matter (I had this problem in a > > version of Xen's sched_clock, and I can't say I saw an ill effects > > from it). No, it does not include idle time, if we're going idle the cpu timer gets stopped. > CFS does measure time elapsed across task-sleep periods (and does > something similar to what the old scheduler's 'sleep average' > interactivity mechanism did), but that mechanism measures "time spent > running during sleep", not "time spent idling". > > still, CFS needs time measurement across idle periods as well, for > another purpose: to be able to do precise task statistics for /proc. > (for top, ps, etc.) So it's still true that sched_clock() should include > idle periods too. I'm not sure, s390 already has an implemetation for precise accounting in the architecture code, does CFS also improve accounting data? Jan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why is arch/s390/crypto/Kconfig sourced when building for another arch ?
Hi Thomas, > with 2.6.20.4 it works great, but when switching to 2.6.21-rcX it breaks > with this: > > drivers/crypto/Kconfig:55: can't open file "arch/s390/crypto/Kconfig" arch/s390/crypto/Kconfig is included there since that is the right place for the config options to show up. > I tried to fix drivers/crypto/Kconfig by changing the code to: > > if S390 > source "arch/s390/crypto/Kconfig" > endif > > but it still gets sourced... You would need something like: source "arch/s390/crypto/Kconfig" depends on S390 But that is not implemented and I doubt it will since deleting parts of the kernel tree is not something that is required to work. Jan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why is arch/s390/crypto/Kconfig sourced when building for another arch ?
On Fri, 2007-03-30 at 05:55 -0400, Robert P. J. Day wrote: > i'm betting the S390 folks would *really* hate that idea but, if you > look closely, the generic Kconfig file *already* has some > arch-dependent content: > > ... > config CRYPTO_DEV_PADLOCK > tristate "Support for VIA PadLock ACE" > depends on X86_32 <- > ... Yes, but the padlock driver is located under drivers/crypto. The s390 crypto stuff is not. It is under arch/s390/crypto, thats why the Kconfig file is there... Both solutions (the current and your proposed) are somehow ugly. I don't care too much, where the Kconfig entries are, as long as it works. So if you're interested in changing it go forward and post a patch... Jan > i think it's a matter of deciding how to be consistent. either you > allow individual architectures to define their own additional Kconfig > files or you don't. mixing the two approaches is a recipe for > confusion. > > rday > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/6] spi: octeon: Move include file from arch/mips to drivers/spi
Move the register definitions to the drivers directory because they are only used there. Signed-off-by: Jan Glauber Tested-by: Steven J. Hill --- .../cvmx-mpi-defs.h => drivers/spi/spi-cavium.h| 32 +- drivers/spi/spi-octeon.c | 3 +- 2 files changed, 3 insertions(+), 32 deletions(-) rename arch/mips/include/asm/octeon/cvmx-mpi-defs.h => drivers/spi/spi-cavium.h (84%) diff --git a/arch/mips/include/asm/octeon/cvmx-mpi-defs.h b/drivers/spi/spi-cavium.h similarity index 84% rename from arch/mips/include/asm/octeon/cvmx-mpi-defs.h rename to drivers/spi/spi-cavium.h index 4615b10..d41dba5 100644 --- a/arch/mips/include/asm/octeon/cvmx-mpi-defs.h +++ b/drivers/spi/spi-cavium.h @@ -1,32 +1,4 @@ -/***license start*** - * Author: Cavium Networks - * - * Contact: supp...@caviumnetworks.com - * This file is part of the OCTEON SDK - * - * Copyright (c) 2003-2012 Cavium Networks - * - * This file is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License, Version 2, as - * published by the Free Software Foundation. - * - * This file is distributed in the hope that it will be useful, but - * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or - * NONINFRINGEMENT. See the GNU General Public License for more - * details. - * - * You should have received a copy of the GNU General Public License - * along with this file; if not, write to the Free Software - * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA - * or visit http://www.gnu.org/licenses/. - * - * This file may also be available under a different license from Cavium. - * Contact Cavium Networks for more information - ***license end**/ - -#ifndef __CVMX_MPI_DEFS_H__ -#define __CVMX_MPI_DEFS_H__ +/* MPI register descriptions */ #define CVMX_MPI_CFG (CVMX_ADD_IO_SEG(0x000107001000ull)) #define CVMX_MPI_DATX(offset) (CVMX_ADD_IO_SEG(0x000107001080ull) + ((offset) & 15) * 8) @@ -324,5 +296,3 @@ union cvmx_mpi_tx { struct cvmx_mpi_tx_s cn66xx; struct cvmx_mpi_tx_cn61xx cnf71xx; }; - -#endif diff --git a/drivers/spi/spi-octeon.c b/drivers/spi/spi-octeon.c index 209eddc..2180176 100644 --- a/drivers/spi/spi-octeon.c +++ b/drivers/spi/spi-octeon.c @@ -15,7 +15,8 @@ #include #include -#include + +#include "spi-cavium.h" #define OCTEON_SPI_MAX_BYTES 9 -- 2.9.0.rc0.21.g322
[PATCH 0/6] SPI ThunderX driver
Hi Mark, This series adds support for SPI on Cavium's ThunderX (arm64). The SPI hardware is the same as on MIPS Octeon, the only difference is that the device appears as a PCI device. To avoid copy and paste of the Octeon driver I've moved the common parts into a shared file. Patches #1-5 prepare the Octeon driver for re-use. Patch #6 adds the ThunderX driver. The series was tested on MIPS (Edge Router PRO and cn71xx) and ThunderX. Feedback welcome! thanks, Jan Jan Glauber (5): spi: octeon: Store system clock freqency in struct octeon_spi spi: octeon: Put register offsets into a struct spi: octeon: Move include file from arch/mips to drivers/spi spi: octeon: Split driver into Octeon specific and common parts spi: octeon: Add thunderx driver Steven J. Hill (1): spi: octeon: Convert driver to use readq()/writeq() functions drivers/spi/Kconfig| 7 + drivers/spi/Makefile | 3 + drivers/spi/spi-cavium-octeon.c| 104 + drivers/spi/spi-cavium-thunderx.c | 158 + drivers/spi/spi-cavium.c | 151 .../cvmx-mpi-defs.h => drivers/spi/spi-cavium.h| 62 ++--- drivers/spi/spi-octeon.c | 255 - 7 files changed, 456 insertions(+), 284 deletions(-) create mode 100644 drivers/spi/spi-cavium-octeon.c create mode 100644 drivers/spi/spi-cavium-thunderx.c create mode 100644 drivers/spi/spi-cavium.c rename arch/mips/include/asm/octeon/cvmx-mpi-defs.h => drivers/spi/spi-cavium.h (84%) delete mode 100644 drivers/spi/spi-octeon.c -- 2.9.0.rc0.21.g322
[PATCH 3/6] spi: octeon: Put register offsets into a struct
Instead of hard-coding the register offsets put them into a struct and set them in the probe function. Signed-off-by: Jan Glauber Tested-by: Steven J. Hill --- drivers/spi/spi-octeon.c | 41 +++-- 1 file changed, 27 insertions(+), 14 deletions(-) diff --git a/drivers/spi/spi-octeon.c b/drivers/spi/spi-octeon.c index e722040..209eddc 100644 --- a/drivers/spi/spi-octeon.c +++ b/drivers/spi/spi-octeon.c @@ -17,22 +17,30 @@ #include #include -#define OCTEON_SPI_CFG 0 -#define OCTEON_SPI_STS 0x08 -#define OCTEON_SPI_TX 0x10 -#define OCTEON_SPI_DAT0 0x80 - #define OCTEON_SPI_MAX_BYTES 9 #define OCTEON_SPI_MAX_CLOCK_HZ 1600 +struct octeon_spi_regs { + int config; + int status; + int tx; + int data; +}; + struct octeon_spi { void __iomem *register_base; u64 last_cfg; u64 cs_enax; int sys_freq; + struct octeon_spi_regs regs; }; +#define OCTEON_SPI_CFG(x) (x->regs.config) +#define OCTEON_SPI_STS(x) (x->regs.status) +#define OCTEON_SPI_TX(x) (x->regs.tx) +#define OCTEON_SPI_DAT0(x) (x->regs.data) + static void octeon_spi_wait_ready(struct octeon_spi *p) { union cvmx_mpi_sts mpi_sts; @@ -41,7 +49,7 @@ static void octeon_spi_wait_ready(struct octeon_spi *p) do { if (loops++) __delay(500); - mpi_sts.u64 = readq(p->register_base + OCTEON_SPI_STS); + mpi_sts.u64 = readq(p->register_base + OCTEON_SPI_STS(p)); } while (mpi_sts.s.busy); } @@ -83,7 +91,7 @@ static int octeon_spi_do_transfer(struct octeon_spi *p, if (mpi_cfg.u64 != p->last_cfg) { p->last_cfg = mpi_cfg.u64; - writeq(mpi_cfg.u64, p->register_base + OCTEON_SPI_CFG); + writeq(mpi_cfg.u64, p->register_base + OCTEON_SPI_CFG(p)); } tx_buf = xfer->tx_buf; rx_buf = xfer->rx_buf; @@ -95,19 +103,19 @@ static int octeon_spi_do_transfer(struct octeon_spi *p, d = *tx_buf++; else d = 0; - writeq(d, p->register_base + OCTEON_SPI_DAT0 + (8 * i)); + writeq(d, p->register_base + OCTEON_SPI_DAT0(p) + (8 * i)); } mpi_tx.u64 = 0; mpi_tx.s.csid = spi->chip_select; mpi_tx.s.leavecs = 1; mpi_tx.s.txnum = tx_buf ? OCTEON_SPI_MAX_BYTES : 0; mpi_tx.s.totnum = OCTEON_SPI_MAX_BYTES; - writeq(mpi_tx.u64, p->register_base + OCTEON_SPI_TX); + writeq(mpi_tx.u64, p->register_base + OCTEON_SPI_TX(p)); octeon_spi_wait_ready(p); if (rx_buf) for (i = 0; i < OCTEON_SPI_MAX_BYTES; i++) { - u64 v = readq(p->register_base + OCTEON_SPI_DAT0 + (8 * i)); + u64 v = readq(p->register_base + OCTEON_SPI_DAT0(p) + (8 * i)); *rx_buf++ = (u8)v; } len -= OCTEON_SPI_MAX_BYTES; @@ -119,7 +127,7 @@ static int octeon_spi_do_transfer(struct octeon_spi *p, d = *tx_buf++; else d = 0; - writeq(d, p->register_base + OCTEON_SPI_DAT0 + (8 * i)); + writeq(d, p->register_base + OCTEON_SPI_DAT0(p) + (8 * i)); } mpi_tx.u64 = 0; @@ -130,12 +138,12 @@ static int octeon_spi_do_transfer(struct octeon_spi *p, mpi_tx.s.leavecs = !xfer->cs_change; mpi_tx.s.txnum = tx_buf ? len : 0; mpi_tx.s.totnum = len; - writeq(mpi_tx.u64, p->register_base + OCTEON_SPI_TX); + writeq(mpi_tx.u64, p->register_base + OCTEON_SPI_TX(p)); octeon_spi_wait_ready(p); if (rx_buf) for (i = 0; i < len; i++) { - u64 v = readq(p->register_base + OCTEON_SPI_DAT0 + (8 * i)); + u64 v = readq(p->register_base + OCTEON_SPI_DAT0(p) + (8 * i)); *rx_buf++ = (u8)v; } @@ -194,6 +202,11 @@ static int octeon_spi_probe(struct platform_device *pdev) p->register_base = reg_base; p->sys_freq = octeon_get_io_clock_rate(); + p->regs.config = 0; + p->regs.status = 0x08; + p->regs.tx = 0x10; + p->regs.data = 0x80; + master->num_chipselect = 4; master->mode_bits = SPI_CPHA | SPI_CPOL | @@ -226,7 +239,7 @@ static int octeon_spi_remove(struct platform_device *pdev) struct octeon_spi *p = spi_master_get_devdata(master); /* Clear the CSENA* and put everything in a known state. */ - writeq(0, p->register_base + OCTEON_SPI_CFG); + writeq(0, p->register_base + OCTEON_SPI_CFG(p)); return 0; } -- 2.9.0.rc0.21.g322
[PATCH 6/6] spi: octeon: Add thunderx driver
Add ThunderX SPI driver using the shared part from the Octeon driver. The main difference of the ThunderX driver is that it is a PCI device so probing is different. The system clock settings can be specified in device tree. Signed-off-by: Jan Glauber --- drivers/spi/Kconfig | 7 ++ drivers/spi/Makefile | 2 + drivers/spi/spi-cavium-thunderx.c | 158 ++ drivers/spi/spi-cavium.h | 3 + 4 files changed, 170 insertions(+) create mode 100644 drivers/spi/spi-cavium-thunderx.c diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig index 4b931ec..db02ba7 100644 --- a/drivers/spi/Kconfig +++ b/drivers/spi/Kconfig @@ -630,6 +630,13 @@ config SPI_TEGRA20_SLINK help SPI driver for Nvidia Tegra20/Tegra30 SLINK Controller interface. +config SPI_THUNDERX + tristate "Cavium ThunderX SPI controller" + depends on 64BIT && PCI && !CAVIUM_OCTEON_SOC + help + SPI host driver for the hardware found on Cavium ThunderX + SOCs. + config SPI_TOPCLIFF_PCH tristate "Intel EG20T PCH/LAPIS Semicon IOH(ML7213/ML7223/ML7831) SPI" depends on PCI && (X86_32 || MIPS || COMPILE_TEST) diff --git a/drivers/spi/Makefile b/drivers/spi/Makefile index 185367e..133364b 100644 --- a/drivers/spi/Makefile +++ b/drivers/spi/Makefile @@ -91,6 +91,8 @@ obj-$(CONFIG_SPI_TEGRA114)+= spi-tegra114.o obj-$(CONFIG_SPI_TEGRA20_SFLASH) += spi-tegra20-sflash.o obj-$(CONFIG_SPI_TEGRA20_SLINK)+= spi-tegra20-slink.o obj-$(CONFIG_SPI_TLE62X0) += spi-tle62x0.o +spi-thunderx-objs := spi-cavium.o spi-cavium-thunderx.o +obj-$(CONFIG_SPI_THUNDERX) += spi-thunderx.o obj-$(CONFIG_SPI_TOPCLIFF_PCH) += spi-topcliff-pch.o obj-$(CONFIG_SPI_TXX9) += spi-txx9.o obj-$(CONFIG_SPI_XCOMM)+= spi-xcomm.o diff --git a/drivers/spi/spi-cavium-thunderx.c b/drivers/spi/spi-cavium-thunderx.c new file mode 100644 index 000..7eb9141 --- /dev/null +++ b/drivers/spi/spi-cavium-thunderx.c @@ -0,0 +1,158 @@ +/* + * Cavium ThunderX SPI driver. + * + * Copyright (C) 2016 Cavium Inc. + * Authors: Jan Glauber + */ + +#include +#include +#include +#include +#include + +#include "spi-cavium.h" + +#define DRV_NAME "spi-thunderx" + +#define SYS_FREQ_DEFAULT 7 + +static void thunderx_spi_clock_enable(struct device *dev, struct octeon_spi *p) +{ + int ret; + + p->clk = devm_clk_get(dev, NULL); + if (IS_ERR(p->clk)) { + p->clk = NULL; + goto skip; + } + + ret = clk_prepare_enable(p->clk); + if (ret) + goto skip; + p->sys_freq = clk_get_rate(p->clk); + +skip: + if (!p->sys_freq) + p->sys_freq = SYS_FREQ_DEFAULT; + + dev_info(dev, "Set system clock to %u\n", p->sys_freq); +} + +static void thunderx_spi_clock_disable(struct device *dev, struct clk *clk) +{ + if (!clk) + return; + clk_disable_unprepare(clk); + devm_clk_put(dev, clk); +} + +static int thunderx_spi_probe(struct pci_dev *pdev, + const struct pci_device_id *ent) +{ + struct device *dev = &pdev->dev; + struct spi_master *master; + struct octeon_spi *p; + int ret = -ENOENT; + + master = spi_alloc_master(dev, sizeof(struct octeon_spi)); + if (!master) + return -ENOMEM; + p = spi_master_get_devdata(master); + + ret = pci_enable_device(pdev); + if (ret) { + dev_err(dev, "Failed to enable PCI device\n"); + goto out_free; + } + + ret = pci_request_regions(pdev, DRV_NAME); + if (ret) { + dev_err(dev, "PCI request regions failed 0x%x\n", ret); + goto out_disable; + } + + p->register_base = pci_ioremap_bar(pdev, 0); + if (!p->register_base) { + dev_err(dev, "Cannot map reg base\n"); + ret = -EINVAL; + goto out_region; + } + + p->regs.config = 0x1000; + p->regs.status = 0x1008; + p->regs.tx = 0x1010; + p->regs.data = 0x1080; + + thunderx_spi_clock_enable(dev, p); + + master->num_chipselect = 4; + master->mode_bits = SPI_CPHA | SPI_CPOL | SPI_CS_HIGH | + SPI_LSB_FIRST | SPI_3WIRE; + master->transfer_one_message = octeon_spi_transfer_one_message; + master->bits_per_word_mask = SPI_BPW_MASK(8); + master->max_speed_hz = OCTEON_SPI_MAX_CLOCK_HZ; + master->dev.of_node = pdev->dev.of_node; + + pci_set_drvdata(pdev, master); + ret = devm_spi_register_master(dev, master); + if (ret) { + dev_e
[PATCH 2/6] spi: octeon: Store system clock freqency in struct octeon_spi
Storing the system clock frequency in struct octeon_spi avoids calling the MIPS specific octeon_get_io_clock_rate() for every transfer. Signed-off-by: Jan Glauber Tested-by: Steven J. Hill --- drivers/spi/spi-octeon.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/spi/spi-octeon.c b/drivers/spi/spi-octeon.c index b53ba53..e722040 100644 --- a/drivers/spi/spi-octeon.c +++ b/drivers/spi/spi-octeon.c @@ -30,6 +30,7 @@ struct octeon_spi { void __iomem *register_base; u64 last_cfg; u64 cs_enax; + int sys_freq; }; static void octeon_spi_wait_ready(struct octeon_spi *p) @@ -53,7 +54,6 @@ static int octeon_spi_do_transfer(struct octeon_spi *p, union cvmx_mpi_cfg mpi_cfg; union cvmx_mpi_tx mpi_tx; unsigned int clkdiv; - unsigned int speed_hz; int mode; bool cpha, cpol; const u8 *tx_buf; @@ -65,9 +65,7 @@ static int octeon_spi_do_transfer(struct octeon_spi *p, cpha = mode & SPI_CPHA; cpol = mode & SPI_CPOL; - speed_hz = xfer->speed_hz; - - clkdiv = octeon_get_io_clock_rate() / (2 * speed_hz); + clkdiv = p->sys_freq / (2 * xfer->speed_hz); mpi_cfg.u64 = 0; @@ -194,6 +192,7 @@ static int octeon_spi_probe(struct platform_device *pdev) } p->register_base = reg_base; + p->sys_freq = octeon_get_io_clock_rate(); master->num_chipselect = 4; master->mode_bits = SPI_CPHA | -- 2.9.0.rc0.21.g322
[PATCH 1/6] spi: octeon: Convert driver to use readq()/writeq() functions
From: "Steven J. Hill" Remove all calls to cvmx_read_csr()/cvmx_write_csr() and use the portable readq()/writeq() functions. Signed-off-by: Steven J. Hill Signed-off-by: Jan Glauber --- drivers/spi/spi-octeon.c | 23 +++ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/drivers/spi/spi-octeon.c b/drivers/spi/spi-octeon.c index 3b17009..b53ba53 100644 --- a/drivers/spi/spi-octeon.c +++ b/drivers/spi/spi-octeon.c @@ -27,7 +27,7 @@ #define OCTEON_SPI_MAX_CLOCK_HZ 1600 struct octeon_spi { - u64 register_base; + void __iomem *register_base; u64 last_cfg; u64 cs_enax; }; @@ -40,7 +40,7 @@ static void octeon_spi_wait_ready(struct octeon_spi *p) do { if (loops++) __delay(500); - mpi_sts.u64 = cvmx_read_csr(p->register_base + OCTEON_SPI_STS); + mpi_sts.u64 = readq(p->register_base + OCTEON_SPI_STS); } while (mpi_sts.s.busy); } @@ -85,7 +85,7 @@ static int octeon_spi_do_transfer(struct octeon_spi *p, if (mpi_cfg.u64 != p->last_cfg) { p->last_cfg = mpi_cfg.u64; - cvmx_write_csr(p->register_base + OCTEON_SPI_CFG, mpi_cfg.u64); + writeq(mpi_cfg.u64, p->register_base + OCTEON_SPI_CFG); } tx_buf = xfer->tx_buf; rx_buf = xfer->rx_buf; @@ -97,19 +97,19 @@ static int octeon_spi_do_transfer(struct octeon_spi *p, d = *tx_buf++; else d = 0; - cvmx_write_csr(p->register_base + OCTEON_SPI_DAT0 + (8 * i), d); + writeq(d, p->register_base + OCTEON_SPI_DAT0 + (8 * i)); } mpi_tx.u64 = 0; mpi_tx.s.csid = spi->chip_select; mpi_tx.s.leavecs = 1; mpi_tx.s.txnum = tx_buf ? OCTEON_SPI_MAX_BYTES : 0; mpi_tx.s.totnum = OCTEON_SPI_MAX_BYTES; - cvmx_write_csr(p->register_base + OCTEON_SPI_TX, mpi_tx.u64); + writeq(mpi_tx.u64, p->register_base + OCTEON_SPI_TX); octeon_spi_wait_ready(p); if (rx_buf) for (i = 0; i < OCTEON_SPI_MAX_BYTES; i++) { - u64 v = cvmx_read_csr(p->register_base + OCTEON_SPI_DAT0 + (8 * i)); + u64 v = readq(p->register_base + OCTEON_SPI_DAT0 + (8 * i)); *rx_buf++ = (u8)v; } len -= OCTEON_SPI_MAX_BYTES; @@ -121,7 +121,7 @@ static int octeon_spi_do_transfer(struct octeon_spi *p, d = *tx_buf++; else d = 0; - cvmx_write_csr(p->register_base + OCTEON_SPI_DAT0 + (8 * i), d); + writeq(d, p->register_base + OCTEON_SPI_DAT0 + (8 * i)); } mpi_tx.u64 = 0; @@ -132,12 +132,12 @@ static int octeon_spi_do_transfer(struct octeon_spi *p, mpi_tx.s.leavecs = !xfer->cs_change; mpi_tx.s.txnum = tx_buf ? len : 0; mpi_tx.s.totnum = len; - cvmx_write_csr(p->register_base + OCTEON_SPI_TX, mpi_tx.u64); + writeq(mpi_tx.u64, p->register_base + OCTEON_SPI_TX); octeon_spi_wait_ready(p); if (rx_buf) for (i = 0; i < len; i++) { - u64 v = cvmx_read_csr(p->register_base + OCTEON_SPI_DAT0 + (8 * i)); + u64 v = readq(p->register_base + OCTEON_SPI_DAT0 + (8 * i)); *rx_buf++ = (u8)v; } @@ -193,7 +193,7 @@ static int octeon_spi_probe(struct platform_device *pdev) goto fail; } - p->register_base = (u64)reg_base; + p->register_base = reg_base; master->num_chipselect = 4; master->mode_bits = SPI_CPHA | @@ -225,10 +225,9 @@ static int octeon_spi_remove(struct platform_device *pdev) { struct spi_master *master = platform_get_drvdata(pdev); struct octeon_spi *p = spi_master_get_devdata(master); - u64 register_base = p->register_base; /* Clear the CSENA* and put everything in a known state. */ - cvmx_write_csr(register_base + OCTEON_SPI_CFG, 0); + writeq(0, p->register_base + OCTEON_SPI_CFG); return 0; } -- 2.9.0.rc0.21.g322
[PATCH 5/6] spi: octeon: Split driver into Octeon specific and common parts
Separate driver probing from SPI transfer functions. Signed-off-by: Jan Glauber Tested-by: Steven J. Hill --- drivers/spi/Makefile | 1 + drivers/spi/spi-cavium-octeon.c| 104 + drivers/spi/{spi-octeon.c => spi-cavium.c} | 120 + drivers/spi/spi-cavium.h | 31 4 files changed, 138 insertions(+), 118 deletions(-) create mode 100644 drivers/spi/spi-cavium-octeon.c rename drivers/spi/{spi-octeon.c => spi-cavium.c} (55%) diff --git a/drivers/spi/Makefile b/drivers/spi/Makefile index 3c74d00..185367e 100644 --- a/drivers/spi/Makefile +++ b/drivers/spi/Makefile @@ -56,6 +56,7 @@ obj-$(CONFIG_SPI_MT65XX)+= spi-mt65xx.o obj-$(CONFIG_SPI_MXS) += spi-mxs.o obj-$(CONFIG_SPI_NUC900) += spi-nuc900.o obj-$(CONFIG_SPI_OC_TINY) += spi-oc-tiny.o +spi-octeon-objs:= spi-cavium.o spi-cavium-octeon.o obj-$(CONFIG_SPI_OCTEON) += spi-octeon.o obj-$(CONFIG_SPI_OMAP_UWIRE) += spi-omap-uwire.o obj-$(CONFIG_SPI_OMAP_100K)+= spi-omap-100k.o diff --git a/drivers/spi/spi-cavium-octeon.c b/drivers/spi/spi-cavium-octeon.c new file mode 100644 index 000..ee4703e --- /dev/null +++ b/drivers/spi/spi-cavium-octeon.c @@ -0,0 +1,104 @@ +/* + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * Copyright (C) 2011, 2012 Cavium, Inc. + */ + +#include +#include +#include +#include +#include + +#include + +#include "spi-cavium.h" + +static int octeon_spi_probe(struct platform_device *pdev) +{ + struct resource *res_mem; + void __iomem *reg_base; + struct spi_master *master; + struct octeon_spi *p; + int err = -ENOENT; + + master = spi_alloc_master(&pdev->dev, sizeof(struct octeon_spi)); + if (!master) + return -ENOMEM; + p = spi_master_get_devdata(master); + platform_set_drvdata(pdev, master); + + res_mem = platform_get_resource(pdev, IORESOURCE_MEM, 0); + reg_base = devm_ioremap_resource(&pdev->dev, res_mem); + if (IS_ERR(reg_base)) { + err = PTR_ERR(reg_base); + goto fail; + } + + p->register_base = reg_base; + p->sys_freq = octeon_get_io_clock_rate(); + + p->regs.config = 0; + p->regs.status = 0x08; + p->regs.tx = 0x10; + p->regs.data = 0x80; + + master->num_chipselect = 4; + master->mode_bits = SPI_CPHA | + SPI_CPOL | + SPI_CS_HIGH | + SPI_LSB_FIRST | + SPI_3WIRE; + + master->transfer_one_message = octeon_spi_transfer_one_message; + master->bits_per_word_mask = SPI_BPW_MASK(8); + master->max_speed_hz = OCTEON_SPI_MAX_CLOCK_HZ; + + master->dev.of_node = pdev->dev.of_node; + err = devm_spi_register_master(&pdev->dev, master); + if (err) { + dev_err(&pdev->dev, "register master failed: %d\n", err); + goto fail; + } + + dev_info(&pdev->dev, "OCTEON SPI bus driver\n"); + + return 0; +fail: + spi_master_put(master); + return err; +} + +static int octeon_spi_remove(struct platform_device *pdev) +{ + struct spi_master *master = platform_get_drvdata(pdev); + struct octeon_spi *p = spi_master_get_devdata(master); + + /* Clear the CSENA* and put everything in a known state. */ + writeq(0, p->register_base + OCTEON_SPI_CFG(p)); + + return 0; +} + +static const struct of_device_id octeon_spi_match[] = { + { .compatible = "cavium,octeon-3010-spi", }, + {}, +}; +MODULE_DEVICE_TABLE(of, octeon_spi_match); + +static struct platform_driver octeon_spi_driver = { + .driver = { + .name = "spi-octeon", + .of_match_table = octeon_spi_match, + }, + .probe = octeon_spi_probe, + .remove = octeon_spi_remove, +}; + +module_platform_driver(octeon_spi_driver); + +MODULE_DESCRIPTION("Cavium, Inc. OCTEON SPI bus driver"); +MODULE_AUTHOR("David Daney"); +MODULE_LICENSE("GPL"); diff --git a/drivers/spi/spi-octeon.c b/drivers/spi/spi-cavium.c similarity index 55% rename from drivers/spi/spi-octeon.c rename to drivers/spi/spi-cavium.c index 2180176..5aaf215 100644 --- a/drivers/spi/spi-octeon.c +++ b/drivers/spi/spi-cavium.c @@ -6,42 +6,13 @@ * Copyright (C) 2011, 2012 Cavium, Inc. */ -#include -#include #include #include #include #include -#include - -#include #include "spi-cavium.h" -#define OCTEON_SPI
[PATCH 3/5] i2c: octeon,thunderx: Fix high-level controller status check
In case the high-level controller (HLC) is used the status code is reported at a different location. Check that location after HLC write operations if the ready bit is not set and return an appropriate error code instead of always returning -EAGAIN. Signed-off-by: Jan Glauber --- drivers/i2c/busses/i2c-octeon-core.c | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/drivers/i2c/busses/i2c-octeon-core.c b/drivers/i2c/busses/i2c-octeon-core.c index 7d4df83..5e63b17 100644 --- a/drivers/i2c/busses/i2c-octeon-core.c +++ b/drivers/i2c/busses/i2c-octeon-core.c @@ -215,7 +215,16 @@ static int octeon_i2c_hlc_wait(struct octeon_i2c *i2c) static int octeon_i2c_check_status(struct octeon_i2c *i2c, int final_read) { - u8 stat = octeon_i2c_stat_read(i2c); + u8 stat; + + /* +* This is ugly... in HLC mode the status is not in the status register +* but in the lower 8 bits of SW_TWSI. +*/ + if (i2c->hlc_enabled) + stat = __raw_readq(i2c->twsi_base + SW_TWSI(i2c)); + else + stat = octeon_i2c_stat_read(i2c); switch (stat) { /* Everything is fine */ @@ -453,7 +462,7 @@ static int octeon_i2c_hlc_read(struct octeon_i2c *i2c, struct i2c_msg *msgs) cmd = __raw_readq(i2c->twsi_base + SW_TWSI(i2c)); if ((cmd & SW_TWSI_R) == 0) - return -EAGAIN; + return octeon_i2c_check_status(i2c, false); for (i = 0, j = msgs[0].len - 1; i < msgs[0].len && i < 4; i++, j--) msgs[0].buf[j] = (cmd >> (8 * i)) & 0xff; @@ -506,9 +515,7 @@ static int octeon_i2c_hlc_write(struct octeon_i2c *i2c, struct i2c_msg *msgs) cmd = __raw_readq(i2c->twsi_base + SW_TWSI(i2c)); if ((cmd & SW_TWSI_R) == 0) - return -EAGAIN; - - ret = octeon_i2c_check_status(i2c, false); + return octeon_i2c_check_status(i2c, false); err: return ret; @@ -553,7 +560,7 @@ static int octeon_i2c_hlc_comp_read(struct octeon_i2c *i2c, struct i2c_msg *msgs cmd = __raw_readq(i2c->twsi_base + SW_TWSI(i2c)); if ((cmd & SW_TWSI_R) == 0) - return -EAGAIN; + return octeon_i2c_check_status(i2c, false); for (i = 0, j = msgs[1].len - 1; i < msgs[1].len && i < 4; i++, j--) msgs[1].buf[j] = (cmd >> (8 * i)) & 0xff; @@ -617,9 +624,7 @@ static int octeon_i2c_hlc_comp_write(struct octeon_i2c *i2c, struct i2c_msg *msg cmd = __raw_readq(i2c->twsi_base + SW_TWSI(i2c)); if ((cmd & SW_TWSI_R) == 0) - return -EAGAIN; - - ret = octeon_i2c_check_status(i2c, false); + return octeon_i2c_check_status(i2c, false); err: return ret; -- 1.9.1
[PATCH 2/5] i2c: octeon,thunderx: Avoid sending STOP during recovery
From: Dmitry Bazhenov Due to a bug in the ThunderX I2C hardware sending STOP during a recovery attempt could lock up the hardware. To work around this problem do not send STOP at the beginning of the recovery but use the override registers to bring the TWSI including the high-level controller out of the bad state. Signed-off-by: Dmitry Bazhenov Signed-off-by: Jan Glauber [Changed commit message] --- drivers/i2c/busses/i2c-octeon-core.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/i2c/busses/i2c-octeon-core.c b/drivers/i2c/busses/i2c-octeon-core.c index f322242..7d4df83 100644 --- a/drivers/i2c/busses/i2c-octeon-core.c +++ b/drivers/i2c/busses/i2c-octeon-core.c @@ -783,13 +783,14 @@ static void octeon_i2c_prepare_recovery(struct i2c_adapter *adap) { struct octeon_i2c *i2c = i2c_get_adapdata(adap); + octeon_i2c_hlc_disable(i2c); + /* -* The stop resets the state machine, does not _transmit_ STOP unless -* engine was active. +* Bring control register to a good state regardless +* of HLC state. */ - octeon_i2c_stop(i2c); + octeon_i2c_ctl_write(i2c, TWSI_CTL_ENAB); - octeon_i2c_hlc_disable(i2c); octeon_i2c_write_int(i2c, 0); } @@ -797,6 +798,15 @@ static void octeon_i2c_unprepare_recovery(struct i2c_adapter *adap) { struct octeon_i2c *i2c = i2c_get_adapdata(adap); + /* +* Generate STOP to finish the unfinished transaction. +* Can't generate STOP via the TWSI CTL register +* since it could bring the TWSI controller into an inoperable state. +*/ + octeon_i2c_write_int(i2c, TWSI_INT_SDA_OVR | TWSI_INT_SCL_OVR); + udelay(5); + octeon_i2c_write_int(i2c, TWSI_INT_SDA_OVR); + udelay(5); octeon_i2c_write_int(i2c, 0); } -- 1.9.1
[PATCH 1/5] i2c: octeon,thunderx: Fix set SCL recovery function
From: Dmitry Bazhenov The set SCL recovery function unconditionally pulls the SCL line low. Only pull SCL line low according to val parameter. Signed-off-by: Dmitry Bazhenov Signed-off-by: Jan Glauber [Changed commit message] --- drivers/i2c/busses/i2c-octeon-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-octeon-core.c b/drivers/i2c/busses/i2c-octeon-core.c index f45ea5e..f322242 100644 --- a/drivers/i2c/busses/i2c-octeon-core.c +++ b/drivers/i2c/busses/i2c-octeon-core.c @@ -767,7 +767,7 @@ static void octeon_i2c_set_scl(struct i2c_adapter *adap, int val) { struct octeon_i2c *i2c = i2c_get_adapdata(adap); - octeon_i2c_write_int(i2c, TWSI_INT_SCL_OVR); + octeon_i2c_write_int(i2c, val ? 0 : TWSI_INT_SCL_OVR); } static int octeon_i2c_get_sda(struct i2c_adapter *adap) -- 1.9.1
[PATCH 0/5] i2c: octeon,thunderx: Recovery fixes and improvements
Hi Wolfram, the ThunderX i2c driver got more exposure with IPMI/BMC testing and we've found several issues that I'd like to address with this series. The patches are all related to recovery or improve stability in the presence of hardware/firmware bugs. Patches are against linux-next and were tested on ThunderX and Octeon (by Steven J. Hill). thanks, Jan -- Dmitry Bazhenov (2): i2c: octeon,thunderx: Fix set SCL recovery function i2c: octeon,thunderx: Avoid sending STOP during recovery Jan Glauber (3): i2c: octeon,thunderx: Fix high-level controller status check i2c: octeon,thunderx: Check bus state before starting a transaction i2c: octeon,thunderx: Limit register access retries drivers/i2c/busses/i2c-octeon-core.c | 67 drivers/i2c/busses/i2c-octeon-core.h | 21 --- 2 files changed, 68 insertions(+), 20 deletions(-) -- 1.9.1
[PATCH 4/5] i2c: octeon,thunderx: Check bus state before starting a transaction
Add an additional status check before starting a transaction and, if required, trigger the recovery if the check fails. Signed-off-by: Jan Glauber --- drivers/i2c/busses/i2c-octeon-core.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/i2c/busses/i2c-octeon-core.c b/drivers/i2c/busses/i2c-octeon-core.c index 5e63b17..b21aa3e 100644 --- a/drivers/i2c/busses/i2c-octeon-core.c +++ b/drivers/i2c/busses/i2c-octeon-core.c @@ -630,6 +630,22 @@ static int octeon_i2c_hlc_comp_write(struct octeon_i2c *i2c, struct i2c_msg *msg return ret; } +static int octeon_i2c_check_bus(struct octeon_i2c *i2c) +{ + int stat, lines; + + stat = octeon_i2c_stat_read(i2c); + + /* get I2C line state */ + lines = octeon_i2c_read_int(i2c) & (TWSI_INT_SCL | TWSI_INT_SDA); + + if (stat == STAT_IDLE && lines == (TWSI_INT_SCL | TWSI_INT_SDA)) + return 0; + + /* bus check failed, try to recover */ + return octeon_i2c_recovery(i2c); +} + /** * octeon_i2c_xfer - The driver's master_xfer function * @adap: Pointer to the i2c_adapter structure @@ -643,6 +659,10 @@ int octeon_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num) struct octeon_i2c *i2c = i2c_get_adapdata(adap); int i, ret = 0; + ret = octeon_i2c_check_bus(i2c); + if (ret) + goto out; + if (num == 1) { if (msgs[0].len > 0 && msgs[0].len <= 8) { if (msgs[0].flags & I2C_M_RD) -- 1.9.1
[PATCH 5/5] i2c: octeon,thunderx: Limit register access retries
Do not infinitely retry register readq and writeq operations in order to not lock up the CPU in case the TWSI gets stuck. Return -EIO in case of a failed data read. For all other cases just return so subsequent operations will fail and trigger the recovery. Signed-off-by: Jan Glauber --- drivers/i2c/busses/i2c-octeon-core.c | 4 +++- drivers/i2c/busses/i2c-octeon-core.h | 21 - 2 files changed, 19 insertions(+), 6 deletions(-) diff --git a/drivers/i2c/busses/i2c-octeon-core.c b/drivers/i2c/busses/i2c-octeon-core.c index b21aa3e..6ba0146 100644 --- a/drivers/i2c/busses/i2c-octeon-core.c +++ b/drivers/i2c/busses/i2c-octeon-core.c @@ -381,7 +381,9 @@ static int octeon_i2c_read(struct octeon_i2c *i2c, int target, if (result) return result; - data[i] = octeon_i2c_data_read(i2c); + data[i] = octeon_i2c_data_read(i2c, &result); + if (result) + return result; if (recv_len && i == 0) { if (data[i] > I2C_SMBUS_BLOCK_MAX + 1) return -EPROTO; diff --git a/drivers/i2c/busses/i2c-octeon-core.h b/drivers/i2c/busses/i2c-octeon-core.h index 87151ea..177d4cc 100644 --- a/drivers/i2c/busses/i2c-octeon-core.h +++ b/drivers/i2c/busses/i2c-octeon-core.h @@ -141,11 +141,14 @@ static inline void octeon_i2c_writeq_flush(u64 val, void __iomem *addr) */ static inline void octeon_i2c_reg_write(struct octeon_i2c *i2c, u64 eop_reg, u8 data) { + int tries = 100; u64 tmp; __raw_writeq(SW_TWSI_V | eop_reg | data, i2c->twsi_base + SW_TWSI(i2c)); do { tmp = __raw_readq(i2c->twsi_base + SW_TWSI(i2c)); + if (--tries < 0) + return; } while ((tmp & SW_TWSI_V) != 0); } @@ -163,24 +166,32 @@ static inline void octeon_i2c_reg_write(struct octeon_i2c *i2c, u64 eop_reg, u8 * * The I2C core registers are accessed indirectly via the SW_TWSI CSR. */ -static inline u8 octeon_i2c_reg_read(struct octeon_i2c *i2c, u64 eop_reg) +static inline int octeon_i2c_reg_read(struct octeon_i2c *i2c, u64 eop_reg, + int *error) { + int tries = 100; u64 tmp; __raw_writeq(SW_TWSI_V | eop_reg | SW_TWSI_R, i2c->twsi_base + SW_TWSI(i2c)); do { tmp = __raw_readq(i2c->twsi_base + SW_TWSI(i2c)); + if (--tries < 0) { + /* signal that the returned data is invalid */ + if (error) + *error = -EIO; + return 0; + } } while ((tmp & SW_TWSI_V) != 0); return tmp & 0xFF; } #define octeon_i2c_ctl_read(i2c) \ - octeon_i2c_reg_read(i2c, SW_TWSI_EOP_TWSI_CTL) -#define octeon_i2c_data_read(i2c) \ - octeon_i2c_reg_read(i2c, SW_TWSI_EOP_TWSI_DATA) + octeon_i2c_reg_read(i2c, SW_TWSI_EOP_TWSI_CTL, NULL) +#define octeon_i2c_data_read(i2c, error) \ + octeon_i2c_reg_read(i2c, SW_TWSI_EOP_TWSI_DATA, error) #define octeon_i2c_stat_read(i2c) \ - octeon_i2c_reg_read(i2c, SW_TWSI_EOP_TWSI_STAT) + octeon_i2c_reg_read(i2c, SW_TWSI_EOP_TWSI_STAT, NULL) /** * octeon_i2c_read_int - read the TWSI_INT register -- 1.9.1
Re: [PATCH 4/5] i2c: octeon,thunderx: Check bus state before starting a transaction
On Wed, Sep 21, 2016 at 10:55:41PM +0200, Wolfram Sang wrote: > On Wed, Sep 21, 2016 at 08:51:05AM +0200, Jan Glauber wrote: > > Add an additional status check before starting a transaction and, > > if required, trigger the recovery if the check fails. > > > > Signed-off-by: Jan Glauber > > Won't this break multi-master setups? > I'm afraid yes. I don't have a multi-master setup, but agree that we should not break it. How about re-checking if the bus is idle until a timeout (i2c->adap.timeout ?) happens and only then recover? Additionaly we can check for arbitration loss as Dmitry did in his original patch. --Jan
Re: [PATCH 5/5] i2c: octeon,thunderx: Limit register access retries
On Wed, Sep 21, 2016 at 11:03:35PM +0200, Wolfram Sang wrote: > On Wed, Sep 21, 2016 at 08:51:06AM +0200, Jan Glauber wrote: > > Do not infinitely retry register readq and writeq operations > > in order to not lock up the CPU in case the TWSI gets stuck. > > > > Return -EIO in case of a failed data read. For all other > > cases just return so subsequent operations will fail > > and trigger the recovery. > > > > Signed-off-by: Jan Glauber > > I didn't really check, but have you considered using > readq_poll_timeout() from iopoll.h? > Indeed, readq_poll_timeout() fits quite well here. It will lose some cycles on mips but I'm not convinced that matters with i2c. That would be the first user of readq_poll_timeout() in the kernel :) --Jan
[PATCH v2 0/2] i2c: octeon: thunderx: Recovery fixes and improvements
Hi Wolfram, here are the remaining two patches with iopoll.h usage and a timed wait before entering the recovery on a failed bus check. thanks, Jan - Jan Glauber (2): i2c: octeon: thunderx: Check bus state before starting a transaction i2c: octeon: thunderx: Limit register access retries drivers/i2c/busses/i2c-octeon-core.c | 33 - drivers/i2c/busses/i2c-octeon-core.h | 27 --- 2 files changed, 48 insertions(+), 12 deletions(-) -- 1.9.1
[PATCH v2 2/2] i2c: octeon: thunderx: Limit register access retries
Do not infinitely retry register readq and writeq operations in order to not lock up the CPU in case the TWSI gets stuck. Return -ETIMEDOUT in case of a failed data read. For all other cases just return so subsequent operations will fail and trigger the recovery. Signed-off-by: Jan Glauber --- drivers/i2c/busses/i2c-octeon-core.c | 4 +++- drivers/i2c/busses/i2c-octeon-core.h | 27 --- 2 files changed, 19 insertions(+), 12 deletions(-) diff --git a/drivers/i2c/busses/i2c-octeon-core.c b/drivers/i2c/busses/i2c-octeon-core.c index f526511..1dfc4b8 100644 --- a/drivers/i2c/busses/i2c-octeon-core.c +++ b/drivers/i2c/busses/i2c-octeon-core.c @@ -381,7 +381,9 @@ static int octeon_i2c_read(struct octeon_i2c *i2c, int target, if (result) return result; - data[i] = octeon_i2c_data_read(i2c); + data[i] = octeon_i2c_data_read(i2c, &result); + if (result) + return result; if (recv_len && i == 0) { if (data[i] > I2C_SMBUS_BLOCK_MAX + 1) return -EPROTO; diff --git a/drivers/i2c/busses/i2c-octeon-core.h b/drivers/i2c/busses/i2c-octeon-core.h index 87151ea..1db7c83 100644 --- a/drivers/i2c/busses/i2c-octeon-core.h +++ b/drivers/i2c/busses/i2c-octeon-core.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -144,9 +145,9 @@ static inline void octeon_i2c_reg_write(struct octeon_i2c *i2c, u64 eop_reg, u8 u64 tmp; __raw_writeq(SW_TWSI_V | eop_reg | data, i2c->twsi_base + SW_TWSI(i2c)); - do { - tmp = __raw_readq(i2c->twsi_base + SW_TWSI(i2c)); - } while ((tmp & SW_TWSI_V) != 0); + + readq_poll_timeout(i2c->twsi_base + SW_TWSI(i2c), tmp, tmp & SW_TWSI_V, + I2C_OCTEON_EVENT_WAIT, i2c->adap.timeout); } #define octeon_i2c_ctl_write(i2c, val) \ @@ -163,24 +164,28 @@ static inline void octeon_i2c_reg_write(struct octeon_i2c *i2c, u64 eop_reg, u8 * * The I2C core registers are accessed indirectly via the SW_TWSI CSR. */ -static inline u8 octeon_i2c_reg_read(struct octeon_i2c *i2c, u64 eop_reg) +static inline int octeon_i2c_reg_read(struct octeon_i2c *i2c, u64 eop_reg, + int *error) { u64 tmp; + int ret; __raw_writeq(SW_TWSI_V | eop_reg | SW_TWSI_R, i2c->twsi_base + SW_TWSI(i2c)); - do { - tmp = __raw_readq(i2c->twsi_base + SW_TWSI(i2c)); - } while ((tmp & SW_TWSI_V) != 0); + ret = readq_poll_timeout(i2c->twsi_base + SW_TWSI(i2c), tmp, +tmp & SW_TWSI_V, I2C_OCTEON_EVENT_WAIT, +i2c->adap.timeout); + if (error) + *error = ret; return tmp & 0xFF; } #define octeon_i2c_ctl_read(i2c) \ - octeon_i2c_reg_read(i2c, SW_TWSI_EOP_TWSI_CTL) -#define octeon_i2c_data_read(i2c) \ - octeon_i2c_reg_read(i2c, SW_TWSI_EOP_TWSI_DATA) + octeon_i2c_reg_read(i2c, SW_TWSI_EOP_TWSI_CTL, NULL) +#define octeon_i2c_data_read(i2c, error) \ + octeon_i2c_reg_read(i2c, SW_TWSI_EOP_TWSI_DATA, error) #define octeon_i2c_stat_read(i2c) \ - octeon_i2c_reg_read(i2c, SW_TWSI_EOP_TWSI_STAT) + octeon_i2c_reg_read(i2c, SW_TWSI_EOP_TWSI_STAT, NULL) /** * octeon_i2c_read_int - read the TWSI_INT register -- 1.9.1
[PATCH v2 1/2] i2c: octeon: thunderx: Check bus state before starting a transaction
Add an additional status check before starting a transaction. If the check fails wait for some time to tolerate multi-master mode. After the timeout expires trigger the recovery. Signed-off-by: Jan Glauber --- drivers/i2c/busses/i2c-octeon-core.c | 29 + 1 file changed, 29 insertions(+) diff --git a/drivers/i2c/busses/i2c-octeon-core.c b/drivers/i2c/busses/i2c-octeon-core.c index 5e63b17..f526511 100644 --- a/drivers/i2c/busses/i2c-octeon-core.c +++ b/drivers/i2c/busses/i2c-octeon-core.c @@ -630,6 +630,31 @@ static int octeon_i2c_hlc_comp_write(struct octeon_i2c *i2c, struct i2c_msg *msg return ret; } +static int octeon_i2c_check_bus(struct octeon_i2c *i2c) +{ + u64 end = get_jiffies_64() + i2c->adap.timeout; + int stat, lines; + + while (time_before64(get_jiffies_64(), end)) { + stat = octeon_i2c_stat_read(i2c); + + /* get I2C line state */ + lines = octeon_i2c_read_int(i2c) & (TWSI_INT_SCL | TWSI_INT_SDA); + + if (stat == STAT_IDLE && lines == (TWSI_INT_SCL | TWSI_INT_SDA)) + return 0; + + if (stat == STAT_LOST_ARB_38 || stat == STAT_LOST_ARB_68 || + stat == STAT_LOST_ARB_78 || stat == STAT_LOST_ARB_B0) + break; + + usleep_range(I2C_OCTEON_EVENT_WAIT / 2, I2C_OCTEON_EVENT_WAIT); + } + + /* bus check failed, try to recover */ + return octeon_i2c_recovery(i2c); +} + /** * octeon_i2c_xfer - The driver's master_xfer function * @adap: Pointer to the i2c_adapter structure @@ -643,6 +668,10 @@ int octeon_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num) struct octeon_i2c *i2c = i2c_get_adapdata(adap); int i, ret = 0; + ret = octeon_i2c_check_bus(i2c); + if (ret) + goto out; + if (num == 1) { if (msgs[0].len > 0 && msgs[0].len <= 8) { if (msgs[0].flags & I2C_M_RD) -- 1.9.1
Re: [PATCH v2 1/2] i2c: octeon: thunderx: Check bus state before starting a transaction
On Sat, Sep 24, 2016 at 11:24:19AM +0200, Wolfram Sang wrote: > On Fri, Sep 23, 2016 at 11:40:38AM +0200, Jan Glauber wrote: > > Add an additional status check before starting a transaction. If the > > check fails wait for some time to tolerate multi-master mode. After the > > timeout expires trigger the recovery. > > > > Signed-off-by: Jan Glauber > > Need to think more about it, needs to wait for next cycle. > OK, please share your thoughts when you get to it. --Jan
Re: [PATCH v2] spi: octeon: Add thunderx driver
On Wed, Jul 27, 2016 at 07:08:24PM +0100, Mark Brown wrote: > On Mon, Jul 25, 2016 at 07:56:22PM +0200, Jan Glauber wrote: > > Add ThunderX SPI driver using the shared part from the Octeon > > driver. The main difference of the ThunderX driver is that it > > is a PCI device so probing is different. The system clock settings > > can be specified in device tree. > > Don't send individual patches in reply to the middle of threads, it > makes it really confusing what's going on. I now have multiple patches > from you for this driver completely unthreaded in my inbox with no > indication of ordering or anything. Please resend anything that's > pending as a proper patch series. With multiple being exactly two. I thought it to be easier this way around and the ordering to be obvious (if you use threading), but of course I can resend the two patches as a new series.
[PATCH v2 0/2] SPI ThunderX driver
This series adds support for SPI on Cavium's ThunderX (arm64). The SPI hardware is the same as on MIPS Octeon, the only difference is that the device appears as a PCI device. To avoid copy and paste of the Octeon driver I've moved the common parts into a shared file. The series was tested on MIPS (Edge Router PRO and cn71xx) and ThunderX. Changes to v1: - Changed Kconfig depencency for ThunderX - Merged clock setup functions with main probe/remove - Fail if SPI DT entry misses clock reference - Removed debug prints at end of probe - Removed unneeded includes from spi-cavium.c - Dropped merged patches thanks, Jan ------ Jan Glauber (2): spi: octeon: Split driver into Octeon specific and common parts spi: octeon: Add thunderx driver drivers/spi/Kconfig| 7 ++ drivers/spi/Makefile | 3 + drivers/spi/spi-cavium-octeon.c| 102 + drivers/spi/spi-cavium-thunderx.c | 140 + drivers/spi/{spi-octeon.c => spi-cavium.c} | 122 + drivers/spi/spi-cavium.h | 34 +++ 6 files changed, 288 insertions(+), 120 deletions(-) create mode 100644 drivers/spi/spi-cavium-octeon.c create mode 100644 drivers/spi/spi-cavium-thunderx.c rename drivers/spi/{spi-octeon.c => spi-cavium.c} (54%) -- 2.9.0.rc0.21.g322
[PATCH v2 2/2] spi: octeon: Add thunderx driver
Add ThunderX SPI driver using the shared part from the Octeon driver. The main difference of the ThunderX driver is that it is a PCI device so probing is different. The system clock settings can be specified in device tree. Signed-off-by: Jan Glauber --- drivers/spi/Kconfig | 7 ++ drivers/spi/Makefile | 2 + drivers/spi/spi-cavium-thunderx.c | 140 ++ drivers/spi/spi-cavium.h | 3 + 4 files changed, 152 insertions(+) create mode 100644 drivers/spi/spi-cavium-thunderx.c diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig index 4b931ec..e0ee112 100644 --- a/drivers/spi/Kconfig +++ b/drivers/spi/Kconfig @@ -630,6 +630,13 @@ config SPI_TEGRA20_SLINK help SPI driver for Nvidia Tegra20/Tegra30 SLINK Controller interface. +config SPI_THUNDERX + tristate "Cavium ThunderX SPI controller" + depends on (ARM64 || CONFIG_TEST) && 64BIT && PCI + help + SPI host driver for the hardware found on Cavium ThunderX + SOCs. + config SPI_TOPCLIFF_PCH tristate "Intel EG20T PCH/LAPIS Semicon IOH(ML7213/ML7223/ML7831) SPI" depends on PCI && (X86_32 || MIPS || COMPILE_TEST) diff --git a/drivers/spi/Makefile b/drivers/spi/Makefile index 185367e..133364b 100644 --- a/drivers/spi/Makefile +++ b/drivers/spi/Makefile @@ -91,6 +91,8 @@ obj-$(CONFIG_SPI_TEGRA114)+= spi-tegra114.o obj-$(CONFIG_SPI_TEGRA20_SFLASH) += spi-tegra20-sflash.o obj-$(CONFIG_SPI_TEGRA20_SLINK)+= spi-tegra20-slink.o obj-$(CONFIG_SPI_TLE62X0) += spi-tle62x0.o +spi-thunderx-objs := spi-cavium.o spi-cavium-thunderx.o +obj-$(CONFIG_SPI_THUNDERX) += spi-thunderx.o obj-$(CONFIG_SPI_TOPCLIFF_PCH) += spi-topcliff-pch.o obj-$(CONFIG_SPI_TXX9) += spi-txx9.o obj-$(CONFIG_SPI_XCOMM)+= spi-xcomm.o diff --git a/drivers/spi/spi-cavium-thunderx.c b/drivers/spi/spi-cavium-thunderx.c new file mode 100644 index 000..28c3dcc --- /dev/null +++ b/drivers/spi/spi-cavium-thunderx.c @@ -0,0 +1,140 @@ +/* + * Cavium ThunderX SPI driver. + * + * Copyright (C) 2016 Cavium Inc. + * Authors: Jan Glauber + */ + +#include +#include +#include +#include +#include + +#include "spi-cavium.h" + +#define DRV_NAME "spi-thunderx" + +#define SYS_FREQ_DEFAULT 7 /* 700 Mhz */ + +static int thunderx_spi_probe(struct pci_dev *pdev, + const struct pci_device_id *ent) +{ + struct device *dev = &pdev->dev; + struct spi_master *master; + struct octeon_spi *p; + int ret = -ENOENT; + + master = spi_alloc_master(dev, sizeof(struct octeon_spi)); + if (!master) + return -ENOMEM; + p = spi_master_get_devdata(master); + + ret = pci_enable_device(pdev); + if (ret) { + dev_err(dev, "Failed to enable PCI device\n"); + goto out_free; + } + + ret = pci_request_regions(pdev, DRV_NAME); + if (ret) { + dev_err(dev, "PCI request regions failed 0x%x\n", ret); + goto out_disable; + } + + p->register_base = pci_ioremap_bar(pdev, 0); + if (!p->register_base) { + dev_err(dev, "Cannot map reg base\n"); + ret = -EINVAL; + goto out_region; + } + + p->regs.config = 0x1000; + p->regs.status = 0x1008; + p->regs.tx = 0x1010; + p->regs.data = 0x1080; + + p->clk = devm_clk_get(dev, NULL); + if (IS_ERR(p->clk)) + goto out_unmap; + + ret = clk_prepare_enable(p->clk); + if (ret) + goto out_clock_devm; + + p->sys_freq = clk_get_rate(p->clk); + if (!p->sys_freq) + p->sys_freq = SYS_FREQ_DEFAULT; + dev_info(dev, "Set system clock to %u\n", p->sys_freq); + + master->num_chipselect = 4; + master->mode_bits = SPI_CPHA | SPI_CPOL | SPI_CS_HIGH | + SPI_LSB_FIRST | SPI_3WIRE; + master->transfer_one_message = octeon_spi_transfer_one_message; + master->bits_per_word_mask = SPI_BPW_MASK(8); + master->max_speed_hz = OCTEON_SPI_MAX_CLOCK_HZ; + master->dev.of_node = pdev->dev.of_node; + + pci_set_drvdata(pdev, master); + ret = devm_spi_register_master(dev, master); + if (ret) { + dev_err(&pdev->dev, "Register master failed: %d\n", ret); + goto out_clock; + } + + return 0; + +out_clock: + clk_disable_unprepare(p->clk); +out_clock_devm: + devm_clk_put(dev, p->clk); +out_unmap: + iounmap(p->register_base); +out_region: + pci_release_regions(pdev); +out_disable: + pci_disa
[PATCH v2 1/2] spi: octeon: Split driver into Octeon specific and common parts
Separate driver probing from SPI transfer functions. Signed-off-by: Jan Glauber --- drivers/spi/Makefile | 1 + drivers/spi/spi-cavium-octeon.c| 102 drivers/spi/{spi-octeon.c => spi-cavium.c} | 122 + drivers/spi/spi-cavium.h | 31 4 files changed, 136 insertions(+), 120 deletions(-) create mode 100644 drivers/spi/spi-cavium-octeon.c rename drivers/spi/{spi-octeon.c => spi-cavium.c} (54%) diff --git a/drivers/spi/Makefile b/drivers/spi/Makefile index 3c74d00..185367e 100644 --- a/drivers/spi/Makefile +++ b/drivers/spi/Makefile @@ -56,6 +56,7 @@ obj-$(CONFIG_SPI_MT65XX)+= spi-mt65xx.o obj-$(CONFIG_SPI_MXS) += spi-mxs.o obj-$(CONFIG_SPI_NUC900) += spi-nuc900.o obj-$(CONFIG_SPI_OC_TINY) += spi-oc-tiny.o +spi-octeon-objs:= spi-cavium.o spi-cavium-octeon.o obj-$(CONFIG_SPI_OCTEON) += spi-octeon.o obj-$(CONFIG_SPI_OMAP_UWIRE) += spi-omap-uwire.o obj-$(CONFIG_SPI_OMAP_100K)+= spi-omap-100k.o diff --git a/drivers/spi/spi-cavium-octeon.c b/drivers/spi/spi-cavium-octeon.c new file mode 100644 index 000..97310c1 --- /dev/null +++ b/drivers/spi/spi-cavium-octeon.c @@ -0,0 +1,102 @@ +/* + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * Copyright (C) 2011, 2012 Cavium, Inc. + */ + +#include +#include +#include +#include +#include + +#include + +#include "spi-cavium.h" + +static int octeon_spi_probe(struct platform_device *pdev) +{ + struct resource *res_mem; + void __iomem *reg_base; + struct spi_master *master; + struct octeon_spi *p; + int err = -ENOENT; + + master = spi_alloc_master(&pdev->dev, sizeof(struct octeon_spi)); + if (!master) + return -ENOMEM; + p = spi_master_get_devdata(master); + platform_set_drvdata(pdev, master); + + res_mem = platform_get_resource(pdev, IORESOURCE_MEM, 0); + reg_base = devm_ioremap_resource(&pdev->dev, res_mem); + if (IS_ERR(reg_base)) { + err = PTR_ERR(reg_base); + goto fail; + } + + p->register_base = reg_base; + p->sys_freq = octeon_get_io_clock_rate(); + + p->regs.config = 0; + p->regs.status = 0x08; + p->regs.tx = 0x10; + p->regs.data = 0x80; + + master->num_chipselect = 4; + master->mode_bits = SPI_CPHA | + SPI_CPOL | + SPI_CS_HIGH | + SPI_LSB_FIRST | + SPI_3WIRE; + + master->transfer_one_message = octeon_spi_transfer_one_message; + master->bits_per_word_mask = SPI_BPW_MASK(8); + master->max_speed_hz = OCTEON_SPI_MAX_CLOCK_HZ; + + master->dev.of_node = pdev->dev.of_node; + err = devm_spi_register_master(&pdev->dev, master); + if (err) { + dev_err(&pdev->dev, "register master failed: %d\n", err); + goto fail; + } + + return 0; +fail: + spi_master_put(master); + return err; +} + +static int octeon_spi_remove(struct platform_device *pdev) +{ + struct spi_master *master = platform_get_drvdata(pdev); + struct octeon_spi *p = spi_master_get_devdata(master); + + /* Clear the CSENA* and put everything in a known state. */ + writeq(0, p->register_base + OCTEON_SPI_CFG(p)); + + return 0; +} + +static const struct of_device_id octeon_spi_match[] = { + { .compatible = "cavium,octeon-3010-spi", }, + {}, +}; +MODULE_DEVICE_TABLE(of, octeon_spi_match); + +static struct platform_driver octeon_spi_driver = { + .driver = { + .name = "spi-octeon", + .of_match_table = octeon_spi_match, + }, + .probe = octeon_spi_probe, + .remove = octeon_spi_remove, +}; + +module_platform_driver(octeon_spi_driver); + +MODULE_DESCRIPTION("Cavium, Inc. OCTEON SPI bus driver"); +MODULE_AUTHOR("David Daney"); +MODULE_LICENSE("GPL"); diff --git a/drivers/spi/spi-octeon.c b/drivers/spi/spi-cavium.c similarity index 54% rename from drivers/spi/spi-octeon.c rename to drivers/spi/spi-cavium.c index 2180176..8857e7d 100644 --- a/drivers/spi/spi-octeon.c +++ b/drivers/spi/spi-cavium.c @@ -6,42 +6,11 @@ * Copyright (C) 2011, 2012 Cavium, Inc. */ -#include -#include #include -#include #include -#include -#include - -#include #include "spi-cavium.h" -#define OCTEON_SPI_MAX_BYTES 9 - -#define OCTEON_SPI_MAX_CLOCK_HZ 1600 - -struct octeon_spi_regs { - int c
Re: [PATCH v10 4/8] i2c: thunderx: Add SMBUS alert support
On Tue, Aug 23, 2016 at 10:57:25PM +0200, Wolfram Sang wrote: > On Wed, Jun 15, 2016 at 03:51:30PM +0200, Jan Glauber wrote: > > Add SMBUS alert interrupt support. For now only device tree is > > supported for specifying the alert. In case of ACPI an error > > is returned. > > > > Signed-off-by: Jan Glauber > > What about 'select I2C_SMBUS' in Kconfig and skip all the ifdeffery? > Wouldn't that prevent a distribution that has I2C_HELPER_AUTO set from enabling the ThunderX i2c driver at all? --Jan
Re: [PATCH v10 4/8] i2c: thunderx: Add SMBUS alert support
On Tue, Aug 23, 2016 at 11:39:48PM +0200, Wolfram Sang wrote: > On Tue, Aug 23, 2016 at 11:28:59PM +0200, Jan Glauber wrote: > > On Tue, Aug 23, 2016 at 10:57:25PM +0200, Wolfram Sang wrote: > > > On Wed, Jun 15, 2016 at 03:51:30PM +0200, Jan Glauber wrote: > > > > Add SMBUS alert interrupt support. For now only device tree is > > > > supported for specifying the alert. In case of ACPI an error > > > > is returned. > > > > > > > > Signed-off-by: Jan Glauber > > > > > > What about 'select I2C_SMBUS' in Kconfig and skip all the ifdeffery? > > > > > > > Wouldn't that prevent a distribution that has I2C_HELPER_AUTO set > > from enabling the ThunderX i2c driver at all? > > From commit e2ca307439fb9df922c3e8891289a2ac05812fb7: > > ... > Bus drivers which implement SMBus alert should select this option, so > in most cases this option is hidden and the user doesn't have to care > about it. > OK, makes sense. I'll remove the ifdef's.
Re: [PATCH v10 3/8] i2c: thunderx: Add i2c driver for ThunderX SOC
On Tue, Aug 23, 2016 at 10:36:29PM +0200, Wolfram Sang wrote: > > > i2c-octeon-objs := i2c-cavium.o i2c-octeon-core.o > > obj-$(CONFIG_I2C_OCTEON) += i2c-octeon.o > > +i2c-thunderx-objs := i2c-cavium.o i2c-thunderx-core.o > > +obj-$(CONFIG_I2C_THUNDERX) += i2c-thunderx.o > > Shouldn't that rather be "i2c-cavium-core.o", "i2c-octeon-platdrv.o", > and "i2c-thunderx-pcidrv.o" for the -objs? I mean, the cavium part is > the core-part... > > > +skip: > > + if (!i2c->sys_freq) > > + i2c->sys_freq = SYS_FREQ_DEFAULT; > > + > > + dev_info(dev, "Set system clock to %u\n", i2c->sys_freq); > > Too many dev_info IMO, see later. OK. I was working on an update where I dropped most of the messages. > > + i2c->adap.class = I2C_CLASS_HWMON | I2C_CLASS_SPD; > > Do you really need that? Not sure, just copy'n paste from what most of the other bus drivers do. Looking at i2c-core.c it might be a legacy thing, so I can probably remove it. > > + ret = i2c_add_adapter(&i2c->adap); > > + if (ret < 0) { > > + dev_err(dev, "Failed to add i2c adapter\n"); > > I2C core does error reporting for you meanwhile. OK. > > + goto out_irq; > > + } > > + > > + dev_info(i2c->dev, "probed\n"); > > I'd think all nice to know dev_info should go here in condensed form. > > > +out_irq: > > + devm_free_irq(dev, i2c->i2c_msix.vector, i2c); > > If you need to free irq manually here anyhow, then you don't need the > devm variant. Yes. I paid attention to the devm documentation in the meantime and switched all probing to the managed fucntions and got rid of the goto's completely. > > +out_msix: > > + pci_disable_msix(pdev); > > +out_unmap: > > + iounmap(i2c->twsi_base); > > + thunder_i2c_clock_disable(dev, i2c->clk); > > +out_release_regions: > > + pci_release_regions(pdev); > > +out_disable_device: > > + pci_disable_device(pdev); > > +out_free_i2c: > > + pci_set_drvdata(pdev, NULL); > > Similar to devm_*, there is also pcim_* for PCI devices. You might want > to have a look for those. See above. > > + devm_kfree(dev, i2c); > > Not needed. Yes. > > + return ret; > > +} > > + > > +static void thunder_i2c_remove_pci(struct pci_dev *pdev) > > +{ > > + struct octeon_i2c *i2c = pci_get_drvdata(pdev); > > + struct device *dev; > > + > > + if (!i2c) > > + return; > > How should that happen? Just a safety net, can be removed. > > + > > +module_pci_driver(thunder_i2c_pci_driver); > > + > > +MODULE_LICENSE("GPL"); > > Please add a minimal GPL header at the top of the file, so it is clear > if you mean "v2" or "v2 or later". OK. > > +MODULE_AUTHOR("Fred Martin "); > > +MODULE_DESCRIPTION("I2C-Bus adapter for Cavium ThunderX SOC"); > > -- > > 1.9.1 > >
Re: [PATCH v10 3/8] i2c: thunderx: Add i2c driver for ThunderX SOC
On Tue, Aug 23, 2016 at 10:39:41PM +0200, Wolfram Sang wrote: > On Tue, Aug 23, 2016 at 10:36:29PM +0200, Wolfram Sang wrote: > > > > > i2c-octeon-objs := i2c-cavium.o i2c-octeon-core.o > > > obj-$(CONFIG_I2C_OCTEON) += i2c-octeon.o > > > +i2c-thunderx-objs := i2c-cavium.o i2c-thunderx-core.o > > > +obj-$(CONFIG_I2C_THUNDERX) += i2c-thunderx.o > > > > Shouldn't that rather be "i2c-cavium-core.o", > > Thinking of it again, it should probably even be "i2c-octeon-core.o" to > avoid confusion because all the functions start with octeon_* > > > "i2c-octeon-platdrv.o", and "i2c-thunderx-pcidrv.o" for the -objs? > > Those names still make sense :) > Agreed, the naming you propose looks much better. --Jan
[PATCH v11 3/8] i2c: thunderx: Add i2c driver for ThunderX SOC
The ThunderX SOC uses the same i2c block as the Octeon SOC. The main difference is that on ThunderX the device is a PCI device so device probing is done via PCI, interrupts are MSI-X. The clock rates can be set via device tree or ACPI. Signed-off-by: Jan Glauber --- drivers/i2c/busses/Kconfig | 10 ++ drivers/i2c/busses/Makefile | 2 + drivers/i2c/busses/i2c-octeon-core.h | 16 ++- drivers/i2c/busses/i2c-thunderx-pcidrv.c | 209 +++ 4 files changed, 234 insertions(+), 3 deletions(-) create mode 100644 drivers/i2c/busses/i2c-thunderx-pcidrv.c diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig index 5c3993b..d69a342 100644 --- a/drivers/i2c/busses/Kconfig +++ b/drivers/i2c/busses/Kconfig @@ -956,6 +956,16 @@ config I2C_OCTEON This driver can also be built as a module. If so, the module will be called i2c-octeon. +config I2C_THUNDERX + tristate "Cavium ThunderX I2C bus support" + depends on 64BIT && PCI && (ARM64 || COMPILE_TEST) + help + Say yes if you want to support the I2C serial bus on Cavium + ThunderX SOC. + + This driver can also be built as a module. If so, the module + will be called i2c-thunderx. + config I2C_XILINX tristate "Xilinx I2C Controller" depends on HAS_IOMEM diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile index f975263..29764cc 100644 --- a/drivers/i2c/busses/Makefile +++ b/drivers/i2c/busses/Makefile @@ -93,6 +93,8 @@ obj-$(CONFIG_I2C_VERSATILE) += i2c-versatile.o obj-$(CONFIG_I2C_WMT) += i2c-wmt.o i2c-octeon-objs := i2c-octeon-core.o i2c-octeon-platdrv.o obj-$(CONFIG_I2C_OCTEON) += i2c-octeon.o +i2c-thunderx-objs := i2c-octeon-core.o i2c-thunderx-pcidrv.o +obj-$(CONFIG_I2C_THUNDERX) += i2c-thunderx.o obj-$(CONFIG_I2C_XILINX) += i2c-xiic.o obj-$(CONFIG_I2C_XLR) += i2c-xlr.o obj-$(CONFIG_I2C_XLP9XX) += i2c-xlp9xx.o diff --git a/drivers/i2c/busses/i2c-octeon-core.h b/drivers/i2c/busses/i2c-octeon-core.h index 81c6a81..33c7e1f 100644 --- a/drivers/i2c/busses/i2c-octeon-core.h +++ b/drivers/i2c/busses/i2c-octeon-core.h @@ -8,9 +8,15 @@ #include /* Register offsets */ -#define SW_TWSI0x00 -#define TWSI_INT 0x10 -#define SW_TWSI_EXT0x18 +#if IS_ENABLED(CONFIG_I2C_THUNDERX) + #define SW_TWSI 0x1000 + #define TWSI_INT0x1010 + #define SW_TWSI_EXT 0x1018 +#else + #define SW_TWSI 0x00 + #define TWSI_INT0x10 + #define SW_TWSI_EXT 0x18 +#endif /* Controller command patterns */ #define SW_TWSI_V BIT_ULL(63) /* Valid bit */ @@ -94,6 +100,7 @@ struct octeon_i2c { wait_queue_head_t queue; struct i2c_adapter adap; + struct clk *clk; int irq; int hlc_irq;/* For cn7890 only */ u32 twsi_freq; @@ -109,6 +116,9 @@ struct octeon_i2c { void (*hlc_int_disable)(struct octeon_i2c *); atomic_t int_enable_cnt; atomic_t hlc_int_enable_cnt; +#if IS_ENABLED(CONFIG_I2C_THUNDERX) + struct msix_entry i2c_msix; +#endif }; static inline void octeon_i2c_writeq_flush(u64 val, void __iomem *addr) diff --git a/drivers/i2c/busses/i2c-thunderx-pcidrv.c b/drivers/i2c/busses/i2c-thunderx-pcidrv.c new file mode 100644 index 000..0dcebe2 --- /dev/null +++ b/drivers/i2c/busses/i2c-thunderx-pcidrv.c @@ -0,0 +1,209 @@ +/* + * Cavium ThunderX i2c driver. + * + * Copyright (C) 2015,2016 Cavium Inc. + * Authors: Fred Martin + * Jan Glauber + * + * This file is licensed under the terms of the GNU General Public + * License version 2. This program is licensed "as is" without any + * warranty of any kind, whether express or implied. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "i2c-octeon-core.h" + +#define DRV_NAME "i2c-thunderx" + +#define PCI_DEVICE_ID_THUNDER_TWSI 0xa012 + +#define SYS_FREQ_DEFAULT 7 + +#define TWSI_INT_ENA_W1C 0x1028 +#define TWSI_INT_ENA_W1S 0x1030 + +/* + * Enable the CORE interrupt. + * The interrupt will be asserted when there is non-STAT_IDLE state in the + * SW_TWSI_EOP_TWSI_STAT register. + */ +static void thunder_i2c_int_enable(struct octeon_i2c *i2c) +{ + octeon_i2c_writeq_flush(TWSI_INT_CORE_INT, + i2c->twsi_base + TWSI_INT_ENA_W1S); +} + +/* + * Disable the CORE interrupt. + */ +static void thunder_i2c_int_disable(struct octeon_i2c *i2c) +{ + octeon_i2c_writeq_flush(TWSI_INT_CORE_INT, + i2c->twsi_base + TWSI_INT_ENA_W1C); +} + +static void thunder_i2c_hlc_int_enable(struct octeon_i2c *i
[PATCH v11 0/8] i2c-octeon and i2c-thunderx driver
Hi Wolfram, this update should address all comments. The probe funtion is much easier to read now after using the managed device functions. We also had an issue with the 5ms timeout and some ipmi/ssif commands so I dropped the timeout setting to get the 1s default. Changes to v10: - Proper use of managed device functions - Renamed files - Removed i2c class setting - Droped timeout setting - Improved Kconfig dependency - Removed most probing kernel messages - select SMBUS instead of if-defing - Removed bogus i2c pointer check - Explicitely mention GPL version - Drop whitespace fixes from MAINTAINERS patch Changes to v9: - rebased on top of upstreamed octeon i2c fixes - reduced default sclk to 700Mhz Changes to v8: - Use device property for clock-frequency setting in thunderx, get rid of of_find_node_by_name - Simplify adap.name by using device name - SMBUS ACPI handling - Re-phrase SMBUS error/not-specified message Thanks, Jan - Jan Glauber (8): i2c: octeon: Rename driver to prepare for split i2c: octeon: Split the driver into two parts i2c: thunderx: Add i2c driver for ThunderX SOC i2c: thunderx: Add SMBUS alert support i2c: octeon,thunderx: Move register offsets to struct i2c: octeon: Sort include files alphabetically i2c: octeon: Use booleon values for booleon variables i2c: octeon: thunderx: Add MAINTAINERS entry MAINTAINERS|8 + drivers/i2c/busses/Kconfig | 11 + drivers/i2c/busses/Makefile|3 + .../i2c/busses/{i2c-octeon.c => i2c-octeon-core.c} | 1039 ++-- drivers/i2c/busses/i2c-octeon-core.h | 211 drivers/i2c/busses/i2c-octeon-platdrv.c| 288 ++ drivers/i2c/busses/i2c-thunderx-pcidrv.c | 259 + 7 files changed, 1077 insertions(+), 742 deletions(-) rename drivers/i2c/busses/{i2c-octeon.c => i2c-octeon-core.c} (57%) create mode 100644 drivers/i2c/busses/i2c-octeon-core.h create mode 100644 drivers/i2c/busses/i2c-octeon-platdrv.c create mode 100644 drivers/i2c/busses/i2c-thunderx-pcidrv.c -- 2.9.0.rc0.21.g322
[PATCH v11 7/8] i2c: octeon: Use booleon values for booleon variables
Initialize booleon values with true instead of 1. Signed-off-by: Jan Glauber --- drivers/i2c/busses/i2c-octeon-core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/i2c/busses/i2c-octeon-core.c b/drivers/i2c/busses/i2c-octeon-core.c index a327a5f..f45ea5e 100644 --- a/drivers/i2c/busses/i2c-octeon-core.c +++ b/drivers/i2c/busses/i2c-octeon-core.c @@ -63,7 +63,7 @@ static bool octeon_i2c_test_ready(struct octeon_i2c *i2c, bool *first) static int octeon_i2c_wait(struct octeon_i2c *i2c) { long time_left; - bool first = 1; + bool first = true; /* * Some chip revisions don't assert the irq in the interrupt @@ -176,7 +176,7 @@ static void octeon_i2c_hlc_disable(struct octeon_i2c *i2c) */ static int octeon_i2c_hlc_wait(struct octeon_i2c *i2c) { - bool first = 1; + bool first = true; int time_left; /* -- 2.9.0.rc0.21.g322
[PATCH v11 4/8] i2c: thunderx: Add SMBUS alert support
Add SMBUS alert interrupt support. For now only device tree is supported for specifying the alert. In case of ACPI an error is returned. Signed-off-by: Jan Glauber --- drivers/i2c/busses/Kconfig | 1 + drivers/i2c/busses/i2c-octeon-core.h | 3 +++ drivers/i2c/busses/i2c-thunderx-pcidrv.c | 46 3 files changed, 50 insertions(+) diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig index d69a342..1f3239e 100644 --- a/drivers/i2c/busses/Kconfig +++ b/drivers/i2c/busses/Kconfig @@ -959,6 +959,7 @@ config I2C_OCTEON config I2C_THUNDERX tristate "Cavium ThunderX I2C bus support" depends on 64BIT && PCI && (ARM64 || COMPILE_TEST) + select I2C_SMBUS help Say yes if you want to support the I2C serial bus on Cavium ThunderX SOC. diff --git a/drivers/i2c/busses/i2c-octeon-core.h b/drivers/i2c/busses/i2c-octeon-core.h index 33c7e1f..2ed6f7a 100644 --- a/drivers/i2c/busses/i2c-octeon-core.h +++ b/drivers/i2c/busses/i2c-octeon-core.h @@ -3,6 +3,7 @@ #include #include #include +#include #include #include #include @@ -119,6 +120,8 @@ struct octeon_i2c { #if IS_ENABLED(CONFIG_I2C_THUNDERX) struct msix_entry i2c_msix; #endif + struct i2c_smbus_alert_setup alert_data; + struct i2c_client *ara; }; static inline void octeon_i2c_writeq_flush(u64 val, void __iomem *addr) diff --git a/drivers/i2c/busses/i2c-thunderx-pcidrv.c b/drivers/i2c/busses/i2c-thunderx-pcidrv.c index 0dcebe2..e8c3ce0 100644 --- a/drivers/i2c/busses/i2c-thunderx-pcidrv.c +++ b/drivers/i2c/busses/i2c-thunderx-pcidrv.c @@ -14,9 +14,11 @@ #include #include #include +#include #include #include #include +#include #include #include "i2c-octeon-core.h" @@ -107,6 +109,44 @@ static void thunder_i2c_clock_disable(struct device *dev, struct clk *clk) clk_put(clk); } +static int thunder_i2c_smbus_setup_of(struct octeon_i2c *i2c, + struct device_node *node) +{ + u32 type; + + if (!node) + return -EINVAL; + + i2c->alert_data.irq = irq_of_parse_and_map(node, 0); + if (!i2c->alert_data.irq) + return -EINVAL; + + type = irqd_get_trigger_type(irq_get_irq_data(i2c->alert_data.irq)); + i2c->alert_data.alert_edge_triggered = + (type & IRQ_TYPE_LEVEL_MASK) ? 1 : 0; + + i2c->ara = i2c_setup_smbus_alert(&i2c->adap, &i2c->alert_data); + if (!i2c->ara) + return -ENODEV; + return 0; +} + +static int thunder_i2c_smbus_setup(struct octeon_i2c *i2c, + struct device_node *node) +{ + /* TODO: ACPI support */ + if (!acpi_disabled) + return -EOPNOTSUPP; + + return thunder_i2c_smbus_setup_of(i2c, node); +} + +static void thunder_i2c_smbus_remove(struct octeon_i2c *i2c) +{ + if (i2c->ara) + i2c_unregister_device(i2c->ara); +} + static int thunder_i2c_probe_pci(struct pci_dev *pdev, const struct pci_device_id *ent) { @@ -173,6 +213,11 @@ static int thunder_i2c_probe_pci(struct pci_dev *pdev, goto error; dev_info(i2c->dev, "Probed. Set system clock to %u\n", i2c->sys_freq); + + ret = thunder_i2c_smbus_setup(i2c, pdev->dev.of_node); + if (ret) + dev_info(dev, "SMBUS alert not active on this bus\n"); + return 0; error: @@ -184,6 +229,7 @@ static void thunder_i2c_remove_pci(struct pci_dev *pdev) { struct octeon_i2c *i2c = pci_get_drvdata(pdev); + thunder_i2c_smbus_remove(i2c); thunder_i2c_clock_disable(&pdev->dev, i2c->clk); i2c_del_adapter(&i2c->adap); } -- 2.9.0.rc0.21.g322
[PATCH v11 8/8] i2c: octeon: thunderx: Add MAINTAINERS entry
The i2c Octeon and ThunderX drivers are maintained by Cavium. While at it fix the whitespace errors of the next entry. Signed-off-by: Jan Glauber Acked-by: David Daney --- MAINTAINERS | 8 1 file changed, 8 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index a306795..11cba03 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2878,6 +2878,14 @@ S: Maintained F: drivers/iio/light/cm* F: Documentation/devicetree/bindings/i2c/trivial-devices.txt +CAVIUM I2C DRIVER +M: Jan Glauber +M: David Daney +W: http://www.cavium.com +S: Supported +F: drivers/i2c/busses/i2c-octeon* +F: drivers/i2c/busses/i2c-thunderx* + CAVIUM LIQUIDIO NETWORK DRIVER M: Derek Chickles M: Satanand Burla -- 2.9.0.rc0.21.g322
[PATCH v11 5/8] i2c: octeon,thunderx: Move register offsets to struct
The register offsets are different between Octeon and ThunderX so move them into the algorithm struct and get rid of the define. Signed-off-by: Jan Glauber --- drivers/i2c/busses/i2c-octeon-core.c | 28 - drivers/i2c/busses/i2c-octeon-core.h | 35 drivers/i2c/busses/i2c-octeon-platdrv.c | 4 drivers/i2c/busses/i2c-thunderx-pcidrv.c | 4 4 files changed, 40 insertions(+), 31 deletions(-) diff --git a/drivers/i2c/busses/i2c-octeon-core.c b/drivers/i2c/busses/i2c-octeon-core.c index 23384ef..a327a5f 100644 --- a/drivers/i2c/busses/i2c-octeon-core.c +++ b/drivers/i2c/busses/i2c-octeon-core.c @@ -99,7 +99,7 @@ static int octeon_i2c_wait(struct octeon_i2c *i2c) static bool octeon_i2c_hlc_test_valid(struct octeon_i2c *i2c) { - return (__raw_readq(i2c->twsi_base + SW_TWSI) & SW_TWSI_V) == 0; + return (__raw_readq(i2c->twsi_base + SW_TWSI(i2c)) & SW_TWSI_V) == 0; } static bool octeon_i2c_hlc_test_ready(struct octeon_i2c *i2c, bool *first) @@ -446,12 +446,12 @@ static int octeon_i2c_hlc_read(struct octeon_i2c *i2c, struct i2c_msg *msgs) else cmd |= SW_TWSI_OP_7; - octeon_i2c_writeq_flush(cmd, i2c->twsi_base + SW_TWSI); + octeon_i2c_writeq_flush(cmd, i2c->twsi_base + SW_TWSI(i2c)); ret = octeon_i2c_hlc_wait(i2c); if (ret) goto err; - cmd = __raw_readq(i2c->twsi_base + SW_TWSI); + cmd = __raw_readq(i2c->twsi_base + SW_TWSI(i2c)); if ((cmd & SW_TWSI_R) == 0) return -EAGAIN; @@ -459,7 +459,7 @@ static int octeon_i2c_hlc_read(struct octeon_i2c *i2c, struct i2c_msg *msgs) msgs[0].buf[j] = (cmd >> (8 * i)) & 0xff; if (msgs[0].len > 4) { - cmd = __raw_readq(i2c->twsi_base + SW_TWSI_EXT); + cmd = __raw_readq(i2c->twsi_base + SW_TWSI_EXT(i2c)); for (i = 0; i < msgs[0].len - 4 && i < 4; i++, j--) msgs[0].buf[j] = (cmd >> (8 * i)) & 0xff; } @@ -496,15 +496,15 @@ static int octeon_i2c_hlc_write(struct octeon_i2c *i2c, struct i2c_msg *msgs) for (i = 0; i < msgs[0].len - 4 && i < 4; i++, j--) ext |= (u64)msgs[0].buf[j] << (8 * i); - octeon_i2c_writeq_flush(ext, i2c->twsi_base + SW_TWSI_EXT); + octeon_i2c_writeq_flush(ext, i2c->twsi_base + SW_TWSI_EXT(i2c)); } - octeon_i2c_writeq_flush(cmd, i2c->twsi_base + SW_TWSI); + octeon_i2c_writeq_flush(cmd, i2c->twsi_base + SW_TWSI(i2c)); ret = octeon_i2c_hlc_wait(i2c); if (ret) goto err; - cmd = __raw_readq(i2c->twsi_base + SW_TWSI); + cmd = __raw_readq(i2c->twsi_base + SW_TWSI(i2c)); if ((cmd & SW_TWSI_R) == 0) return -EAGAIN; @@ -539,19 +539,19 @@ static int octeon_i2c_hlc_comp_read(struct octeon_i2c *i2c, struct i2c_msg *msgs cmd |= SW_TWSI_EIA; ext = (u64)msgs[0].buf[0] << SW_TWSI_IA_SHIFT; cmd |= (u64)msgs[0].buf[1] << SW_TWSI_IA_SHIFT; - octeon_i2c_writeq_flush(ext, i2c->twsi_base + SW_TWSI_EXT); + octeon_i2c_writeq_flush(ext, i2c->twsi_base + SW_TWSI_EXT(i2c)); } else { cmd |= (u64)msgs[0].buf[0] << SW_TWSI_IA_SHIFT; } octeon_i2c_hlc_int_clear(i2c); - octeon_i2c_writeq_flush(cmd, i2c->twsi_base + SW_TWSI); + octeon_i2c_writeq_flush(cmd, i2c->twsi_base + SW_TWSI(i2c)); ret = octeon_i2c_hlc_wait(i2c); if (ret) goto err; - cmd = __raw_readq(i2c->twsi_base + SW_TWSI); + cmd = __raw_readq(i2c->twsi_base + SW_TWSI(i2c)); if ((cmd & SW_TWSI_R) == 0) return -EAGAIN; @@ -559,7 +559,7 @@ static int octeon_i2c_hlc_comp_read(struct octeon_i2c *i2c, struct i2c_msg *msgs msgs[1].buf[j] = (cmd >> (8 * i)) & 0xff; if (msgs[1].len > 4) { - cmd = __raw_readq(i2c->twsi_base + SW_TWSI_EXT); + cmd = __raw_readq(i2c->twsi_base + SW_TWSI_EXT(i2c)); for (i = 0; i < msgs[1].len - 4 && i < 4; i++, j--) msgs[1].buf[j] = (cmd >> (8 * i)) & 0xff; } @@ -606,16 +606,16 @@ static int octeon_i2c_hlc_comp_write(struct octeon_i2c *i2c, struct i2c_msg *msg set_ext = true; } if (set_ext) - octeon_i2c_writeq_flush(ext, i2c->twsi_base + SW_TWSI_EXT); + octeon_i2c_writeq_flush(ext, i2c->twsi_base + SW_TWSI_EXT(i2c)); octeon_i2c_hlc_int_clear(i2c); - octeon_i2c_writeq_flush(cmd, i2c->twsi_base + SW_TWSI); + octeon_i2c_writeq_flush(cmd, i2c->twsi_base +
[PATCH v11 2/8] i2c: octeon: Split the driver into two parts
Move common functionality into a separate file in preparation of the re-use from the ThunderX i2c driver. Functions are slightly re-ordered but no other changes are included. Signed-off-by: Jan Glauber --- drivers/i2c/busses/Makefile | 3 +- drivers/i2c/busses/i2c-octeon-core.c| 810 ++ drivers/i2c/busses/i2c-octeon-core.h| 197 +++ drivers/i2c/busses/i2c-octeon-platdrv.c | 973 +--- 4 files changed, 1010 insertions(+), 973 deletions(-) create mode 100644 drivers/i2c/busses/i2c-octeon-core.c create mode 100644 drivers/i2c/busses/i2c-octeon-core.h diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile index f2c9c6f..f975263 100644 --- a/drivers/i2c/busses/Makefile +++ b/drivers/i2c/busses/Makefile @@ -91,7 +91,8 @@ obj-$(CONFIG_I2C_UNIPHIER)+= i2c-uniphier.o obj-$(CONFIG_I2C_UNIPHIER_F) += i2c-uniphier-f.o obj-$(CONFIG_I2C_VERSATILE)+= i2c-versatile.o obj-$(CONFIG_I2C_WMT) += i2c-wmt.o -obj-$(CONFIG_I2C_OCTEON) += i2c-octeon-platdrv.o +i2c-octeon-objs := i2c-octeon-core.o i2c-octeon-platdrv.o +obj-$(CONFIG_I2C_OCTEON) += i2c-octeon.o obj-$(CONFIG_I2C_XILINX) += i2c-xiic.o obj-$(CONFIG_I2C_XLR) += i2c-xlr.o obj-$(CONFIG_I2C_XLP9XX) += i2c-xlp9xx.o diff --git a/drivers/i2c/busses/i2c-octeon-core.c b/drivers/i2c/busses/i2c-octeon-core.c new file mode 100644 index 000..23384ef --- /dev/null +++ b/drivers/i2c/busses/i2c-octeon-core.c @@ -0,0 +1,810 @@ +/* + * (C) Copyright 2009-2010 + * Nokia Siemens Networks, michael.lawnick@nsn.com + * + * Portions Copyright (C) 2010 - 2016 Cavium, Inc. + * + * This file contains the shared part of the driver for the i2c adapter in + * Cavium Networks' OCTEON processors and ThunderX SOCs. + * + * This file is licensed under the terms of the GNU General Public + * License version 2. This program is licensed "as is" without any + * warranty of any kind, whether express or implied. + */ + +#include +#include +#include +#include +#include + +#include "i2c-octeon-core.h" + +/* interrupt service routine */ +irqreturn_t octeon_i2c_isr(int irq, void *dev_id) +{ + struct octeon_i2c *i2c = dev_id; + + i2c->int_disable(i2c); + wake_up(&i2c->queue); + + return IRQ_HANDLED; +} + +static bool octeon_i2c_test_iflg(struct octeon_i2c *i2c) +{ + return (octeon_i2c_ctl_read(i2c) & TWSI_CTL_IFLG); +} + +static bool octeon_i2c_test_ready(struct octeon_i2c *i2c, bool *first) +{ + if (octeon_i2c_test_iflg(i2c)) + return true; + + if (*first) { + *first = false; + return false; + } + + /* +* IRQ has signaled an event but IFLG hasn't changed. +* Sleep and retry once. +*/ + usleep_range(I2C_OCTEON_EVENT_WAIT, 2 * I2C_OCTEON_EVENT_WAIT); + return octeon_i2c_test_iflg(i2c); +} + +/** + * octeon_i2c_wait - wait for the IFLG to be set + * @i2c: The struct octeon_i2c + * + * Returns 0 on success, otherwise a negative errno. + */ +static int octeon_i2c_wait(struct octeon_i2c *i2c) +{ + long time_left; + bool first = 1; + + /* +* Some chip revisions don't assert the irq in the interrupt +* controller. So we must poll for the IFLG change. +*/ + if (i2c->broken_irq_mode) { + u64 end = get_jiffies_64() + i2c->adap.timeout; + + while (!octeon_i2c_test_iflg(i2c) && + time_before64(get_jiffies_64(), end)) + usleep_range(I2C_OCTEON_EVENT_WAIT / 2, I2C_OCTEON_EVENT_WAIT); + + return octeon_i2c_test_iflg(i2c) ? 0 : -ETIMEDOUT; + } + + i2c->int_enable(i2c); + time_left = wait_event_timeout(i2c->queue, octeon_i2c_test_ready(i2c, &first), + i2c->adap.timeout); + i2c->int_disable(i2c); + + if (i2c->broken_irq_check && !time_left && + octeon_i2c_test_iflg(i2c)) { + dev_err(i2c->dev, "broken irq connection detected, switching to polling mode.\n"); + i2c->broken_irq_mode = true; + return 0; + } + + if (!time_left) + return -ETIMEDOUT; + + return 0; +} + +static bool octeon_i2c_hlc_test_valid(struct octeon_i2c *i2c) +{ + return (__raw_readq(i2c->twsi_base + SW_TWSI) & SW_TWSI_V) == 0; +} + +static bool octeon_i2c_hlc_test_ready(struct octeon_i2c *i2c, bool *first) +{ + /* check if valid bit is cleared */ + if (octeon_i2c_hlc_test_valid(i2c)) + return true; + + if (*first) { + *first = false; + return false; + } + + /* +* IRQ has signaled an event but valid bit isn't cleared. +* Sleep and retry once. +*/ + us
[PATCH v11 6/8] i2c: octeon: Sort include files alphabetically
Sort include files alphabetically to reduce probability of merge conflicts. Signed-off-by: Jan Glauber --- drivers/i2c/busses/i2c-octeon-platdrv.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/i2c/busses/i2c-octeon-platdrv.c b/drivers/i2c/busses/i2c-octeon-platdrv.c index 916ebee..ce2765f 100644 --- a/drivers/i2c/busses/i2c-octeon-platdrv.c +++ b/drivers/i2c/busses/i2c-octeon-platdrv.c @@ -12,16 +12,16 @@ */ #include -#include +#include +#include #include +#include #include #include -#include +#include +#include #include #include -#include -#include -#include #include #include "i2c-octeon-core.h" -- 2.9.0.rc0.21.g322
[PATCH v11 1/8] i2c: octeon: Rename driver to prepare for split
This is an intermediate commit in preparation of the driver split. The module rename in this commit will be reverted in the next patch, this is just done to make the series bisectible. Signed-off-by: Jan Glauber --- drivers/i2c/busses/Makefile | 2 +- drivers/i2c/busses/{i2c-octeon.c => i2c-octeon-platdrv.c} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename drivers/i2c/busses/{i2c-octeon.c => i2c-octeon-platdrv.c} (100%) diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile index 37f2819..f2c9c6f 100644 --- a/drivers/i2c/busses/Makefile +++ b/drivers/i2c/busses/Makefile @@ -91,7 +91,7 @@ obj-$(CONFIG_I2C_UNIPHIER)+= i2c-uniphier.o obj-$(CONFIG_I2C_UNIPHIER_F) += i2c-uniphier-f.o obj-$(CONFIG_I2C_VERSATILE)+= i2c-versatile.o obj-$(CONFIG_I2C_WMT) += i2c-wmt.o -obj-$(CONFIG_I2C_OCTEON) += i2c-octeon.o +obj-$(CONFIG_I2C_OCTEON) += i2c-octeon-platdrv.o obj-$(CONFIG_I2C_XILINX) += i2c-xiic.o obj-$(CONFIG_I2C_XLR) += i2c-xlr.o obj-$(CONFIG_I2C_XLP9XX) += i2c-xlp9xx.o diff --git a/drivers/i2c/busses/i2c-octeon.c b/drivers/i2c/busses/i2c-octeon-platdrv.c similarity index 100% rename from drivers/i2c/busses/i2c-octeon.c rename to drivers/i2c/busses/i2c-octeon-platdrv.c -- 2.9.0.rc0.21.g322
[PATCH v2 3/3] vfio/pci: Don't probe devices that can't be reset
If a PCI device supports neither function-level reset, nor slot or bus reset then refuse to probe it. A line is printed to inform the user. Without this change starting qemu with a vfio-pci device can lead to a kernel panic on some Cavium cn8xxx systems, depending on the used device. Signed-off-by: Jan Glauber --- drivers/vfio/pci/vfio_pci.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 063c1ce..029ba13 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1196,6 +1196,12 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) return -EINVAL; + ret = pci_probe_reset_bus(pdev->bus); + if (ret) { + dev_warn(&pdev->dev, "Refusing to probe because reset is not possible.\n"); + return ret; + } + group = vfio_iommu_group_get(&pdev->dev); if (!group) return -EINVAL; -- 2.9.0.rc0.21.g322
[PATCH v2 0/3] Workaround for bus reset on Cavium cn8xxx root ports
I've picked this up from David, keeping his patches but preventing to probe vfio-pci devices that can't be reset. Without this series starting qemu with a vfio-pci enabled device can lead to a kernel panic on Cavium systems, depending on the used hardware. Changes to v1 (https://lkml.org/lkml/2017/5/15/934): - Prevent probing by vfio-pci David Daney (2): PCI: Allow PCI_DEV_FLAGS_NO_BUS_RESET to be used on bus device PCI: Avoid bus reset for Cavium cn8xxx root ports Jan Glauber (1): vfio/pci: Don't probe devices that can't be reset drivers/pci/pci.c | 4 drivers/pci/quirks.c| 8 drivers/vfio/pci/vfio_pci.c | 6 ++ 3 files changed, 18 insertions(+) -- 2.9.0.rc0.21.g322
[PATCH v2 2/3] PCI: Avoid bus reset for Cavium cn8xxx root ports
From: David Daney Root ports of cn8xxx do not function after bus reset when used with some e1000e and LSI HBA devices. Add a quirk to prevent bus reset on these root ports. Signed-off-by: David Daney [jglau...@cavium.com: fixed typo and whitespaces] Signed-off-by: Jan Glauber --- drivers/pci/quirks.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 6967c6b..85191b8 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3364,6 +3364,14 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0032, quirk_no_bus_reset); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0033, quirk_no_bus_reset); +/* + * Root port on some Cavium CN8xxx chips do not successfully complete + * a bus reset when used with certain types of child devices. Config + * space access to the child may quit responding. Flag the root port + * as not supporting bus reset. + */ +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset); + static void quirk_no_pm_reset(struct pci_dev *dev) { /* -- 2.9.0.rc0.21.g322
[PATCH v2 1/3] PCI: Allow PCI_DEV_FLAGS_NO_BUS_RESET to be used on bus device
From: David Daney When checking to see if a PCI bus can safely be reset, we check to see if any of the children have their PCI_DEV_FLAGS_NO_BUS_RESET flag set. As these devices are known not to behave well after a bus reset. Some PCIe root port bridges also do not behave well after a bus reset, sometimes causing the devices behind the bridge to become unusable. Add a check for the PCI_DEV_FLAGS_NO_BUS_RESET flag being set in the bridge device to allow these bridges to be flagged, and prevent their buses from being reset. A follow on patch will add a quirk for this type of bridge. Signed-off-by: David Daney [jglau...@cavium.com: fixed typo] Signed-off-by: Jan Glauber --- drivers/pci/pci.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index af0cc34..d9abbc9 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4290,6 +4290,10 @@ static bool pci_bus_resetable(struct pci_bus *bus) { struct pci_dev *dev; + + if (bus->self && (bus->self->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)) + return false; + list_for_each_entry(dev, &bus->devices, bus_list) { if (dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET || (dev->subordinate && !pci_bus_resetable(dev->subordinate))) -- 2.9.0.rc0.21.g322
Re: net/sunrpc: v4.14-rc4 lockdep warning
Hi Trond, is there a patch available for this issue? I'm running into with 4.14-rc5 on my ARM64 board. thanks, Jan 2017-10-11 19:49 GMT+02:00 Trond Myklebust : > On Tue, 2017-10-10 at 10:19 -0700, t...@kernel.org wrote: >> Hello, >> >> On Tue, Oct 10, 2017 at 04:48:57PM +, Trond Myklebust wrote: >> > Thanks for the explanation. What I'm not really understanding here >> > though, is how the work item could be queued at all. We have a >> > wait_on_bit_lock() in xprt_destroy() that should mean the xprt- >> > > task_cleanup work item has completed running, and that it cannot >> > > be >> > >> > requeued. >> > >> > Is there a possibility that the flush_queue() might be triggered >> > despite the work item not being queued? >> >> Yeah, for sure. The lockdep annotations don't distinguish those >> cases and assume the worst case. >> > > OK. Let's just remove that call to cancel_work_sync() then. As I said, > it should be redundant due to the wait_on_bit_lock(). > > -- > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.mykleb...@primarydata.com
[PATCH] MAINTAINERS: Split Cavium EDAC entry and add myself
Split the Cavium EDAC entry into MIPS and ARM drivers because they have different maintainers and mailing lists. Add myself as additional maintainer to the ThunderX driver. Signed-off-by: Jan Glauber --- MAINTAINERS | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index a74227a..9ea8b89 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4906,13 +4906,19 @@ L: linux-e...@vger.kernel.org S: Maintained F: drivers/edac/highbank* -EDAC-CAVIUM +EDAC-CAVIUM OCTEON M: Ralf Baechle M: David Daney L: linux-e...@vger.kernel.org L: linux-m...@linux-mips.org S: Supported F: drivers/edac/octeon_edac* + +EDAC-CAVIUM THUNDERX +M: David Daney +M: Jan Glauber +L: linux-e...@vger.kernel.org +S: Supported F: drivers/edac/thunderx_edac* EDAC-CORE -- 1.9.1
Re: [PATCH 3/3] i2c: xlp9xx: Add support for SMBAlert
On Tue, Feb 27, 2018 at 01:26:20PM +, George Cherian wrote: > Add support for SMBus alert mechanism to i2c-xlp9xx driver. > The second interrupt is parsed to use for SMBus alert. > The first interrupt is the i2c controller main interrupt. > > Signed-off-by: Kamlakant Patel > Signed-off-by: George Cherian > --- > drivers/i2c/busses/i2c-xlp9xx.c | 26 ++ > 1 file changed, 26 insertions(+) > > diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c > index eb8913e..9462eab 100644 > --- a/drivers/i2c/busses/i2c-xlp9xx.c > +++ b/drivers/i2c/busses/i2c-xlp9xx.c > @@ -10,6 +10,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -84,6 +85,8 @@ struct xlp9xx_i2c_dev { > struct device *dev; > struct i2c_adapter adapter; > struct completion msg_complete; > + struct i2c_smbus_alert_setup alert_data; > + struct i2c_client *ara; > int irq; > bool msg_read; > bool len_recv; > @@ -447,6 +450,21 @@ static int xlp9xx_i2c_get_frequency(struct > platform_device *pdev, > return 0; > } > > +static int xlp9xx_i2c_smbus_setup(struct xlp9xx_i2c_dev *priv, > + struct platform_device *pdev) > +{ > + if (!priv->alert_data.irq) > + return -EINVAL; > + > + priv->alert_data.alert_edge_triggered = 0; Hi George, I think this is not needed anymore, see: 9b9f2b8bc2ac i2c: i2c-smbus: Use threaded irq for smbalert --Jan > + > + priv->ara = i2c_setup_smbus_alert(&priv->adapter, &priv->alert_data); > + if (!priv->ara) > + return -ENODEV; > + > + return 0; > +} > + > static int xlp9xx_i2c_probe(struct platform_device *pdev) > { > struct xlp9xx_i2c_dev *priv; > @@ -467,6 +485,10 @@ static int xlp9xx_i2c_probe(struct platform_device *pdev) > dev_err(&pdev->dev, "invalid irq!\n"); > return priv->irq; > } > + /* SMBAlert irq */ > + priv->alert_data.irq = platform_get_irq(pdev, 1); > + if (priv->alert_data.irq <= 0) > + priv->alert_data.irq = 0; > > xlp9xx_i2c_get_frequency(pdev, priv); > xlp9xx_i2c_init(priv); > @@ -493,6 +515,10 @@ static int xlp9xx_i2c_probe(struct platform_device *pdev) > if (err) > return err; > > + err = xlp9xx_i2c_smbus_setup(priv, pdev); > + if (err) > + dev_info(&pdev->dev, "No active SMBus alert %d\n", err); > + > platform_set_drvdata(pdev, priv); > dev_dbg(&pdev->dev, "I2C bus:%d added\n", priv->adapter.nr); > > -- > 2.1.4
Re: [PATCHv2 1/3] i2c: xlp9xx: Check for Bus state before every transfer
I don't know how valuable same-company reviewed-by's are in the end, but the patches look good to me with the small change in SMBalert, so you could add: Reviewed-by: Jan Glauber --Jan On Tue, Feb 27, 2018 at 01:26:18PM +, George Cherian wrote: > I2C bus enters the STOP condition after the DATA_DONE interrupt is raised. > Essentially the driver should be checking the bus state before sending > any transaction. In case a transaction is initiated while the > bus is busy, the prior transaction's stop condition is not achieved. > Add the check to make sure the bus is not busy before every transaction. > > Signed-off-by: George Cherian > --- > drivers/i2c/busses/i2c-xlp9xx.c | 32 > 1 file changed, 32 insertions(+) > > diff --git a/drivers/i2c/busses/i2c-xlp9xx.c b/drivers/i2c/busses/i2c-xlp9xx.c > index 1f6d780..42dd1fa 100644 > --- a/drivers/i2c/busses/i2c-xlp9xx.c > +++ b/drivers/i2c/busses/i2c-xlp9xx.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > > #define XLP9XX_I2C_DIV 0x0 > #define XLP9XX_I2C_CTRL 0x1 > @@ -36,6 +37,8 @@ > #define XLP9XX_I2C_TIMEOUT 0X10 > #define XLP9XX_I2C_GENCALLADDR 0x11 > > +#define XLP9XX_I2C_STATUS_BUSY BIT(0) > + > #define XLP9XX_I2C_CMD_START BIT(7) > #define XLP9XX_I2C_CMD_STOP BIT(6) > #define XLP9XX_I2C_CMD_READ BIT(5) > @@ -71,6 +74,7 @@ > #define XLP9XX_I2C_HIGH_FREQ 40 > #define XLP9XX_I2C_FIFO_SIZE 0x80U > #define XLP9XX_I2C_TIMEOUT_MS1000 > +#define XLP9XX_I2C_BUSY_TIMEOUT 50 > > #define XLP9XX_I2C_FIFO_WCNT_MASK0xff > #define XLP9XX_I2C_STATUS_ERRMASK(XLP9XX_I2C_INTEN_ARLOST | \ > @@ -241,6 +245,26 @@ static irqreturn_t xlp9xx_i2c_isr(int irq, void *dev_id) > return IRQ_HANDLED; > } > > +static int xlp9xx_i2c_check_bus_status(struct xlp9xx_i2c_dev *priv) > +{ > + u32 status; > + u32 busy_timeout = XLP9XX_I2C_BUSY_TIMEOUT; > + > + while (busy_timeout) { > + status = xlp9xx_read_i2c_reg(priv, XLP9XX_I2C_STATUS); > + if ((status & XLP9XX_I2C_STATUS_BUSY) == 0) > + break; > + > + busy_timeout--; > + usleep_range(1000, 1100); > + } > + > + if (!busy_timeout) > + return -EIO; > + > + return 0; > +} > + > static int xlp9xx_i2c_init(struct xlp9xx_i2c_dev *priv) > { > u32 prescale; > @@ -363,6 +387,14 @@ static int xlp9xx_i2c_xfer(struct i2c_adapter *adap, > struct i2c_msg *msgs, > int i, ret; > struct xlp9xx_i2c_dev *priv = i2c_get_adapdata(adap); > > + ret = xlp9xx_i2c_check_bus_status(priv); > + if (ret) { > + xlp9xx_i2c_init(priv); > + ret = xlp9xx_i2c_check_bus_status(priv); > + if (ret) > + return ret; > + } > + > for (i = 0; i < num; i++) { > ret = xlp9xx_i2c_xfer_msg(priv, &msgs[i], i == num - 1); > if (ret != 0) > -- > 2.1.4
Re: [PATCH 2/2] arm64: defconfig: Raise NR_CPUS to 256
On Tue, Mar 06, 2018 at 02:12:29PM +0100, Arnd Bergmann wrote: > On Fri, Mar 2, 2018 at 3:37 PM, Jan Glauber wrote: > > ThunderX1 dual socket has 96 CPUs and ThunderX2 has 224 CPUs. > > Are you sure about those numbers? From my counting, I would have expected > twice that number in both cases: 48 cores, 2 chips and 2x SMT for ThunderX > vs 52 Cores, 2 chips and 4x SMT for ThunderX2. That's what I have on those machines. I counted SMT as normal CPUs as it doesn't make a difference for the config. I've not seen SMT on ThunderX. The ThunderX2 number of 224 is already with 4x SMT (and 2 chips) but there may be other versions planned that I'm not aware of. > > Therefore raise the default number of CPUs from 64 to 256 > > by adding an arm64 specific option to override the generic default. > > Regardless of what the correct numbers for your chips are, I'd like > to hear some other opinions on how high we should raise that default > limit, both in arch/arm64/Kconfig and in the defconfig file. > > As I remember it, there is a noticeable cost for taking the limit beyond > BITS_PER_LONG, both in terms of memory consumption and also > runtime performance (copying and comparing CPU masks). OK, that explains the default. My unverified assumption is that increasing the CPU masks wont be a noticable performance hit. Also, I don't think that anyone who wants performance will use defconfig. All server distributions would bump up the NR_CPUS anyway and really small systems will probably need to tune the config anyway. For me defconfig should produce a usable system, not with every last driver configured but with all the basics like CPUs, networking, etc. fully present. > I'm sure someone will keep coming up with even larger configurations > in the future, so we should try to decide how far we can take the > defaults for the moment without impacting users of the smallest > systems. Alternatively, you could add some measurements that > show how much memory and CPU time is used up on a typical > configuration for a small system (4 cores, no SMT, 512 MB RAM). > If that's low enough, we could just do it anyway. OK, I'll take a look. --Jan
Re: [PATCH 2/2] arm64: defconfig: Raise NR_CPUS to 256
On Tue, Mar 06, 2018 at 03:02:01PM +0100, Jan Glauber wrote: > On Tue, Mar 06, 2018 at 02:12:29PM +0100, Arnd Bergmann wrote: > > On Fri, Mar 2, 2018 at 3:37 PM, Jan Glauber wrote: > > > ThunderX1 dual socket has 96 CPUs and ThunderX2 has 224 CPUs. > > > > Are you sure about those numbers? From my counting, I would have expected > > twice that number in both cases: 48 cores, 2 chips and 2x SMT for ThunderX > > vs 52 Cores, 2 chips and 4x SMT for ThunderX2. > > That's what I have on those machines. I counted SMT as normal CPUs as it > doesn't make a difference for the config. I've not seen SMT on ThunderX. > > The ThunderX2 number of 224 is already with 4x SMT (and 2 chips) but > there may be other versions planned that I'm not aware of. > > > > Therefore raise the default number of CPUs from 64 to 256 > > > by adding an arm64 specific option to override the generic default. > > > > Regardless of what the correct numbers for your chips are, I'd like > > to hear some other opinions on how high we should raise that default > > limit, both in arch/arm64/Kconfig and in the defconfig file. > > > > As I remember it, there is a noticeable cost for taking the limit beyond > > BITS_PER_LONG, both in terms of memory consumption and also > > runtime performance (copying and comparing CPU masks). > > OK, that explains the default. My unverified assumption is that > increasing the CPU masks wont be a noticable performance hit. > > Also, I don't think that anyone who wants performance will use > defconfig. All server distributions would bump up the NR_CPUS anyway > and really small systems will probably need to tune the config > anyway. > > For me defconfig should produce a usable system, not with every last > driver configured but with all the basics like CPUs, networking, etc. > fully present. > > > I'm sure someone will keep coming up with even larger configurations > > in the future, so we should try to decide how far we can take the > > defaults for the moment without impacting users of the smallest > > systems. Alternatively, you could add some measurements that > > show how much memory and CPU time is used up on a typical > > configuration for a small system (4 cores, no SMT, 512 MB RAM). > > If that's low enough, we could just do it anyway. > > OK, I'll take a look. I've made some measurements on a 4 core board (Cavium 81xx) with NR_CPUS set to 64 or 256: - vmlinux grows by 0.04 % with 256 CPUs - Kernel compile time was a bit faster with 256 CPUS (which does not make sense, but at least is seems to not suffer from the change). Is there a benchmark that will be better suited? Maybe even a microbenchmark that will suffer from the longer cpumasks? - Available memory decreased by 0.13% (restricted memory to 512 MB), BSS increased 5.3 % Cheers, Jan
Re: [PATCH 2/2] arm64: defconfig: Raise NR_CPUS to 256
On Mon, Mar 26, 2018 at 11:28:28AM +0200, Arnd Bergmann wrote: > On Mon, Mar 26, 2018 at 10:52 AM, Jan Glauber > wrote: > > On Tue, Mar 06, 2018 at 03:02:01PM +0100, Jan Glauber wrote: > >> On Tue, Mar 06, 2018 at 02:12:29PM +0100, Arnd Bergmann wrote: > >> > On Fri, Mar 2, 2018 at 3:37 PM, Jan Glauber wrote: > >> > > ThunderX1 dual socket has 96 CPUs and ThunderX2 has 224 CPUs. > >> > > >> > Are you sure about those numbers? From my counting, I would have expected > >> > twice that number in both cases: 48 cores, 2 chips and 2x SMT for > >> > ThunderX > >> > vs 52 Cores, 2 chips and 4x SMT for ThunderX2. > >> > >> That's what I have on those machines. I counted SMT as normal CPUs as it > >> doesn't make a difference for the config. I've not seen SMT on ThunderX. > >> > >> The ThunderX2 number of 224 is already with 4x SMT (and 2 chips) but > >> there may be other versions planned that I'm not aware of. > >> > >> > > Therefore raise the default number of CPUs from 64 to 256 > >> > > by adding an arm64 specific option to override the generic default. > >> > > >> > Regardless of what the correct numbers for your chips are, I'd like > >> > to hear some other opinions on how high we should raise that default > >> > limit, both in arch/arm64/Kconfig and in the defconfig file. > >> > > >> > As I remember it, there is a noticeable cost for taking the limit beyond > >> > BITS_PER_LONG, both in terms of memory consumption and also > >> > runtime performance (copying and comparing CPU masks). > >> > >> OK, that explains the default. My unverified assumption is that > >> increasing the CPU masks wont be a noticable performance hit. > >> > >> Also, I don't think that anyone who wants performance will use > >> defconfig. All server distributions would bump up the NR_CPUS anyway > >> and really small systems will probably need to tune the config > >> anyway. > >> > >> For me defconfig should produce a usable system, not with every last > >> driver configured but with all the basics like CPUs, networking, etc. > >> fully present. > >> > >> > I'm sure someone will keep coming up with even larger configurations > >> > in the future, so we should try to decide how far we can take the > >> > defaults for the moment without impacting users of the smallest > >> > systems. Alternatively, you could add some measurements that > >> > show how much memory and CPU time is used up on a typical > >> > configuration for a small system (4 cores, no SMT, 512 MB RAM). > >> > If that's low enough, we could just do it anyway. > >> > >> OK, I'll take a look. > > > > I've made some measurements on a 4 core board (Cavium 81xx) with > > NR_CPUS set to 64 or 256: > > > > - vmlinux grows by 0.04 % with 256 CPUs > > Ok. Is this both with CONFIG_CPUMASK_OFFSTACK=n? Yes. > > - Kernel compile time was a bit faster with 256 CPUS (which does > > not make sense, but at least is seems to not suffer from the change). > > Do you mean compiling the same kernel configuration while running > on a system with less than 64 CPUs on either a CONFIG_NR_CPUS=64 > or CONFIG_NR_PCUS=256 kernel, or do you mean the time to compile > a kernel with either CONFIG_NR_CPUS=64 or CONFIG_NR_CPUS=256, > while running on the same host? The former, compiling everything on a 4-core system using two different kernels to compile the same thing. > I assume the former, which is a very interesting result, possibly > pointing to us doing something wrong in the NR_CPUS=64 case > that could be optimized. > > If you ran with CONFIG_CPUMASK_OFFSTACK, that may have made > a significant difference, but I would expect it to be faster without it. > > To get more insight to what is happening, you could rerun the same > test with 'perf record' and then compare the profiles. How significant > is the runtime difference compared to the jitter you get between normal > runs on the same configuration? I did retry once but the odd case that CONFIG_NR_CPUS=256 was faster was consistent. The difference was very small though so it may be completely due to jitter. > > Is there a benchmark that will be better suited? Maybe even a > > microbenchmark that will suffer from the longer cpumasks? > > Good question. > > > - Available memory decreased by 0.13% (restricted memory to 512 MB), > > BSS increased 5.3 % > > 0.13% of a few hundred megabytes is still several hundred kb, right? I'd > like to hear some other opinions on that, but it seems to be in the > range of enabling many additional device drivers, which is something > we don't do lightly. Agreed, available memory was reduced by 128 KB. --Jan
[PATCH] crypto: testmgr: Allow different compression results
From: Mahipal Challa The following error is triggered by the ThunderX ZIP driver if the testmanager is enabled: [ 199.069437] ThunderX-ZIP :03:00.0: Found ZIP device 0 177d:a01a on Node 0 [ 199.073573] alg: comp: Compression test 1 failed for deflate-generic: output len = 37 The reason for this error is the verification of the compression results. Verifying the compression result only works if all algorithm parameters are identical, in this case to the software implementation. Different compression engines like the ThunderX ZIP coprocessor might yield different compression results by tuning the algorithm parameters. In our case the compressed result is shorter than the test vector. We should not forbid different compression results but only check that compression -> decompression yields the same result. This is done already in the acomp test. Do something similar for test_comp(). Signed-off-by: Mahipal Challa Signed-off-by: Balakrishna Bhamidipati [jglau...@cavium.com: removed unrelated printk changes, rewrote commit msg, fixed whitespace and unneeded initialization] Signed-off-by: Jan Glauber --- crypto/testmgr.c | 50 +- 1 file changed, 37 insertions(+), 13 deletions(-) diff --git a/crypto/testmgr.c b/crypto/testmgr.c index af4a01c..627e82e 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -1342,19 +1342,30 @@ static int test_comp(struct crypto_comp *tfm, int ctcount, int dtcount) { const char *algo = crypto_tfm_alg_driver_name(crypto_comp_tfm(tfm)); + char *output, *decomp_output; unsigned int i; - char result[COMP_BUF_SIZE]; int ret; + output = kmalloc(COMP_BUF_SIZE, GFP_KERNEL); + if (!output) + return -ENOMEM; + + decomp_output = kmalloc(COMP_BUF_SIZE, GFP_KERNEL); + if (!decomp_output) { + kfree(output); + return -ENOMEM; + } + for (i = 0; i < ctcount; i++) { int ilen; unsigned int dlen = COMP_BUF_SIZE; - memset(result, 0, sizeof (result)); + memset(output, 0, sizeof(COMP_BUF_SIZE)); + memset(decomp_output, 0, sizeof(COMP_BUF_SIZE)); ilen = ctemplate[i].inlen; ret = crypto_comp_compress(tfm, ctemplate[i].input, - ilen, result, &dlen); + ilen, output, &dlen); if (ret) { printk(KERN_ERR "alg: comp: compression failed " "on test %d for %s: ret=%d\n", i + 1, algo, @@ -1362,7 +1373,17 @@ static int test_comp(struct crypto_comp *tfm, goto out; } - if (dlen != ctemplate[i].outlen) { + ilen = dlen; + dlen = COMP_BUF_SIZE; + ret = crypto_comp_decompress(tfm, output, +ilen, decomp_output, &dlen); + if (ret) { + pr_err("alg: comp: compression failed: decompress: on test %d for %s failed: ret=%d\n", + i + 1, algo, -ret); + goto out; + } + + if (dlen != ctemplate[i].inlen) { printk(KERN_ERR "alg: comp: Compression test %d " "failed for %s: output len = %d\n", i + 1, algo, dlen); @@ -1370,10 +1391,11 @@ static int test_comp(struct crypto_comp *tfm, goto out; } - if (memcmp(result, ctemplate[i].output, dlen)) { - printk(KERN_ERR "alg: comp: Compression test %d " - "failed for %s\n", i + 1, algo); - hexdump(result, dlen); + if (memcmp(decomp_output, ctemplate[i].input, + ctemplate[i].inlen)) { + pr_err("alg: comp: compression failed: output differs: on test %d for %s\n", + i + 1, algo); + hexdump(decomp_output, dlen); ret = -EINVAL; goto out; } @@ -1383,11 +1405,11 @@ static int test_comp(struct crypto_comp *tfm, int ilen; unsigned int dlen = COMP_BUF_SIZE; - memset(result, 0, sizeof (result)); + memset(decomp_output, 0, sizeof(COMP_BUF_SIZE)); ilen = dtemplate[i].inlen; ret = crypto_comp_decompress(tfm, dtemplate[i].input, -ilen, result, &dlen); +ilen, decomp_output, &dlen); if (ret) {
[PATCH v2 0/5] ThunderX ZIP driver bug fixes
Some bug fixes for this driver after it stopped working with virtual mapped stacks. I think the first two patches qualify for stable. Jan Glauber (5): crypto: thunderx_zip: Fix fallout from CONFIG_VMAP_STACK crypto: thunderx_zip: Limit result reading attempts crypto: thunderx_zip: Prevent division by zero crypto: thunderx_zip: Fix statistics pending request value crypto: thunderx_zip: Fix smp_processor_id() warnings drivers/crypto/cavium/zip/common.h | 21 + drivers/crypto/cavium/zip/zip_crypto.c | 22 ++ drivers/crypto/cavium/zip/zip_deflate.c | 4 ++-- drivers/crypto/cavium/zip/zip_device.c | 4 ++-- drivers/crypto/cavium/zip/zip_inflate.c | 4 ++-- drivers/crypto/cavium/zip/zip_main.c| 24 +++- drivers/crypto/cavium/zip/zip_main.h| 1 - 7 files changed, 52 insertions(+), 28 deletions(-) -- 2.16.2
[PATCH v2 2/5] crypto: thunderx_zip: Limit result reading attempts
After issuing a request an endless loop was used to read the completion state from memory which is asynchronously updated by the ZIP coprocessor. Add an upper bound to the retry attempts to prevent a CPU getting stuck forever in case of an error. Additionally, add a read memory barrier and a small delay between the reading attempts. Signed-off-by: Jan Glauber Reviewed-by: Robert Richter Cc: stable # 4.14 --- drivers/crypto/cavium/zip/common.h | 21 + drivers/crypto/cavium/zip/zip_deflate.c | 4 ++-- drivers/crypto/cavium/zip/zip_inflate.c | 4 ++-- 3 files changed, 25 insertions(+), 4 deletions(-) diff --git a/drivers/crypto/cavium/zip/common.h b/drivers/crypto/cavium/zip/common.h index dc451e0a43c5..58fb3ed6e644 100644 --- a/drivers/crypto/cavium/zip/common.h +++ b/drivers/crypto/cavium/zip/common.h @@ -46,8 +46,10 @@ #ifndef __COMMON_H__ #define __COMMON_H__ +#include #include #include +#include #include #include #include @@ -149,6 +151,25 @@ struct zip_operation { u32 sizeofzops; }; +static inline int zip_poll_result(union zip_zres_s *result) +{ + int retries = 1000; + + while (!result->s.compcode) { + if (!--retries) { + pr_err("ZIP ERR: request timed out"); + return -ETIMEDOUT; + } + udelay(10); + /* +* Force re-reading of compcode which is updated +* by the ZIP coprocessor. +*/ + rmb(); + } + return 0; +} + /* error messages */ #define zip_err(fmt, args...) pr_err("ZIP ERR:%s():%d: " \ fmt "\n", __func__, __LINE__, ## args) diff --git a/drivers/crypto/cavium/zip/zip_deflate.c b/drivers/crypto/cavium/zip/zip_deflate.c index 9a944b8c1e29..d7133f857d67 100644 --- a/drivers/crypto/cavium/zip/zip_deflate.c +++ b/drivers/crypto/cavium/zip/zip_deflate.c @@ -129,8 +129,8 @@ int zip_deflate(struct zip_operation *zip_ops, struct zip_state *s, /* Stats update for compression requests submitted */ atomic64_inc(&zip_dev->stats.comp_req_submit); - while (!result_ptr->s.compcode) - continue; + /* Wait for completion or error */ + zip_poll_result(result_ptr); /* Stats update for compression requests completed */ atomic64_inc(&zip_dev->stats.comp_req_complete); diff --git a/drivers/crypto/cavium/zip/zip_inflate.c b/drivers/crypto/cavium/zip/zip_inflate.c index 50cbdd83dbf2..7e0d73e2f89e 100644 --- a/drivers/crypto/cavium/zip/zip_inflate.c +++ b/drivers/crypto/cavium/zip/zip_inflate.c @@ -143,8 +143,8 @@ int zip_inflate(struct zip_operation *zip_ops, struct zip_state *s, /* Decompression requests submitted stats update */ atomic64_inc(&zip_dev->stats.decomp_req_submit); - while (!result_ptr->s.compcode) - continue; + /* Wait for completion or error */ + zip_poll_result(result_ptr); /* Decompression requests completed stats update */ atomic64_inc(&zip_dev->stats.decomp_req_complete); -- 2.16.2
[PATCH v2 4/5] crypto: thunderx_zip: Fix statistics pending request value
The pending request counter was read from the wrong register. While at it, there is no need to use an atomic for it as it is only read localy in a loop. Signed-off-by: Jan Glauber Reviewed-by: Robert Richter --- drivers/crypto/cavium/zip/zip_main.c | 13 + drivers/crypto/cavium/zip/zip_main.h | 1 - 2 files changed, 5 insertions(+), 9 deletions(-) diff --git a/drivers/crypto/cavium/zip/zip_main.c b/drivers/crypto/cavium/zip/zip_main.c index 79b449e0f955..ae5b20c695ca 100644 --- a/drivers/crypto/cavium/zip/zip_main.c +++ b/drivers/crypto/cavium/zip/zip_main.c @@ -469,6 +469,8 @@ static int zip_show_stats(struct seq_file *s, void *unused) struct zip_stats *st; for (index = 0; index < MAX_ZIP_DEVICES; index++) { + u64 pending = 0; + if (zip_dev[index]) { zip = zip_dev[index]; st = &zip->stats; @@ -476,10 +478,8 @@ static int zip_show_stats(struct seq_file *s, void *unused) /* Get all the pending requests */ for (q = 0; q < ZIP_NUM_QUEUES; q++) { val = zip_reg_read((zip->reg_base + - ZIP_DBG_COREX_STA(q))); - val = (val >> 32); - val = val & 0xff; - atomic64_add(val, &st->pending_req); + ZIP_DBG_QUEX_STA(q))); + pending += val >> 32 & 0xff; } val = atomic64_read(&st->comp_req_complete); @@ -514,10 +514,7 @@ static int zip_show_stats(struct seq_file *s, void *unused) (u64)atomic64_read(&st->decomp_in_bytes), (u64)atomic64_read(&st->decomp_out_bytes), (u64)atomic64_read(&st->decomp_bad_reqs), - (u64)atomic64_read(&st->pending_req)); - - /* Reset pending requests count */ - atomic64_set(&st->pending_req, 0); + pending); } } return 0; diff --git a/drivers/crypto/cavium/zip/zip_main.h b/drivers/crypto/cavium/zip/zip_main.h index 64e051f60784..e1e4fa92ce80 100644 --- a/drivers/crypto/cavium/zip/zip_main.h +++ b/drivers/crypto/cavium/zip/zip_main.h @@ -74,7 +74,6 @@ struct zip_stats { atomic64_tcomp_req_complete; atomic64_tdecomp_req_submit; atomic64_tdecomp_req_complete; - atomic64_tpending_req; atomic64_tcomp_in_bytes; atomic64_tcomp_out_bytes; atomic64_tdecomp_in_bytes; -- 2.16.2
[PATCH v2 3/5] crypto: thunderx_zip: Prevent division by zero
Avoid two potential divisions by zero when calculating average values for the zip statistics. Signed-off-by: Jan Glauber Reviewed-by: Robert Richter --- drivers/crypto/cavium/zip/zip_main.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/crypto/cavium/zip/zip_main.c b/drivers/crypto/cavium/zip/zip_main.c index 1cd8aa488185..79b449e0f955 100644 --- a/drivers/crypto/cavium/zip/zip_main.c +++ b/drivers/crypto/cavium/zip/zip_main.c @@ -482,10 +482,11 @@ static int zip_show_stats(struct seq_file *s, void *unused) atomic64_add(val, &st->pending_req); } - avg_chunk = (atomic64_read(&st->comp_in_bytes) / -atomic64_read(&st->comp_req_complete)); - avg_cr = (atomic64_read(&st->comp_in_bytes) / - atomic64_read(&st->comp_out_bytes)); + val = atomic64_read(&st->comp_req_complete); + avg_chunk = (val) ? atomic64_read(&st->comp_in_bytes) / val : 0; + + val = atomic64_read(&st->comp_out_bytes); + avg_cr = (val) ? atomic64_read(&st->comp_in_bytes) / val : 0; seq_printf(s, "ZIP Device %d Stats\n" "---\n" "Comp Req Submitted: \t%lld\n" -- 2.16.2
[PATCH v2 5/5] crypto: thunderx_zip: Fix smp_processor_id() warnings
Switch to raw_smp_processor_id() to prevent a number of warnings from kernel debugging. We do not care about preemption here, as the CPU number is only used as a poor mans load balancing or device selection. If preemption happens during a compress/decompress operation a small performance hit will occur but everything will continue to work, so just ignore it. Signed-off-by: Jan Glauber Reviewed-by: Robert Richter --- drivers/crypto/cavium/zip/zip_device.c | 4 ++-- drivers/crypto/cavium/zip/zip_main.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/crypto/cavium/zip/zip_device.c b/drivers/crypto/cavium/zip/zip_device.c index ccf21fb91513..f174ec29ed69 100644 --- a/drivers/crypto/cavium/zip/zip_device.c +++ b/drivers/crypto/cavium/zip/zip_device.c @@ -87,12 +87,12 @@ u32 zip_load_instr(union zip_inst_s *instr, * Distribute the instructions between the enabled queues based on * the CPU id. */ - if (smp_processor_id() % 2 == 0) + if (raw_smp_processor_id() % 2 == 0) queue = 0; else queue = 1; - zip_dbg("CPU Core: %d Queue number:%d", smp_processor_id(), queue); + zip_dbg("CPU Core: %d Queue number:%d", raw_smp_processor_id(), queue); /* Take cmd buffer lock */ spin_lock(&zip_dev->iq[queue].lock); diff --git a/drivers/crypto/cavium/zip/zip_main.c b/drivers/crypto/cavium/zip/zip_main.c index ae5b20c695ca..be055b9547f6 100644 --- a/drivers/crypto/cavium/zip/zip_main.c +++ b/drivers/crypto/cavium/zip/zip_main.c @@ -113,7 +113,7 @@ struct zip_device *zip_get_device(int node) */ int zip_get_node_id(void) { - return cpu_to_node(smp_processor_id()); + return cpu_to_node(raw_smp_processor_id()); } /* Initializes the ZIP h/w sub-system */ -- 2.16.2
[PATCH v2 1/5] crypto: thunderx_zip: Fix fallout from CONFIG_VMAP_STACK
Enabling virtual mapped kernel stacks breaks the thunderx_zip driver. On compression or decompression the executing CPU hangs in an endless loop. The reason for this is the usage of __pa by the driver which does no longer work for an address that is not part of the 1:1 mapping. The zip driver allocates a result struct on the stack and needs to tell the hardware the physical address within this struct that is used to signal the completion of the request. As the hardware gets the wrong address after the broken __pa conversion it writes to an arbitrary address. The zip driver then waits forever for the completion byte to contain a non-zero value. Allocating the result struct from 1:1 mapped memory resolves this bug. Signed-off-by: Jan Glauber Reviewed-by: Robert Richter Cc: stable # 4.14 --- drivers/crypto/cavium/zip/zip_crypto.c | 22 ++ 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/drivers/crypto/cavium/zip/zip_crypto.c b/drivers/crypto/cavium/zip/zip_crypto.c index 8df4d26cf9d4..b92b6e7e100f 100644 --- a/drivers/crypto/cavium/zip/zip_crypto.c +++ b/drivers/crypto/cavium/zip/zip_crypto.c @@ -124,7 +124,7 @@ int zip_compress(const u8 *src, unsigned int slen, struct zip_kernel_ctx *zip_ctx) { struct zip_operation *zip_ops = NULL; - struct zip_state zip_state; + struct zip_state *zip_state; struct zip_device *zip = NULL; int ret; @@ -135,20 +135,23 @@ int zip_compress(const u8 *src, unsigned int slen, if (!zip) return -ENODEV; - memset(&zip_state, 0, sizeof(struct zip_state)); + zip_state = kzalloc(sizeof(*zip_state), GFP_ATOMIC); + if (!zip_state) + return -ENOMEM; + zip_ops = &zip_ctx->zip_comp; zip_ops->input_len = slen; zip_ops->output_len = *dlen; memcpy(zip_ops->input, src, slen); - ret = zip_deflate(zip_ops, &zip_state, zip); + ret = zip_deflate(zip_ops, zip_state, zip); if (!ret) { *dlen = zip_ops->output_len; memcpy(dst, zip_ops->output, *dlen); } - + kfree(zip_state); return ret; } @@ -157,7 +160,7 @@ int zip_decompress(const u8 *src, unsigned int slen, struct zip_kernel_ctx *zip_ctx) { struct zip_operation *zip_ops = NULL; - struct zip_state zip_state; + struct zip_state *zip_state; struct zip_device *zip = NULL; int ret; @@ -168,7 +171,10 @@ int zip_decompress(const u8 *src, unsigned int slen, if (!zip) return -ENODEV; - memset(&zip_state, 0, sizeof(struct zip_state)); + zip_state = kzalloc(sizeof(*zip_state), GFP_ATOMIC); + if (!zip_state) + return -ENOMEM; + zip_ops = &zip_ctx->zip_decomp; memcpy(zip_ops->input, src, slen); @@ -179,13 +185,13 @@ int zip_decompress(const u8 *src, unsigned int slen, zip_ops->input_len = slen; zip_ops->output_len = *dlen; - ret = zip_inflate(zip_ops, &zip_state, zip); + ret = zip_inflate(zip_ops, zip_state, zip); if (!ret) { *dlen = zip_ops->output_len; memcpy(dst, zip_ops->output, *dlen); } - + kfree(zip_state); return ret; } -- 2.16.2
WARN_ON after gic_reserve_range
Hi Marc, with 4.20-rc3 I see two WARN_ON's firing on a ThunderX2 system that come from commit 3fb68faee867 (irqchip/gic-v3-its: Register LPI tables with EFI config table). Global efi_memreserve_root is NULL as it will only be set when early initcalls are running, but from the backtrace the WARN_ON's are running even earlier (init_IRQ). Am I the only one seeing this? [0.00] WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c:1696 its_init+0x36c/0x608 [0.00] Modules linked in: [0.00] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc3-jang+ #69 [0.00] pstate: 6089 (nZCv daIf -PAN -UAO) [0.00] pc : its_init+0x36c/0x608 [0.00] lr : its_init+0x368/0x608 [0.00] sp : 08e53c60 [0.00] x29: 08e53c60 x28: 7dfffe6807a4 [0.00] x27: 0001 x26: 80267b4c2100 [0.00] x25: 08ffe930 x24: 08e5d9c8 [0.00] x23: 80267bc10300 x22: 08e5d9c8 [0.00] x21: 08e59000 x20: 08f12000 [0.00] x19: 08ffe000 x18: [0.00] x17: 0009 x16: [0.00] x15: 08e59648 x14: 88fa616f [0.00] x13: 08fa617d x12: 08e7e000 [0.00] x11: 08e53900 x10: 08e59eb0 [0.00] x9 : 08e3e018 x8 : 0875ac58 [0.00] x7 : 206f6e203a746e65 x6 : 018c [0.00] x5 : 0001 x4 : [0.00] x3 : 0001 x2 : 4f419eae5e0eb800 [0.00] x1 : x0 : ffed [0.00] Call trace: [0.00] its_init+0x36c/0x608 [0.00] gic_init_bases+0x288/0x300 [0.00] gic_acpi_init+0x124/0x248 [0.00] acpi_match_madt+0x4c/0x88 [0.00] acpi_table_parse_entries_array+0x134/0x220 [0.00] acpi_table_parse_entries+0x70/0x98 [0.00] acpi_table_parse_madt+0x40/0x50 [0.00] __acpi_probe_device_table+0x88/0xe4 [0.00] irqchip_init+0x38/0x40 [0.00] init_IRQ+0xcc/0x100 [0.00] start_kernel+0x2fc/0x490 [0.00] ---[ end trace 41868d15bb5cf8f6 ]--- --Jan
Re: dcache_readdir NULL inode oops
On Tue, Nov 20, 2018 at 07:03:17PM +, Will Deacon wrote: > On Tue, Nov 20, 2018 at 06:28:54PM +, Will Deacon wrote: > > On Sat, Nov 10, 2018 at 11:17:03AM +0000, Jan Glauber wrote: > > > On Fri, Nov 09, 2018 at 03:58:56PM +, Will Deacon wrote: > > > > On Fri, Nov 09, 2018 at 02:37:51PM +, Jan Glauber wrote: > > > > > I'm seeing the following oops reproducible with upstream kernel on > > > > > arm64 > > > > > (ThunderX2): > > > > > > > > [...] > > > > > > > > > It happens after 1-3 hours of running 'stress-ng --dev 128'. This > > > > > testcase > > > > > does a scandir of /dev and then calls random stuff like ioctl, lseek, > > > > > open/close etc. on the entries. I assume no files are deleted under > > > > > /dev > > > > > during the testcase. > > > > > > > > > > The NULL pointer is the inode pointer of next. The next > > > > > dentry->d_flags is > > > > > DCACHE_RCUACCESS when this happens. > > > > > > > > > > Any hints on how to further debug this? > > > > > > > > Can you reproduce the issue with vanilla -rc1 and do you have a "known > > > > good" > > > > kernel? > > > > > > I can try out -rc1, but IIRC this wasn't bisectible as the bug was > > > present at > > > least back to 4.14. I need to double check that as there were other issues > > > that are resolved now so I may confuse things here. I've defintely seen > > > the same bug with 4.18. > > > > > > Unfortunately I lost access to the machine as our data center seems to be > > > moving currently so it might take some days until I can try -rc1. > > > > Ok, I've just managed to reproduce this in a KVM guest running v4.20-rc3 on > > both the host and the guest, so if anybody has any ideas of things to try > > then > > I'm happy to give them a shot. In the meantime, I'll try again with a bunch > > of > > debug checks enabled. Hi Will, good that you can reproduce the issue. I've verified that the issue is indeed reproducible with 4.14. > > Weee, I eventually hit a use-after-free from KASAN. See below. I ran KASAN (and all the other debug stuff) but didn't trigger anything in the host. --Jan
Re: dcache_readdir NULL inode oops
On Wed, Nov 28, 2018 at 08:08:06PM +, Will Deacon wrote: > I spent some more time looking at this today... > > On Fri, Nov 23, 2018 at 06:05:25PM +, Will Deacon wrote: > > Doing some more debugging, it looks like the usual failure case is where > > one CPU clears the inode field in the dentry via: > > > > devpts_pty_kill() > > -> d_delete() // dentry->d_lockref.count == 1 > > -> dentry_unlink_inode() > > > > whilst another CPU gets a pointer to the dentry via: > > > > sys_getdents64() > > -> iterate_dir() > > -> dcache_readdir() > > -> next_positive() > > > > and explodes on the subsequent inode dereference when trying to pass the > > inode number to dir_emit(): > > > > if (!dir_emit(..., d_inode(next)->i_ino, ...)) > > > > Indeed, the hack below triggers a warning, indicating that the inode > > is being cleared concurrently. > > > > I can't work out whether the getdents64() path should hold a refcount > > to stop d_delete() in its tracks, or whether devpts_pty_kill() shouldn't > > be calling d_delete() like this at all. > > So the issue is that opening /dev/pts/ptmx creates a new pty in /dev/pts, > which disappears when you close /dev/pts/ptmx. Consequently, when we tear > down the dentry for the magic new file, we have to take the i_node rwsem of > the *parent* so that concurrent path walkers don't trip over it whilst its > being freed. I wrote a simple concurrent program to getdents(/dev/pts/) in > one thread, whilst another opens and closes /dev/pts/ptmx: it crashes the > kernel in seconds. I also made a testcase and verified that your fix is fine. I also tried replacing open-close on /dev/ptmx with mkdir-rmdir but that does not trigger the error. > Patch below, but I'd still like somebody else to look at this, please. I wonder why no inode_lock on parent is needed for devpts_pty_new(), but I'm obviously not a VFS expert... So your patch looks good to me and clearly solves the issue. thanks, Jan > Will > > --->8 > > diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c > index c53814539070..50ddb95ff84c 100644 > --- a/fs/devpts/inode.c > +++ b/fs/devpts/inode.c > @@ -619,11 +619,17 @@ void *devpts_get_priv(struct dentry *dentry) > */ > void devpts_pty_kill(struct dentry *dentry) > { > - WARN_ON_ONCE(dentry->d_sb->s_magic != DEVPTS_SUPER_MAGIC); > + struct super_block *sb = dentry->d_sb; > + struct dentry *parent = sb->s_root; > > + WARN_ON_ONCE(sb->s_magic != DEVPTS_SUPER_MAGIC); > + > + inode_lock(parent->d_inode); > dentry->d_fsdata = NULL; > drop_nlink(dentry->d_inode); > d_delete(dentry); > + inode_unlock(parent->d_inode); > + > dput(dentry); /* d_alloc_name() in devpts_pty_new() */ > }
dcache_readdir NULL inode oops
Hi Al, I'm seeing the following oops reproducible with upstream kernel on arm64 (ThunderX2): [ 5428.795719] Unable to handle kernel NULL pointer dereference at virtual address 0040 [ 5428.813838] Mem abort info: [ 5428.820721] ESR = 0x9606 [ 5428.828476] Exception class = DABT (current EL), IL = 32 bits [ 5428.841590] SET = 0, FnV = 0 [ 5428.848939] EA = 0, S1PTW = 0 [ 5428.855941] Data abort info: [ 5428.862422] ISV = 0, ISS = 0x0006 [ 5428.870787] CM = 0, WnR = 0 [ 5428.877359] user pgtable: 4k pages, 48-bit VAs, pgdp = 52f9e034 [ 5428.891098] [0040] pgd=007ebb0d6003, pud=007ed3073003, pmd= [ 5428.909251] Internal error: Oops: 9606 [#1] SMP [ 5428.919122] Modules linked in: xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ipmi_ssif ip_tables x_tables ipv6 crc32_ce bnx2x crct10dif_ce igb nvme nvme_core i2c_algo_bit mdio gpio_xlp i2c_xlp9xx [ 5428.972724] CPU: 45 PID: 220018 Comm: stress-ng-dev Not tainted 4.19.0-jang+ #45 [ 5428.987664] Hardware name: To be filled by O.E.M. Saber/To be filled by O.E.M., BIOS 0ACKL018 03/30/2018 [ 5429.006819] pstate: 6049 (nZCv daif +PAN -UAO) [ 5429.016567] pc : dcache_readdir+0xfc/0x1a8 [ 5429.024903] lr : dcache_readdir+0x134/0x1a8 [ 5429.033376] sp : 2d553d70 [ 5429.040101] x29: 2d553d70 x28: 807db4988000 [ 5429.050892] x27: x26: [ 5429.061679] x25: 5600 x24: 8024577106c0 [ 5429.072457] x23: x22: 80267b92a480 [ 5429.083248] x21: 80267b92a520 x20: 8024575e5e00 [ 5429.094029] x19: 2d553e40 x18: [ 5429.104805] x17: x16: [ 5429.115553] x15: x14: [ 5429.126332] x13: x12: [ 5429.137096] x11: x10: 80266b398228 [ 5429.147849] x9 : 80266b398000 x8 : 7e4e [ 5429.158580] x7 : x6 : 0830d190 [ 5429.169362] x5 : x4 : 0d7506a8 [ 5429.180123] x3 : 0002 x2 : 0002 [ 5429.190890] x1 : 8024575e5e38 x0 : 2d553e40 [ 5429.201715] Process stress-ng-dev (pid: 220018, stack limit = 0x9437ac28) [ 5429.216828] Call trace: [ 5429.221855] dcache_readdir+0xfc/0x1a8 [ 5429.229459] iterate_dir+0x8c/0x1a0 [ 5429.236561] ksys_getdents64+0xa4/0x188 [ 5429.244357] __arm64_sys_getdents64+0x28/0x38 [ 5429.253201] el0_svc_handler+0x7c/0x100 [ 5429.260989] el0_svc+0x8/0xc [ 5429.266878] Code: a9429681 aa1303e0 b9402682 a9400e66 (f94020a4) [ 5429.279192] ---[ end trace 5c1e28c07cf016c5 ]--- It happens after 1-3 hours of running 'stress-ng --dev 128'. This testcase does a scandir of /dev and then calls random stuff like ioctl, lseek, open/close etc. on the entries. I assume no files are deleted under /dev during the testcase. The NULL pointer is the inode pointer of next. The next dentry->d_flags is DCACHE_RCUACCESS when this happens. Any hints on how to further debug this? --Jan
Re: dcache_readdir NULL inode oops
On Fri, Nov 09, 2018 at 03:58:56PM +, Will Deacon wrote: > On Fri, Nov 09, 2018 at 02:37:51PM +0000, Jan Glauber wrote: > > I'm seeing the following oops reproducible with upstream kernel on arm64 > > (ThunderX2): > > [...] > > > It happens after 1-3 hours of running 'stress-ng --dev 128'. This testcase > > does a scandir of /dev and then calls random stuff like ioctl, lseek, > > open/close etc. on the entries. I assume no files are deleted under /dev > > during the testcase. > > > > The NULL pointer is the inode pointer of next. The next dentry->d_flags is > > DCACHE_RCUACCESS when this happens. > > > > Any hints on how to further debug this? > > Can you reproduce the issue with vanilla -rc1 and do you have a "known good" > kernel? I can try out -rc1, but IIRC this wasn't bisectible as the bug was present at least back to 4.14. I need to double check that as there were other issues that are resolved now so I may confuse things here. I've defintely seen the same bug with 4.18. Unfortunately I lost access to the machine as our data center seems to be moving currently so it might take some days until I can try -rc1. thanks, Jan
Re: [PATCH] EDAC, thunderx: Remove VLA usage
On Fri, Jun 29, 2018 at 11:48:50AM -0700, Kees Cook wrote: > In the quest to remove all stack VLA usage from the kernel[1], this > switches to using a kmalloc-allocated buffer instead of stack space. > This should be fine since the existing routine is allocating memory > too. > > [1] > https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com > > Cc: David Daney > Cc: Jan Glauber > Cc: Borislav Petkov > Cc: Mauro Carvalho Chehab > Cc: linux-e...@vger.kernel.org > Signed-off-by: Kees Cook Acked-by: Jan Glauber thanks, Jan > --- > drivers/edac/thunderx_edac.c | 14 +- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff --git a/drivers/edac/thunderx_edac.c b/drivers/edac/thunderx_edac.c > index 4803c6468bab..c009d94f40c5 100644 > --- a/drivers/edac/thunderx_edac.c > +++ b/drivers/edac/thunderx_edac.c > @@ -408,26 +408,29 @@ static ssize_t thunderx_lmc_inject_ecc_write(struct > file *file, >size_t count, loff_t *ppos) > { > struct thunderx_lmc *lmc = file->private_data; > - > unsigned int cline_size = cache_line_size(); > - > - u8 tmp[cline_size]; > + u8 *tmp; > void __iomem *addr; > unsigned int offs, timeout = 10; > > atomic_set(&lmc->ecc_int, 0); > > lmc->mem = alloc_pages_node(lmc->node, GFP_KERNEL, 0); > - > if (!lmc->mem) > return -ENOMEM; > > + tmp = kmalloc(cline_size, GFP_KERNEL); > + if (!tmp) { > + __free_pages(lmc->mem, 0); > + return -ENOMEM; > + } > + > addr = page_address(lmc->mem); > > while (!atomic_read(&lmc->ecc_int) && timeout--) { > stop_machine(inject_ecc_fn, lmc, NULL); > > - for (offs = 0; offs < PAGE_SIZE; offs += sizeof(tmp)) { > + for (offs = 0; offs < PAGE_SIZE; offs += cline_size) { > /* >* Do a load from the previously rigged location >* This should generate an error interrupt. > @@ -437,6 +440,7 @@ static ssize_t thunderx_lmc_inject_ecc_write(struct file > *file, > } > } > > + kfree(tmp); > __free_pages(lmc->mem, 0); > > return count; > -- > 2.17.1 > > > -- > Kees Cook > Pixel Security
Re: KASAN: use-after-scope in ext4_group_desc_csum
On Fri, Oct 05, 2018 at 05:32:07PM +0200, Dmitry Vyukov wrote: [...] > This all makes me think that somebody else has left these 0xf8 in > shadow before ext4_map_blocks started executing. > Unfortunately debugging garbage in stack shadow is not completely > trivial and there is no common recipe. I don't have setup to run arm64 > kernel at the moment. I would try to locate that garbage in stack > shadow earlier, e.g. calling another function before ext4_map_blocks, > implementing that function in mm/kasan/kasan.c (non-instrumented > itself) and then try to scan stack and verify presence of 0xf8 > garbage. If this works out, then try to catch garbage earlier and/or > try to figure out what function left that garbage (that's possible by > locating 0x41b58ab3 magic: > https://bugzilla.kernel.org/show_bug.cgi?id=198435). Thanks a lot for your analysis! I'll try to debug this further but as you pointed out it might be difficult to catch who writes beforehand to that location. --Jan
KASAN: use-after-scope in ext4_group_desc_csum
Hi, I'm getting below warning when I enable CONFIG_KASAN_EXTRA=y on a arm64 ThunderX2 system. As far as I can tell this is present since KASAN_EXTRA was introduced (4.16). [ 64.547333] == [ 64.561933] BUG: KASAN: use-after-scope in ext4_es_lookup_extent+0x130/0x980 [ 64.576105] Write of size 4 at addr 80222d81f0ec by task exe/4075 [ 64.592044] CPU: 102 PID: 4075 Comm: exe Not tainted 4.19.0-rc6-jang+ #29 [ 64.605690] Hardware name: To be filled by O.E.M. Saber/To be filled by O.E.M., BIOS 0ACKL018 03/30/2018 [ 64.624750] Call trace: [ 64.629666] dump_backtrace+0x0/0x360 [ 64.637024] show_stack+0x24/0x30 [ 64.643687] dump_stack+0x12c/0x1b4 [ 64.650699] print_address_description+0x68/0x2c8 [ 64.660152] kasan_report+0x130/0x300 [ 64.667509] __asan_store4+0x84/0xa8 [ 64.674693] ext4_es_lookup_extent+0x130/0x980 [ 64.683623] ext4_map_blocks+0xe0/0x990 [ 64.691330] _ext4_get_block+0x130/0x2b8 [ 64.699211] ext4_get_block+0x40/0x50 [ 64.706571] generic_block_bmap+0x104/0x178 [ 64.714977] ext4_bmap+0xc4/0x198 [ 64.721636] bmap+0x54/0x70 [ 64.727250] jbd2_journal_init_inode+0x2c/0x208 [ 64.736355] ext4_fill_super+0x5080/0x5c90 [ 64.744587] mount_bdev+0x1e0/0x228 [ 64.751597] ext4_mount+0x44/0x58 [ 64.758255] mount_fs+0x58/0x1b8 [ 64.764740] vfs_kern_mount.part.2+0xc0/0x2a8 [ 64.773495] do_mount+0x7a8/0x13e8 [ 64.780327] ksys_mount+0x9c/0x110 [ 64.787160] __arm64_sys_mount+0x70/0x88 [ 64.795043] el0_svc_handler+0xac/0x150 [ 64.802749] el0_svc+0x8/0xc [ 64.811521] The buggy address belongs to the page: [ 64.821149] page:7e0088b607c0 count:0 mapcount:0 mapping: index:0x0 [ 64.837249] flags: 0x1000() [ 64.844959] raw: 1000 7e0088b607c8 7e0088b607c8 [ 64.860527] raw: [ 64.876093] page dumped because: kasan: bad access detected [ 64.890278] Memory state around the buggy address: [ 64.899907] 80222d81ef80: f2 f2 f2 f2 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 [ 64.914426] 80222d81f000: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 [ 64.928945] >80222d81f080: f8 f8 f8 f8 f8 f8 f1 f1 f1 f1 f8 f8 f8 f8 00 f2 [ 64.943463] ^ [ 64.956759] 80222d81f100: f2 f2 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 [ 64.971278] 80222d81f180: f8 f8 f8 f8 f1 f1 f1 f1 00 00 00 f2 f8 f8 f8 f8 [ 64.985795] == [ 65.000312] Disabling lock debugging due to kernel taint [ 65.037509] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null) I'm not seeing any issues like filesystem corruption or misbehaviour that could be related the warning. Is this a false positive? Any thoughts? --Jan