Re: mm: WARNING in __delete_from_page_cache
On Tue, 2016-01-26 at 14:36 +0100, Jan Kara wrote: > On Tue 26-01-16 07:54:56, Matthew Wilcox wrote: > > On Tue, Jan 26, 2016 at 03:42:34AM +, Williams, Dan J wrote: > > > @@ -2907,7 +2912,12 @@ extern void replace_mount_options(struct > > > super_block *sb, char *options); > > > > > > static inline bool io_is_direct(struct file *filp) > > > { > > > - return (filp->f_flags & O_DIRECT) || > > > IS_DAX(file_inode(filp)); > > > > I think this should just be a one-liner: > > > > - return (filp->f_flags & O_DIRECT) || > > IS_DAX(file_inode(filp)); > > + return (filp->f_flags & O_DIRECT) || IS_DAX(filp- > > >f_mapping->host); > > > > This does the right thing for block device inodes and filesystem > > inodes. > > (see the opening stanzas of __dax_fault for an example). > > Ah, right. This looks indeed better. > Oh, yeah, looks good. 8< (git am --scissors) Subject: fs, block: force direct-I/O for dax-enabled block devices From: Dan Williams Similar to the file I/O path, re-direct all I/O to the DAX path for I/O to a block-device special file. Both regular files and device special files can use the common filp->f_mapping->host lookup to determing is DAX is enabled. Otherwise, we confuse the DAX code that does not expect to find live data in the page cache: [ cut here ] WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217 __delete_from_page_cache+0x9f6/0xb60() Modules linked in: CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 88006d3f7738 82999e2d 8800620a 86473d20 88006d3f7778 81352089 81658d36 86473d20 00d9 ea009d60 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0x6f/0xa2 lib/dump_stack.c:50 [] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482 [] warn_slowpath_null+0x29/0x30 kernel/panic.c:515 [] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217 [] delete_from_page_cache+0x112/0x200 mm/filemap.c:244 [] __dax_fault+0x859/0x1800 fs/dax.c:487 [] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730 [< inline >] wp_pfn_shared mm/memory.c:2208 [] do_wp_page+0xc85/0x14f0 mm/memory.c:2307 [< inline >] handle_pte_fault mm/memory.c:3323 [< inline >] __handle_mm_fault mm/memory.c:3417 [] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446 [] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238 [] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331 [] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264 [] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986 [] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185 ---[ end trace dae21e0f85f1f98c ]--- Cc: Ross Zwisler Fixes: 5a023cdba50c ("block: enable dax for raw block devices") Reported-by: Dmitry Vyukov Reported-by: Kirill A. Shutemov Suggested-by: Jan Kara Reviewed-by: Jan Kara Suggested-by: Matthew Wilcox Signed-off-by: Dan Williams --- include/linux/fs.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 1a2046275cdf..b10002d4a5f5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2907,7 +2907,7 @@ extern void replace_mount_options(struct super_block *sb, char *options); static inline bool io_is_direct(struct file *filp) { - return (filp->f_flags & O_DIRECT) || IS_DAX(file_inode(filp)); + return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host); } static inline int iocb_flags(struct file *file)
Re: [PATCH] spi: atmel: fix handling of cs_change set on non-last xfer
Le 18/01/2016 02:25, Mans Rullgard a écrit : > The driver does the wrong thing when cs_change is set on a non-last > xfer in a message. When cs_change is set, the driver deactivates the > CS and leaves it off until a later xfer again has cs_change set whereas > it should be briefly toggling CS off and on again. > > This patch brings the behaviour of the driver back in line with the > documentation and common sense. The delay of 10 us is the same as is > used by the default spi_transfer_one_message() function in spi.c. > > Fixes: 8090d6d1a415 ("spi: atmel: Refactor spi-atmel to use SPI framework > queue") > Signed-off-by: Mans Rullgard Hi Mans, Yes, it seems to be a sensible thing to do: Acked-by: Nicolas Ferre Thanks, best regards. > --- > drivers/spi/spi-atmel.c | 10 +++--- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/drivers/spi/spi-atmel.c b/drivers/spi/spi-atmel.c > index aebad36391c9..4b8ccf1f897e 100644 > --- a/drivers/spi/spi-atmel.c > +++ b/drivers/spi/spi-atmel.c > @@ -315,7 +315,6 @@ struct atmel_spi { > struct atmel_spi_dmadma; > > boolkeep_cs; > - boolcs_active; > > u32 fifo_size; > }; > @@ -1406,11 +1405,9 @@ static int atmel_spi_one_transfer(struct spi_master > *master, >&msg->transfers)) { > as->keep_cs = true; > } else { > - as->cs_active = !as->cs_active; > - if (as->cs_active) > - cs_activate(as, msg->spi); > - else > - cs_deactivate(as, msg->spi); > + cs_deactivate(as, msg->spi); > + udelay(10); > + cs_activate(as, msg->spi); > } > } > > @@ -1433,7 +1430,6 @@ static int atmel_spi_transfer_one_message(struct > spi_master *master, > atmel_spi_lock(as); > cs_activate(as, spi); > > - as->cs_active = true; > as->keep_cs = false; > > msg->status = 0; > -- Nicolas Ferre
Re: [PATCH V4 16/16] ARM64: tegra: select PM_GENERIC_DOMAINS
Hi Arnd, Ulf, On 14/01/16 17:16, Jon Hunter wrote: > > On 14/01/16 09:21, Arnd Bergmann wrote: >> On Thursday 14 January 2016 09:57:14 Ulf Hansson wrote: >>> On 13 January 2016 at 21:43, Arnd Bergmann wrote: On Wednesday 13 January 2016 18:03:24 Thierry Reding wrote: > On Fri, Dec 04, 2015 at 02:57:17PM +, Jon Hunter wrote: >> Enable PM_GENERIC_DOMAINS for tegra 64-bit devices. To ensure that >> devices >> dependent upon a particular power-domain are only probed when that power >> domain has been powered up, requires that PM is made mandatory for tegra >> 64-bit devices and so select this option for tegra as well. >> >> Signed-off-by: Jon Hunter >> --- >> arch/arm64/Kconfig.platforms | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms >> index 9806324fa215..e0b5bd0aff0f 100644 >> --- a/arch/arm64/Kconfig.platforms >> +++ b/arch/arm64/Kconfig.platforms >> @@ -93,6 +93,8 @@ config ARCH_TEGRA >> select GENERIC_CLOCKEVENTS >> select HAVE_CLK >> select PINCTRL >> + select PM >> + select PM_GENERIC_DOMAINS >> select RESET_CONTROLLER >> help >> This enables support for the NVIDIA Tegra SoC family. > > This has potential consequences for multi-platform builds, doesn't it? > All of a sudden any combination of builds that includes Tegra won't be > possible to build without PM support. > > Adding linux-arm-ker...@lists.infradead.org for visibility. > > Agreed, it would be better to add 'depends on PM_GENERIC_DOMAINS' dependencies in the drivers that require it. >>> >>> The problem with that approach is that if those drivers are cross SoC >>> drivers. In some cases PM isn't needed and it is. >>> >>> Of course I don't have the in depth knowledge about the drivers being >>> used in Tegra which may need PM, perhaps it's not that many? >>> >>> Anyway, to me it seems like ARCH_TEGRA should depend on PM instead. >>> Would that work? >> >> That seems a little over-restrictive, as it prevents you from >> building a tegra kernel even if none of the drivers that rely >> on the pm domains are used, but it would work. >> >> I've looked again at how other platforms (on arm32) do it, and >> a lot of them use "select PM_GENERIC_DOMAINS if PM", so they don't >> automatically enable PM, but they enable the pmdomain code if >> PM is already set. No driver really "depends on PM_GENERIC_DOMAINS", >> so we shouldn't really start that now or we end up with circular >> dependencies in the long run. > > What I am not a fan of in the current gen-pd implementation, is if we > have !PM but the platform has power-domains, then there is no way to > determine if a device within a power-domain can be probed safely. Some > arm platforms force all the power-domains on during early init in the > case of !PM. IMO this is still not ideal, because if a power-domain > failed to turn on during early init, then you should probably call > BUG(). Ideally the kernel should be able to boot and only probe the > devices you know that can be probed safely. > > So for platforms have use PM_GENERIC_DOMAINS, I think really they should > select PM and not "select PM_GENERIC_DOMAINS if PM". IMO, "select > PM_GENERIC_DOMAINS if PM" seems fragile. Any more thoughts on this? I have been discussing with Thierry and we think that selecting PM for tegra still makes the most sense. The question is, is this ok for multi-configs? The only other suggestion/thought I have is to allow PM_GENERIC_DOMAINS_OF to be selected independently of PM so that we can have minimal support for PM domains that allows you to register PM domains with the kernel and their current state, but does not allow you to control them, etc. This way tegra could always select PM_GENERIC_DOMAINS_OF regardless of PM, and we would be able to determine if we can probe a device safely. I am not sure that Rafael is too keen on this approach but that is the only alternative I have come up with. I have a rough outline of a patch for this here [0] FWIW. Cheers Jon [0] https://github.com/jonhunter/linux/commits/gpd
Re: [PATCH 10/10] vfio: allow the user to register reserved iova range for MSI mapping
Hi Eric, [auto build test ERROR on v4.5-rc1] [also build test ERROR on next-20160125] [cannot apply to iommu/next] [if your patch is applied to the wrong git tree, please drop us a note to help improving the system] url: https://github.com/0day-ci/linux/commits/Eric-Auger/KVM-PCIe-MSI-passthrough-on-ARM-ARM64/20160126-211921 config: x86_64-randconfig-s3-01262306 (attached as .config) reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): ERROR: "alloc_iova" [drivers/vfio/vfio_iommu_type1.ko] undefined! >> ERROR: "init_iova_domain" [drivers/vfio/vfio_iommu_type1.ko] undefined! >> ERROR: "put_iova_domain" [drivers/vfio/vfio_iommu_type1.ko] undefined! ERROR: "free_iova" [drivers/vfio/vfio_iommu_type1.ko] undefined! --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH] ubsan: fix tree-wide -Wmaybe-uninitialized false positives
On 01/26/2016 12:41 AM, Andrew Morton wrote: > On Mon, 25 Jan 2016 19:01:34 +0300 Andrey Ryabinin > wrote: > >> -fsanitize=* options makes GCC less smart than usual and increase number >> of 'maybe-uninitialized' false-positives. So this patch does two things: >> * Add -Wno-maybe-uninitialized to CFLAGS_UBSAN which will disable all >>such warnings for instrumented files. >> * Remove CONFIG_UBSAN_SANITIZE_ALL from all[yes|mod]config builds. So >>the all[yes|mod]config build goes without -fsanitize=* and still with >>-Wmaybe-uninitialized. > > hm, that's a bit sad. > > We have no means of working out whether we should re-enable > maybe-uninitialized for later gcc's, as they become smarter about this. > What do we do, just "remember" to try it later on? > I don't see anything bad about it. Note, that CONFIG_UBSAN_SANITIZE_ALL=y *only* adds -fsanitize=* to CFLAGS and this patch removes only CONFIG_UBSAN_SANITIZE_ALL from allyesconfig, but not the CONFIG_UBSAN. So now, we do allyesconfig build without CONFIG_UBSAN_SANITIZE_ALL (iow without -fsantize=*), but still with CONFIG_UBSAN=y. Which means that we still build lib/ubsan.c (and with -Wmaybe-uninitialized). > Do you know if this issue is on the gcc developer' radar? > I don't know, but it's unlikely that something will be changed here. -Wmaybe-uninitialized will always be prone to false-positives, simply by definition of it(if GCC could prove that variable is uninitialized it will issue another warning -Wuninitialized). And since -fsanitize=* causes significant changes in generated code, the influence on -Wmaybe-uninitialized likely will retain.
[PATCH v2] Add hard/soft lockup debugger entry points
This patch adds an export which can be set by system debuggers to direct the hard lockup and soft lockup detector to trigger a breakpoint exception and enter a debugger if one is active. It is assumed that if someone sets this variable, then an breakpoint handler of some sort will be actively loaded or registered via the notify die handler chain. This addition is extremely useful for debugging hard and soft lockups real time and quickly from a console debugger. Signed-off-by: Jeff Merkey --- kernel/watchdog.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index b3ace6e..c28e58c 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -23,6 +23,7 @@ #include #include +#include #include #include #include @@ -108,6 +109,9 @@ static DEFINE_PER_CPU(struct perf_event *, watchdog_ev); #endif static unsigned long soft_lockup_nmi_warn; +int debug_watchdog_lockups; +EXPORT_SYMBOL_GPL(debug_watchdog_lockups); + /* boot commands */ /* * Should we panic when a soft-lockup or hard-lockup occurs: @@ -358,6 +362,9 @@ static void watchdog_overflow_callback(struct perf_event *event, else dump_stack(); + if (debug_watchdog_lockups) + arch_kgdb_breakpoint(); + /* * Perform all-CPU dump only once to avoid multiple hardlockups * generating interleaving traces @@ -478,6 +485,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) else dump_stack(); + if (debug_watchdog_lockups) + arch_kgdb_breakpoint(); + if (softlockup_all_cpu_backtrace) { /* Avoid generating two back traces for current * given that one is already made above -- 1.8.3.1
Re: [PATCH v4 2/6] drm/dsi: Refactor device creation
On 1/21/2016 9:16 PM, Thierry Reding wrote: On Thu, Dec 10, 2015 at 06:11:36PM +0530, Archit Taneja wrote: Simplify the mipi dsi device creation process. device_initialize and "MIPI" and "DSI", please. Sure, I'll replace with these and in the other patches. device_add don't need to be called separately when creating mipi_dsi_device's. Use device_register instead to simplify things. Create a helper function mipi_dsi_device_new which takes in struct mipi_dsi_device_info and mipi_dsi_host. It clubs the functions mipi_dsi_device_alloc and mipi_dsi_device_add into one. mipi_dsi_device_info acts as a template to populate the dsi device information. This is populated by of_mipi_dsi_device_add and passed to mipi_dsi_device_new. Later on, we'll provide mipi_dsi_device_new as a standalone way to create a dsi device not available via DT. The new device creation process tries to closely follow what's been done in i2c_new_device in i2c-core. Reviewed-by: Andrzej Hajda Signed-off-by: Archit Taneja --- drivers/gpu/drm/drm_mipi_dsi.c | 61 +- include/drm/drm_mipi_dsi.h | 15 +++ 2 files changed, 40 insertions(+), 36 deletions(-) To be honest, I'm not sure I like this. If you want to have a simpler helper, why not implement it using the lower-level helpers. Really the only thing you're doing here is add a high-level helper that takes an info struct, whereas previously the same would be done by storing the info directly in the structure between allocation and addition of the device. Initially the implementation was following that of platform devices, I see no reason to deviate from that. What you want here can easily be I don't see why we need to call device_initialize and device_add separately for DSI devices. From my (limited) understanding, we should call these separately if we want to take a reference (using get_device()), or set up some private data before the bus's notifier kicks in. Since the main purpose of the series is not to simplify the device creation code, I can drop this. done by something like: struct mipi_dsi_device * mipi_dsi_device_register_full(struct mipi_dsi_host *host, const struct mipi_dsi_device_info *info) { struct mipi_dsi_device *dsi; dsi = mipi_dsi_device_alloc(host); if (IS_ERR(dsi)) return dsi; dsi->dev.of_node = info->node; dsi->channel = info->channel; err = mipi_dsi_device_add(dsi); if (err < 0) { ... } return dsi; } Thierry This does look less intrusive. I'll consider switching to this. Thanks, Archit -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
[PATCH] scripts/dtc: Update to upstream commit b06e55c88b9b
Sync to upstream dtc commit b06e55c88b9b ("Prevent crash on modulo by zero"). This adds the following commits from upstream: b06e55c Prevent crash on modulo by zero b433450 Fix some bugs in processing of line directives d728ad5 Fix crash on nul character in string escape sequence 1ab2205 Gracefully handle bad octal literals 1937095 Prevent crash on division by zero d0b3ab0 libfdt: Fix undefined behaviour in fdt_offset_ptr() d4c7c25 libfdt: check for potential overrun in _fdt_splice() f58799b libfdt: Add some missing symbols to version.lds af9f26d Remove duplicated -Werror in dtc Makefile 604e61e fdt: Add functions to retrieve strings 8702bd1 fdt: Add a function to get the index of a string 2218387 fdt: Add a function to count strings 554fde2 libfdt: fix comment block of fdt_get_property_namelen() e5e6df7 fdtdump: Fix bug printing bytestrings with negative values 067829e Remove redundant fdtdump test code 897a429 Move fdt_path_offset alias tests to right tests section 2d1417c Add simple .travis.yml f6dbc6c guess output file format 5e78dff guess input file format based on file content or file name 8b927bf tests: convert `echo -n` to `printf` 64c46b0 Fix crash with poorly defined #size-cells Cc: Frank Rowand Cc: Grant Likely Signed-off-by: Rob Herring --- scripts/dtc/checks.c | 2 +- scripts/dtc/dtc-lexer.l | 39 +- scripts/dtc/dtc-lexer.lex.c_shipped | 101 +++ scripts/dtc/dtc-parser.tab.c_shipped | 84 + scripts/dtc/dtc-parser.y | 20 ++- scripts/dtc/dtc.c| 62 - scripts/dtc/libfdt/fdt.c | 13 ++--- scripts/dtc/libfdt/fdt_ro.c | 100 ++ scripts/dtc/libfdt/fdt_rw.c | 2 + scripts/dtc/libfdt/libfdt.h | 73 +++-- scripts/dtc/util.c | 3 +- scripts/dtc/version_gen.h| 2 +- 12 files changed, 390 insertions(+), 111 deletions(-) Generated from script, so just sending the diffstat and log for review. Rob
Re: [PATCH 2/3] input: touchscreen: ad7879: fix default x/y axis assignment
On 2016-01-25 23:58, Michael Hennerich wrote: > On 01/26/2016 04:04 AM, Stefan Agner wrote: >> The measurements read from the controller which are temporary stored >> in conversion_data, are interpreted wrong. The first measurement X+ >> contains the Y position, and the second measurement Y+ the X position >> (see also Table 11 Register Table in the data sheet). >> >> The problem is already known and a swap option has been introduced: >> commit 6680884a4420 ("Input: ad7879 - add option to correct xy axis") >> >> However, with that the meaning of the new boolean is inverted since >> the underlying values are already swapped. With this change, a true >> in swap_xy actually swaps the two axis. >> >> Signed-off-by: Stefan Agner >> --- >> Hi Michael, >> >> It seems that swap_xy is not used in any board which is in mainline, >> hence swap_xy is always false. Therefore, up until now all boards >> actually used swapped axis. However, I doubt that the blackfin boards >> really have those axis swapped, it is probably more likely that the >> userspace calibration took care of it. >> >> However, if they are really swapped, we should set the swap_xy flag >> to 1 for those board... >> >> Do you happen to now what is the case with those boards? >> > > > Hi Stefan, > > I would be hesitant to invert the default behaviour of the driver. > Too many people in the field already using it as it is. Afaik, we should be able to change in-kernel API's (especially if they are wrong) since we do not guarantee any API... > > A XY swap can have multiple reasons. > > Lot's of small VGA/QVGA TFTs have the option to switch the scan > direction from Landscape to Portrait. In addition you can also rotate > and flip or mirror using VDMA options. So it really depends on the use > case, how the touch panel is mounted to the screen or how it is wired. Ok, I see the reason for that functionality. I am mainly concerned about the new DT bindings. The touchscreen binding documents specify touchscreen-swapped-x-y, see: https://www.kernel.org/doc/Documentation/devicetree/bindings/input/touchscreen/touchscreen.txt I would like to make sure that this property is really swapping axis (and not necessary if the hardware is implemented according to the datasheet...) We could also implement a workaround to keep the platform data behavior as is (invert the swap_xy flag)... -- Stefan >> -- >> Stefan >> >> drivers/input/touchscreen/ad7879.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/input/touchscreen/ad7879.c >> b/drivers/input/touchscreen/ad7879.c >> index a73934b..e290e7b 100644 >> --- a/drivers/input/touchscreen/ad7879.c >> +++ b/drivers/input/touchscreen/ad7879.c >> @@ -94,8 +94,8 @@ >> #define AD7879_TEMP_BIT(1<<1) >> >> enum { >> -AD7879_SEQ_XPOS = 0, >> -AD7879_SEQ_YPOS = 1, >> +AD7879_SEQ_YPOS = 0, >> +AD7879_SEQ_XPOS = 1, >> AD7879_SEQ_Z1= 2, >> AD7879_SEQ_Z2= 3, >> AD7879_NR_SENSE = 4, >>
Re: [RFC PATCH] mm: support CONFIG_ZONE_DEVICE + CONFIG_ZONE_DMA
On Mon, Jan 25, 2016 at 10:00 PM, Sudip Mukherjee wrote: > On Mon, Jan 25, 2016 at 04:06:40PM -0800, Dan Williams wrote: >> It appears devices requiring ZONE_DMA are still prevalent (see link >> below). For this reason the proposal to require turning off ZONE_DMA to >> enable ZONE_DEVICE is untenable in the short term. We want a single >> kernel image to be able to support legacy devices as well as next >> generation persistent memory platforms. >> >> Towards this end, alias ZONE_DMA and ZONE_DEVICE to work around needing >> to maintain a unique zone number for ZONE_DEVICE. Record the geometry >> of ZONE_DMA at init (->init_spanned_pages) and use that information in >> is_zone_device_page() to differentiate pages allocated via >> devm_memremap_pages() vs true ZONE_DMA pages. Otherwise, use the >> simpler definition of is_zone_device_page() when ZONE_DMA is turned off. >> >> Note that this also teaches the memory hot remove path that the zone may >> not have sections for all pfn spans (->zone_dyn_start_pfn). >> >> A user visible implication of this change is potentially an unexpectedly >> high "spanned" value in /proc/zoneinfo for the DMA zone. >> >> Cc: H. Peter Anvin >> Cc: Ingo Molnar >> Cc: Rik van Riel >> Cc: Mel Gorman >> Cc: Jerome Glisse >> Cc: Christoph Hellwig >> Cc: Dave Hansen >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=110931 >> Fixes: 033fbae988fc ("mm: ZONE_DEVICE for "device memory"") >> Reported-by: Sudip Mukherjee > > It should actually be Reported-by: Mark > > Hi Mark, > Can you please test this patch available at > https://patchwork.kernel.org/patch/8116991/ > in your setup.. Note this patch is on top of 4.5-rc1 and is likely not a suitable for -stable backport to 4.3/4.4. For 4.3 and 4.4, distributions that want to support legacy devices should leave ZONE_DEVICE disabled as it is by default.
Re: [PATCH V8 20/23] perf tools: making function set_max_cpu_num() non static
On 25 January 2016 at 14:29, Arnaldo Carvalho de Melo wrote: > Em Mon, Jan 25, 2016 at 06:12:42PM -0300, Arnaldo Carvalho de Melo escreveu: >> Em Mon, Jan 25, 2016 at 01:46:22PM -0700, Mathieu Poirier escreveu: >> > On 14 January 2016 at 14:46, Mathieu Poirier >> > wrote: >> > > For memory allocation purposes, code located in other places >> > > then util/cpumap.c may want to know how many CPUs the system has. >> > > >> > > This patch is making function set_max_cpu_num() available to >> > > other parts of the perf tool so that global variable >> > > 'max_cpu_num' gets the right value when referenced by cpu__max_cpu(). >> > > >> > > Cc: Peter Zijlstra >> > > Cc: Ingo Molnar >> > > Cc: Arnaldo Carvalho de Melo >> > > Signed-off-by: Mathieu Poirier >> > > --- >> > > tools/perf/util/cpumap.c | 2 +- >> > > tools/perf/util/cpumap.h | 1 + >> > > 2 files changed, 2 insertions(+), 1 deletion(-) >> > > >> > > diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c >> > > index 10af1e7524fb..ae179320c0c0 100644 >> > > --- a/tools/perf/util/cpumap.c >> > > +++ b/tools/perf/util/cpumap.c >> > > @@ -380,7 +380,7 @@ out: >> > > } >> > > >> > > /* Determine highest possible cpu in the system for sparse allocation */ >> > > -static void set_max_cpu_num(void) >> > > +void set_max_cpu_num(void) >> > > { >> > > const char *mnt; >> > > char path[PATH_MAX]; >> > > diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h >> > > index 85f7772457fa..45fa963345eb 100644 >> > > --- a/tools/perf/util/cpumap.h >> > > +++ b/tools/perf/util/cpumap.h >> > > @@ -14,6 +14,7 @@ struct cpu_map { >> > > int map[]; >> > > }; >> > > >> > > +void set_max_cpu_num(void); >> > > struct cpu_map *cpu_map__new(const char *cpu_list); >> > > struct cpu_map *cpu_map__empty_new(int nr); >> > > struct cpu_map *cpu_map__dummy_new(void); >> > > -- >> > > 2.1.4 >> > > >> > >> > Arnaldo, >> > >> > I can't queue this patch for 4.6 without at least a reviewed by from you. >> >> This one I remember, looks ugly, the name set_max_cpu_num() looks >> strange, when that was restricted (static) to that cpumap.c file, it >> wasn't a problem, exporting it for wider usage looks bad. >> >> You've been waiting for this for quite a while, it seems, lemme stop >> what I am doing to check this... > > So, please check the patch below, what you need then is just to use > cpu__max_cpu(). I like your approach - thanks for the review. I will spin V9 when I have received Adrian's comments. Mathieu > > - Arnaldo > > diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c > index fa935093a599..9bcf2bed3a6d 100644 > --- a/tools/perf/util/cpumap.c > +++ b/tools/perf/util/cpumap.c > @@ -8,6 +8,10 @@ > #include > #include "asm/bug.h" > > +static int max_cpu_num; > +static int max_node_num; > +static int *cpunode_map; > + > static struct cpu_map *cpu_map__default_new(void) > { > struct cpu_map *cpus; > @@ -486,6 +490,32 @@ out: > pr_err("Failed to read max nodes, using default of %d\n", > max_node_num); > } > > +int cpu__max_node(void) > +{ > + if (unlikely(!max_node_num)) > + set_max_node_num(); > + > + return max_node_num; > +} > + > +int cpu__max_cpu(void) > +{ > + if (unlikely(!max_cpu_num)) > + set_max_cpu_num(); > + > + return max_cpu_num; > +} > + > +int cpu__get_node(int cpu) > +{ > + if (unlikely(cpunode_map == NULL)) { > + pr_debug("cpu_map not initialized\n"); > + return -1; > + } > + > + return cpunode_map[cpu]; > +} > + > static int init_cpunode_map(void) > { > int i; > diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h > index 71c41b9efabb..81a2562aaa2b 100644 > --- a/tools/perf/util/cpumap.h > +++ b/tools/perf/util/cpumap.h > @@ -57,37 +57,11 @@ static inline bool cpu_map__empty(const struct cpu_map > *map) > return map ? map->map[0] == -1 : true; > } > > -int max_cpu_num; > -int max_node_num; > -int *cpunode_map; > - > int cpu__setup_cpunode_map(void); > > -static inline int cpu__max_node(void) > -{ > - if (unlikely(!max_node_num)) > - pr_debug("cpu_map not initialized\n"); > - > - return max_node_num; > -} > - > -static inline int cpu__max_cpu(void) > -{ > - if (unlikely(!max_cpu_num)) > - pr_debug("cpu_map not initialized\n"); > - > - return max_cpu_num; > -} > - > -static inline int cpu__get_node(int cpu) > -{ > - if (unlikely(cpunode_map == NULL)) { > - pr_debug("cpu_map not initialized\n"); > - return -1; > - } > - > - return cpunode_map[cpu]; > -} > +int cpu__max_node(void); > +int cpu__max_cpu(void); > +int cpu__get_node(int cpu); > > int cpu_map__build_map(struct cpu_map *cpus, struct cpu_map **res, >int (*f)(struct cpu_map *map, int cpu, void *data),
Re: [PATCH 6/7] [media] em28xx: add MEDIA_TUNER dependency
Em Tue, 26 Jan 2016 17:51:11 +0100 Arnd Bergmann escreveu: > On Tuesday 26 January 2016 14:36:44 Mauro Carvalho Chehab wrote: > > Em Tue, 26 Jan 2016 16:53:38 +0100 > > Arnd Bergmann escreveu: > > > On Tuesday 26 January 2016 12:33:08 Mauro Carvalho Chehab wrote: > > > > Em Tue, 26 Jan 2016 15:10:00 +0100 > > > > Advanced users may, instead, manually select the media tuner that his > > > > hardware needs. In such case, it doesn't matter if MEDIA_TUNER > > > > is enabled or not. > > > > > > > > As this is due to a Kconfig limitation, I've no idea how to fix or get > > > > hid of it, but making em28xx dependent of MEDIA_TUNER is wrong. > > > > > > I don't understand what limitation you see here. > > > > Before MEDIA_TUNER, what we had was something like: > > > > config MEDIA_driver_foo > > select VIDEO_tuner_bar if MEDIA_SUBDRV_AUTOSELECT > > select MEDIA_frontend_foobar if MEDIA_SUBDRV_AUTOSELECT > > ... > > > > However, as different I2C drivers had different dependencies, this > > used to cause lots of troubles. So, one of the Kbuild maintainers > > came out with the idea of converting from select into depends on. > > The MEDIA_TUNER is just an ancillary invisible option to make it > > work at the tuner's side, as usually what we want is to have all > > tuners selected, as we don't have a one to one mapping about what > > driver supports what tuner (nor we wanted to do it, as this would > > mean lots of work for not much gain). > > Ok > > > > The definition > > > of the VIDEO_TUNER symbol is an empty 'tristate' symbol with a > > > dependency on MEDIA_TUNER to ensure we get a warning if MEDIA_TUNER > > > is not enabled, and to ensure it is set to 'm' if MEDIA_TUNER=m and > > > a "bool" driver selects VIDEO_TUNER. > > > > No, VIDEO_TUNER is there because we wanted to be able to use select > > to enable V4L2 tuner core support and let people to manually select > > the needed I2C devices with MEDIA_SUBDRV_AUTOSELECT unselected. > > I meant what the dependency is there for, not the symbol itself. > It's clear what the symbol does. > > > > diff --git a/drivers/media/v4l2-core/Kconfig > > > b/drivers/media/v4l2-core/Kconfig > > > index 9beece00869b..1050bdf1848f 100644 > > > --- a/drivers/media/v4l2-core/Kconfig > > > +++ b/drivers/media/v4l2-core/Kconfig > > > @@ -37,7 +37,11 @@ config VIDEO_PCI_SKELETON > > > # Used by drivers that need tuner.ko > > > config VIDEO_TUNER > > > tristate > > > - depends on MEDIA_TUNER > > > + > > > +config VIDEO_TUNER_MODULE > > > + tristate # must not be built-in if MEDIA_TUNER=m because of I2C > > > + default y if VIDEO_TUNER=y || MEDIA_TUNER=y > > > + default m if VIDEO_TUNER=m > > > > Doesn't need to worry about that, because all drivers that select > > VIDEO_TUNER > > depend on I2C: > > > > Ok, then the dependency does not do anything other than generate a > warning. > > > diff --git a/drivers/media/v4l2-core/Kconfig > > b/drivers/media/v4l2-core/Kconfig > > index 9beece00869b..b30e1c879a57 100644 > > --- a/drivers/media/v4l2-core/Kconfig > > +++ b/drivers/media/v4l2-core/Kconfig > > @@ -37,7 +37,7 @@ config VIDEO_PCI_SKELETON > > # Used by drivers that need tuner.ko > > config VIDEO_TUNER > > tristate > > - depends on MEDIA_TUNER > > + default MEDIA_TUNER > > > > # Used by drivers that need v4l2-mem2mem.ko > > config V4L2_MEM2MEM_DEV > > So this means it's now enabled if MEDIA_TUNER is enabled, whereas > before it was only enabled when explicitly selected. That sounds > like a useful change, but it also seems unrelated to the warning > fix, and should probably be a separate change, while for now > we can simply remove the 'depends on' line without any replacement. True. > > Ok, if we'll have platform drivers for analog TV using the I2C bus > > at directly in SoC, then your solution is better, but the tuner core > > driver may not be the best way of doing it. So, for now, I would use > > the simpler version. > > Ok. Do you want me to submit a new version or do you prefer to write > one yourself? With or without the 'default'? Feel free to submit a new version without the default. Thanks! Mauro
Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
On Tue, 2016-01-26 at 10:26 -0600, Christoph Lameter wrote: > On Tue, 26 Jan 2016, Mike Galbraith wrote: > > > > Why would the deferring cause this overhead? > > > > Because we schedule to idle cores aggressively, thus we may pop in and > > out of idle at high frequency. > > Whats the point of going idle if you have things to do soon? When a task schedules off, how do you know it'll be back at all, much less soon? -Mike
[PATCH v4 03/22] arm64: pgtable: implement static [pte|pmd|pud]_offset variants
The page table accessors pte_offset(), pud_offset() and pmd_offset() rely on __va translations, so they can only be used after the linear mapping has been installed. For the early fixmap and kasan init routines, whose page tables are allocated statically in the kernel image, these functions will return bogus values. So implement pte_offset_kimg(), pmd_offset_kimg() and pud_offset_kimg(), which can be used instead before any page tables have been allocated dynamically. Reviewed-by: Mark Rutland Signed-off-by: Ard Biesheuvel --- arch/arm64/include/asm/pgtable.h | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 2fb94ef881f5..e9eaf6e9262d 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -446,6 +446,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd) #define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK)) +/* use ONLY for statically allocated translation tables */ +#define pte_offset_kimg(dir,addr) ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr + /* * Conversion functions: convert a page and protection to a page entry, * and a page entry and page directory to the page they refer to. @@ -489,6 +492,9 @@ static inline phys_addr_t pud_page_paddr(pud_t pud) #define pud_page(pud) pfn_to_page(__phys_to_pfn(pud_val(pud) & PHYS_MASK)) +/* use ONLY for statically allocated translation tables */ +#define pmd_offset_kimg(dir,addr) ((pmd_t *)__phys_to_kimg(pmd_offset_phys((dir), (addr + #else #define pud_page_paddr(pud)({ BUILD_BUG(); 0; }) @@ -498,6 +504,8 @@ static inline phys_addr_t pud_page_paddr(pud_t pud) #define pmd_set_fixmap_offset(pudp, addr) ((pmd_t *)pudp) #define pmd_clear_fixmap() +#define pmd_offset_kimg(dir,addr) ((pmd_t *)dir) + #endif /* CONFIG_PGTABLE_LEVELS > 2 */ #if CONFIG_PGTABLE_LEVELS > 3 @@ -536,6 +544,9 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd) #define pgd_page(pgd) pfn_to_page(__phys_to_pfn(pgd_val(pgd) & PHYS_MASK)) +/* use ONLY for statically allocated translation tables */ +#define pud_offset_kimg(dir,addr) ((pud_t *)__phys_to_kimg(pud_offset_phys((dir), (addr + #else #define pgd_page_paddr(pgd)({ BUILD_BUG(); 0;}) @@ -545,6 +556,8 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd) #define pud_set_fixmap_offset(pgdp, addr) ((pud_t *)pgdp) #define pud_clear_fixmap() +#define pud_offset_kimg(dir,addr) ((pud_t *)dir) + #endif /* CONFIG_PGTABLE_LEVELS > 3 */ #define pgd_ERROR(pgd) __pgd_error(__FILE__, __LINE__, pgd_val(pgd)) -- 2.5.0
[PATCH v4 07/22] arm64: move kernel image to base of vmalloc area
This moves the module area to right before the vmalloc area, and moves the kernel image to the base of the vmalloc area. This is an intermediate step towards implementing KASLR, which allows the kernel image to be located anywhere in the vmalloc area. Signed-off-by: Ard Biesheuvel --- arch/arm64/include/asm/kasan.h | 2 +- arch/arm64/include/asm/memory.h | 21 +++-- arch/arm64/include/asm/pgtable.h | 10 +- arch/arm64/mm/dump.c | 12 +-- arch/arm64/mm/init.c | 23 ++--- arch/arm64/mm/kasan_init.c | 18 +++- arch/arm64/mm/mmu.c | 97 +--- 7 files changed, 116 insertions(+), 67 deletions(-) diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h index de0d21211c34..71ad0f93eb71 100644 --- a/arch/arm64/include/asm/kasan.h +++ b/arch/arm64/include/asm/kasan.h @@ -14,7 +14,7 @@ * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses. */ #define KASAN_SHADOW_START (VA_START) -#define KASAN_SHADOW_END(KASAN_SHADOW_START + (1UL << (VA_BITS - 3))) +#define KASAN_SHADOW_END(KASAN_SHADOW_START + KASAN_SHADOW_SIZE) /* * This value is used to map an address to the corresponding shadow diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index aebc739f5a11..4388651d1f0d 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -45,16 +45,15 @@ * VA_START - the first kernel virtual address. * TASK_SIZE - the maximum size of a user space task. * TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area. - * The module space lives between the addresses given by TASK_SIZE - * and PAGE_OFFSET - it must be within 128MB of the kernel text. */ #define VA_BITS(CONFIG_ARM64_VA_BITS) #define VA_START (UL(0x) << VA_BITS) #define PAGE_OFFSET(UL(0x) << (VA_BITS - 1)) -#define KIMAGE_VADDR (PAGE_OFFSET) -#define MODULES_END(KIMAGE_VADDR) -#define MODULES_VADDR (MODULES_END - SZ_64M) -#define PCI_IO_END (MODULES_VADDR - SZ_2M) +#define KIMAGE_VADDR (MODULES_END) +#define MODULES_END(MODULES_VADDR + MODULES_VSIZE) +#define MODULES_VADDR (VA_START + KASAN_SHADOW_SIZE) +#define MODULES_VSIZE (SZ_64M) +#define PCI_IO_END (PAGE_OFFSET - SZ_2M) #define PCI_IO_START (PCI_IO_END - PCI_IO_SIZE) #define FIXADDR_TOP(PCI_IO_START - SZ_2M) #define TASK_SIZE_64 (UL(1) << VA_BITS) @@ -72,6 +71,16 @@ #define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 4)) /* + * The size of the KASAN shadow region. This should be 1/8th of the + * size of the entire kernel virtual address space. + */ +#ifdef CONFIG_KASAN +#define KASAN_SHADOW_SIZE (UL(1) << (VA_BITS - 3)) +#else +#define KASAN_SHADOW_SIZE (0) +#endif + +/* * Physical vs virtual RAM address space conversion. These are * private definitions which should NOT be used outside memory.h * files. Use virt_to_phys/phys_to_virt/__pa/__va instead. diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index e9eaf6e9262d..550d21574d0d 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -36,19 +36,13 @@ * * VMEMAP_SIZE: allows the whole VA space to be covered by a struct page array * (rounded up to PUD_SIZE). - * VMALLOC_START: beginning of the kernel VA space + * VMALLOC_START: beginning of the kernel vmalloc space * VMALLOC_END: extends to the available space below vmmemmap, PCI I/O space, * fixed mappings and modules */ #define VMEMMAP_SIZE ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE) -#ifndef CONFIG_KASAN -#define VMALLOC_START (VA_START) -#else -#include -#define VMALLOC_START (KASAN_SHADOW_END + SZ_64K) -#endif - +#define VMALLOC_START (MODULES_END) #define VMALLOC_END(PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K) #define vmemmap((struct page *)(VMALLOC_END + SZ_64K)) diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c index 5a22a119a74c..e83ffb00560c 100644 --- a/arch/arm64/mm/dump.c +++ b/arch/arm64/mm/dump.c @@ -35,7 +35,9 @@ struct addr_marker { }; enum address_markers_idx { - VMALLOC_START_NR = 0, + MODULES_START_NR = 0, + MODULES_END_NR, + VMALLOC_START_NR, VMALLOC_END_NR, #ifdef CONFIG_SPARSEMEM_VMEMMAP VMEMMAP_START_NR, @@ -45,12 +47,12 @@ enum address_markers_idx { FIXADDR_END_NR, PCI_START_NR, PCI_END_NR, - MODULES_START_NR, - MODUELS_END_NR, KERNEL_SPACE_NR, }; static struct addr_marker address_markers[] = { + { MODULES_VADDR,"Modules start" }, + { MODULES_END, "Modules end" }, { VMALLOC_START,"vmalloc() Area" }, { VMALLOC_END,
[PATCH v4 16/22] kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit the kallsyms table in such a way that it contains offsets relative to some anchor point in the kernel image rather than absolute addresses. On 64-bit architectures, it cuts the size of the kallsyms address table in half, since offsets between kernel symbols can typically be expressed in 32 bits. This saves several hundreds of kilobytes of permanent .rodata on average. In addition, the kallsyms address table is no longer subject to dynamic relocation when CONFIG_RELOCATABLE is in effect, so the relocation work done after decompression now doesn't have to do relocation updates for all these values. This saves up to 24 bytes (i.e., the size of a ELF64 RELA relocation table entry) per value, which easily adds up to a couple of megabytes of uncompressed __init data on ppc64 or arm64. Even if these relocation entries typically compress well, the combined size reduction of 2.8 MB uncompressed for a ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500 KB space saving in the compressed image. Since it is useful for some architectures (like x86) to retain the ability to emit absolute values as well, this patch adds support for both, by emitting absolute addresses as positive 32-bit values, and addresses relative to the lowest encountered relative symbol as negative values, which are subtracted from the runtime address of this base symbol to produce the actual address. Support for the above is enabled by default for all architectures except IA-64, whose symbols are too far apart to capture in this manner. Signed-off-by: Ard Biesheuvel Tested-by: Guenter Roeck Reviewed-by: Kees Cook Tested-by: Kees Cook Cc: Heiko Carstens Cc: Michael Ellerman Cc: Ingo Molnar Cc: H. Peter Anvin Cc: Benjamin Herrenschmidt Cc: Michal Marek Cc: Rusty Russell Cc: Arnd Bergmann Signed-off-by: Andrew Morton --- init/Kconfig| 16 kernel/kallsyms.c | 38 +++-- scripts/kallsyms.c | 88 +--- scripts/link-vmlinux.sh | 4 + scripts/namespace.pl| 2 + 5 files changed, 129 insertions(+), 19 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 22320804fbaf..1cc72a068afc 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1420,6 +1420,22 @@ config KALLSYMS_ALL Say N unless you really need all symbols. +config KALLSYMS_BASE_RELATIVE + bool + depends on KALLSYMS + default !IA64 + help + Instead of emitting them as absolute values in the native word size, + emit the symbol references in the kallsyms table as 32-bit entries, + each containing either an absolute value in the range [0, S32_MAX] or + a relative value in the range [base, base + S32_MAX], where base is + the lowest relative symbol address encountered in the image. + + On 64-bit builds, this reduces the size of the address table by 50%, + but more importantly, it results in entries whose values are build + time constants, and no relocation pass is required at runtime to fix + up the entries based on the runtime load address of the kernel. + config PRINTK default y bool "Enable support for printk" if EXPERT diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index 5c5987f10819..10a8af9d5744 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -38,6 +38,7 @@ * during the second link stage. */ extern const unsigned long kallsyms_addresses[] __weak; +extern const int kallsyms_offsets[] __weak; extern const u8 kallsyms_names[] __weak; /* @@ -47,6 +48,9 @@ extern const u8 kallsyms_names[] __weak; extern const unsigned long kallsyms_num_syms __attribute__((weak, section(".rodata"))); +extern const unsigned long kallsyms_relative_base +__attribute__((weak, section(".rodata"))); + extern const u8 kallsyms_token_table[] __weak; extern const u16 kallsyms_token_index[] __weak; @@ -176,6 +180,19 @@ static unsigned int get_symbol_offset(unsigned long pos) return name - kallsyms_names; } +static unsigned long kallsyms_sym_address(int idx) +{ + if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE)) + return kallsyms_addresses[idx]; + + /* positive offsets are absolute values */ + if (kallsyms_offsets[idx] >= 0) + return kallsyms_offsets[idx]; + + /* negative offsets are relative to kallsyms_relative_base - 1 */ + return kallsyms_relative_base - 1 - kallsyms_offsets[idx]; +} + /* Lookup the address for this symbol. Returns 0 if not found. */ unsigned long kallsyms_lookup_name(const char *name) { @@ -187,7 +204,7 @@ unsigned long kallsyms_lookup_name(const char *name) off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); if (strcmp(namebuf, name) == 0) - return kallsyms_addresses[i]; + return kallsyms_sym_address(i); } return mo
[PATCH v4 21/22] efi: stub: use high allocation for converted command line
Before we can move the command line processing before the allocation of the kernel, which is required for detecting the 'nokaslr' option which controls that allocation, move the converted command line higher up in memory, to prevent it from interfering with the kernel itself. Since x86 needs the address to fit in 32 bits, use UINT_MAX as the upper bound there. Otherwise, use ULONG_MAX (i.e., no limit) Reviewed-by: Matt Fleming Signed-off-by: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 2 ++ drivers/firmware/efi/libstub/efi-stub-helper.c | 7 ++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 0010c78c4998..08b1f2f6ea50 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -25,6 +25,8 @@ #define EFI32_LOADER_SIGNATURE "EL32" #define EFI64_LOADER_SIGNATURE "EL64" +#define MAX_CMDLINE_ADDRESSUINT_MAX + #ifdef CONFIG_X86_32 diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c index f07d4a67fa76..29ed2f9b218c 100644 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c @@ -649,6 +649,10 @@ static u8 *efi_utf16_to_utf8(u8 *dst, const u16 *src, int n) return dst; } +#ifndef MAX_CMDLINE_ADDRESS +#define MAX_CMDLINE_ADDRESSULONG_MAX +#endif + /* * Convert the unicode UEFI command line to ASCII to pass to kernel. * Size of memory allocated return in *cmd_line_len. @@ -684,7 +688,8 @@ char *efi_convert_cmdline(efi_system_table_t *sys_table_arg, options_bytes++;/* NUL termination */ - status = efi_low_alloc(sys_table_arg, options_bytes, 0, &cmdline_addr); + status = efi_high_alloc(sys_table_arg, options_bytes, 0, + &cmdline_addr, MAX_CMDLINE_ADDRESS); if (status != EFI_SUCCESS) return NULL; -- 2.5.0
[PATCH v4 12/22] arm64: avoid dynamic relocations in early boot code
Before implementing KASLR for arm64 by building a self-relocating PIE executable, we have to ensure that values we use before the relocation routine is executed are not subject to dynamic relocation themselves. This applies not only to virtual addresses, but also to values that are supplied by the linker at build time and relocated using R_AARCH64_ABS64 relocations. So instead, use assemble time constants, or force the use of static relocations by folding the constants into the instructions. Reviewed-by: Mark Rutland Signed-off-by: Ard Biesheuvel --- arch/arm64/kernel/efi-entry.S | 2 +- arch/arm64/kernel/head.S | 39 +--- 2 files changed, 27 insertions(+), 14 deletions(-) diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S index a773db92908b..f82036e02485 100644 --- a/arch/arm64/kernel/efi-entry.S +++ b/arch/arm64/kernel/efi-entry.S @@ -61,7 +61,7 @@ ENTRY(entry) */ mov x20, x0 // DTB address ldr x0, [sp, #16] // relocated _text address - ldr x21, =stext_offset + movzx21, #:abs_g0:stext_offset add x21, x0, x21 /* diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 2a39d4ab02bf..ab4c8e93e31c 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -67,12 +67,11 @@ * in the entry routines. */ __HEAD - +_head: /* * DO NOT MODIFY. Image header expected by Linux boot-loaders. */ #ifdef CONFIG_EFI -efi_head: /* * This add instruction has no meaningful effect except that * its opcode forms the magic "MZ" signature required by UEFI. @@ -94,14 +93,14 @@ efi_head: .byte 0x4d .byte 0x64 #ifdef CONFIG_EFI - .long pe_header - efi_head// Offset to the PE header. + .long pe_header - _head // Offset to the PE header. #else .word 0 // reserved #endif #ifdef CONFIG_EFI .globl __efistub_stext_offset - .set__efistub_stext_offset, stext - efi_head + .set__efistub_stext_offset, stext - _head .align 3 pe_header: .ascii "PE" @@ -124,7 +123,7 @@ optional_header: .long _end - stext// SizeOfCode .long 0 // SizeOfInitializedData .long 0 // SizeOfUninitializedData - .long __efistub_entry - efi_head // AddressOfEntryPoint + .long __efistub_entry - _head // AddressOfEntryPoint .long __efistub_stext_offset // BaseOfCode extra_header_fields: @@ -139,7 +138,7 @@ extra_header_fields: .short 0 // MinorSubsystemVersion .long 0 // Win32VersionValue - .long _end - efi_head // SizeOfImage + .long _end - _head// SizeOfImage // Everything before the kernel image is considered part of the header .long __efistub_stext_offset // SizeOfHeaders @@ -219,11 +218,13 @@ ENTRY(stext) * On return, the CPU will be ready for the MMU to be turned on and * the TCR will have been set. */ - ldr x27, =__mmap_switched // address to jump to after + ldr x27, 0f // address to jump to after // MMU has been enabled adr_l lr, __enable_mmu// return (PIC) address b __cpu_setup // initialise processor ENDPROC(stext) + .align 3 +0: .quad __mmap_switched - (_head - TEXT_OFFSET) + KIMAGE_VADDR /* * Preserve the arguments passed by the bootloader in x0 .. x3 @@ -391,7 +392,8 @@ __create_page_tables: mov x0, x26 // swapper_pg_dir ldr x5, =KIMAGE_VADDR create_pgd_entry x0, x5, x3, x6 - ldr x6, =KERNEL_END // __va(KERNEL_END) + ldr w6, kernel_img_size + add x6, x6, x5 mov x3, x24 // phys offset create_block_map x0, x7, x3, x5, x6 @@ -408,6 +410,9 @@ __create_page_tables: mov lr, x27 ret ENDPROC(__create_page_tables) + +kernel_img_size: + .long _end - (_head - TEXT_OFFSET) .ltorg /* @@ -415,6 +420,10 @@ ENDPROC(__create_page_tables) */ .setinitial_sp, init_thread_union + THREAD_START_SP __mmap_switched: + adr_l x8, vectors // load VBAR_EL1 with virtual + msr vbar_el1, x8// vector table address + isb + // Clear BSS adr_l x0, __bss_start mov x1, xzr @@ -601,13 +610,19 @@ ENTRY(secondary_startup) adrpx26, swapper_pg_dir bl __cpu_setup // initiali
[PATCH v4 19/22] efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
This exposes the firmware's implementation of EFI_RNG_PROTOCOL via a new function efi_get_random_bytes(). Reviewed-by: Matt Fleming Signed-off-by: Ard Biesheuvel --- drivers/firmware/efi/libstub/Makefile | 2 +- drivers/firmware/efi/libstub/efistub.h | 3 ++ drivers/firmware/efi/libstub/random.c | 35 include/linux/efi.h| 5 ++- 4 files changed, 43 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile index aaf9c0bab42e..ad077944aa0e 100644 --- a/drivers/firmware/efi/libstub/Makefile +++ b/drivers/firmware/efi/libstub/Makefile @@ -36,7 +36,7 @@ lib-$(CONFIG_EFI_ARMSTUB) += arm-stub.o fdt.o string.o \ $(patsubst %.c,lib-%.o,$(arm-deps)) lib-$(CONFIG_ARM) += arm32-stub.o -lib-$(CONFIG_ARM64)+= arm64-stub.o +lib-$(CONFIG_ARM64)+= arm64-stub.o random.o CFLAGS_arm64-stub.o:= -DTEXT_OFFSET=$(TEXT_OFFSET) # diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h index 6b6548fda089..206b7252b9d1 100644 --- a/drivers/firmware/efi/libstub/efistub.h +++ b/drivers/firmware/efi/libstub/efistub.h @@ -43,4 +43,7 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size, unsigned long desc_size, efi_memory_desc_t *runtime_map, int *count); +efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table, + unsigned long size, u8 *out); + #endif diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c new file mode 100644 index ..97941ee5954f --- /dev/null +++ b/drivers/firmware/efi/libstub/random.c @@ -0,0 +1,35 @@ +/* + * Copyright (C) 2016 Linaro Ltd; + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include +#include + +#include "efistub.h" + +struct efi_rng_protocol { + efi_status_t (*get_info)(struct efi_rng_protocol *, +unsigned long *, efi_guid_t *); + efi_status_t (*get_rng)(struct efi_rng_protocol *, + efi_guid_t *, unsigned long, u8 *out); +}; + +efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table_arg, + unsigned long size, u8 *out) +{ + efi_guid_t rng_proto = EFI_RNG_PROTOCOL_GUID; + efi_status_t status; + struct efi_rng_protocol *rng; + + status = efi_call_early(locate_protocol, &rng_proto, NULL, + (void **)&rng); + if (status != EFI_SUCCESS) + return status; + + return rng->get_rng(rng, NULL, size, out); +} diff --git a/include/linux/efi.h b/include/linux/efi.h index 569b5a866bb1..13783fdc9bdd 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -299,7 +299,7 @@ typedef struct { void *open_protocol_information; void *protocols_per_handle; void *locate_handle_buffer; - void *locate_protocol; + efi_status_t (*locate_protocol)(efi_guid_t *, void *, void **); void *install_multiple_protocol_interfaces; void *uninstall_multiple_protocol_interfaces; void *calculate_crc32; @@ -599,6 +599,9 @@ void efi_native_runtime_setup(void); #define EFI_PROPERTIES_TABLE_GUID \ EFI_GUID( 0x880aaca3, 0x4adc, 0x4a04, 0x90, 0x79, 0xb7, 0x47, 0x34, 0x08, 0x25, 0xe5 ) +#define EFI_RNG_PROTOCOL_GUID \ +EFI_GUID( 0x3152bca5, 0xeade, 0x433d, 0x86, 0x2e, 0xc0, 0x1c, 0xdc, 0x29, 0x1f, 0x44 ) + typedef struct { efi_guid_t guid; u64 table; -- 2.5.0
Re: [PATCH 3/3] input: touchscreen: ad7879: add device tree support
On 2016-01-26 00:14, Michael Hennerich wrote: > On 01/26/2016 04:04 AM, Stefan Agner wrote: >> Add device tree support for the I2C variant of AD7879 (AD7879-1). This >> allows to specify the touchscreen controller as a I2C client node. >> Most of the options available as platform data are also available as >> device tree properties. Exporting the GPIO is currently not possible >> through device tree. >> >> Signed-off-by: Stefan Agner > > > Hi Stefan, > > Thanks for the patch - > There is something similar in our tree but I forgot to send it > mainline a long time ago. > > https://github.com/analogdevicesinc/linux/commit/69b16d4b616a4bbe9001d3f67d3ff54f3deb85ce Yeah I saw that patch. I tried to use the standard touchscreen properties where applicable. Also, my implementation is currently only for I2C. However, when I now think about it, I might as well move my code into ad7879.c which would enable to use the same bindings for SPI devices too. > > There are some build issues can you have a look? Yeah I saw, shame on me. Will fix them. > > I also don't understand why "exporting the GPIO is not possible > through device tree"? > > Can you explain? I should have written "not implemented" instead of "not possible". To implement proper device tree GPIO bindings it would need some more changes. I did not look into that, hence the current device tree bindings do not allow to enable the GPIO functionality. -- Stefan > > Regards, > Michael > > >> --- >> .../bindings/input/touchscreen/ad7879-i2c.txt | 47 >> drivers/input/touchscreen/ad7879-i2c.c | 63 >> +- >> drivers/input/touchscreen/ad7879-spi.c | 3 +- >> drivers/input/touchscreen/ad7879.c | 2 +- >> drivers/input/touchscreen/ad7879.h | 1 + >> 5 files changed, 113 insertions(+), 3 deletions(-) >> create mode 100644 >> Documentation/devicetree/bindings/input/touchscreen/ad7879-i2c.txt >> >> diff --git >> a/Documentation/devicetree/bindings/input/touchscreen/ad7879-i2c.txt >> b/Documentation/devicetree/bindings/input/touchscreen/ad7879-i2c.txt >> new file mode 100644 >> index 000..bf169a2 >> --- /dev/null >> +++ b/Documentation/devicetree/bindings/input/touchscreen/ad7879-i2c.txt >> @@ -0,0 +1,47 @@ >> +* Analog Devices AD7879-1/AD7889-1 touchscreen interface (I2C) >> + >> +Required properties: >> +- compatible: must be "adi,ad7879-1" >> +- reg: i2c slave address >> +- interrupt-parent: the phandle for the interrupt controller >> +- interrupts: touch controller interrupt >> +- resistance-plate-x: total resistance of X-plate (for >> pressure >> + calculation) >> +- touchscreen-max-pressure : maximum reported pressure >> +- touchscreen-swapped-x-y : X and Y axis are swapped (boolean) >> + Swapping is done after inverting the axis >> +Optional properties: >> +- first-conversion-delay: 0-12 in 128us steps (starting with 128us) >> + 13: 2.560ms >> + 14: 3.584ms >> + 15: 4.096ms >> +- acquisition-time : 0: 2us >> + 1: 4us >> + 2: 8us >> + 3: 16us >> +- median-filter-size: 0: disabled >> + 1: 4 measurements >> + 2: 8 measurements >> + 3: 16 measurements >> +- averaging : 0: 2 middle values (1 if median disabled) >> + 1: 4 middle values >> + 2: 8 middle values >> + 3: 16 values >> +- conversion-interval: : 0: convert one time only >> + 1-255: 515us + val * 35us (up to 9.440ms) >> + >> +Example: >> + >> +ad7879@2c { >> +compatible = "adi,ad7879-1"; >> +reg = <0x2c>; >> +interrupt-parent = <&gpio1>; >> +interrupts = <13 IRQ_TYPE_EDGE_FALLING>; >> +resistance-plate-x = <120>; >> +touchscreen-max-pressure = <4096>; >> +first-conversion-delay = /bits/ 8 <3>; >> +acquisition-time = /bits/ 8 <1>; >> +median-filter-size = /bits/ 8 <2>; >> +averaging = /bits/ 8 <1>; >> +conversion-interval = /bits/ 8 <255>; >> +}; >> diff --git a/drivers/input/touchscreen/ad7879-i2c.c >> b/drivers/input/touchscreen/ad7879-i2c.c >> index d66962c..08a2c9a 100644 >> --- a/drivers/input/touchscreen/ad7879-i2c.c >> +++ b/drivers/input/touchscreen/ad7879-i2c.c >> @@ -11,6 +11,7 @@ >> #include >> #include >> #include >> +#include >> >> #include "ad7879.h" >> >> @@ -54,9 +55,50 @@ static const struct ad7879_bus_ops ad7879_i2c_bus_ops = { >> .write = ad7879_i2c_write, >> }; >> >> +static struct ad7879_
[PATCH v4 22/22] arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
Since arm64 does not use a decompressor that supplies an execution environment where it is feasible to some extent to provide a source of randomness, the arm64 KASLR kernel depends on the bootloader to supply some random bits in the /chosen/kaslr-seed DT property upon kernel entry. On UEFI systems, we can use the EFI_RNG_PROTOCOL, if supplied, to obtain some random bits. At the same time, use it to randomize the offset of the kernel Image in physical memory. Signed-off-by: Ard Biesheuvel --- arch/arm64/Kconfig| 5 ++ drivers/firmware/efi/libstub/arm-stub.c | 40 ++ drivers/firmware/efi/libstub/arm64-stub.c | 78 ++-- drivers/firmware/efi/libstub/fdt.c| 9 +++ 4 files changed, 97 insertions(+), 35 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index d7e31454d421..c6b5f71996e0 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -786,6 +786,11 @@ config RANDOMIZE_BASE It is the bootloader's job to provide entropy, by passing a random u64 value in /chosen/kaslr-seed at kernel entry. + When booting via the UEFI stub, it will invoke the firmware's + EFI_RNG_PROTOCOL implementation (if available) to supply entropy + to the kernel proper. In addition, it will randomise the physical + location of the kernel Image as well. + If unsure, say N. endmenu diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c index 3397902e4040..4deb3e7faa0e 100644 --- a/drivers/firmware/efi/libstub/arm-stub.c +++ b/drivers/firmware/efi/libstub/arm-stub.c @@ -18,6 +18,8 @@ #include "efistub.h" +bool __nokaslr; + static int efi_secureboot_enabled(efi_system_table_t *sys_table_arg) { static efi_guid_t const var_guid = EFI_GLOBAL_VARIABLE_GUID; @@ -207,14 +209,6 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table, pr_efi_err(sys_table, "Failed to find DRAM base\n"); goto fail; } - status = handle_kernel_image(sys_table, image_addr, &image_size, -&reserve_addr, -&reserve_size, -dram_base, image); - if (status != EFI_SUCCESS) { - pr_efi_err(sys_table, "Failed to relocate kernel\n"); - goto fail; - } /* * Get the command line from EFI, using the LOADED_IMAGE @@ -224,7 +218,28 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table, cmdline_ptr = efi_convert_cmdline(sys_table, image, &cmdline_size); if (!cmdline_ptr) { pr_efi_err(sys_table, "getting command line via LOADED_IMAGE_PROTOCOL\n"); - goto fail_free_image; + goto fail; + } + + /* check whether 'nokaslr' was passed on the command line */ + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) { + static const u8 default_cmdline[] = CONFIG_CMDLINE; + const u8 *str, *cmdline = cmdline_ptr; + + if (IS_ENABLED(CONFIG_CMDLINE_FORCE)) + cmdline = default_cmdline; + str = strstr(cmdline, "nokaslr"); + if (str == cmdline || (str > cmdline && *(str - 1) == ' ')) + __nokaslr = true; + } + + status = handle_kernel_image(sys_table, image_addr, &image_size, +&reserve_addr, +&reserve_size, +dram_base, image); + if (status != EFI_SUCCESS) { + pr_efi_err(sys_table, "Failed to relocate kernel\n"); + goto fail_free_cmdline; } status = efi_parse_options(cmdline_ptr); @@ -244,7 +259,7 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table, if (status != EFI_SUCCESS) { pr_efi_err(sys_table, "Failed to load device tree!\n"); - goto fail_free_cmdline; + goto fail_free_image; } } @@ -286,12 +301,11 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table, efi_free(sys_table, initrd_size, initrd_addr); efi_free(sys_table, fdt_size, fdt_addr); -fail_free_cmdline: - efi_free(sys_table, cmdline_size, (unsigned long)cmdline_ptr); - fail_free_image: efi_free(sys_table, image_size, *image_addr); efi_free(sys_table, reserve_size, reserve_addr); +fail_free_cmdline: + efi_free(sys_table, cmdline_size, (unsigned long)cmdline_ptr); fail: return EFI_ERROR; } diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c index 78dfbd34b6bf..e0e6b74fef8f 100644 --- a/drivers/firmware/efi/libstub/arm64-stub.c +++ b/drivers/firmware/efi/libstub/arm64-stub.c @@ -13,6 +13,10 @@ #include #inclu
[PATCH v4 20/22] efi: stub: add implementation of efi_random_alloc()
This implements efi_random_alloc(), which allocates a chunk of memory of a certain size at a certain alignment, and uses the random_seed argument it receives to randomize the address of the allocation. This is implemented by iterating over the UEFI memory map, counting the number of suitable slots (aligned offsets) within each region, and picking a random number between 0 and 'number of slots - 1' to select the slot, This should guarantee that each possible offset is chosen equally likely. Suggested-by: Kees Cook Cc: Matt Fleming Signed-off-by: Ard Biesheuvel --- drivers/firmware/efi/libstub/efistub.h | 4 + drivers/firmware/efi/libstub/random.c | 100 2 files changed, 104 insertions(+) diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h index 206b7252b9d1..5ed3d3f38166 100644 --- a/drivers/firmware/efi/libstub/efistub.h +++ b/drivers/firmware/efi/libstub/efistub.h @@ -46,4 +46,8 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size, efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table, unsigned long size, u8 *out); +efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg, + unsigned long size, unsigned long align, + unsigned long *addr, unsigned long random_seed); + #endif diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c index 97941ee5954f..b98346350230 100644 --- a/drivers/firmware/efi/libstub/random.c +++ b/drivers/firmware/efi/libstub/random.c @@ -33,3 +33,103 @@ efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table_arg, return rng->get_rng(rng, NULL, size, out); } + +/* + * Return the number of slots covered by this entry, i.e., the number of + * addresses it covers that are suitably aligned and supply enough room + * for the allocation. + */ +static unsigned long get_entry_num_slots(efi_memory_desc_t *md, +unsigned long size, +unsigned long align) +{ + u64 start, end; + + if (md->type != EFI_CONVENTIONAL_MEMORY) + return 0; + + start = round_up(md->phys_addr, align); + end = round_down(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - size, +align); + + if (start > end) + return 0; + + return (end - start + 1) / align; +} + +/* + * The UEFI memory descriptors have a virtual address field that is only used + * when installing the virtual mapping using SetVirtualAddressMap(). Since it + * is unused here, we can reuse it to keep track of each descriptor's slot + * count. + */ +#define MD_NUM_SLOTS(md) ((md)->virt_addr) + +efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg, + unsigned long size, + unsigned long align, + unsigned long *addr, + unsigned long random_seed) +{ + unsigned long map_size, desc_size, total_slots = 0, target_slot; + efi_status_t status = EFI_NOT_FOUND; + efi_memory_desc_t *memory_map; + int map_offset; + + status = efi_get_memory_map(sys_table_arg, &memory_map, &map_size, + &desc_size, NULL, NULL); + if (status != EFI_SUCCESS) + return status; + + if (align < EFI_ALLOC_ALIGN) + align = EFI_ALLOC_ALIGN; + + /* count the suitable slots in each memory map entry */ + for (map_offset = 0; map_offset < map_size; map_offset += desc_size) { + efi_memory_desc_t *md = (void *)memory_map + map_offset; + unsigned long slots; + + slots = get_entry_num_slots(md, size, align); + MD_NUM_SLOTS(md) = slots; + total_slots += slots; + } + + /* find a random number between 0 and total_slots */ + target_slot = (total_slots * (u16)random_seed) >> 16; + + /* +* target_slot is now a value in the range [0, total_slots), and so +* it corresponds with exactly one of the suitable slots we recorded +* when iterating over the memory map the first time around. +* +* So iterate over the memory map again, subtracting the number of +* slots of each entry at each iteration, until we have found the entry +* that covers our chosen slot. Use the residual value of target_slot +* to calculate the randomly chosen address, and allocate it directly +* using EFI_ALLOCATE_ADDRESS. +*/ + for (map_offset = 0; map_offset < map_size; map_offset += desc_size) { + efi_memory_desc_t *md = (void *)memory_map + map_offset; + efi_physical_addr_t target; + unsigned long pages; + + if (target_slot >= MD_NUM_SLOTS(md)) { +
Re: stty blocks forever when the line is already opened (and the tx buffer can't be flushed)
Hi Richard, On 01/26/2016 08:19 AM, Richard Genoud wrote: > [ sorry for the noise, I forgot to Cc the lkml ] > > Hi, > I've found a case were calling > stty -F /dev/ttyS1 clocal > blocks forever. > And I don't know if it's a very old bug or if it's meant to be like that. > > Here is how to reproduce the lock : > NB: there's NO modem on ttyS1 > stty -F /dev/ttyS1 clocal cread crtscts > cat < /dev/ttyS1 > > #on another terminal : > echo "dummy" > /dev/ttyS1 # This call doesn't block > > stty -F /dev/ttyS1 -crtscts # this blocks forever on ioctl(TCSETSW ) > > > looking at tty_port_close_start(), it's pretty clear that nothing is > flushed until the last user, so it explains why the "echo dummy" > returns directly, despite the crtscts flags. > And in tty_mode_ioctl(), there are the lines: > case TCSETSW: > return set_termios(real_tty, p, TERMIOS_WAIT | TERMIOS_OLD); > That explain why the stty blocks. > > But this behavior seems really strange. > ... Or it's meant to be like that ? Yeah, meant to be like that. When mgetty writes the login prompt but h/w flow control is enabled and nothing's connected, the output is buffered. Since stty uses tcsetattr(TCSADRAIN), the attempt to turn off h/w flow control blocks, waiting for output to empty. In this situation, stopping mgetty will allow the other process to unblock and advance. Hmmm, I could add a -f,--force flag to stty so it uses tcsetattr(TCSANOW)... Regards, Peter Hurley > Regards, > Richard > > NB: This is actually a real life use case with mgetty, a modem losing > its power and another process trying to speak to the modem. >
[PATCH v4 09/22] extable: add support for relative extables to search and sort routines
This adds support to the generic search_extable() and sort_extable() implementations for dealing with exception table entries whose fields contain relative offsets rather than absolute addresses. Acked-by: Helge Deller Acked-by: Heiko Carstens Acked-by: H. Peter Anvin Acked-by: Tony Luck Acked-by: Will Deacon Signed-off-by: Ard Biesheuvel --- lib/extable.c | 50 1 file changed, 41 insertions(+), 9 deletions(-) diff --git a/lib/extable.c b/lib/extable.c index 4cac81ec225e..0be02ad561e9 100644 --- a/lib/extable.c +++ b/lib/extable.c @@ -14,7 +14,37 @@ #include #include +#ifndef ARCH_HAS_RELATIVE_EXTABLE +#define ex_to_insn(x) ((x)->insn) +#else +static inline unsigned long ex_to_insn(const struct exception_table_entry *x) +{ + return (unsigned long)&x->insn + x->insn; +} +#endif + #ifndef ARCH_HAS_SORT_EXTABLE +#ifndef ARCH_HAS_RELATIVE_EXTABLE +#define swap_exNULL +#else +static void swap_ex(void *a, void *b, int size) +{ + struct exception_table_entry *x = a, *y = b, tmp; + int delta = b - a; + + tmp = *x; + x->insn = y->insn + delta; + y->insn = tmp.insn - delta; + +#ifdef swap_ex_entry_fixup + swap_ex_entry_fixup(x, y, tmp, delta); +#else + x->fixup = y->fixup + delta; + y->fixup = tmp.fixup - delta; +#endif +} +#endif /* ARCH_HAS_RELATIVE_EXTABLE */ + /* * The exception table needs to be sorted so that the binary * search that we use to find entries in it works properly. @@ -26,9 +56,9 @@ static int cmp_ex(const void *a, const void *b) const struct exception_table_entry *x = a, *y = b; /* avoid overflow */ - if (x->insn > y->insn) + if (ex_to_insn(x) > ex_to_insn(y)) return 1; - if (x->insn < y->insn) + if (ex_to_insn(x) < ex_to_insn(y)) return -1; return 0; } @@ -37,7 +67,7 @@ void sort_extable(struct exception_table_entry *start, struct exception_table_entry *finish) { sort(start, finish - start, sizeof(struct exception_table_entry), -cmp_ex, NULL); +cmp_ex, swap_ex); } #ifdef CONFIG_MODULES @@ -48,13 +78,15 @@ void sort_extable(struct exception_table_entry *start, void trim_init_extable(struct module *m) { /*trim the beginning*/ - while (m->num_exentries && within_module_init(m->extable[0].insn, m)) { + while (m->num_exentries && + within_module_init(ex_to_insn(&m->extable[0]), m)) { m->extable++; m->num_exentries--; } /*trim the end*/ while (m->num_exentries && - within_module_init(m->extable[m->num_exentries-1].insn, m)) + within_module_init(ex_to_insn(&m->extable[m->num_exentries - 1]), + m)) m->num_exentries--; } #endif /* CONFIG_MODULES */ @@ -81,13 +113,13 @@ search_extable(const struct exception_table_entry *first, * careful, the distance between value and insn * can be larger than MAX_LONG: */ - if (mid->insn < value) + if (ex_to_insn(mid) < value) first = mid + 1; - else if (mid->insn > value) + else if (ex_to_insn(mid) > value) last = mid - 1; else return mid; -} -return NULL; + } + return NULL; } #endif -- 2.5.0
[PATCH v4 15/22] scripts/sortextable: add support for ET_DYN binaries
Add support to scripts/sortextable for handling relocatable (PIE) executables, whose ELF type is ET_DYN, not ET_EXEC. Other than adding support for the new type, no changes are needed. Signed-off-by: Ard Biesheuvel --- scripts/sortextable.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/scripts/sortextable.c b/scripts/sortextable.c index af247c70fb66..19d83647846c 100644 --- a/scripts/sortextable.c +++ b/scripts/sortextable.c @@ -266,9 +266,9 @@ do_file(char const *const fname) break; } /* end switch */ if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0 - || r2(&ehdr->e_type) != ET_EXEC + || (r2(&ehdr->e_type) != ET_EXEC && r2(&ehdr->e_type) != ET_DYN) || ehdr->e_ident[EI_VERSION] != EV_CURRENT) { - fprintf(stderr, "unrecognized ET_EXEC file %s\n", fname); + fprintf(stderr, "unrecognized ET_EXEC/ET_DYN file %s\n", fname); fail_file(); } @@ -304,7 +304,7 @@ do_file(char const *const fname) if (r2(&ehdr->e_ehsize) != sizeof(Elf32_Ehdr) || r2(&ehdr->e_shentsize) != sizeof(Elf32_Shdr)) { fprintf(stderr, - "unrecognized ET_EXEC file: %s\n", fname); + "unrecognized ET_EXEC/ET_DYN file: %s\n", fname); fail_file(); } do32(ehdr, fname, custom_sort); @@ -314,7 +314,7 @@ do_file(char const *const fname) if (r2(&ghdr->e_ehsize) != sizeof(Elf64_Ehdr) || r2(&ghdr->e_shentsize) != sizeof(Elf64_Shdr)) { fprintf(stderr, - "unrecognized ET_EXEC file: %s\n", fname); + "unrecognized ET_EXEC/ET_DYN file: %s\n", fname); fail_file(); } do64(ghdr, fname, custom_sort); -- 2.5.0
[PATCH v4 17/22] arm64: add support for building the kernel as a relocate PIE binary
This implements CONFIG_RELOCATABLE, which links the final vmlinux image with a dynamic relocation section, which allows the early boot code to perform a relocation to a different virtual address at runtime. This is a prerequisite for KASLR (CONFIG_RANDOMIZE_BASE). Signed-off-by: Ard Biesheuvel --- arch/arm64/Kconfig | 12 arch/arm64/Makefile | 4 +++ arch/arm64/include/asm/elf.h| 2 ++ arch/arm64/kernel/head.S| 32 arch/arm64/kernel/vmlinux.lds.S | 16 ++ 5 files changed, 66 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 141f65ab0ed5..6aa86f86fd10 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -84,6 +84,7 @@ config ARM64 select IOMMU_DMA if IOMMU_SUPPORT select IRQ_DOMAIN select IRQ_FORCED_THREADING + select KALLSYMS_TEXT_RELATIVE select MODULES_USE_ELF_RELA select NO_BOOTMEM select OF @@ -762,6 +763,17 @@ config ARM64_MODULE_PLTS select ARM64_MODULE_CMODEL_LARGE select HAVE_MOD_ARCH_SPECIFIC +config RELOCATABLE + bool + help + This builds the kernel as a Position Independent Executable (PIE), + which retains all relocation metadata required to relocate the + kernel binary at runtime to a different virtual address than the + address it was linked at. + Since AArch64 uses the RELA relocation format, this requires a + relocation pass at runtime even if the kernel is loaded at the + same address it was linked at. + endmenu menu "Boot options" diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index db462980c6be..c3eaa03f9020 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -15,6 +15,10 @@ CPPFLAGS_vmlinux.lds = -DTEXT_OFFSET=$(TEXT_OFFSET) OBJCOPYFLAGS :=-O binary -R .note -R .note.gnu.build-id -R .comment -S GZFLAGS:=-9 +ifneq ($(CONFIG_RELOCATABLE),) +LDFLAGS_vmlinux+= -pie +endif + KBUILD_DEFCONFIG := defconfig # Check for binutils support for specific extensions diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index 435f55952e1f..24ed037f09fd 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -77,6 +77,8 @@ #define R_AARCH64_MOVW_PREL_G2_NC 292 #define R_AARCH64_MOVW_PREL_G3 293 +#define R_AARCH64_RELATIVE 1027 + /* * These are used to set parameters in the core dumps. */ diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index b4be53923942..92f9c26632f3 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -432,6 +433,37 @@ __mmap_switched: bl __pi_memset dsb ishst // Make zero page visible to PTW +#ifdef CONFIG_RELOCATABLE + + /* +* Iterate over each entry in the relocation table, and apply the +* relocations in place. +*/ + adr_l x8, __dynsym_start // start of symbol table + adr_l x9, __reloc_start // start of reloc table + adr_l x10, __reloc_end// end of reloc table + +0: cmp x9, x10 + b.hs2f + ldp x11, x12, [x9], #24 + ldr x13, [x9, #-8] + cmp w12, #R_AARCH64_RELATIVE + b.ne1f + str x13, [x11] + b 0b + +1: cmp w12, #R_AARCH64_ABS64 + b.ne0b + add x12, x12, x12, lsl #1 // symtab offset: 24x top word + add x12, x8, x12, lsr #(32 - 3) // ... shifted into bottom word + ldr x15, [x12, #8] // Elf64_Sym::st_value + add x15, x13, x15 + str x15, [x11] + b 0b + +2: +#endif + adr_l sp, initial_sp, x4 mov x4, sp and x4, x4, #~(THREAD_SIZE - 1) diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S index 282e3e64a17e..e3f6cd740ea3 100644 --- a/arch/arm64/kernel/vmlinux.lds.S +++ b/arch/arm64/kernel/vmlinux.lds.S @@ -87,6 +87,7 @@ SECTIONS EXIT_CALL *(.discard) *(.discard.*) + *(.interp .dynamic) } . = KIMAGE_VADDR + TEXT_OFFSET; @@ -149,6 +150,21 @@ SECTIONS .altinstr_replacement : { *(.altinstr_replacement) } + .rela : ALIGN(8) { + __reloc_start = .; + *(.rela .rela*) + __reloc_end = .; + } + .dynsym : ALIGN(8) { + __dynsym_start = .; + *(.dynsym) + } + .dynstr : { + *(.dynstr) + } + .hash : { + *(.hash) + } . = ALIGN(PAGE_SIZE); __init_end = .; -- 2.5.0
[PATCH v4 18/22] arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by the bootloader in the /chosen/kaslr-seed DT property. Depending on the size of the address space (VA_BITS) and the page size, the entropy in the virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all 4 levels), with the sidenote that displacements that result in the kernel image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB granule kernels, respectively) are not allowed, and will be rounded up to an acceptable value. The module region is randomized by choosing a page aligned 128 MB region inside the interval [_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of entropy (depending on page size), independently of the kernel randomization, but still guarantees that modules are within the range of relative branch and jump instructions (with the caveat that, since the module region is shared with other uses of the vmalloc area, modules may need to be loaded further away if the module region is exhausted) Signed-off-by: Ard Biesheuvel --- arch/arm64/Kconfig | 14 ++ arch/arm64/include/asm/memory.h | 5 +- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/head.S| 59 ++- arch/arm64/kernel/kaslr.c | 169 arch/arm64/kernel/module.c | 8 +- arch/arm64/kernel/setup.c | 29 arch/arm64/mm/mmu.c | 33 ++-- 8 files changed, 298 insertions(+), 20 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 6aa86f86fd10..d7e31454d421 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -774,6 +774,20 @@ config RELOCATABLE relocation pass at runtime even if the kernel is loaded at the same address it was linked at. +config RANDOMIZE_BASE + bool "Randomize the address of the kernel image" + select ARM64_MODULE_PLTS + select RELOCATABLE + help + Randomizes the virtual address at which the kernel image is + loaded, as a security feature that deters exploit attempts + relying on knowledge of the location of kernel internals. + + It is the bootloader's job to provide entropy, by passing a + random u64 value in /chosen/kaslr-seed at kernel entry. + + If unsure, say N. + endmenu menu "Boot options" diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index 61005e7dd6cb..083361531a61 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -52,7 +52,7 @@ #define KIMAGE_VADDR (MODULES_END) #define MODULES_END(MODULES_VADDR + MODULES_VSIZE) #define MODULES_VADDR (VA_START + KASAN_SHADOW_SIZE) -#define MODULES_VSIZE (SZ_64M) +#define MODULES_VSIZE (SZ_128M) #define PCI_IO_END (PAGE_OFFSET - SZ_2M) #define PCI_IO_START (PCI_IO_END - PCI_IO_SIZE) #define FIXADDR_TOP(PCI_IO_START - SZ_2M) @@ -127,6 +127,9 @@ extern phys_addr_t memstart_addr; /* PHYS_OFFSET - the physical address of the start of memory. */ #define PHYS_OFFSET({ memstart_addr; }) +/* the virtual base of the kernel image (minus TEXT_OFFSET) */ +extern u64 kimage_vaddr; + /* the offset between the kernel virtual and physical mappings */ extern u64 kimage_voffset; diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index e2f0a755beaa..c9aaecddb941 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -43,6 +43,7 @@ arm64-obj-$(CONFIG_PCI) += pci.o arm64-obj-$(CONFIG_ARMV8_DEPRECATED) += armv8_deprecated.o arm64-obj-$(CONFIG_ACPI) += acpi.o arm64-obj-$(CONFIG_PARAVIRT) += paravirt.o +arm64-obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o obj-y += $(arm64-obj-y) vdso/ obj-m += $(arm64-obj-m) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 92f9c26632f3..8712a38c3de7 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -210,6 +210,7 @@ section_table: ENTRY(stext) bl preserve_boot_args bl el2_setup // Drop to EL1, w20=cpu_boot_mode + mov x23, xzr// KASLR offset, defaults to 0 adrpx24, __PHYS_OFFSET bl set_cpu_boot_mode_flag bl __create_page_tables// x25=TTBR0, x26=TTBR1 @@ -313,7 +314,7 @@ ENDPROC(preserve_boot_args) __create_page_tables: adrpx25, idmap_pg_dir adrpx26, swapper_pg_dir - mov x27, lr + mov x28, lr /* * Invalidate the idmap and swapper page tables to avoid potential @@ -392,6 +393,7 @@ __create_page_tables: */ mov x0, x26 // swapper_pg_dir ldr x5, =KIMAGE_VADDR + ad
[PATCH v4 13/22] arm64: allow kernel Image to be loaded anywhere in physical memory
This relaxes the kernel Image placement requirements, so that it may be placed at any 2 MB aligned offset in physical memory. This is accomplished by ignoring PHYS_OFFSET when installing memblocks, and accounting for the apparent virtual offset of the kernel Image. As a result, virtual address references below PAGE_OFFSET are correctly mapped onto physical references into the kernel Image regardless of where it sits in memory. Note that limiting memory using mem= is not unambiguous anymore after this change, considering that the kernel may be at the top of physical memory, and clipping from the bottom rather than the top will discard any 32-bit DMA addressable memory first. To deal with this, the handling of mem= is reimplemented to clip top down, but take special care not to clip memory that covers the kernel image. Since mem= should not be considered a production feature, a panic notifier handler is installed that dumps the memory limit at panic time if one was set. Signed-off-by: Ard Biesheuvel --- Documentation/arm64/booting.txt | 20 ++-- arch/arm64/include/asm/boot.h | 6 ++ arch/arm64/include/asm/kernel-pgtable.h | 11 +++ arch/arm64/include/asm/kvm_asm.h| 2 +- arch/arm64/include/asm/memory.h | 15 +-- arch/arm64/kernel/head.S| 6 +- arch/arm64/kernel/image.h | 13 ++- arch/arm64/mm/init.c| 96 +++- arch/arm64/mm/mmu.c | 3 + 9 files changed, 150 insertions(+), 22 deletions(-) diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt index 701d39d3171a..67484067ce4f 100644 --- a/Documentation/arm64/booting.txt +++ b/Documentation/arm64/booting.txt @@ -109,7 +109,13 @@ Header notes: 1 - 4K 2 - 16K 3 - 64K - Bits 3-63: Reserved. + Bit 3: Kernel physical placement + 0 - 2MB aligned base should be as close as possible + to the base of DRAM, since memory below it is not + accessible + 1 - 2MB aligned base may be anywhere in physical + memory + Bits 4-63: Reserved. - When image_size is zero, a bootloader should attempt to keep as much memory as possible free for use by the kernel immediately after the @@ -117,14 +123,14 @@ Header notes: depending on selected features, and is effectively unbound. The Image must be placed text_offset bytes from a 2MB aligned base -address near the start of usable system RAM and called there. Memory -below that base address is currently unusable by Linux, and therefore it -is strongly recommended that this location is the start of system RAM. -The region between the 2 MB aligned base address and the start of the -image has no special significance to the kernel, and may be used for -other purposes. +address anywhere in usable system RAM and called there. The region +between the 2 MB aligned base address and the start of the image has no +special significance to the kernel, and may be used for other purposes. At least image_size bytes from the start of the image must be free for use by the kernel. +NOTE: versions prior to v4.6 cannot make use of memory below the +physical offset of the Image so it is recommended that the Image be +placed as close as possible to the start of system RAM. Any memory described to the kernel (even that below the start of the image) which is not marked as reserved from the kernel (e.g., with a diff --git a/arch/arm64/include/asm/boot.h b/arch/arm64/include/asm/boot.h index 81151b67b26b..ebf2481889c3 100644 --- a/arch/arm64/include/asm/boot.h +++ b/arch/arm64/include/asm/boot.h @@ -11,4 +11,10 @@ #define MIN_FDT_ALIGN 8 #define MAX_FDT_SIZE SZ_2M +/* + * arm64 requires the kernel image to placed + * TEXT_OFFSET bytes beyond a 2 MB aligned base + */ +#define MIN_KIMG_ALIGN SZ_2M + #endif diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h index a459714ee29e..33bd22956cba 100644 --- a/arch/arm64/include/asm/kernel-pgtable.h +++ b/arch/arm64/include/asm/kernel-pgtable.h @@ -79,5 +79,16 @@ #define SWAPPER_MM_MMUFLAGS(PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS) #endif +/* + * To make optimal use of block mappings when laying out the linear mapping, + * round down the base of physical memory to a size that can be mapped + * efficiently, i.e., either PUD_SIZE (4k) or PMD_SIZE (64k), or a multiple that + * can be mapped using contiguous bits in the page tables: 32 * PMD_SIZE (16k) + */ +#ifdef CONFIG_ARM64_64K_PAGES +#define ARM64_MEMSTART_ALIGN SZ_512M +#else +#define ARM64_MEMSTART_ALIGN SZ_1G +#endif #endif /* __ASM_KERNEL_PGTABLE_H */ diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 30e0e04c9f16..054ac25e7c2e 100644 --- a/arch/arm64/include/asm/kvm_asm.h
[PATCH v4 10/22] arm64: switch to relative exception tables
Instead of using absolute addresses for both the exception location and the fixup, use offsets relative to the exception table entry values. Not only does this cut the size of the exception table in half, it is also a prerequisite for KASLR, since absolute exception table entries are subject to dynamic relocation, which is incompatible with the sorting of the exception table that occurs at build time. This patch also introduces the _ASM_EXTABLE preprocessor macro (which exists on x86 as well) and its _asm_extable assembly counterpart, as shorthands to emit exception table entries. Acked-by: Will Deacon Signed-off-by: Ard Biesheuvel --- arch/arm64/include/asm/assembler.h | 15 +++--- arch/arm64/include/asm/futex.h | 12 +++- arch/arm64/include/asm/uaccess.h| 30 +++- arch/arm64/include/asm/word-at-a-time.h | 7 ++--- arch/arm64/kernel/armv8_deprecated.c| 7 ++--- arch/arm64/mm/extable.c | 2 +- scripts/sortextable.c | 2 +- 7 files changed, 38 insertions(+), 37 deletions(-) diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index bb7b72734c24..d8bfcc1ce923 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -94,12 +94,19 @@ dmb \opt .endm +/* + * Emit an entry into the exception table + */ + .macro _asm_extable, from, to + .pushsection__ex_table, "a" + .align 3 + .long (\from - .), (\to - .) + .popsection + .endm + #define USER(l, x...) \ : x; \ - .section __ex_table,"a";\ - .align 3; \ - .quad b,l;\ - .previous + _asm_extableb, l /* * Register aliases. diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h index 007a69fc4f40..1ab15a3b5a0e 100644 --- a/arch/arm64/include/asm/futex.h +++ b/arch/arm64/include/asm/futex.h @@ -42,10 +42,8 @@ "4:mov %w0, %w5\n" \ " b 3b\n" \ " .popsection\n" \ -" .pushsection __ex_table,\"a\"\n"\ -" .align 3\n"\ -" .quad 1b, 4b, 2b, 4b\n" \ -" .popsection\n" \ + _ASM_EXTABLE(1b, 4b)\ + _ASM_EXTABLE(2b, 4b)\ ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN,\ CONFIG_ARM64_PAN) \ : "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp) \ @@ -133,10 +131,8 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, "4:mov %w0, %w6\n" " b 3b\n" " .popsection\n" -" .pushsection __ex_table,\"a\"\n" -" .align 3\n" -" .quad 1b, 4b, 2b, 4b\n" -" .popsection\n" + _ASM_EXTABLE(1b, 4b) + _ASM_EXTABLE(2b, 4b) : "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp) : "r" (oldval), "r" (newval), "Ir" (-EFAULT) : "memory"); diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index b2ede967fe7d..dc11577fab7e 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -36,11 +36,11 @@ #define VERIFY_WRITE 1 /* - * The exception table consists of pairs of addresses: the first is the - * address of an instruction that is allowed to fault, and the second is - * the address at which the program should continue. No registers are - * modified, so it is entirely up to the continuation code to figure out - * what to do. + * The exception table consists of pairs of relative offsets: the first + * is the relative offset to an instruction that is allowed to fault, + * and the second is the relative offset at which the program should + * continue. No registers are modified, so it is entirely up to the + * continuation code to figure out what to do. * * All the routines below use bits of fixup code that are out of line * with the main instruction path. This means when everything is well, @@ -50,9 +50,11 @@ struct exception_table_entry { - unsigned long insn, fixup; + int insn, fixup; }; +#define ARCH_HAS_RELATIVE_EXTABLE + extern int fixup_exception(struct pt_regs *regs); #define KERNEL_DS (-1UL) @@ -105,6 +107,12 @@ static inline void set_fs(mm_segment_t fs) #define access_ok(type, addr, size)__range_ok(addr, size) #define user_addr_max get_fs +#define _ASM_EXTABLE(from, to)
[PATCH v4 14/22] arm64: make asm/elf.h available to asm files
This reshuffles some code in asm/elf.h and puts a #ifndef __ASSEMBLY__ around its C definitions so that the CPP defines can be used in asm source files as well. Signed-off-by: Ard Biesheuvel --- arch/arm64/include/asm/elf.h | 22 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index faad6df49e5b..435f55952e1f 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -24,15 +24,6 @@ #include #include -typedef unsigned long elf_greg_t; - -#define ELF_NGREG (sizeof(struct user_pt_regs) / sizeof(elf_greg_t)) -#define ELF_CORE_COPY_REGS(dest, regs) \ - *(struct user_pt_regs *)&(dest) = (regs)->user_regs; - -typedef elf_greg_t elf_gregset_t[ELF_NGREG]; -typedef struct user_fpsimd_state elf_fpregset_t; - /* * AArch64 static relocation types. */ @@ -127,6 +118,17 @@ typedef struct user_fpsimd_state elf_fpregset_t; */ #define ELF_ET_DYN_BASE(2 * TASK_SIZE_64 / 3) +#ifndef __ASSEMBLY__ + +typedef unsigned long elf_greg_t; + +#define ELF_NGREG (sizeof(struct user_pt_regs) / sizeof(elf_greg_t)) +#define ELF_CORE_COPY_REGS(dest, regs) \ + *(struct user_pt_regs *)&(dest) = (regs)->user_regs; + +typedef elf_greg_t elf_gregset_t[ELF_NGREG]; +typedef struct user_fpsimd_state elf_fpregset_t; + /* * When the program starts, a1 contains a pointer to a function to be * registered with atexit, as per the SVR4 ABI. A value of 0 means we have no @@ -186,4 +188,6 @@ extern int aarch32_setup_vectors_page(struct linux_binprm *bprm, #endif /* CONFIG_COMPAT */ +#endif /* !__ASSEMBLY__ */ + #endif -- 2.5.0
[PATCH v4 11/22] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
Unfortunately, the current way of using the linker to emit build time constants into the Image header will no longer work once we switch to the use of PIE executables. The reason is that such constants are emitted into the binary using R_AARCH64_ABS64 relocations, which are resolved at runtime, not at build time, and the places targeted by those relocations will contain zeroes before that. So refactor the endian swapping linker script constant generation code so that it emits the upper and lower 32-bit words separately. Signed-off-by: Ard Biesheuvel --- arch/arm64/include/asm/assembler.h | 11 +++ arch/arm64/kernel/head.S | 6 ++-- arch/arm64/kernel/image.h | 32 3 files changed, 33 insertions(+), 16 deletions(-) diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index d8bfcc1ce923..70f7b9e04598 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -222,4 +222,15 @@ lr .reqx30 // link register .size __pi_##x, . - x;\ ENDPROC(x) + /* +* Emit a 64-bit absolute little endian symbol reference in a way that +* ensures that it will be resolved at build time, even when building a +* PIE binary. This requires cooperation from the linker script, which +* must emit the lo32/hi32 halves individually. +*/ + .macro le64sym, sym + .long \sym\()_lo32 + .long \sym\()_hi32 + .endm + #endif /* __ASM_ASSEMBLER_H */ diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 85181cb60f46..2a39d4ab02bf 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -83,9 +83,9 @@ efi_head: b stext // branch to kernel start, magic .long 0 // reserved #endif - .quad _kernel_offset_le // Image load offset from start of RAM, little-endian - .quad _kernel_size_le // Effective size of kernel image, little-endian - .quad _kernel_flags_le// Informative flags, little-endian + le64sym _kernel_offset_le // Image load offset from start of RAM, little-endian + le64sym _kernel_size_le // Effective size of kernel image, little-endian + le64sym _kernel_flags_le// Informative flags, little-endian .quad 0 // reserved .quad 0 // reserved .quad 0 // reserved diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h index bc2abb8b1599..b64c9b0a4492 100644 --- a/arch/arm64/kernel/image.h +++ b/arch/arm64/kernel/image.h @@ -26,21 +26,27 @@ * There aren't any ELF relocations we can use to endian-swap values known only * at link time (e.g. the subtraction of two symbol addresses), so we must get * the linker to endian-swap certain values before emitting them. + * + * Note that, in order for this to work when building the ELF64 PIE executable + * (for KASLR), these values should not be referenced via R_AARCH64_ABS64 + * relocations, since these are fixed up at runtime rather than at build time + * when PIE is in effect. So we need to split them up in 32-bit high and low + * words. */ #ifdef CONFIG_CPU_BIG_ENDIAN -#define DATA_LE64(data)\ - data) & 0x00ff) << 56) |\ -(((data) & 0xff00) << 40) |\ -(((data) & 0x00ff) << 24) |\ -(((data) & 0xff00) << 8) |\ -(((data) & 0x00ff) >> 8) |\ -(((data) & 0xff00) >> 24) |\ -(((data) & 0x00ff) >> 40) |\ -(((data) & 0xff00) >> 56)) +#define DATA_LE32(data)\ + data) & 0x00ff) << 24) |\ +(((data) & 0xff00) << 8) |\ +(((data) & 0x00ff) >> 8) |\ +(((data) & 0xff00) >> 24)) #else -#define DATA_LE64(data) ((data) & 0x) +#define DATA_LE32(data) ((data) & 0x) #endif +#define DEFINE_IMAGE_LE64(sym, data) \ + sym##_lo32 = DATA_LE32((data) & 0x);\ + sym##_hi32 = DATA_LE32((data) >> 32) + #ifdef CONFIG_CPU_BIG_ENDIAN #define __HEAD_FLAG_BE 1 #else @@ -58,9 +64,9 @@ * endian swapped in head.S, all are done here for consistency. */ #define HEAD_SYMBOLS \ - _kernel_size_le = DATA_LE64(_end - _text); \ - _kernel_offset_le = DATA_LE64(TEXT_OFFSET); \ - _kernel_flags_le= DATA_LE64(__HEAD_FLAGS); + DEFINE_IMAGE_LE64(_kernel_size_le, _end - _text); \ + DEFINE_IMAGE_
[PATCH v4 04/22] arm64: decouple early fixmap init from linear mapping
Since the early fixmap page tables are populated using pages that are part of the static footprint of the kernel, they are covered by the initial kernel mapping, and we can refer to them without using __va/__pa translations, which are tied to the linear mapping. Since the fixmap page tables are disjoint from the kernel mapping up to the top level pgd entry, we can refer to bm_pte[] directly, and there is no need to walk the page tables and perform __pa()/__va() translations at each step. Reviewed-by: Mark Rutland Signed-off-by: Ard Biesheuvel --- arch/arm64/mm/mmu.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 7711554a94f4..cb3a7bdb4e23 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -583,7 +583,7 @@ static inline pud_t * fixmap_pud(unsigned long addr) BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd)); - return pud_offset(pgd, addr); + return pud_offset_kimg(pgd, addr); } static inline pmd_t * fixmap_pmd(unsigned long addr) @@ -592,16 +592,12 @@ static inline pmd_t * fixmap_pmd(unsigned long addr) BUG_ON(pud_none(*pud) || pud_bad(*pud)); - return pmd_offset(pud, addr); + return pmd_offset_kimg(pud, addr); } static inline pte_t * fixmap_pte(unsigned long addr) { - pmd_t *pmd = fixmap_pmd(addr); - - BUG_ON(pmd_none(*pmd) || pmd_bad(*pmd)); - - return pte_offset_kernel(pmd, addr); + return &bm_pte[pte_index(addr)]; } void __init early_fixmap_init(void) @@ -613,14 +609,14 @@ void __init early_fixmap_init(void) pgd = pgd_offset_k(addr); pgd_populate(&init_mm, pgd, bm_pud); - pud = pud_offset(pgd, addr); + pud = fixmap_pud(addr); pud_populate(&init_mm, pud, bm_pmd); - pmd = pmd_offset(pud, addr); + pmd = fixmap_pmd(addr); pmd_populate_kernel(&init_mm, pmd, bm_pte); /* * The boot-ioremap range spans multiple pmds, for which -* we are not preparted: +* we are not prepared: */ BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT) != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT)); -- 2.5.0
[PATCH v4 08/22] arm64: add support for module PLTs
This adds support for emitting PLTs at module load time for relative branches that are out of range. This is a prerequisite for KASLR, which may place the kernel and the modules anywhere in the vmalloc area, making it more likely that branch target offsets exceed the maximum range of +/- 128 MB. Signed-off-by: Ard Biesheuvel --- In this version, I removed the distinction between relocations against .init executable sections and ordinary executable sections. The reason is that it is hardly worth the trouble, given that .init.text usually does not contain that many far branches, and this version now only reserves PLT entry space for jump and call relocations against undefined symbols (since symbols defined in the same module can be assumed to be within +/- 128 MB) For example, the mac80211.ko module (which is fairly sizable at ~400 KB) built with -mcmodel=large gives the following relocation counts: relocsbranches unique !local .text 3925 3347 518219 .init.text 11 8 7 1 .exit.text4 4 4 1 .text.unlikely 81 6736 17 ('unique' means branches to unique type/symbol/addend combos, of which !local is the subset referring to undefined symbols) IOW, we are only emitting a single PLT entry for the .init sections, and we are better off just adding it to the core PLT section instead. --- arch/arm64/Kconfig | 9 + arch/arm64/Makefile | 6 +- arch/arm64/include/asm/module.h | 11 ++ arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/module-plts.c | 201 arch/arm64/kernel/module.c | 12 ++ arch/arm64/kernel/module.lds| 3 + 7 files changed, 242 insertions(+), 1 deletion(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index cd767fa3037a..141f65ab0ed5 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -394,6 +394,7 @@ config ARM64_ERRATUM_843419 bool "Cortex-A53: 843419: A load or store might access an incorrect address" depends on MODULES default y + select ARM64_MODULE_CMODEL_LARGE help This option builds kernel modules using the large memory model in order to avoid the use of the ADRP instruction, which can cause @@ -753,6 +754,14 @@ config ARM64_LSE_ATOMICS endmenu +config ARM64_MODULE_CMODEL_LARGE + bool + +config ARM64_MODULE_PLTS + bool + select ARM64_MODULE_CMODEL_LARGE + select HAVE_MOD_ARCH_SPECIFIC + endmenu menu "Boot options" diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index cd822d8454c0..db462980c6be 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -41,10 +41,14 @@ endif CHECKFLAGS += -D__aarch64__ -ifeq ($(CONFIG_ARM64_ERRATUM_843419), y) +ifeq ($(CONFIG_ARM64_MODULE_CMODEL_LARGE), y) KBUILD_CFLAGS_MODULE += -mcmodel=large endif +ifeq ($(CONFIG_ARM64_MODULE_PLTS),y) +KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/arm64/kernel/module.lds +endif + # Default value head-y := arch/arm64/kernel/head.o diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h index e80e232b730e..8652fb613304 100644 --- a/arch/arm64/include/asm/module.h +++ b/arch/arm64/include/asm/module.h @@ -20,4 +20,15 @@ #define MODULE_ARCH_VERMAGIC "aarch64" +#ifdef CONFIG_ARM64_MODULE_PLTS +struct mod_arch_specific { + struct elf64_shdr *plt; + int plt_num_entries; + int plt_max_entries; +}; +#endif + +u64 module_emit_plt_entry(struct module *mod, const Elf64_Rela *rela, + Elf64_Sym *sym); + #endif /* __ASM_MODULE_H */ diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 83cd7e68e83b..e2f0a755beaa 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -30,6 +30,7 @@ arm64-obj-$(CONFIG_COMPAT)+= sys32.o kuser32.o signal32.o \ ../../arm/kernel/opcodes.o arm64-obj-$(CONFIG_FUNCTION_TRACER)+= ftrace.o entry-ftrace.o arm64-obj-$(CONFIG_MODULES)+= arm64ksyms.o module.o +arm64-obj-$(CONFIG_ARM64_MODULE_PLTS) += module-plts.o arm64-obj-$(CONFIG_PERF_EVENTS)+= perf_regs.o perf_callchain.o arm64-obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c new file mode 100644 index ..1ce90d8450ae --- /dev/null +++ b/arch/arm64/kernel/module-plts.c @@ -0,0 +1,201 @@ +/* + * Copyright (C) 2014-2016 Linaro Ltd. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include +#include + +struct
Re: [kernel-hardening] Re: [PATCH 0/2] sysctl: allow CLONE_NEWUSER to be disabled
Quoting Josh Boyer (jwbo...@fedoraproject.org): > On Mon, Jan 25, 2016 at 11:57 PM, Eric W. Biederman > wrote: > > Kees Cook writes: > > > >> On Mon, Jan 25, 2016 at 11:33 AM, Eric W. Biederman > >> wrote: > >>> Kees Cook writes: > > Well, I don't know about less weird, but it would leave a unneeded > hole in the permission checks. > >>> > >>> To be clear the current patch has my: > >>> > >>> Nacked-by: "Eric W. Biederman" > >>> > >>> The code is buggy, and poorly thought through. Your lack of interest in > >>> fixing the bugs in your patch is distressing. > >> > >> I'm not sure where you see me having a "lack of interest". The > >> existing cap-checking sysctls have a corner-case bug, which is > >> orthogonal to this change. > > > > That certainly doesn't sound like you have any plans to change anything > > there. > > > >>> So broken code, not willing to fix. No. We are not merging this sysctl. > >> > >> I think you're jumping to conclusions. :) > > > > I think I am the maintainer. > > > > What you are proposing is very much something that is only of interst to > > people who are not using user namespaces. It is fatally flawed as > > a way to avoid new attack surfaces for people who don't care as the > > sysctl leaves user namespaces enabled by default. It is fatally flawed > > as remediation to recommend to people to change if a new user namespace > > related but is discovered. Any running process that happens to be > > created while user namespace creation was enabled will continue to > > exist. Effectively a reboot will be required as part of a mitigation. > > Many sysadmins will get that wrong. > > > > I can't possibly see your sysctl as proposed achieving it's goals. A > > person has to be entirely too aware of subtlety and nuance to use it > > effectively. > > What you're saying is true for the "oh crap" case of a new userns > related CVE being found. However, there is the case where sysadmins > know for a fact that a set of machines should not allow user > namespaces to be enabled. Currently they have 2 choices, 1) use their Hi - can you give a specific example of this? (Where users really should not be able to use them - not where they might not need them) I think it'll help the discussion tremendously. Because so far the only good arguments I've seen have been about actual bugs in the user namespaces, which would not warrant a designed-in permanent disable switch. If there are good use cases where such a disable switch will always be needed (and compiling out can't satisfy) that'd be helpful. thanks, -serge
Re: [PATCH v9 2/6] Documentation, dt, arm64/arm: dt bindings for numa.
Hi Rob, Mark, On Wed, Jan 20, 2016 at 7:48 PM, Rob Herring wrote: > On Mon, Jan 18, 2016 at 10:06:01PM +0530, Ganapatrao Kulkarni wrote: >> DT bindings for numa mapping of memory, cores and IOs. >> >> Reviewed-by: Robert Richter >> Signed-off-by: Ganapatrao Kulkarni >> --- >> Documentation/devicetree/bindings/arm/numa.txt | 272 >> + >> 1 file changed, 272 insertions(+) >> create mode 100644 Documentation/devicetree/bindings/arm/numa.txt > > This is looks okay to me, but some cosmetic things on the example. can i have your Ack please? > >> +== >> +4 - Example dts >> +== >> + >> +2 sockets system consists of 2 boards connected through ccn bus and >> +each board having one socket/soc of 8 cpus, memory and pci bus. >> + >> + memory@00c0 { > > Drop the leading 0s on unit addresses. i will correct these in next version. > >> + device_type = "memory"; >> + reg = <0x0 0x00c0 0x0 0x8000>; >> + /* node 0 */ >> + numa-node-id = <0>; >> + }; >> + >> + memory@100 { >> + device_type = "memory"; >> + reg = <0x100 0x 0x0 0x8000>; >> + /* node 1 */ >> + numa-node-id = <1>; >> + }; >> + >> + cpus { >> + #address-cells = <2>; >> + #size-cells = <0>; >> + >> + cpu@000 { > > Same here (leaving one of course). > >> + device_type = "cpu"; >> + compatible = "arm,armv8"; >> + reg = <0x0 0x000>; >> + enable-method = "psci"; >> + /* node 0 */ >> + numa-node-id = <0>; >> + }; >> + cpu@001 { > > and so on... > >> + device_type = "cpu"; >> + compatible = "arm,armv8"; >> + reg = <0x0 0x001>; > > Either all leading 0s or none. > >> + reg = <0x0 0x008>; >> + enable-method = "psci"; >> + /* node 1 */ > > Kind of a pointless comment. > > Wouldn't each cluster of cpus for a given numa node be in a different > cpu affinity? Certainly not required by the architecture, but the common > case at least. > >> + numa-node-id = <1>; >> + }; > > [...] > >> + pcie0: pcie0@0x8480, { > > Drop the 0x and the comma. > >> + compatible = "arm,armv8"; >> + device_type = "pci"; >> + bus-range = <0 255>; >> + #size-cells = <2>; >> + #address-cells = <3>; >> + reg = <0x8480 0x 0 0x1000>; /* Configuration >> space */ >> + ranges = <0x0300 0x8010 0x 0x8010 0x 0x70 >> 0x>; >> + /* node 0 */ >> + numa-node-id = <0>; >> +}; >> + >> + pcie1: pcie1@0x9480, { > > ditto > >> + compatible = "arm,armv8"; >> + device_type = "pci"; >> + bus-range = <0 255>; >> + #size-cells = <2>; >> + #address-cells = <3>; >> + reg = <0x9480 0x 0 0x1000>; /* Configuration >> space */ >> + ranges = <0x0300 0x9010 0x 0x9010 0x 0x70 >> 0x>; >> + /* node 1 */ >> + numa-node-id = <1>; >> +}; >> + >> + distance-map { >> + compatible = "numa-distance-map-v1"; >> + distance-matrix = <0 0 10>, >> + <0 1 20>, >> + <1 1 10>; >> + }; >> -- >> 1.8.1.4 thanks Ganapat >> >> -- >> To unsubscribe from this list: send the line "unsubscribe devicetree" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 06/22] arm64: add support for ioremap() block mappings
This wires up the existing generic huge-vmap feature, which allows ioremap() to use PMD or PUD sized block mappings. Signed-off-by: Ard Biesheuvel --- Documentation/features/vm/huge-vmap/arch-support.txt | 2 +- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/memory.h | 6 +++ arch/arm64/mm/mmu.c | 41 4 files changed, 49 insertions(+), 1 deletion(-) diff --git a/Documentation/features/vm/huge-vmap/arch-support.txt b/Documentation/features/vm/huge-vmap/arch-support.txt index af6816bccb43..df1d1f3c9af2 100644 --- a/Documentation/features/vm/huge-vmap/arch-support.txt +++ b/Documentation/features/vm/huge-vmap/arch-support.txt @@ -9,7 +9,7 @@ | alpha: | TODO | | arc: | TODO | | arm: | TODO | -| arm64: | TODO | +| arm64: | ok | | avr32: | TODO | |blackfin: | TODO | | c6x: | TODO | diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 8cc62289a63e..cd767fa3037a 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -49,6 +49,7 @@ config ARM64 select HAVE_ALIGNED_STRUCT_PAGE if SLUB select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_BITREVERSE + select HAVE_ARCH_HUGE_VMAP select HAVE_ARCH_JUMP_LABEL select HAVE_ARCH_KASAN if SPARSEMEM_VMEMMAP && !(ARM64_16K_PAGES && ARM64_VA_BITS_48) select HAVE_ARCH_KGDB diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index bea9631b34a8..aebc739f5a11 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -106,6 +106,12 @@ #define MT_S2_NORMAL 0xf #define MT_S2_DEVICE_nGnRE 0x1 +#ifdef CONFIG_ARM64_4K_PAGES +#define IOREMAP_MAX_ORDER (PUD_SHIFT) +#else +#define IOREMAP_MAX_ORDER (PMD_SHIFT) +#endif + #ifndef __ASSEMBLY__ extern phys_addr_t memstart_addr; diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index cb3a7bdb4e23..b84915723ea0 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -710,3 +710,44 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys) return dt_virt; } + +int __init arch_ioremap_pud_supported(void) +{ + /* only 4k granule supports level 1 block mappings */ + return IS_ENABLED(CONFIG_ARM64_4K_PAGES); +} + +int __init arch_ioremap_pmd_supported(void) +{ + return 1; +} + +int pud_set_huge(pud_t *pud, phys_addr_t phys, pgprot_t prot) +{ + BUG_ON(phys & ~PUD_MASK); + set_pud(pud, __pud(phys | PUD_TYPE_SECT | pgprot_val(mk_sect_prot(prot; + return 1; +} + +int pmd_set_huge(pmd_t *pmd, phys_addr_t phys, pgprot_t prot) +{ + BUG_ON(phys & ~PMD_MASK); + set_pmd(pmd, __pmd(phys | PMD_TYPE_SECT | pgprot_val(mk_sect_prot(prot; + return 1; +} + +int pud_clear_huge(pud_t *pud) +{ + if (!pud_sect(*pud)) + return 0; + pud_clear(pud); + return 1; +} + +int pmd_clear_huge(pmd_t *pmd) +{ + if (!pmd_sect(*pmd)) + return 0; + pmd_clear(pmd); + return 1; +} -- 2.5.0
[PATCH v4 02/22] arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
This introduces the preprocessor symbol KIMAGE_VADDR which will serve as the symbolic virtual base of the kernel region, i.e., the kernel's virtual offset will be KIMAGE_VADDR + TEXT_OFFSET. For now, we define it as being equal to PAGE_OFFSET, but in the future, it will be moved below it once we move the kernel virtual mapping out of the linear mapping. Reviewed-by: Mark Rutland Signed-off-by: Ard Biesheuvel --- arch/arm64/include/asm/memory.h | 10 -- arch/arm64/kernel/head.S| 2 +- arch/arm64/kernel/vmlinux.lds.S | 4 ++-- 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index 853953cd1f08..bea9631b34a8 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -51,7 +51,8 @@ #define VA_BITS(CONFIG_ARM64_VA_BITS) #define VA_START (UL(0x) << VA_BITS) #define PAGE_OFFSET(UL(0x) << (VA_BITS - 1)) -#define MODULES_END(PAGE_OFFSET) +#define KIMAGE_VADDR (PAGE_OFFSET) +#define MODULES_END(KIMAGE_VADDR) #define MODULES_VADDR (MODULES_END - SZ_64M) #define PCI_IO_END (MODULES_VADDR - SZ_2M) #define PCI_IO_START (PCI_IO_END - PCI_IO_SIZE) @@ -75,8 +76,13 @@ * private definitions which should NOT be used outside memory.h * files. Use virt_to_phys/phys_to_virt/__pa/__va instead. */ -#define __virt_to_phys(x) (((phys_addr_t)(x) - PAGE_OFFSET + PHYS_OFFSET)) +#define __virt_to_phys(x) ({ \ + phys_addr_t __x = (phys_addr_t)(x); \ + __x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :\ +(__x - KIMAGE_VADDR + PHYS_OFFSET); }) + #define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET)) +#define __phys_to_kimg(x) ((unsigned long)((x) - PHYS_OFFSET + KIMAGE_VADDR)) /* * Convert a page to/from a physical address diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 51370f8808bc..85181cb60f46 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -389,7 +389,7 @@ __create_page_tables: * Map the kernel image (starting with PHYS_OFFSET). */ mov x0, x26 // swapper_pg_dir - mov x5, #PAGE_OFFSET + ldr x5, =KIMAGE_VADDR create_pgd_entry x0, x5, x3, x6 ldr x6, =KERNEL_END // __va(KERNEL_END) mov x3, x24 // phys offset diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S index b78a3c772294..282e3e64a17e 100644 --- a/arch/arm64/kernel/vmlinux.lds.S +++ b/arch/arm64/kernel/vmlinux.lds.S @@ -89,7 +89,7 @@ SECTIONS *(.discard.*) } - . = PAGE_OFFSET + TEXT_OFFSET; + . = KIMAGE_VADDR + TEXT_OFFSET; .head.text : { _text = .; @@ -186,4 +186,4 @@ ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K, /* * If padding is applied before .head.text, virt<->phys conversions will fail. */ -ASSERT(_text == (PAGE_OFFSET + TEXT_OFFSET), "HEAD is misaligned") +ASSERT(_text == (KIMAGE_VADDR + TEXT_OFFSET), "HEAD is misaligned") -- 2.5.0
[PATCH v4 05/22] arm64: kvm: deal with kernel symbols outside of linear mapping
KVM on arm64 uses a fixed offset between the linear mapping at EL1 and the HYP mapping at EL2. Before we can move the kernel virtual mapping out of the linear mapping, we have to make sure that references to kernel symbols that are accessed via the HYP mapping are translated to their linear equivalent. Reviewed-by: Mark Rutland Signed-off-by: Ard Biesheuvel --- arch/arm/include/asm/kvm_asm.h| 2 ++ arch/arm/kvm/arm.c| 8 +--- arch/arm64/include/asm/kvm_asm.h | 2 ++ arch/arm64/include/asm/kvm_host.h | 8 +--- arch/arm64/kvm/hyp.S | 6 +++--- 5 files changed, 17 insertions(+), 9 deletions(-) diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h index 194c91b610ff..c35c349da069 100644 --- a/arch/arm/include/asm/kvm_asm.h +++ b/arch/arm/include/asm/kvm_asm.h @@ -79,6 +79,8 @@ #define rr_lo_hi(a1, a2) a1, a2 #endif +#define kvm_ksym_ref(kva) (kva) + #ifndef __ASSEMBLY__ struct kvm; struct kvm_vcpu; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index dda1959f0dde..975da6cfbf59 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -982,7 +982,7 @@ static void cpu_init_hyp_mode(void *dummy) pgd_ptr = kvm_mmu_get_httbr(); stack_page = __this_cpu_read(kvm_arm_hyp_stack_page); hyp_stack_ptr = stack_page + PAGE_SIZE; - vector_ptr = (unsigned long)__kvm_hyp_vector; + vector_ptr = (unsigned long)kvm_ksym_ref(__kvm_hyp_vector); __cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr); @@ -1074,13 +1074,15 @@ static int init_hyp_mode(void) /* * Map the Hyp-code called directly from the host */ - err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end); + err = create_hyp_mappings(kvm_ksym_ref(__kvm_hyp_code_start), + kvm_ksym_ref(__kvm_hyp_code_end)); if (err) { kvm_err("Cannot map world-switch code\n"); goto out_free_mappings; } - err = create_hyp_mappings(__start_rodata, __end_rodata); + err = create_hyp_mappings(kvm_ksym_ref(__start_rodata), + kvm_ksym_ref(__end_rodata)); if (err) { kvm_err("Cannot map rodata section\n"); goto out_free_mappings; diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 52b777b7d407..30e0e04c9f16 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -26,6 +26,8 @@ #define KVM_ARM64_DEBUG_DIRTY_SHIFT0 #define KVM_ARM64_DEBUG_DIRTY (1 << KVM_ARM64_DEBUG_DIRTY_SHIFT) +#define kvm_ksym_ref(sym) ((void *)&sym - KIMAGE_VADDR + PAGE_OFFSET) + #ifndef __ASSEMBLY__ struct kvm; struct kvm_vcpu; diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 689d4c95e12f..e3d67ff8798b 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -307,7 +307,7 @@ static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm, struct kvm_vcpu *kvm_arm_get_running_vcpu(void); struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void); -u64 kvm_call_hyp(void *hypfn, ...); +u64 __kvm_call_hyp(void *hypfn, ...); void force_vm_exit(const cpumask_t *mask); void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot); @@ -328,8 +328,8 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr, * Call initialization code, and switch to the full blown * HYP code. */ - kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr, -hyp_stack_ptr, vector_ptr); + __kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr, + hyp_stack_ptr, vector_ptr); } static inline void kvm_arch_hardware_disable(void) {} @@ -343,4 +343,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu); void kvm_arm_clear_debug(struct kvm_vcpu *vcpu); void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu); +#define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__) + #endif /* __ARM64_KVM_HOST_H__ */ diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S index 0ccdcbbef3c2..870578f84b1c 100644 --- a/arch/arm64/kvm/hyp.S +++ b/arch/arm64/kvm/hyp.S @@ -20,7 +20,7 @@ #include /* - * u64 kvm_call_hyp(void *hypfn, ...); + * u64 __kvm_call_hyp(void *hypfn, ...); * * This is not really a variadic function in the classic C-way and care must * be taken when calling this to ensure parameters are passed in registers @@ -37,7 +37,7 @@ * used to implement __hyp_get_vectors in the same way as in * arch/arm64/kernel/hyp_stub.S. */ -ENTRY(kvm_call_hyp) +ENTRY(__kvm_call_hyp) hvc #0 ret -ENDPROC(kvm_call_hyp) +ENDPROC(__kvm_call_hyp) -- 2.5.0
[PATCH v4 01/22] of/fdt: make memblock minimum physical address arch configurable
By default, early_init_dt_add_memory_arch() ignores memory below the base of the kernel image since it won't be addressable via the linear mapping. However, this is not appropriate anymore once we decouple the kernel text mapping from the linear mapping, so archs may want to drop the low limit entirely. So allow the minimum to be overridden by setting MIN_MEMBLOCK_ADDR. Acked-by: Mark Rutland Acked-by: Rob Herring Signed-off-by: Ard Biesheuvel --- drivers/of/fdt.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 655f79db7899..1f98156f8996 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -976,13 +976,16 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname, } #ifdef CONFIG_HAVE_MEMBLOCK +#ifndef MIN_MEMBLOCK_ADDR +#define MIN_MEMBLOCK_ADDR __pa(PAGE_OFFSET) +#endif #ifndef MAX_MEMBLOCK_ADDR #define MAX_MEMBLOCK_ADDR ((phys_addr_t)~0) #endif void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size) { - const u64 phys_offset = __pa(PAGE_OFFSET); + const u64 phys_offset = MIN_MEMBLOCK_ADDR; if (!PAGE_ALIGNED(base)) { if (size < PAGE_SIZE - (base & ~PAGE_MASK)) { -- 2.5.0
[PATCH v4 00/22] arm64: implement support for KASLR
This series implements KASLR for arm64, by building the kernel as a PIE executable that can relocate itself at runtime, and moving it to a random offset in the vmalloc area. v2 and up also implement physical randomization, i.e., it allows the kernel to deal with being loaded at any physical offset (modulo the required alignment), and invokes the EFI_RNG_PROTOCOL from the UEFI stub to obtain random bits and perform the actual randomization of the physical load address. Changes since v3: - Implemented base relative kallsyms address tables. This saves 250 KB of permanent .rodata (for a defconfig build), but more importantly, saves about 1.4 MB of __init data in the dynamic relocation table. This patch has been picked up by akpm in the mean time, but it is reproduced here for completeness - Reimplemented the KASLR init code in C. This has a couple of benefits, i.e., we can now parse the 'nokaslr' command line option in a timely fashion, and we can pass the KASLR random seed via /chosen/kaslr-seed rather than via x1, which means fewer changes to bootloaders. This is implemented using a two pass approach, i.e., the kernel is booted without KASLR to a state where it can invoke ordinary C code, and it re-enters the early boot code to recreate the kernel mappings at an offset if it finds the prerequisite data in the FDT. Note that this requires some mild refactoring to ensure that early_fixmap_init and fixmap_remap_fdt can cope with being called twice. - Added a patch to enable the inappropriately named huge-vmap feature for arm64, which allows ioremap() (but not vmap or vmalloc) to use block mappings. This should be an improvement in itself, but the significance for this series is that it also allows the __init region to be unmapped entirely via unmap_kernel_range() [which complains about block mappings without this feature enabled] - Split the implementation of CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE into separate patches. - Randomize the module region independently from the core kernel. It is chosen such that it covers the [_stext, _etext] interval of the core kernel to avoid using PLT entries unless we really have to. - Update the module PLT patch to replace the O(n^2) searches with sorting, and use a single .plt section for __init and ordinary code. - Added panic notifiers to report the KASLR offset, and whether a mem= limit is in effect. - Replaced Mark Rutland's asm/elf.g split off patch with one that puts the C declarations between #ifndef __ASSEMBLY__/#endif - Incorporated feedback (and tags) from Mark Rutland and Matt Fleming - Minor tweaks and fixes. Changes since v2: - Incorporated feedback from Marc Zyngier into the KVM patch (#5) - Dropped the pgdir section and the patch that memblock_reserve()'s the kernel sections at a smaller granularity. This is no longer necessary with the pgdir section gone. This also fixes an issue spotted by James Morse where the fixmap page tables are not zeroed correctly; these have been moved back to the .bss section. - Got rid of all ifdef'ery regarding the number of translation levels in the changed .c files, by introducing new definitions in pgtable.h (#3, #6) - Fixed KAsan support, which was broken by all earlier versions. - Moved module region along with the virtually randomized kernel, so that module addresses become unpredictable as well, and we only have to rely on veneers in the PLTs when the module region is exhausted (which is somewhat more likely since the module region is now shared with other uses of the vmalloc area) - Added support for the 'nokaslr' command line option. This affects the randomization performed by the stub, and results in a warning if passed while the bootloader also presented a random seed for virtual KASLR in register x1. - The .text/.rodata sections of the kernel are no longer aliased in the linear region with a writable mapping. - Added a separate image header flag for kernel images that may be loaded at any 2 MB aligned offset (+ TEXT_OFFSET) - The KASLR displacement is now corrected if it results in the kernel image intersecting a PUD/PMD boundary (4k and 16k/64k granule kernels, respectively) - Split out UEFI stub random routines into separate patches. - Implemented a weight based EFI random allocation routine so that each suitable offset in available memory is equally likely to be selected (as suggested by Kees Cook) - Reused CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE instead of introducing new Kconfig symbols to describe the same functionality. - Reimplemented mem= logic so memory is clipped from the top first. Changes since v1/RFC: - This series now implements fully independent virtual and physical address randomization at load time. I have recycled some patches from this series: http://thread.gmane.org/gmane.linux.ports.arm.kernel/455151, and updated the final UEFI stub patch to randomize the physical address as well. - Added a patch to de
Re: [RFC PATCH V2 3/8] genirq: Add runtime power management support for IRQ chips
On Fri, 22 Jan 2016, Ulf Hansson wrote: > Here's a small collection of drivers that I easily picked up as > candidates for using these new APIs. > In principle, they would invoke these new APIs from their runtime PM > callbacks. > > drivers/spi/spi-atmel.c > drivers/spi/spi-pl022.c > drivers/i2c/busses/i2c-omap.c > drivers/i2c/busses/i2c-nomadik.c > drivers/i2c/busses/i2c-sh_mobile.c > drivers/mmc/host/mtk-sd.c > drivers/mmc/host/mmci.c Instead of adding those calls to each driver, we can be smart and flag the interrupt as AUTO_RUNTIME_SUSPEND or such. So the runtime_pm core can handle it when invoking the dev_pm_ops->runtime_suspend()/resume() callbacks. Unfortunately the devres stuff is exceptionally bad to be used for this, but with some surgery it should be doable. Thanks, tglx
Re: [kernel-hardening] Re: [PATCH 0/2] sysctl: allow CLONE_NEWUSER to be disabled
Quoting Josh Boyer (jwbo...@fedoraproject.org): > On Tue, Jan 26, 2016 at 9:46 AM, Austin S. Hemmelgarn > wrote: > > On 2016-01-26 09:38, Josh Boyer wrote: > >> > >> On Mon, Jan 25, 2016 at 11:57 PM, Eric W. Biederman > >> wrote: > >>> > >>> Kees Cook writes: > >>> > On Mon, Jan 25, 2016 at 11:33 AM, Eric W. Biederman > wrote: > > > > Kees Cook writes: > >> > >> > >> Well, I don't know about less weird, but it would leave a unneeded > >> hole in the permission checks. > > > > > > To be clear the current patch has my: > > > > Nacked-by: "Eric W. Biederman" > > > > The code is buggy, and poorly thought through. Your lack of interest > > in > > fixing the bugs in your patch is distressing. > > > I'm not sure where you see me having a "lack of interest". The > existing cap-checking sysctls have a corner-case bug, which is > orthogonal to this change. > >>> > >>> > >>> That certainly doesn't sound like you have any plans to change anything > >>> there. > >>> > > So broken code, not willing to fix. No. We are not merging this > > sysctl. > > > I think you're jumping to conclusions. :) > >>> > >>> > >>> I think I am the maintainer. > >>> > >>> What you are proposing is very much something that is only of interst to > >>> people who are not using user namespaces. It is fatally flawed as > >>> a way to avoid new attack surfaces for people who don't care as the > >>> sysctl leaves user namespaces enabled by default. It is fatally flawed > >>> as remediation to recommend to people to change if a new user namespace > >>> related but is discovered. Any running process that happens to be > >>> created while user namespace creation was enabled will continue to > >>> exist. Effectively a reboot will be required as part of a mitigation. > >>> Many sysadmins will get that wrong. > >>> > >>> I can't possibly see your sysctl as proposed achieving it's goals. A > >>> person has to be entirely too aware of subtlety and nuance to use it > >>> effectively. > >> > >> > >> What you're saying is true for the "oh crap" case of a new userns > >> related CVE being found. However, there is the case where sysadmins > >> know for a fact that a set of machines should not allow user > >> namespaces to be enabled. Currently they have 2 choices, 1) use their > >> distro kernel as-is, which may not meet their goal of having userns > >> disabled, or 2) rebuild their kernel to disable it, which may > >> invalidate any support contracts they have. > >> > >> I tend to agree with you on the lack of value around runtime > >> mitigation, but allowing an admin to toggle this as a blatant on/off > >> switch on reboot does have value. > >> > This feature is already implemented by two distros, and likely wanted > by others. We cannot ignore that. The sysctl default doesn't change > the existing behavior, so this doesn't get in your way at all. Can you > please respond to my earlier email where I rebutted each of your > arguments against it? Just saying "no" and putting words in my mouth > isn't very productive. > >>> > >>> > >>> Calling people who make mistakes insane is not a rebuttal. In security > >>> usability matters, and your sysctl has low usability. > >>> > >>> Further you seem to have missed something crucial in your understanding. > >>> As was explained earlier the sysctl was added to ubuntu to allow early > >>> adopters to experiment not as a long term way of managing user > >>> namespaces. > >>> > >>> > >>> What sounds like a generally useful feature that would cover your use > >>> case and many others is a per user limit on the number of user > >>> namespaces users may create. > >> > >> > >> Where that number may be zero? I don't see how that is really any > >> better than a sysctl. Could you elaborate? > > > > It's a better option because it would allow better configurability. Take for > > example a single user desktop system with some network daemons. On such a > > system, the actual login used for the graphical environment by the user > > should be allowed at least a few user namespaces, because some software > > depends on them for security (Chrome for example, as well as some distro's > > build systems), but system users should be limited to at most one if they > > need it, and ideally zero, so that remote exploits couldn't give access to a > > user namespace. > > > > Conversely, on a server system, it's not unreasonable to completely disable > > user namespaces for almost everything, except for giving one to services > > that use them properly for sand-boxing. > > OK, so better granularity. Fine. > > > I will state though that I only feel this is a better solution given that > > two criteria are met: > > 1. You can set 0 as the limit. > > 2. You can configure this without needing some special software (this in > > particular means that seccomp is not an option). > > I
[PATCH] hisi_sas: fix v1 hw check for slot error
Completion header bit CMPLT_HDR_RSPNS_XFRD flags whether the response frame is received into host memory, and not whether the response frame has an error. As such, change the decision on whether a slot has an error. Also redundant check on CMPLT_HDR_CMD_CMPLT_MSK is removed. Fixes: 27a3f229 ("hisi_sas: Add cq interrupt handler") Signed-off-by: John Garry Tested-by: Ricardo Salveti diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c index 057fdeb..eea24d7 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c @@ -1289,13 +1289,10 @@ static int slot_complete_v1_hw(struct hisi_hba *hisi_hba, goto out; } - if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK) { - if (!(cmplt_hdr_data & CMPLT_HDR_CMD_CMPLT_MSK) || - !(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) - ts->stat = SAS_DATA_OVERRUN; - else - slot_err_v1_hw(hisi_hba, task, slot); + if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK && + !(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) { + slot_err_v1_hw(hisi_hba, task, slot); goto out; } -- 1.9.1
Re: [PATCH v2] Add hard/soft lockup debugger entry points
On 01/26/2016 12:04 PM, Jeff Merkey wrote: This patch adds an export which can be set by system debuggers to direct the hard lockup and soft lockup detector to trigger a breakpoint exception and enter a debugger if one is active. It is assumed that if someone sets this variable, then an breakpoint handler of some sort will be actively loaded or registered via the notify die handler chain. This addition is extremely useful for debugging hard and soft lockups real time and quickly from a console debugger. Signed-off-by: Jeff Merkey --- kernel/watchdog.c | 10 ++ 1 file changed, 10 insertions(+) You probably should reach out to someone who uses this stuff more regularly - I actually wonder if the kgdb_breakpoint() API is the right thing, though, not the internal arch_kgdb_breakpoint(). Of course any of these strategies also assume you are building the kernel with CONFIG_KGDB set, and I'm pretty sure will cause your build to fail if it isn't. You likely need to guard this stuff locally within watchdog.c for !CONFIG_KGDB. -- Chris Metcalf, EZChip Semiconductor http://www.ezchip.com
Re: [v3,11/41] mips: reuse asm-generic/barrier.h
On Wed, Jan 27, 2016 at 12:52:07AM +0800, Boqun Feng wrote: > I recall that last time you and Linus came into a conclusion that even > on Alpha, a barrier for read->write with data dependency is unnecessary: > > http://article.gmane.org/gmane.linux.kernel/2077661 > > And in an earlier mail of that thread, Linus made his point that > smp_read_barrier_depends() should only be used to order read->read. > > So right now, are we going to extend the semantics of > smp_read_barrier_depends()? Can we just make smp_read_barrier_depends() > still only work for read->read, and assume all the architectures won't > reorder read->write with data dependency, so that the code above having > a smp_rmb() also works? That discussions was about control dependencies. So writes that _depend_ on a prior read having an explicit value. So something like: struct foo *x = READ_ONCE(*ptr); smp_read_barrier_depends() if (x->val == 5) x->bar = 5; In that case, the load of x->val must be complete and its value determined _before_ the store to x->bar can happen. This is distinct from: struct foo *x = READ_ONCE(*ptr); smp_read_barrier_depends(); x->bar = 5; And its the second case where smp_read_barrier_depends() read->write order matters.
Re: [PATCH v2] perf: Synchronously cleanup child events
On Tue, Jan 26, 2016 at 05:16:37PM +0100, Peter Zijlstra wrote: > > +struct file *perf_event_get(unsigned int fd) > > { > > + struct file *file; > > > > + file = fget_raw(fd); > > fget_raw() to guarantee the return value isn't NULL? afaict the O_PATH > stuff does not apply to perf events, so you'd put any fd for which the > distinction matters anyway. > > > + if (file->f_op != &perf_fops) { > > + fput(file); > > + return ERR_PTR(-EBADF); > > + } > > > > + return file; > > } It is not possible for one thread to concurrently call close() while this thread tries to fget() ? In which case, we must check the return value anyway?
Re: [PATCH 1/2] timer: add setup_deferrable_timer macro
On Tue, 12 Jan 2016, Lucas Stach wrote: > Add the trivial missing macro to setup a deferrable timer. > > Signed-off-by: Lucas Stach Acked-by: Thomas Gleixner @Rafael: Feel free to pick that up along with the other one.
RE: [PATCH 00/10] KVM PCIe/MSI passthrough on ARM/ARM64
Hello! I'd like just to clarify some things for myself and better wrap my head around it... > On x86 all accesses to the 1MB PA region [FEE0_h - FEF0_000h] are directed > as interrupt messages: accesses to this special PA window directly target the > APIC configuration space and not DRAM, meaning the downstream IOMMU is > bypassed. So, this is effectively the same as always having hardwired 1:1 mappings on all IOMMUs, isn't it ? If so, then we can't we just do the same, just by forcing similar 1:1 mapping? This is what i tried to do in my patchset. All of you are talking about a situation which arises when we are emulating different machine with different physical addresses layout. And e. g. if our host has MSI at 0xABADCAFE, our target could have valid RAM at the same location, and we need to handle it somehow, therefore we have to move our MSI window out of target's RAM. But how does this work on a PC then? What if our host is PC, and we want to emulate some ARM board, which has RAM at FE00 ? Or does it mean that PC architecture is flawed and can reliably handle PCI passthrough only for itself ? Kind regards, Pavel Fedin Senior Engineer Samsung Electronics Research center Russia
Re: N900 sleep mode (in 4.5-rc0, if that matters)
* Pavel Machek [160126 06:01]: > > It seems like I have rather lot of blocking bits: > > 1fff 48005020 (fa005020) cm_idlest_per blocking bits: 0007e000 Looks like most of these are for GPIO banks, that's OK those get saved and restored in the idle loop. Here bit 18 UART4 is a mystery though.. It's uart4 on 36xx but reserved on 34xx. I do have that too on my n900, but it's hitting off mode with v4.4. > ffdffe8d 48004a20 (fa004a20) cm_idlest1_core blocking bits: 00200072 > 000d 48004a28 (fa004a28) cm_idlest3_core > > cm_idlest1_core changes periodicall often, to 00218072. The rest seems > constant. For cm_idlest1_core 42 is the answer.. Here you have bits 4 and 5 blocking which is for OTG and it's PHY. That's a known issue with musb and setting pm_runtime_irq_safe() on the MUSB parent. If you do rmmod omap2430 and phy-twl4030usb chances are the LEDs will start going off assuming the McSPI bit goes low with WLAN idling. Looks like we have some regression with v4.5-rc1 where n900 is not hitting deeper idle states though. I'll run git bisect between v4.4..v4.5-rc1. Regards, Tony
Re: Nokia N900: musb is in wrong state after boot
* Pali Rohár [160126 06:35]: > On Thursday 21 January 2016 12:30:13 Tony Lindgren wrote: > > * joerg Reisenweber [160121 11:35]: > > > On Thu 21 January 2016 11:21:13 Tony Lindgren wrote: > > > > Do you have some pointer > > > > to the "certain resistor value on ID to GND" spec? Is it maybe part of > > > > the carkit related parts of the USB spec? > > > > > > ""Three additional ID pin states are defined[4] at the nominal resistance > > > values of 124 kΩ, 68 kΩ, and 36.5 kΩ, with respect to the ground pin. > > > These > > > permit the device to work with USB Accessory Charger Adapters that allows > > > the > > > OTG device to be attached to both a charger and another device > > > simultaneously. > > > [6]"" > > > https://en.wikipedia.org/wiki/USB_On-The-Go#OTG_micro_plugs > > > > OK thanks. So it's the "accessory charger" part of the > > battery charging specification 1.1. > > So, Tony, do you have some idea what needs to be changed and how to fix > peripheral mode after boot on Nokia N900? No, I'm waiting to hear an educated guess from Felipe on this one. > First I would like to have fully working peripheral mode on Nokia N900 > and then we can try to hack host mode (if possible). > > But peripheral mode is a must due to development, because it provides > usb network or usb tty. Totally. Regards, Tony
Re: [PATCH v2] Add hard/soft lockup debugger entry points
On 1/26/16, Chris Metcalf wrote: > On 01/26/2016 12:04 PM, Jeff Merkey wrote: >> This patch adds an export which can be set by system debuggers to direct >> the hard lockup and soft lockup detector to trigger a breakpoint >> exception >> and enter a debugger if one is active. It is assumed that if someone >> sets this variable, then an breakpoint handler of some sort will be >> actively >> loaded or registered via the notify die handler chain. >> >> This addition is extremely useful for debugging hard and soft lockups >> real time and quickly from a console debugger. >> >> Signed-off-by: Jeff Merkey >> --- >> kernel/watchdog.c | 10 ++ >> 1 file changed, 10 insertions(+) > > You probably should reach out to someone who uses this stuff more > regularly - I actually wonder if the kgdb_breakpoint() API is the > right thing, though, not the internal arch_kgdb_breakpoint(). > > Of course any of these strategies also assume you are building > the kernel with CONFIG_KGDB set, and I'm pretty sure will cause > your build to fail if it isn't. You likely need to guard this > stuff locally within watchdog.c for !CONFIG_KGDB. > > -- > Chris Metcalf, EZChip Semiconductor > http://www.ezchip.com > > I'll go review that code carefully. What would ideal is to have the breakpoint abstraction independent of kgdb entirely. That would not be a simple patch but more involved. We really just need the breakpoint instruction for each arch, not all of kgdb and it should not depend on KGB being enabled. I looked over those includes and this may work, but you are correct, it's not squeaky clean to do it that way. I'll get to work on v3. :-) Jeff
Re: [Openipmi-developer] ipmi_si feature request: SMBIOS-based autoloading
On Tue, Jan 26, 2016 at 5:43 AM, Corey Minyard wrote: > > On 01/26/2016 07:32 AM, Corey Minyard wrote: >> >> On 01/24/2016 07:45 PM, Andy Lutomirski wrote: >>> >>> ipmi_si doesn't autoload on systems where it's found via SMBIOS. >>> Could that be fixed? >> >> I'm not really sure. I kind of assumed this was handled in userland >> like the ACPI tables. I don't think there are many systems that have >> SMBIOS and not ACPI, so I'm not sure of the impact here or what >> to do. I've never seen it handled in userland adequately on Fedora, Ubuntu, or CentOS. FWIW, it might pay to have ipmi_si pull in ipmi_devintf as well. Then ipmitool would work out of the box. >> >>> If I were doing it, I'd suggest rigging up some code that's compiled >>> in to the main kernel even if ipmi_si is a module that creates the >>> platform device if the dmi device is there and then set up a modalias >>> so that the platofrm device causes ipmi_si to load. >>> >>> (In general, having the same driver create the platform device and >>> register the platform driver means that autoloading is unlikely to >>> work right. See arch/x86/kernel/pmem.c for an example of a weird >>> legacy device that gets this right.) >> >> This sounds like kind of a hack. It's a bit of a hack in that case. It does preserve the general driver model approach where a lower-level thing enumerates the system and instantiates devices and then a higher-level driver binds to the devices. >> >>> Alternatively, maybe /sys/firmware/dmi could learn how to advertise >>> modaliases. But that might be a giant mess to solve a tiny problem. >> >> This sounds like the right way, but you are probably right. Are >> there any other resources that could benefit from this? I"m >> guessing not. No clue. Jean might know. Jean? >> >> There is already a "dmi_save_ipmi_device" function that gets called >> when scanning the SMBIOS table (see drivers/firmware/dmi_scan.c). >> Maybe a tie-in there? That happens pretty early, though, I'm not >> sure if it's too early. >> >> Of course it would be easy to have a file like pmem.c that detects >> if an IPMI device is in the SMBIOS table and create a platform >> device for it. >> >> Are you willing to do this work? I'm willing to do some plumbing, but I'm not sure I want to dig deeply into the innards of ipmi_si initialization. >> >> -corey >> > Actually, there is some cleanup that has to occur here, let me look at this > a little bit. It looks like the driver currently decides how to talk to the hardware and then instantiates the platform device. For my approach to work, it would have to be refactored a bit: instantiate the platform device with the info about how to talk to hardware and then have the platform driver fish that info back out of the platform device. Is that what you're talking about? I also don't understand the distinction between ipmi_si and ipmi_bmc. --Andy
Re: [PATCH V4 16/16] ARM64: tegra: select PM_GENERIC_DOMAINS
On Thu, Jan 14, 2016 at 12:11:39PM +0100, Arnd Bergmann wrote: > On Thursday 14 January 2016 11:29:24 Thierry Reding wrote: > > > > It just occurred to me that none of these options really make much of a > > difference. As Jon mentioned once we merge this series a lot of features > > on Tegra will start to rely on PM_GENERIC_DOMAINS and hence PM. So if we > > do want to build a kernel with a maximum of Tegra features enabled (and > > I think a multi_v7_defconfig should include that) we'll end up with a PM > > dependency anyway, whether forced via select or implied via depends on. > > > > I'm beginning to wonder if PM really should be an option these days. The > > disadvantages of making it optional do outweigh the advantages in my > > opinion. I'm not saying that, in general, it's totally useless to build > > a kernel that has no PM support, but for the more specific case where > > you would want to enable multi-platform support I don't think there's > > much practical advantage in allowing !PM. One of the most common build > > warnings are triggered because of this option. Also multi-platform > > kernels are really big already, so much so that I doubt it would make a > > significant difference if we unconditionally built PM support. Also the > > chances are that we'll be seeing more and more SoCs support PM and rely > > on it, much like Tegra would with the addition of this series. > > > > I imagine that we could save ourselves a lot of headaches by simply > > enabling PM by default, whether that be via the PM Kconfig option or by > > selecting it from ARCH_TEGRA and any other architectures that may come > > to rely on it. Doing so would also reduce the amount of test coverage > > that we need to do, both at compile- and runtime. > > I think this needs some investigation. As a general policy, we should > not grow the kernel image size when moving from a traditional ARM > platform to an ARCH_MULTIPLATFORM one. If we make ARCH_TEGRA select PM, then moving to a multi-platform kernel isn't automatically going to increase the image size. The image size is only going to increase if you select ARCH_TEGRA to be part of the multi platform image. > This is somewhat contradicted by how we already require CONFIG_OF > to be set for multiplatform kernels, and that adds around 80kb > to the image size. Yeah, there's also a fair amount of per-SoC code that can't be built as a module and which will be included in multi-platform images when the corresponding ARCH_* symbol is enabled. But I think that's inevitable given the purpose of multi-platform images. > Looking at just the defconfig files, these are the ones that currently > do not set CONFIG_PM: > > build/acs5k_defconfig/.config:# CONFIG_PM is not set > build/acs5k_tiny_defconfig/.config:# CONFIG_PM is not set > build/axm55xx_defconfig/.config:# CONFIG_PM is not set > build/bcm2835_defconfig/.config:# CONFIG_PM is not set > build/clps711x_defconfig/.config:# CONFIG_PM is not set > build/ebsa110_defconfig/.config:# CONFIG_PM is not set > build/footbridge_defconfig/.config:# CONFIG_PM is not set > build/ks8695_defconfig/.config:# CONFIG_PM is not set > build/netwinder_defconfig/.config:# CONFIG_PM is not set > build/rpc_defconfig/.config:# CONFIG_PM is not set > build/u300_defconfig/.config:# CONFIG_PM is not set > build/vf610m4_defconfig/.config:# CONFIG_PM is not set > > The only ones among these are are actually multiplatform are axm55xx, > bcm2835, and u300. I see no downsides of force-enabling PM for > any of those, so we could decide to 'select PM' from > CONFIG_ARCH_MULTIPLATFORM. ARCH_MULTIPLATFORM selecting PM would include PM unconditionally, even if none of the selected platforms require it. In my opinion an explicit select from platforms that require PM would be cleaner. It could be that once we start doing that for a single platform others might follow. When this becomes common place it might be worth moving it up a level, but I think explicit dependencies would be better for starters. > The one usecase where we may want to have a modern machine without > CONFIG_PM is a minimal MACH_VIRT kernel for running in a virtual > machine or QEMU with minimal memory requirements, e.g. trying to > squeeze a large number of guests on a single host system. Right, but those won't be multi-platform kernels, right? If you know exactly that the kernel will run in a virtual machine and that you want to run lots of virtual machines, you probably want to build a custom kernel. Thierry signature.asc Description: PGP signature
Re: [PATCH] qe_ic: fix a buffer overflow error and add check elsewhere
On Thu, 21 Jan 2016, Zhao Qiang wrote: > 127 is the theoretical up boundary of QEIC number, > in fact there only be 44 qe_ic_info now. > add check to overflow for qe_ic_info How do you trigger that overflow? The above does not explain WHY we need these checks. > diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/soc/fsl/qe/qe_ic.c > index 5419527..90c00b7 100644 > --- a/drivers/soc/fsl/qe/qe_ic.c > +++ b/drivers/soc/fsl/qe/qe_ic.c Sigh. Another dump ground for SOC stuff? irq chip drivers belong into drivers/irqchip. Thanks, tglx
Re: [PATCH v5 10/14] serial: pic32_uart: Add PIC32 UART driver
On Tue, Jan 26, 2016 at 10:04:41AM -0700, Joshua Henderson wrote: > Hi Greg and Jiri, > > Ping! Need an ack for this or pull it upstream. The merge window _just_ ended, please give us a chance to catch up on patches to be reviewed. There's no reason you need a response for this right away, it can't be merged until 4.6-rc1, right? thanks, greg k-h
[PATCH] tile kgdb: fix bug in copy to gdb regs, and optimize memset
David Binderman pointed out that we were doing a full memset() of the gdb register buffer and then doing a memcpy() to it that was almost as big. This commit optimizes that by only doing a memset() of the registers that are intended to be zero. While making this change I noticed that we were not copying the link register (LR, number 55) due to a fencepost error in commit f419e6f63c5a ("arch: tile: kernel: kgdb.c: Use memcpy() instead of pointer copy one by one"), and I've corrected that as well. Reported-by: David Binderman Signed-off-by: Chris Metcalf --- arch/tile/kernel/kgdb.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/tile/kernel/kgdb.c b/arch/tile/kernel/kgdb.c index a506c2c28943..6ad99925900e 100644 --- a/arch/tile/kernel/kgdb.c +++ b/arch/tile/kernel/kgdb.c @@ -126,15 +126,15 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *task) { struct pt_regs *thread_regs; + const int NGPRS = TREG_LAST_GPR + 1; if (task == NULL) return; - /* Initialize to zero. */ - memset(gdb_regs, 0, NUMREGBYTES); - thread_regs = task_pt_regs(task); - memcpy(gdb_regs, thread_regs, TREG_LAST_GPR * sizeof(unsigned long)); + memcpy(gdb_regs, thread_regs, NGPRS * sizeof(unsigned long)); + memset(&gdb_regs[NGPRS], 0, + (TILEGX_PC_REGNUM - NGPRS) * sizeof(unsigned long)); gdb_regs[TILEGX_PC_REGNUM] = thread_regs->pc; gdb_regs[TILEGX_FAULTNUM_REGNUM] = thread_regs->faultnum; } -- 2.1.2
Re: [PATCH] workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue
On Thu, Dec 03, 2015 at 02:26:16PM -0500, Tejun Heo wrote: > Task or work item involved in memory reclaim trying to flush a > non-WQ_MEM_RECLAIM workqueue or one of its work items can lead to > deadlock. Trigger WARN_ONCE() if such conditions are detected. > > Signed-off-by: Tejun Heo > Cc: Peter Zijlstra > --- > Hello, > > So, something like this. Seems to work fine here. If there's no > objection, I'm gonna push it through wq/for-4.5. > > Thanks. > > kernel/workqueue.c | 35 +++ > 1 file changed, 35 insertions(+) > > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -2330,6 +2330,37 @@ repeat: > goto repeat; > } > > +/** > + * check_flush_dependency - check for flush dependency sanity > + * @target_wq: workqueue being flushed > + * @target_work: work item being flushed (NULL for workqueue flushes) > + * > + * %current is trying to flush the whole @target_wq or @target_work on it. > + * If @target_wq doesn't have %WQ_MEM_RECLAIM, verify that %current is not > + * reclaiming memory or running on a workqueue which doesn't have > + * %WQ_MEM_RECLAIM as that can break forward-progress guarantee leading to > + * a deadlock. > + */ > +static void check_flush_dependency(struct workqueue_struct *target_wq, > +struct work_struct *target_work) > +{ > + work_func_t target_func = target_work ? target_work->func : NULL; > + struct worker *worker; > + > + if (target_wq->flags & WQ_MEM_RECLAIM) > + return; > + > + worker = current_wq_worker(); > + > + WARN_ONCE(current->flags & PF_MEMALLOC, > + "workqueue: PF_MEMALLOC task %d(%s) is flushing > !WQ_MEM_RECLAIM %s:%pf", > + current->pid, current->comm, target_wq->name, target_func); > + WARN_ONCE(worker && (worker->current_pwq->wq->flags & WQ_MEM_RECLAIM), > + "workqueue: WQ_MEM_RECLAIM %s:%pf is flushing !WQ_MEM_RECLAIM > %s:%pf", > + worker->current_pwq->wq->name, worker->current_func, > + target_wq->name, target_func); > +} > + > struct wq_barrier { > struct work_struct work; > struct completion done; > @@ -2539,6 +2570,8 @@ void flush_workqueue(struct workqueue_st > list_add_tail(&this_flusher.list, &wq->flusher_overflow); > } > > + check_flush_dependency(wq, NULL); > + > mutex_unlock(&wq->mutex); > > wait_for_completion(&this_flusher.done); > @@ -2711,6 +2744,8 @@ static bool start_flush_work(struct work > pwq = worker->current_pwq; > } > > + check_flush_dependency(pwq->wq, work); > + > insert_wq_barrier(pwq, barr, work, worker); > spin_unlock_irq(&pool->lock); > I've started noticing the following during boot on some of the devices I work with: [4.723705] WARNING: CPU: 0 PID: 6 at kernel/workqueue.c:2361 check_flush_dependency+0x138/0x144() [4.736818] workqueue: WQ_MEM_RECLAIM deferwq:deferred_probe_work_func is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu [4.748099] Modules linked in: [4.751342] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.5.0-rc1-00018-g420fc292d9c7 #1 [4.759504] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) [4.765762] Workqueue: deferwq deferred_probe_work_func [4.771004] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [4.778746] [] (show_stack) from [] (dump_stack+0x94/0xd4) [4.785966] [] (dump_stack) from [] (warn_slowpath_common+0x80/0xb0) [4.794048] [] (warn_slowpath_common) from [] (warn_slowpath_fmt+0x30/0x40) [4.802736] [] (warn_slowpath_fmt) from [] (check_flush_dependency+0x138/0x144) [4.811769] [] (check_flush_dependency) from [] (flush_work+0x50/0x15c) [4.820112] [] (flush_work) from [] (lru_add_drain_all+0x130/0x180) [4.828110] [] (lru_add_drain_all) from [] (migrate_prep+0x8/0x10) [4.836018] [] (migrate_prep) from [] (alloc_contig_range+0xd8/0x338) [4.844186] [] (alloc_contig_range) from [] (cma_alloc+0xe0/0x1ac) [4.852093] [] (cma_alloc) from [] (__alloc_from_contiguous+0x38/0xd8) [4.860346] [] (__alloc_from_contiguous) from [] (__dma_alloc+0x240/0x278) [4.868944] [] (__dma_alloc) from [] (arm_dma_alloc+0x54/0x5c) [4.876506] [] (arm_dma_alloc) from [] (dmam_alloc_coherent+0xc0/0xec) [4.884764] [] (dmam_alloc_coherent) from [] (ahci_port_start+0x150/0x1dc) [4.893367] [] (ahci_port_start) from [] (ata_host_start.part.3+0xc8/0x1c8) [4.902055] [] (ata_host_start.part.3) from [] (ata_host_activate+0x50/0x148) [4.910919] [] (ata_host_activate) from [] (ahci_host_activate+0x44/0x114) [4.919523] [] (ahci_host_activate) from [] (ahci_platform_init_host+0x1d8/0x3c8) [4.928733] [] (ahci_platform_init_host) from [] (tegra_ahci_probe+0x448/0x4e8) [4.937770] [] (tegra_ahci_probe) from [] (platform_drv_probe+0x50/0xac) [4.946197] [] (platform_drv_probe) from [] (driver_probe_device+0x21
Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
On Tue, 2016-01-26 at 10:26 -0600, Christoph Lameter wrote: > On Tue, 26 Jan 2016, Mike Galbraith wrote: > > > On Tue, 2016-01-26 at 03:14 +0100, Mike Galbraith wrote: > > > > > Perf and RT say we don't want quiet_vmstat() in the idle loop > > > either. > > > > BTW, the perf numbers were not from an RT kernel, they were from my > > PREEMPT_VOLUNTARY desktop kernel. > > Can we move quiet_vmstat() elsewhere after we have checked that really > nothing else is going on soon? How would you check? Precognition doesn't work for mortals. -Mike
Re: [PATCH] Revert "Staging: panel: usleep_range is preferred over udelay"
On Mon, Jan 25, 2016 at 03:13:21PM +0530, Sudip Mukherjee wrote: > Apart from the mail which Ying Huang sent to me last week for another error > (which actually turned out to be this one), i saw the first report by > Ying Huang on November. > https://lkml.org/lkml/2015/11/2/93 Ying, could you CC the subsystem list for these reports? This one was CC'd to Sirnam, Greg and LKML. Sirnam is too new to understand what they mean, Greg is too busy, and only Sudip and Alan Cox read LKML. regards, dan carpenter
Re: [PATCH] ASoC: fsl: add imx-cs427x machine driver
On Tue, Jan 26, 2016 at 09:05:42AM -0200, Fabio Estevam wrote: > On Tue, Jan 26, 2016 at 9:01 AM, Felipe Ferreri Tonello > wrote: > > > Actually yes, thanks! I didn't know about the existence of fsl-asoc-card. > > > > I get some errors but I don't think they actually matter: > > [ 19.734494] fsl-asrc 2034000.asrc: driver registered > > [ 19.738707] fsl-asoc-card sound: ASoC: CPU DAI (null) not registered > > [ 19.738717] fsl-asoc-card sound: snd_soc_register_card failed (-517) > > [ 19.741556] fsl-asoc-card sound: ASoC: CPU DAI (null) not registered > > [ 19.741564] fsl-asoc-card sound: snd_soc_register_card failed (-517) Deferred probes shouldn't be a problem. > > [ 19.774591] fsl-asoc-card sound: cs4271-hifi <-> 2028000.ssi mapping ok Link is mapped. > > [ 19.781507] fsl-asoc-card sound: ASoC: no source widget found for > > ASRC-Playback > > [ 19.790065] fsl-asoc-card sound: ASoC: Failed to add route > > ASRC-Playback -> direct -> CPU-Playback > > [ 19.805349] fsl-asoc-card sound: ASoC: no sink widget found for > > ASRC-Capture > > [ 19.817222] fsl-asoc-card sound: ASoC: Failed to add route > > CPU-Capture -> direct -> ASRC-Capture You may ignore these "failures" if you don't have ASRC at all. It's optional based on the SoC design or platform requirement. Refer to: Documentation/devicetree/bindings/sound/fsl-asoc-card.txt But I think the log over here could be less confusing. I may try to clean it later. Thanks Nicolin
Re: [PATCH v4 14/22] arm64: make asm/elf.h available to asm files
On Tue, Jan 26, 2016 at 06:10:41PM +0100, Ard Biesheuvel wrote: > This reshuffles some code in asm/elf.h and puts a #ifndef __ASSEMBLY__ > around its C definitions so that the CPP defines can be used in asm > source files as well. > > Signed-off-by: Ard Biesheuvel Acked-by: Mark Rutland Mark. > --- > arch/arm64/include/asm/elf.h | 22 > 1 file changed, 13 insertions(+), 9 deletions(-) > > diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h > index faad6df49e5b..435f55952e1f 100644 > --- a/arch/arm64/include/asm/elf.h > +++ b/arch/arm64/include/asm/elf.h > @@ -24,15 +24,6 @@ > #include > #include > > -typedef unsigned long elf_greg_t; > - > -#define ELF_NGREG (sizeof(struct user_pt_regs) / sizeof(elf_greg_t)) > -#define ELF_CORE_COPY_REGS(dest, regs) \ > - *(struct user_pt_regs *)&(dest) = (regs)->user_regs; > - > -typedef elf_greg_t elf_gregset_t[ELF_NGREG]; > -typedef struct user_fpsimd_state elf_fpregset_t; > - > /* > * AArch64 static relocation types. > */ > @@ -127,6 +118,17 @@ typedef struct user_fpsimd_state elf_fpregset_t; > */ > #define ELF_ET_DYN_BASE (2 * TASK_SIZE_64 / 3) > > +#ifndef __ASSEMBLY__ > + > +typedef unsigned long elf_greg_t; > + > +#define ELF_NGREG (sizeof(struct user_pt_regs) / sizeof(elf_greg_t)) > +#define ELF_CORE_COPY_REGS(dest, regs) \ > + *(struct user_pt_regs *)&(dest) = (regs)->user_regs; > + > +typedef elf_greg_t elf_gregset_t[ELF_NGREG]; > +typedef struct user_fpsimd_state elf_fpregset_t; > + > /* > * When the program starts, a1 contains a pointer to a function to be > * registered with atexit, as per the SVR4 ABI. A value of 0 means we have > no > @@ -186,4 +188,6 @@ extern int aarch32_setup_vectors_page(struct linux_binprm > *bprm, > > #endif /* CONFIG_COMPAT */ > > +#endif /* !__ASSEMBLY__ */ > + > #endif > -- > 2.5.0 >
Re: [PATCH v8 4/5] iommu/mediatek: Add mt8173 IOMMU driver
On 26/01/16 04:12, Yong Wu wrote: This patch adds support for mediatek m4u (MultiMedia Memory Management Unit). Whilst I can't speak for the hardware specifics, I think we've got the API aspects and general shape of the code looking pretty much right by now - I don't see anything worth complaining about at this point. Reviewed-by: Robin Murphy Signed-off-by: Yong Wu --- drivers/iommu/Kconfig | 16 + drivers/iommu/Makefile| 1 + drivers/iommu/mtk_iommu.c | 732 ++ 3 files changed, 749 insertions(+) create mode 100644 drivers/iommu/mtk_iommu.c diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index a1e75cb..4922aa8 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -318,4 +318,20 @@ config S390_IOMMU help Support for the IOMMU API for s390 PCI devices. +config MTK_IOMMU + bool "MTK IOMMU Support" + depends on ARM || ARM64 + depends on ARCH_MEDIATEK || COMPILE_TEST + select IOMMU_API + select IOMMU_DMA + select IOMMU_IO_PGTABLE_ARMV7S + select MEMORY + select MTK_SMI + help + Support for the M4U on certain Mediatek SOCs. M4U is MultiMedia + Memory Management Unit. This option enables remapping of DMA memory + accesses for the multimedia subsystem. + + If unsure, say N here. + endif # IOMMU_SUPPORT diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 42fc0c2..44ae2e0 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -16,6 +16,7 @@ obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o obj-$(CONFIG_IRQ_REMAP) += intel_irq_remapping.o irq_remapping.o +obj-$(CONFIG_MTK_IOMMU) += mtk_iommu.o obj-$(CONFIG_OMAP_IOMMU) += omap-iommu.o obj-$(CONFIG_OMAP_IOMMU_DEBUG) += omap-iommu-debug.o obj-$(CONFIG_ROCKCHIP_IOMMU) += rockchip-iommu.o diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c new file mode 100644 index 000..60fe97b --- /dev/null +++ b/drivers/iommu/mtk_iommu.c @@ -0,0 +1,732 @@ +/* + * Copyright (c) 2015-2016 MediaTek Inc. + * Author: Yong Wu + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "io-pgtable.h" + +#define REG_MMU_PT_BASE_ADDR 0x000 + +#define REG_MMU_INVALIDATE 0x020 +#define F_ALL_INVLD0x2 +#define F_MMU_INV_RANGE0x1 + +#define REG_MMU_INVLD_START_A 0x024 +#define REG_MMU_INVLD_END_A0x028 + +#define REG_MMU_INV_SEL0x038 +#define F_INVLD_EN0BIT(0) +#define F_INVLD_EN1BIT(1) + +#define REG_MMU_STANDARD_AXI_MODE 0x048 +#define REG_MMU_DCM_DIS0x050 + +#define REG_MMU_CTRL_REG 0x110 +#define F_MMU_PREFETCH_RT_REPLACE_MOD BIT(4) +#define F_MMU_TF_PROTECT_SEL(prot) (((prot) & 0x3) << 5) + +#define REG_MMU_IVRP_PADDR 0x114 +#define F_MMU_IVRP_PA_SET(pa) ((pa) >> 1) + +#define REG_MMU_INT_CONTROL0 0x120 +#define F_L2_MULIT_HIT_EN BIT(0) +#define F_TABLE_WALK_FAULT_INT_EN BIT(1) +#define F_PREETCH_FIFO_OVERFLOW_INT_EN BIT(2) +#define F_MISS_FIFO_OVERFLOW_INT_ENBIT(3) +#define F_PREFETCH_FIFO_ERR_INT_EN BIT(5) +#define F_MISS_FIFO_ERR_INT_EN BIT(6) +#define F_INT_CLR_BIT BIT(12) + +#define REG_MMU_INT_MAIN_CONTROL 0x124 +#define F_INT_TRANSLATION_FAULTBIT(0) +#define F_INT_MAIN_MULTI_HIT_FAULT BIT(1) +#define F_INT_INVALID_PA_FAULT BIT(2) +#define F_INT_ENTRY_REPLACEMENT_FAULT BIT(3) +#define F_INT_TLB_MISS_FAULT BIT(4) +#define F_INT_MISS_TRANSACTION_FIFO_FAULT BIT(5) +#define F_INT_PRETETCH_TRANSATION_FIFO_FAULT BIT(6) + +#define REG_MMU_CPE_DONE 0x12C + +#define REG_MMU_FAULT_ST1 0x134 + +#define REG_MMU_FAULT_VA 0x13c +#define F_MMU_FAULT_VA_MSK 0xf000 +#define F_MMU_FAULT_VA_WRITE_BIT BIT(1) +#define F_MMU_FAULT_VA_LAYER
Re: [PATCH v4 05/22] arm64: kvm: deal with kernel symbols outside of linear mapping
On 26/01/16 17:10, Ard Biesheuvel wrote: > KVM on arm64 uses a fixed offset between the linear mapping at EL1 and > the HYP mapping at EL2. Before we can move the kernel virtual mapping > out of the linear mapping, we have to make sure that references to kernel > symbols that are accessed via the HYP mapping are translated to their > linear equivalent. > > Reviewed-by: Mark Rutland > Signed-off-by: Ard Biesheuvel Acked-by: Marc Zyngier M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v2] Add hard/soft lockup debugger entry points
Hi Jeff, [auto build test ERROR on v4.5-rc1] [also build test ERROR on next-20160125] [if your patch is applied to the wrong git tree, please drop us a note to help improving the system] url: https://github.com/0day-ci/linux/commits/Jeff-Merkey/Add-hard-soft-lockup-debugger-entry-points/20160127-010930 config: xtensa-allyesconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=xtensa All errors (new ones prefixed by >>): kernel/watchdog.c: In function 'watchdog_timer_fn': >> kernel/watchdog.c:489:4: error: implicit declaration of function >> 'arch_kgdb_breakpoint' [-Werror=implicit-function-declaration] arch_kgdb_breakpoint(); ^ cc1: some warnings being treated as errors vim +/arch_kgdb_breakpoint +489 kernel/watchdog.c 483 if (regs) 484 show_regs(regs); 485 else 486 dump_stack(); 487 488 if (debug_watchdog_lockups) > 489 arch_kgdb_breakpoint(); 490 491 if (softlockup_all_cpu_backtrace) { 492 /* Avoid generating two back traces for current --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [RFC] kconfig: a new command line tool to set configs
On Mon, Jun 15, 2015 at 02:55:46PM +0200, Michal Marek wrote: > On 2015-05-12 12:31, Dan Carpenter wrote: > > This is an ugly hack job I made last night and it barely works. It > > does two things: > > > > 1) Sometimes I want to search for a config so I have to load > > menuconfig, then search for the config entry, then exit. With > > this script I simply run: > > > > ./scripts/kconfig/kconfig search COMEDI > > > > 2) I quite often try to enable something by doing: > > > > echo CONFIG_FOO=y >> .config > > make oldconfig > > grep CONFIG_FOO .config > > > > The grep is to see if the setting worked. Now I can do: > > > > ./scripts/kconfig/kconfig set CONFIG_FOO=y > > The second use-case is provided by scripts/config already. It's is a lot > simpler shell script, but it's maybe good enough for such task. The scripts/config file doesn't check that the config is valid. It's the same as doing "echo CONFIG_FOO=y >> .config" which I was trying to fix. regards, dan carpenter
Re: [PATCH] cpufreq: Fix NULL reference crash while accessing policy->governor_data
On 26/01/16 09:57, Juri Lelli wrote: > Hi Viresh, > > On 25/01/16 22:33, Viresh Kumar wrote: > > There is a little race discovered by Juri, where we are able to: > > - create and read a sysfs file before policy->governor_data is being set > > to a non NULL value. > > OR > > - set policy->governor_data to NULL, and reading a file before being > > destroyed. > > > > And so such a crash is reported: > > > > Unable to handle kernel NULL pointer dereference at virtual address 000c > > pgd = edfc8000 > > [000c] *pgd=bfc8c835 > > Internal error: Oops: 17 [#1] SMP ARM > > Modules linked in: > > CPU: 4 PID: 1730 Comm: cat Not tainted 4.5.0-rc1+ #463 > > Hardware name: ARM-Versatile Express > > task: ee8e8480 ti: ee93 task.ti: ee93 > > PC is at show_ignore_nice_load_gov_pol+0x24/0x34 > > LR is at show+0x4c/0x60 > > pc : []lr : []psr: a0070013 > > sp : ee931dd0 ip : ee931de0 fp : ee931ddc > > r10: ee4bc290 r9 : 1000 r8 : ef2cb000 > > r7 : ee4bc200 r6 : ef2cb000 r5 : c0af57b0 r4 : ee4bc2e0 > > r3 : r2 : r1 : c0928df4 r0 : ef2cb000 > > Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none > > Control: 10c5387d Table: adfc806a DAC: 0051 > > Process cat (pid: 1730, stack limit = 0xee930210) > > Stack: (0xee931dd0 to 0xee932000) > > 1dc0: ee931dfc ee931de0 c058ae88 > > c058f1a4 > > 1de0: edce3bc0 c07bfca4 edce3ac0 1000 ee931e24 ee931e00 c01fcb90 > > c058ae48 > > 1e00: 0001 edce3bc0 0001 ee931e50 ee8ff480 ee931e34 > > ee931e28 > > 1e20: c01fb33c c01fcb0c ee931e8c ee931e38 c01a5210 c01fb314 ee931e9c > > ee931e48 > > 1e40: edce3bf0 befe4a00 ee931f78 01e4 > > > > 1e60: c00545a8 edce3ac0 1000 1000 befe4a00 ee931f78 > > 1000 > > 1e80: ee931ed4 ee931e90 c01fbed8 c01a5038 ed085a58 0002 > > > > 1ea0: c0ad72e4 ee931f78 ee8ff488 ee8ff480 c077f3fc 1000 befe4a00 > > ee931f78 > > 1ec0: 1000 ee931f44 ee931ed8 c017c328 c01fbdc4 1000 > > > > 1ee0: ee8ff480 1000 ee931f44 ee931ef8 c017c65c c03deb10 ee931fac > > ee931f08 > > 1f00: c0009270 c001f290 c0a8d968 ef2cb000 ef2cb000 ee8ff480 0020 > > ee8ff480 > > 1f20: ee8ff480 befe4a00 1000 ee931f78 ee931f74 > > ee931f48 > > 1f40: c017d1ec c017c2f8 c019c724 c019c684 ee8ff480 ee8ff480 1000 > > befe4a00 > > 1f60: ee931fa4 ee931f78 c017d2a8 c017d160 > > > > 1f80: 000a9f20 1000 befe4a00 0003 c000ffe4 ee93 > > ee931fa8 > > 1fa0: c000fe40 c017d264 000a9f20 1000 0003 befe4a00 1000 > > > > Unable to handle kernel NULL pointer dereference at virtual address 000c > > 1fc0: 000a9f20 1000 befe4a00 0003 0003 > > 0001 > > pgd = edfc4000 > > [000c] *pgd=bfcac835 > > 1fe0: befe49dc 000197f8 b6e35dfc 60070010 0003 3065b49d > > 134ac2c9 > > > > [] (show_ignore_nice_load_gov_pol) from [] > > (show+0x4c/0x60) > > [] (show) from [] (sysfs_kf_seq_show+0x90/0xfc) > > [] (sysfs_kf_seq_show) from [] > > (kernfs_seq_show+0x34/0x38) > > [] (kernfs_seq_show) from [] (seq_read+0x1e4/0x4e4) > > [] (seq_read) from [] (kernfs_fop_read+0x120/0x1a0) > > [] (kernfs_fop_read) from [] (__vfs_read+0x3c/0xe0) > > [] (__vfs_read) from [] (vfs_read+0x98/0x104) > > [] (vfs_read) from [] (SyS_read+0x50/0x90) > > [] (SyS_read) from [] (ret_fast_syscall+0x0/0x1c) > > Code: e5903044 e1a1 e3081df4 e34c1092 (e593300c) > > ---[ end trace 5994b9a5111f35ee ]--- > > > > Fix that by making sure, policy->governor_data is updated at the right > > places only. > > > > This patch fixes the crash I was seeing. > > Tested-by: Juri Lelli > > However, it exposes another problem (running the concurrent lockdep test > that you merged in your tests). After the test is finished there is > always at least one task spinning. Do you think it might be related to > the race we are already discussing in the thread related to my cleanups > patches? This is what I see: > > [ 37.963599] == > [ 37.982113] [ INFO: possible circular locking dependency detected ] > [ 38.000890] 4.5.0-rc1+ #468 Not tainted > [ 38.012368] --- > [ 38.031137] runme.sh/1710 is trying to acquire lock: > [ 38.045999] (s_active#41){.+}, at: [] > kernfs_remove_by_name_ns+0x4c/0x94 > [ 38.070063] > [ 38.070063] but task is already holding lock: > [ 38.087530] (od_dbs_cdata.mutex){+.+.+.}, at: [] > cpufreq_governor_dbs+0x34/0x5d0 > [ 38.112615] > [ 38.112615] which lock already depends on the new lock. > [ 38.112615] > [ 38.137114] > [ 38.137114] the existing dependency chain (in reverse order) is: > [ 38.159528] > -> #2 (od_dbs_cdata.mutex){+.+.+.}: > [ 38.173664][] mutex_lock_nested+0x7c/0x420 > [ 38.19063
Re: [PATCH v4 3/6] drm/dsi: Try to match non-DT dsi devices
On 1/21/2016 9:35 PM, Thierry Reding wrote: On Thu, Dec 10, 2015 at 06:11:37PM +0530, Archit Taneja wrote: Add a device name field in mipi_dsi_device. This name is different from the actual dev name (which is of the format "hostname.reg"). When the device is created via DT, this name is set to the modalias string. Why? What's the use of setting this to the modalias string? There is no use to set it in the DT case. It's just set for the sake of consistency between the non-DT and DT devices. For now, dsi->name is just used for device/driver matching for non-DT devices. There's no harm in setting it to a valid name for DT devices. In the non-DT case, the driver creating the DSI device provides the name by populating a filed in mipi_dsi_device_info. Matching for DT case would be as it was before. For the non-DT case, we compare the device and driver names. Other buses (like i2c/spi) "I2C" and "SPI", please. perform a non-DT match by comparing the device name and entries in the driver's id_table. Such a mechanism isn't used for the dsi bus. "DSI", please. Reviewed-by: Andrzej Hajda Signed-off-by: Archit Taneja --- drivers/gpu/drm/drm_mipi_dsi.c | 25 - include/drm/drm_mipi_dsi.h | 6 ++ 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_mipi_dsi.c b/drivers/gpu/drm/drm_mipi_dsi.c index 9434585..5a46802 100644 --- a/drivers/gpu/drm/drm_mipi_dsi.c +++ b/drivers/gpu/drm/drm_mipi_dsi.c @@ -45,9 +45,26 @@ * subset of the MIPI DCS command set. */ +static const struct device_type mipi_dsi_device_type; + static int mipi_dsi_device_match(struct device *dev, struct device_driver *drv) { - return of_driver_match_device(dev, drv); + struct mipi_dsi_device *dsi; + + if (dev->type == &mipi_dsi_device_type) + dsi = to_mipi_dsi_device(dev); + else + return 0; I think this check is redundant. I'm not aware of any case where the bus ->match() callback is called on a device that isn't on said bus. You're right. I'll drop this. + /* attempt OF style match */ + if (of_driver_match_device(dev, drv)) + return 1; + + /* compare dsi device and driver names */ "DSI", please. + if (!strcmp(dsi->name, drv->name)) + return 1; + + return 0; } static const struct dev_pm_ops mipi_dsi_device_pm_ops = { @@ -125,6 +142,7 @@ struct mipi_dsi_device *mipi_dsi_device_new(struct mipi_dsi_host *host, dsi->dev.type = &mipi_dsi_device_type; dsi->dev.of_node = info->node; dsi->channel = info->reg; + strlcpy(dsi->name, info->type, sizeof(dsi->name)); Don't you need to check info->type != NULL before doing this? It's not needed with the way struct mipi_dsi_device_info is currently defined. dev_set_name(&dsi->dev, "%s.%d", dev_name(host->dev), info->reg); @@ -148,6 +166,11 @@ of_mipi_dsi_device_add(struct mipi_dsi_host *host, struct device_node *node) int ret; u32 reg; + if (of_modalias_node(node, info.type, sizeof(info.type)) < 0) { + dev_err(dev, "modalias failure on %s\n", node->full_name); + return ERR_PTR(-EINVAL); + } + ret = of_property_read_u32(node, "reg", ®); if (ret) { dev_err(dev, "device node %s has no valid reg property: %d\n", diff --git a/include/drm/drm_mipi_dsi.h b/include/drm/drm_mipi_dsi.h index 90f4f3c..cb084af 100644 --- a/include/drm/drm_mipi_dsi.h +++ b/include/drm/drm_mipi_dsi.h @@ -139,8 +139,11 @@ enum mipi_dsi_pixel_format { MIPI_DSI_FMT_RGB565, }; +#define DSI_DEV_NAME_SIZE 20 + /** * struct mipi_dsi_device_info - template for creating a mipi_dsi_device + * @type: dsi peripheral chip type * @reg: DSI virtual channel assigned to peripheral * @node: pointer to OF device node * @@ -148,6 +151,7 @@ enum mipi_dsi_pixel_format { * DSI device */ struct mipi_dsi_device_info { + char type[DSI_DEV_NAME_SIZE]; Why limit ourselves to 20 characters? And why even so complicated? Isn't the type always static when someone specifies this? Couldn't we simply use a const char *name here instead? In the case where the device is registered via DT, we would need space allocated for 'type' to copy the modalias string into it. Having const char *type would make it a bit complicated for the DT path. The mipi_dsi_device_info struct was based on the i2c_board_info/spi_board_info structs, and they have type/modalias members declared as array of chars. I kind of followed suit without putting to much thought on member type. Archit Thierry -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH] qe_ic: fix a buffer overflow error and add check elsewhere
On Tue, 2016-01-26 at 18:31 +0100, Thomas Gleixner wrote: > On Thu, 21 Jan 2016, Zhao Qiang wrote: > > > 127 is the theoretical up boundary of QEIC number, > > in fact there only be 44 qe_ic_info now. > > add check to overflow for qe_ic_info > > How do you trigger that overflow? The above does not explain WHY we need > these > checks. The check in qe_ic_host_map can be triggered by bad data in a device tree. The set_priority functions do not appear to be used at all. > > > diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/soc/fsl/qe/qe_ic.c > > index 5419527..90c00b7 100644 > > --- a/drivers/soc/fsl/qe/qe_ic.c > > +++ b/drivers/soc/fsl/qe/qe_ic.c > > Sigh. Another dump ground for SOC stuff? Another? Where are the others, besides arch? > irq chip drivers belong into drivers/irqchip. Yes. This stuff was recently moved out of arch/powerpc to work toward being able to use it on ARM. I'm expecting followup patches to move things like this that belong elsewhere. -Scott
Re: [kernel-hardening] Re: [PATCH 0/2] sysctl: allow CLONE_NEWUSER to be disabled
On 2016-01-26 12:15, Serge Hallyn wrote: Quoting Josh Boyer (jwbo...@fedoraproject.org): On Mon, Jan 25, 2016 at 11:57 PM, Eric W. Biederman wrote: Kees Cook writes: On Mon, Jan 25, 2016 at 11:33 AM, Eric W. Biederman wrote: Kees Cook writes: Well, I don't know about less weird, but it would leave a unneeded hole in the permission checks. To be clear the current patch has my: Nacked-by: "Eric W. Biederman" The code is buggy, and poorly thought through. Your lack of interest in fixing the bugs in your patch is distressing. I'm not sure where you see me having a "lack of interest". The existing cap-checking sysctls have a corner-case bug, which is orthogonal to this change. That certainly doesn't sound like you have any plans to change anything there. So broken code, not willing to fix. No. We are not merging this sysctl. I think you're jumping to conclusions. :) I think I am the maintainer. What you are proposing is very much something that is only of interst to people who are not using user namespaces. It is fatally flawed as a way to avoid new attack surfaces for people who don't care as the sysctl leaves user namespaces enabled by default. It is fatally flawed as remediation to recommend to people to change if a new user namespace related but is discovered. Any running process that happens to be created while user namespace creation was enabled will continue to exist. Effectively a reboot will be required as part of a mitigation. Many sysadmins will get that wrong. I can't possibly see your sysctl as proposed achieving it's goals. A person has to be entirely too aware of subtlety and nuance to use it effectively. What you're saying is true for the "oh crap" case of a new userns related CVE being found. However, there is the case where sysadmins know for a fact that a set of machines should not allow user namespaces to be enabled. Currently they have 2 choices, 1) use their Hi - can you give a specific example of this? (Where users really should not be able to use them - not where they might not need them) I think it'll help the discussion tremendously. Because so far the only good arguments I've seen have been about actual bugs in the user namespaces, which would not warrant a designed-in permanent disable switch. If there are good use cases where such a disable switch will always be needed (and compiling out can't satisfy) that'd be helpful. In general, if a particular daemon provides a network service and does not use user namespaces for sand-boxing, it should not be allowed to use user namespaces, because those then become something else to potentially land an exploit through. ntpd, postfix, and most other regularly used network servers fall into this category. If you're hosting a shared system providing terminal server like usage where the users actually have shell access, then they probably should not be able to use user namespaces on the server. In essence, if there are cases where you know for certain that users do not need user namespaces, they should not be allowed to use them.
Kredit?
Brauchen Sie ein Geschäft oder einen persönlichen Kredit? Wir haben die Fähigkeit zur Geschäfts- und persönlichen Kredit / Darlehen an Unternehmen und Individuum zu 3 % Zins. Unser Firmensitz ist in China, Europa und Amerika. Für weitere Informationen Kontakt E-Mail: premsfinancial...@outlook.com Hinweis: Ihre Antwort nur auf diese E-Mail zu übermitteln: premsfinancial...@outlook.com Danke PFS / China Kredite
Re: net/irda: use-after-free in ircomm_param_request
On Mon, Jan 25, 2016 at 7:59 AM, Dmitry Vyukov wrote: > It seems that skb can be freed after skb_put() and spinlock unlock, > but ircomm_param_request reads skb->len afterwards: > > int ircomm_param_request(struct ircomm_tty_cb *self, __u8 pi, int flush) > { > ... > skb_put(skb, count); > spin_unlock_irqrestore(&self->spinlock, flags); > pr_debug("%s(), skb->len=%d\n", __func__ , skb->len); > This looks correct to me. We can either get rid of that debugging print or move it under spinlock.
Re: [PATCH v6 4/5] Add ioctl to retrieve USBTMC-USB488 capabilities
On Sun, Jan 24, 2016 at 08:42:54PM -0800, Greg KH wrote: > On Sun, Nov 29, 2015 at 01:35:51PM +0100, Dave Penkler wrote: > > This is a convenience function to obtain an instrument's > > capabilities from its file descriptor without having to access sysfs > > from the user program. > > > > Signed-off-by: Dave Penkler > > --- > > drivers/usb/class/usbtmc.c | 12 > > include/uapi/linux/usb/tmc.h | 21 ++--- > > 2 files changed, 30 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/usb/class/usbtmc.c b/drivers/usb/class/usbtmc.c > > index 3b85ef5..3a3264c 100644 > > --- a/drivers/usb/class/usbtmc.c > > +++ b/drivers/usb/class/usbtmc.c > > @@ -102,6 +102,9 @@ struct usbtmc_device_data { > > u16iin_wMaxPacketSize; > > atomic_t srq_asserted; > > > > + /* coalesced usb488_caps from usbtmc_dev_capabilities */ > > + u8 usb488_caps; > > + > > u8 rigol_quirk; > > > > /* attributes from the USB TMC spec for this device */ > > @@ -992,6 +995,7 @@ static int get_capabilities(struct usbtmc_device_data > > *data) > > data->capabilities.device_capabilities = buffer[5]; > > data->capabilities.usb488_interface_capabilities = buffer[14]; > > data->capabilities.usb488_device_capabilities = buffer[15]; > > + data->usb488_caps = (buffer[14] & 0x07) | ((buffer[15] & 0x0f) << 4); > > rv = 0; > > > > err_out: > > @@ -1167,6 +1171,14 @@ static long usbtmc_ioctl(struct file *file, unsigned > > int cmd, unsigned long arg) > > retval = usbtmc_ioctl_abort_bulk_in(data); > > break; > > > > + case USBTMC488_IOCTL_GET_CAPS: > > + retval = copy_to_user((void __user *)arg, > > + &data->usb488_caps, > > + sizeof(data->usb488_caps)); > > + if (retval) > > + retval = -EFAULT; > > + break; > > + > > case USBTMC488_IOCTL_READ_STB: > > retval = usbtmc488_ioctl_read_stb(data, arg); > > break; > > diff --git a/include/uapi/linux/usb/tmc.h b/include/uapi/linux/usb/tmc.h > > index 7e5ced8..1dc3af1 100644 > > --- a/include/uapi/linux/usb/tmc.h > > +++ b/include/uapi/linux/usb/tmc.h > > @@ -2,12 +2,14 @@ > > * Copyright (C) 2007 Stefan Kopp, Gechingen, Germany > > * Copyright (C) 2008 Novell, Inc. > > * Copyright (C) 2008 Greg Kroah-Hartman > > + * Copyright (C) 2015 Dave Penkler > > * > > * This file holds USB constants defined by the USB Device Class > > - * Definition for Test and Measurement devices published by the USB-IF. > > + * and USB488 Subclass Definitions for Test and Measurement devices > > + * published by the USB-IF. > > * > > - * It also has the ioctl definitions for the usbtmc kernel driver that > > - * userspace needs to know about. > > + * It also has the ioctl and capability definitions for the > > + * usbtmc kernel driver that userspace needs to know about. > > */ > > > > #ifndef __LINUX_USB_TMC_H > > @@ -40,6 +42,19 @@ > > #define USBTMC_IOCTL_ABORT_BULK_IN _IO(USBTMC_IOC_NR, 4) > > #define USBTMC_IOCTL_CLEAR_OUT_HALT_IO(USBTMC_IOC_NR, 6) > > #define USBTMC_IOCTL_CLEAR_IN_HALT _IO(USBTMC_IOC_NR, 7) > > +#define USBTMC488_IOCTL_GET_CAPS _IO(USBTMC_IOC_NR, 17) > > Shouldn't there be a data structure mentioned here that gets passed back > to userspace? And are you sure the direction is correct as well? > Oops, it should read: #define USBTMC488_IOCTL_GET_CAPS _IOR(USBTMC_IOC_NR, 17, unsigned char) Thanks, -dave > thanks, > > greg k-h
Re: [patch] hfs: fix hfs_readdir()
On Tue, 2016-01-26 at 12:26 +0300, Dan Carpenter wrote: > I was looking through static analysis warnings and we seem to be copying > garbage into &rd->key. This goes back to before the start of git... > > Signed-off-by: Dan Carpenter > --- > Not tested. Please review carefully. > > diff --git a/fs/hfs/dir.c b/fs/hfs/dir.c > index 70788e0..66485d7 100644 > --- a/fs/hfs/dir.c > +++ b/fs/hfs/dir.c > @@ -163,7 +163,7 @@ static int hfs_readdir(struct file *file, struct > dir_context *ctx) > rd->file = file; > list_add(&rd->list, &HFS_I(inode)->open_dir_list); > } > - memcpy(&rd->key, &fd.key, sizeof(struct hfs_cat_key)); > + memcpy(&rd->key, &fd.key->cat, sizeof(struct hfs_cat_key)); The field "key" is union: 164 typedef union hfs_btree_key { 165 u8 key_len; /* number of bytes in the key */ 166 struct hfs_cat_key cat; 167 struct hfs_ext_key ext; 168 } hfs_btree_key; The struct hfs_cat_key is the biggest item. So, size of this structure is dominating in the union: 157 struct hfs_ext_key { 158 u8 key_len; /* number of bytes in the key */ 159 u8 FkType; /* HFS_FK_{DATA,RSRC} */ 160 __be32 FNum;/* The File ID of the file */ 161 __be16 FABN;/* allocation blocks number*/ 162 } __packed; 149 struct hfs_cat_key { 150 u8 key_len; /* number of bytes in the key */ 151 u8 reserved;/* padding */ 152 __be32 ParID; /* CNID of the parent dir */ 153 struct hfs_name CName; /* The filename of the entry */ 154 } __packed; because: 27 #define HFS_NAMELEN 31 /* maximum length of an HFS filename */ 87 struct hfs_name { 88 u8 len; 89 u8 name[HFS_NAMELEN]; 90 } __packed; If we are using sizeof(struct hfs_cat_key) then it looks like that we could potentially miss one byte of the union during catalog key copying. But if we will copy struct hfs_ext_key then we will copy some amount of "garbage" anyway. So, I don't think that it's good fix of the issue. What do you think? Another worry could be the "search_key" field of the struct hfs_find_data. Thanks, Vyacheslav Dubeyko.
Re: [PATCH/RFC 3/3] s390: query dynamic DEBUG_PAGEALLOC setting
On Tue, Jan 26, 2016 at 10:18:25AM +0100, Christian Borntraeger wrote: > We can use debug_pagealloc_enabled() to check if we can map > the identity mapping with 1MB/2GB pages as well as to print > the current setting in dump_stack. > > Signed-off-by: Christian Borntraeger > --- > arch/s390/kernel/dumpstack.c | 4 +++- > arch/s390/mm/vmem.c | 10 -- > 2 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/arch/s390/kernel/dumpstack.c b/arch/s390/kernel/dumpstack.c > index dc8e204..a1c0530 100644 > --- a/arch/s390/kernel/dumpstack.c > +++ b/arch/s390/kernel/dumpstack.c > @@ -11,6 +11,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -186,7 +187,8 @@ void die(struct pt_regs *regs, const char *str) > printk("SMP "); > #endif > #ifdef CONFIG_DEBUG_PAGEALLOC > - printk("DEBUG_PAGEALLOC"); > + printk("DEBUG_PAGEALLOC(%s)", > + debug_pagealloc_enabled() ? "enabled" : "disabled"); > #endif I'd prefer if you change this to if (debug_pagealloc_enabled()) printk("DEBUG_PAGEALLOC"); That way we can get rid of yet another ifdef. Having "DEBUG_PAGEALLOC(disabled)" doesn't seem to be very helpful.
Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
On Tue, 26 Jan 2016, Mike Galbraith wrote: > On Tue, 2016-01-26 at 10:26 -0600, Christoph Lameter wrote: > > On Tue, 26 Jan 2016, Mike Galbraith wrote: > > > > > On Tue, 2016-01-26 at 03:14 +0100, Mike Galbraith wrote: > > > > > > > Perf and RT say we don't want quiet_vmstat() in the idle loop > > > > either. > > > > > > BTW, the perf numbers were not from an RT kernel, they were from my > > > PREEMPT_VOLUNTARY desktop kernel. > > > > Can we move quiet_vmstat() elsewhere after we have checked that really > > nothing else is going on soon? > > How would you check? Precognition doesn't work for mortals. Dont we have some decision mechanism to go into higher levels of power savings when the system is idle for longer times?
Re: [PATCH] ntb: perf test: fix address space confusion
On Tue, 2016-01-26 at 10:31 +0100, Arnd Bergmann wrote: > The ntb driver assigns between pointers an __iomem tokens, and > also casts them to 64-bit integers, which results in compiler > warnings on 32-bit systems: > > drivers/ntb/test/ntb_perf.c: In function 'perf_copy': > drivers/ntb/test/ntb_perf.c:213:10: error: cast from pointer to > integer of different size [-Werror=pointer-to-int-cast] > vbase = (u64)(u64 *)mw->vbase; > ^ > drivers/ntb/test/ntb_perf.c:214:14: error: cast from pointer to > integer of different size [-Werror=pointer-to-int-cast] > dst_vaddr = (u64)(u64 *)dst; > ^ > > This adds __iomem annotations where needed and changes the temporary > variables to iomem pointers to avoid casting them to u64. I did not > see the problem in linux-next earlier, but it show showed up in > 4.5-rc1. > > Signed-off-by: Arnd Bergmann Acked-by: Dave Jiang > Fixes: 8a7b6a778a85 ("ntb: ntb perf tool") > --- > drivers/ntb/test/ntb_perf.c | 21 +++-- > 1 file changed, 11 insertions(+), 10 deletions(-) > > diff --git a/drivers/ntb/test/ntb_perf.c > b/drivers/ntb/test/ntb_perf.c > index c8a37ba4b4f9..6bdc1e7b7503 100644 > --- a/drivers/ntb/test/ntb_perf.c > +++ b/drivers/ntb/test/ntb_perf.c > @@ -178,7 +178,7 @@ static void perf_copy_callback(void *data) > atomic_dec(&pctx->dma_sync); > } > > -static ssize_t perf_copy(struct pthr_ctx *pctx, char *dst, > +static ssize_t perf_copy(struct pthr_ctx *pctx, char __iomem *dst, > char *src, size_t size) > { > struct perf_ctx *perf = pctx->perf; > @@ -189,7 +189,8 @@ static ssize_t perf_copy(struct pthr_ctx *pctx, > char *dst, > dma_cookie_t cookie; > size_t src_off, dst_off; > struct perf_mw *mw = &perf->mw; > - u64 vbase, dst_vaddr; > + void __iomem *vbase; > + void __iomem *dst_vaddr; > dma_addr_t dst_phys; > int retries = 0; > > @@ -204,14 +205,14 @@ static ssize_t perf_copy(struct pthr_ctx *pctx, > char *dst, > } > > device = chan->device; > - src_off = (size_t)src & ~PAGE_MASK; > - dst_off = (size_t)dst & ~PAGE_MASK; > + src_off = (uintptr_t)src & ~PAGE_MASK; > + dst_off = (uintptr_t __force)dst & ~PAGE_MASK; > > if (!is_dma_copy_aligned(device, src_off, dst_off, size)) > return -ENODEV; > > - vbase = (u64)(u64 *)mw->vbase; > - dst_vaddr = (u64)(u64 *)dst; > + vbase = mw->vbase; > + dst_vaddr = dst; > dst_phys = mw->phys_addr + (dst_vaddr - vbase); > > unmap = dmaengine_get_unmap_data(device->dev, 1, > GFP_NOWAIT); > @@ -261,13 +262,13 @@ err_get_unmap: > return 0; > } > > -static int perf_move_data(struct pthr_ctx *pctx, char *dst, char > *src, > +static int perf_move_data(struct pthr_ctx *pctx, char __iomem *dst, > char *src, > u64 buf_size, u64 win_size, u64 total) > { > int chunks, total_chunks, i; > int copied_chunks = 0; > u64 copied = 0, result; > - char *tmp = dst; > + char __iomem *tmp = dst; > u64 perf, diff_us; > ktime_t kstart, kstop, kdiff; > > @@ -324,7 +325,7 @@ static int ntb_perf_thread(void *data) > struct perf_ctx *perf = pctx->perf; > struct pci_dev *pdev = perf->ntb->pdev; > struct perf_mw *mw = &perf->mw; > - char *dst; > + char __iomem *dst; > u64 win_size, buf_size, total; > void *src; > int rc, node, i; > @@ -364,7 +365,7 @@ static int ntb_perf_thread(void *data) > if (buf_size > MAX_TEST_SIZE) > buf_size = MAX_TEST_SIZE; > > - dst = (char *)mw->vbase; > + dst = (char __iomem *)mw->vbase; > > atomic_inc(&perf->tsync); > while (atomic_read(&perf->tsync) != perf->perf_threads)
Re: [PATCHv8 0/5] Driver for new "VMD" device
On Tue, Jan 26, 2016 at 08:46:09AM -0800, Christoph Hellwig wrote: > On Wed, Jan 20, 2016 at 02:43:08PM -0600, Bjorn Helgaas wrote: > > I saw responses from Keith and Bryan, and I hope they answer your > > questions. As far as I can tell, the VMD driver is grossly similar to > > other host bridge drivers we've already merged, and I don't think we > > have public specs for all of them. > > > > Unless you have further concerns, I'm going to ask Linus to pull this > > tomorrow, along with the rest of the PCI changes for v4.5. > > I still think it's a bad idea to merge something odd like this without > a good explanation or showing what devices can actually sit under it. > > But you're the maintainer in the end.. Any PCIe devices and and bridges should work with existing upstream drivers. The only exceptions would be anything depndent on INTx or IO ports.
Re: [PATCH] PCI: iproc: Fix BCMA PCIe bus scanning regression
Hi Ray, On Wed, Jan 20, 2016 at 02:55:10PM -0800, Ray Jui wrote: > Commit 943ebae781f5 ("PCI: iproc: Add PAXC interface support") causes > regression on EP device detection on BCMA based platforms. This patch > fixes the issue by allowing multiple devices to be configured on the > same bus, for all PAXB based child buses > > Reported-by: Rafal Milecki > Fixes: 943ebae781f5 ("PCI: iproc: Add PAXC interface support") > Signed-off-by: Ray Jui > --- > drivers/pci/host/pcie-iproc.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c > index 5816bce..4627561 100644 > --- a/drivers/pci/host/pcie-iproc.c > +++ b/drivers/pci/host/pcie-iproc.c > @@ -171,10 +171,11 @@ static inline void iproc_pcie_ob_write(struct > iproc_pcie *pcie, > } > > static inline bool iproc_pcie_device_is_valid(struct iproc_pcie *pcie, > + unsigned int busnum, > unsigned int slot, > unsigned int fn) > { > - if (slot > 0) > + if ((pcie->type == IPROC_PCIE_PAXC || busnum == 0) && slot > 0) > return false; > > /* PAXC can only support limited number of functions */ I don't understand this. Here's the whole function (with this patch applied): static inline bool iproc_pcie_device_is_valid(struct iproc_pcie *pcie, unsigned int busnum, unsigned int slot, unsigned int fn) { if ((pcie->type == IPROC_PCIE_PAXC || busnum == 0) && slot > 0) return false; /* PAXC can only support limited number of functions */ if (pcie->type == IPROC_PCIE_PAXC && fn >= MAX_NUM_PAXC_PF) return false; return true; } This says: - On bus 00, device 0 is the only valid device. That seems plausible because the devices on bus 00 are probably built-in to the SoC. - On PAXC-based systems, device 0 is the only valid device on *any* bus. Is that really true? If there's any way to add a plug-in card, this seems overly restrictive. PCIe devices are generally all device 0, but this would mean you cannot plug in a PCIe-to-PCI bridge leading to a PCI device with a non-zero device number. I think it also means you could not plug in a PCIe device with ARI enabled, because I think we store the upper 5 bits of the 8-bit ARI function number in the PCI_SLOT bits. - On PAXC-based systems, only functions 0, 1, 2, and 3 are valid anywhere in the hierarchy. I think this again restricts what what cards can be plugged in. If iProc only supports devices built directly into the SoC, maybe these constraints are valid. But if it supports any plugin or external devices, they don't seem to make sense. Also, is it the case that an iProc root bus is always bus number zero? That's certainly not the case for many other host controllers, but maybe you only have one possible host controller per system and the base number is not programmable. Bjorn
Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
On Tue, 26 Jan 2016, Mike Galbraith wrote: > On Tue, 2016-01-26 at 10:26 -0600, Christoph Lameter wrote: > > On Tue, 26 Jan 2016, Mike Galbraith wrote: > > > > > > Why would the deferring cause this overhead? > > > > > > Because we schedule to idle cores aggressively, thus we may pop in and > > > out of idle at high frequency. > > > > Whats the point of going idle if you have things to do soon? > > When a task schedules off, how do you know it'll be back at all, much > less soon? Ok so you are running an artificial benchmark that always gets the system running again when it decides to go idle?
Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the interrupt is not single-destination
2016-01-26 09:44+0800, Yang Zhang: > On 2016/1/25 21:59, rkrc...@redhat.com wrote: >>2016-01-25 09:49+0800, Yang Zhang: >>>On 2016/1/22 21:31, rkrc...@redhat.com wrote: 2016-01-22 10:03+0800, Yang Zhang: >Not so complicated. We can reuse the wake up vector and check whether the >interrupt is multicast when one of destination vcpu handles it. I'm not sure what you mean now ... I guess it is: - Deliver the interrupt to a guest VCPU and relay the multicast to other VCPUs. No, it's strictly worse than intercepting it in the host. >>> >>>It is still handled in host context not guest context. The wakeup event >>>cannot be consumed like posted event. >> >>Ok. ("when one of destination vcpu handles it" confused me into >>thinking that you'd like to handle it with the notification vector.) > > Sorry for my poor english. :( It's good. Ambiguity is hard to avoid if a reader doesn't want to assume only the most likely meaning. Also, if wakeup vector were used for wakeup and multicast, we'd be uselessly doing work, because we can't tell which reason triggered the interrupt before finishing one part -- using separate vectors for that would be a bit nicer. >> >>(imprecise -- we would always have to check for ON bit of all PIDs from >> blocked VCPUs, for the original meaning of wakeup vector, and always > > This is what KVM does currently. Yep. >> either read the PIRR or check for ON bit of all PIDs that encode >> multicast interrupts; then we have to clear ON bits for multicasts.) > > Also, most part of work is covered by current logic except checking the > multicast. We could reuse the setup that gets us to wakeup_handler, but there is nothing to share in the handler itself. Sharing a handler means that we always have to execute both parts. We must create new PID anyway and compared to the extra work needed for multicast handling, a new vector + handler is a relatively small code investment that adds clarity to the design (and performance). (Taking the vector splitting to the extreme, we'd improve performance if we added a vector per assigned device. That is practically the same as non-posted mode, just more complicated.) >>--- >>There might be a benefit of using posted interrupts for host interrupts >>when we run out of free interrupt vectors: we could start using vectors >>by multiple sources through posted interrupts, if using posted > > Do you mean per vcpu posted interrupts? I mean using posting for host device interrupts (no virt involved). Let's say we have 300 devices for one CPU and CPU has 200 useable vectors. We have 100 device interrupts that need to be shared in some vectors and using posting might be faster than directly checking multiple devices. (I couldn't come up with a plausible scenario where we might want to use posting for host interrupts.)
Re: [kernel-hardening] Re: [PATCH 0/2] sysctl: allow CLONE_NEWUSER to be disabled
On Tue, Jan 26, 2016 at 10:09 AM, Austin S. Hemmelgarn wrote: > On 2016-01-26 12:15, Serge Hallyn wrote: >> >> Quoting Josh Boyer (jwbo...@fedoraproject.org): >>> >>> On Mon, Jan 25, 2016 at 11:57 PM, Eric W. Biederman >>> wrote: Kees Cook writes: > On Mon, Jan 25, 2016 at 11:33 AM, Eric W. Biederman > wrote: >> >> Kees Cook writes: >>> >>> >>> Well, I don't know about less weird, but it would leave a unneeded >>> hole in the permission checks. >> >> >> To be clear the current patch has my: >> >> Nacked-by: "Eric W. Biederman" >> >> The code is buggy, and poorly thought through. Your lack of interest >> in >> fixing the bugs in your patch is distressing. > > > I'm not sure where you see me having a "lack of interest". The > existing cap-checking sysctls have a corner-case bug, which is > orthogonal to this change. That certainly doesn't sound like you have any plans to change anything there. >> So broken code, not willing to fix. No. We are not merging this >> sysctl. > > > I think you're jumping to conclusions. :) I think I am the maintainer. What you are proposing is very much something that is only of interst to people who are not using user namespaces. It is fatally flawed as a way to avoid new attack surfaces for people who don't care as the sysctl leaves user namespaces enabled by default. It is fatally flawed as remediation to recommend to people to change if a new user namespace related but is discovered. Any running process that happens to be created while user namespace creation was enabled will continue to exist. Effectively a reboot will be required as part of a mitigation. Many sysadmins will get that wrong. I can't possibly see your sysctl as proposed achieving it's goals. A person has to be entirely too aware of subtlety and nuance to use it effectively. >>> >>> >>> What you're saying is true for the "oh crap" case of a new userns >>> related CVE being found. However, there is the case where sysadmins >>> know for a fact that a set of machines should not allow user >>> namespaces to be enabled. Currently they have 2 choices, 1) use their >> >> >> Hi - can you give a specific example of this? (Where users really should >> not be able to use them - not where they might not need them) I think >> it'll help the discussion tremendously. Because so far the only good >> arguments I've seen have been about actual bugs in the user namespaces, >> which would not warrant a designed-in permanent disable switch. If >> there are good use cases where such a disable switch will always be >> needed (and compiling out can't satisfy) that'd be helpful. > > In general, if a particular daemon provides a network service and does not > use user namespaces for sand-boxing, it should not be allowed to use user > namespaces, because those then become something else to potentially land an > exploit through. ntpd, postfix, and most other regularly used network > servers fall into this category. seccomp handles this issue quite nicely. > > If you're hosting a shared system providing terminal server like usage where > the users actually have shell access, then they probably should not be able > to use user namespaces on the server. > Au contraire. If they have user ns access, then can sandbox their own programs. --Andy
Re: [PATCH v4 05/13] mmc: sdhci-of-arasan: fix clk issue in sdhci_arasan_remove()
On Tue, 2016-01-26 at 06:15PM +0800, Jisheng Zhang wrote: > sdhci_pltfm_unregister() could operate host's registers, it will cause > problems if the clk is already disabled and unprepared. Fix this issue > by moving the clk_disable_unprepare() call to the end of remove > function. > > Signed-off-by: Jisheng Zhang Acked-by: Sören Brinkmann Sören
[PATCH 00/11] sync framework de-staging: part 2 - de-stage
From: Gustavo Padovan This patch series de-stage the sync framework and it a follow up on the clean up series I've sent last week: http://thread.gmane.org/gmane.comp.video.dri.devel/145509 Now in part 2 we finish the de-stage of the sync framework. It start with the move of sync_file from staging to drivers/dma-buf, followed by a bunch of clean ups on sync_timeline and sw_sync. Finally we de-stage the later two plus the debug routines. Gustavo Padovan (11): dma-buf/sync_file: de-stage sync_file staging/android: store last signaled value on sync timeline staging/android: remove .fill_driver_data() timeline ops staging/android: remove .{fence,timeline}_value_str() from timeline_ops staging/android: remove struct sync_timeline_ops staging/android: remove sw_sync_timeline and sw_sync_pt staging/android: remove sw_sync.[ch] files staging/android: rename android_fence to timeline_fence dma-buf/sync_timeline: de-stage sync_timeline dma-buf/sync_file: bring debug back to sync file dma-buf/sync_file: bring sync_dump() back drivers/Kconfig| 2 + drivers/dma-buf/Kconfig| 21 ++ drivers/dma-buf/Makefile | 2 + drivers/dma-buf/sw_sync.h | 32 ++ drivers/dma-buf/sync_debug.c | 374 +++ drivers/dma-buf/sync_debug.h | 35 ++ drivers/dma-buf/sync_file.c| 451 +++ drivers/dma-buf/sync_timeline.c| 222 +++ drivers/staging/android/Kconfig| 19 - drivers/staging/android/Makefile | 1 - drivers/staging/android/sw_sync.c | 103 -- drivers/staging/android/sw_sync.h | 59 --- drivers/staging/android/sync.c | 652 - drivers/staging/android/sync.h | 261 - drivers/staging/android/sync_debug.c | 372 --- drivers/staging/android/trace/sync.h | 82 - drivers/staging/android/uapi/sw_sync.h | 32 -- drivers/staging/android/uapi/sync.h| 97 - include/linux/sync_file.h | 123 +++ include/linux/sync_timeline.h | 114 ++ include/trace/events/sync_file.h | 57 +++ include/trace/events/sync_timeline.h | 31 ++ include/uapi/linux/sync.h | 97 + 23 files changed, 1561 insertions(+), 1678 deletions(-) create mode 100644 drivers/dma-buf/Kconfig create mode 100644 drivers/dma-buf/sw_sync.h create mode 100644 drivers/dma-buf/sync_debug.c create mode 100644 drivers/dma-buf/sync_debug.h create mode 100644 drivers/dma-buf/sync_file.c create mode 100644 drivers/dma-buf/sync_timeline.c delete mode 100644 drivers/staging/android/sw_sync.c delete mode 100644 drivers/staging/android/sw_sync.h delete mode 100644 drivers/staging/android/sync.c delete mode 100644 drivers/staging/android/sync.h delete mode 100644 drivers/staging/android/sync_debug.c delete mode 100644 drivers/staging/android/trace/sync.h delete mode 100644 drivers/staging/android/uapi/sw_sync.h delete mode 100644 drivers/staging/android/uapi/sync.h create mode 100644 include/linux/sync_file.h create mode 100644 include/linux/sync_timeline.h create mode 100644 include/trace/events/sync_file.h create mode 100644 include/trace/events/sync_timeline.h create mode 100644 include/uapi/linux/sync.h -- 2.5.0
[PATCH 08/11] staging/android: rename android_fence to timeline_fence
From: Gustavo Padovan We are moving out of staging/adroid so rename it to a name that is not related to android anymore. Signed-off-by: Gustavo Padovan --- drivers/staging/android/sync.c | 40 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c index 07fe995..ea816dd 100644 --- a/drivers/staging/android/sync.c +++ b/drivers/staging/android/sync.c @@ -28,7 +28,7 @@ #define CREATE_TRACE_POINTS #include "trace/sync.h" -static const struct fence_ops android_fence_ops; +static const struct fence_ops timeline_fence_ops; struct sync_timeline *sync_timeline_create(int size, const char *drv_name, const char *name) @@ -126,7 +126,7 @@ struct fence *sync_pt_create(struct sync_timeline *obj, int size, spin_lock_irqsave(&obj->child_list_lock, flags); sync_timeline_get(obj); - fence_init(fence, &android_fence_ops, &obj->child_list_lock, + fence_init(fence, &timeline_fence_ops, &obj->child_list_lock, obj->context, value); list_add_tail(&fence->child_list, &obj->child_list_head); INIT_LIST_HEAD(&fence->active_list); @@ -135,21 +135,21 @@ struct fence *sync_pt_create(struct sync_timeline *obj, int size, } EXPORT_SYMBOL(sync_pt_create); -static const char *android_fence_get_driver_name(struct fence *fence) +static const char *timeline_fence_get_driver_name(struct fence *fence) { struct sync_timeline *parent = fence_parent(fence); return parent->drv_name; } -static const char *android_fence_get_timeline_name(struct fence *fence) +static const char *timeline_fence_get_timeline_name(struct fence *fence) { struct sync_timeline *parent = fence_parent(fence); return parent->name; } -static void android_fence_release(struct fence *fence) +static void timeline_fence_release(struct fence *fence) { struct sync_timeline *parent = fence_parent(fence); unsigned long flags; @@ -164,25 +164,25 @@ static void android_fence_release(struct fence *fence) fence_free(fence); } -static bool android_fence_signaled(struct fence *fence) +static bool timeline_fence_signaled(struct fence *fence) { struct sync_timeline *parent = fence_parent(fence); return (fence->seqno > parent->value) ? false : true; } -static bool android_fence_enable_signaling(struct fence *fence) +static bool timeline_fence_enable_signaling(struct fence *fence) { struct sync_timeline *parent = fence_parent(fence); - if (android_fence_signaled(fence)) + if (timeline_fence_signaled(fence)) return false; list_add_tail(&fence->active_list, &parent->active_list_head); return true; } -static int android_fence_fill_driver_data(struct fence *fence, +static int timeline_fence_fill_driver_data(struct fence *fence, void *data, int size) { if (size < sizeof(fence->seqno)) @@ -193,13 +193,13 @@ static int android_fence_fill_driver_data(struct fence *fence, return sizeof(fence->seqno); } -static void android_fence_value_str(struct fence *fence, +static void timeline_fence_value_str(struct fence *fence, char *str, int size) { snprintf(str, size, "%d", fence->seqno); } -static void android_fence_timeline_value_str(struct fence *fence, +static void timeline_fence_timeline_value_str(struct fence *fence, char *str, int size) { struct sync_timeline *parent = fence_parent(fence); @@ -207,15 +207,15 @@ static void android_fence_timeline_value_str(struct fence *fence, snprintf(str, size, "%d", parent->value); } -static const struct fence_ops android_fence_ops = { - .get_driver_name = android_fence_get_driver_name, - .get_timeline_name = android_fence_get_timeline_name, - .enable_signaling = android_fence_enable_signaling, - .signaled = android_fence_signaled, +static const struct fence_ops timeline_fence_ops = { + .get_driver_name = timeline_fence_get_driver_name, + .get_timeline_name = timeline_fence_get_timeline_name, + .enable_signaling = timeline_fence_enable_signaling, + .signaled = timeline_fence_signaled, .wait = fence_default_wait, - .release = android_fence_release, - .fill_driver_data = android_fence_fill_driver_data, - .fence_value_str = android_fence_value_str, - .timeline_value_str = android_fence_timeline_value_str, + .release = timeline_fence_release, + .fill_driver_data = timeline_fence_fill_driver_data, + .fence_value_str = timeline_fence_value_str, + .timeline_value_str = timeline_fence_timeline_value_str, }; -- 2.5.0
[PATCH 06/11] staging/android: remove sw_sync_timeline and sw_sync_pt
From: Gustavo Padovan As we moved value storage to sync_timeline and fence those two structs became useless and can be removed now. Signed-off-by: Gustavo Padovan --- drivers/staging/android/sw_sync.c| 24 +++- drivers/staging/android/sw_sync.h| 24 ++-- drivers/staging/android/sync_debug.c | 12 ++-- 3 files changed, 19 insertions(+), 41 deletions(-) diff --git a/drivers/staging/android/sw_sync.c b/drivers/staging/android/sw_sync.c index c5e92c6..461dbd9 100644 --- a/drivers/staging/android/sw_sync.c +++ b/drivers/staging/android/sw_sync.c @@ -25,31 +25,21 @@ #include "sw_sync.h" -struct fence *sw_sync_pt_create(struct sw_sync_timeline *obj, u32 value) +struct fence *sw_sync_pt_create(struct sync_timeline *obj, u32 value) { - struct sw_sync_pt *pt; - - pt = (struct sw_sync_pt *) - sync_pt_create(&obj->obj, sizeof(struct sw_sync_pt), value); - - pt->value = value; - - return (struct fence *)pt; + return sync_pt_create(obj, sizeof(struct fence), value); } EXPORT_SYMBOL(sw_sync_pt_create); -struct sw_sync_timeline *sw_sync_timeline_create(const char *name) +struct sync_timeline *sw_sync_timeline_create(const char *name) { - struct sw_sync_timeline *obj = (struct sw_sync_timeline *) - sync_timeline_create(sizeof(struct sw_sync_timeline), -"sw_sync", name); - - return obj; + return sync_timeline_create(sizeof(struct sync_timeline), + "sw_sync", name); } EXPORT_SYMBOL(sw_sync_timeline_create); -void sw_sync_timeline_inc(struct sw_sync_timeline *obj, u32 inc) +void sw_sync_timeline_inc(struct sync_timeline *obj, u32 inc) { - sync_timeline_signal(&obj->obj, inc); + sync_timeline_signal(obj, inc); } EXPORT_SYMBOL(sw_sync_timeline_inc); diff --git a/drivers/staging/android/sw_sync.h b/drivers/staging/android/sw_sync.h index e18667b..9f26c62 100644 --- a/drivers/staging/android/sw_sync.h +++ b/drivers/staging/android/sw_sync.h @@ -22,34 +22,22 @@ #include "sync.h" #include "uapi/sw_sync.h" -struct sw_sync_timeline { - struct sync_timeline obj; - - u32 value; -}; - -struct sw_sync_pt { - struct fencept; - - u32 value; -}; - #if IS_ENABLED(CONFIG_SW_SYNC) -struct sw_sync_timeline *sw_sync_timeline_create(const char *name); -void sw_sync_timeline_inc(struct sw_sync_timeline *obj, u32 inc); +struct sync_timeline *sw_sync_timeline_create(const char *name); +void sw_sync_timeline_inc(struct sync_timeline *obj, u32 inc); -struct fence *sw_sync_pt_create(struct sw_sync_timeline *obj, u32 value); +struct fence *sw_sync_pt_create(struct sync_timeline *obj, u32 value); #else -static inline struct sw_sync_timeline *sw_sync_timeline_create(const char *name) +static inline struct sync_timeline *sw_sync_timeline_create(const char *name) { return NULL; } -static inline void sw_sync_timeline_inc(struct sw_sync_timeline *obj, u32 inc) +static inline void sw_sync_timeline_inc(struct sync_timeline *obj, u32 inc) { } -static inline struct fence *sw_sync_pt_create(struct sw_sync_timeline *obj, +static inline struct fence *sw_sync_pt_create(struct sync_timeline *obj, u32 value) { return NULL; diff --git a/drivers/staging/android/sync_debug.c b/drivers/staging/android/sync_debug.c index 26ba9e86..e984955 100644 --- a/drivers/staging/android/sync_debug.c +++ b/drivers/staging/android/sync_debug.c @@ -211,7 +211,7 @@ static const struct file_operations sync_info_debugfs_fops = { /* opening sw_sync create a new sync obj */ static int sw_sync_debugfs_open(struct inode *inode, struct file *file) { - struct sw_sync_timeline *obj; + struct sync_timeline *obj; char task_comm[TASK_COMM_LEN]; get_task_comm(task_comm, current); @@ -227,13 +227,13 @@ static int sw_sync_debugfs_open(struct inode *inode, struct file *file) static int sw_sync_debugfs_release(struct inode *inode, struct file *file) { - struct sw_sync_timeline *obj = file->private_data; + struct sync_timeline *obj = file->private_data; - sync_timeline_destroy(&obj->obj); + sync_timeline_destroy(obj); return 0; } -static long sw_sync_ioctl_create_fence(struct sw_sync_timeline *obj, +static long sw_sync_ioctl_create_fence(struct sync_timeline *obj, unsigned long arg) { int fd = get_unused_fd_flags(O_CLOEXEC); @@ -280,7 +280,7 @@ err: return err; } -static long sw_sync_ioctl_inc(struct sw_sync_timeline *obj, unsigned long arg) +static long sw_sync_ioctl_inc(struct sync_timeline *obj, unsigned long arg) { u32 value; @@ -295,7 +295,7 @@ static long sw_sync_ioctl_inc(struct sw_sync_timeline *obj, unsigned long arg) static long sw_sync_ioctl(struct file *file
[PATCH 10/11] dma-buf/sync_file: bring debug back to sync file
From: Gustavo Padovan Enable reports of sync_files through /sync/info Signed-off-by: Gustavo Padovan --- drivers/dma-buf/sync_file.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c index 92474dd..aa1215d 100644 --- a/drivers/dma-buf/sync_file.c +++ b/drivers/dma-buf/sync_file.c @@ -29,6 +29,8 @@ #define CREATE_TRACE_POINTS #include +#include "sync_debug.h" + static const struct file_operations sync_file_fops; static struct sync_file *sync_file_alloc(int size, const char *name) @@ -87,6 +89,8 @@ struct sync_file *sync_file_create(const char *name, struct fence *fence) fence_check_cb_func)) atomic_dec(&sync_file->status); + sync_file_debug_add(sync_file); + return sync_file; } EXPORT_SYMBOL(sync_file_create); @@ -188,6 +192,7 @@ struct sync_file *sync_file_merge(const char *name, atomic_sub(num_fences - i, &sync_file->status); sync_file->num_fences = i; + sync_file_debug_add(sync_file); return sync_file; } EXPORT_SYMBOL(sync_file_merge); @@ -246,6 +251,8 @@ static int sync_file_release(struct inode *inode, struct file *file) { struct sync_file *sync_file = file->private_data; + sync_file_debug_remove(sync_file); + kref_put(&sync_file->kref, sync_file_free); return 0; } -- 2.5.0
Re: [f2fs-dev] [PATCH 1/2] f2fs: avoid multiple node page writes due to inline_data
Hi Chao, On Tue, Jan 26, 2016 at 02:58:53PM +0800, Chao Yu wrote: > Hi Jaegeuk, > > > -Original Message- > > From: Jaegeuk Kim [mailto:jaeg...@kernel.org] > > Sent: Tuesday, January 26, 2016 3:18 AM > > To: Chao Yu > > Cc: linux-kernel@vger.kernel.org; linux-fsde...@vger.kernel.org; > > linux-f2fs-de...@lists.sourceforge.net > > Subject: Re: [f2fs-dev] [PATCH 1/2] f2fs: avoid multiple node page writes > > due to inline_data > > > > Hi Chao, > > > > On Mon, Jan 25, 2016 at 05:42:40PM +0800, Chao Yu wrote: > > > Hi Jaegeuk, > > > > > > > -Original Message- > > > > From: Jaegeuk Kim [mailto:jaeg...@kernel.org] > > > > Sent: Sunday, January 24, 2016 4:16 AM > > > > To: linux-kernel@vger.kernel.org; linux-fsde...@vger.kernel.org; > > > > linux-f2fs-de...@lists.sourceforge.net > > > > Cc: Jaegeuk Kim > > > > Subject: [f2fs-dev] [PATCH 1/2] f2fs: avoid multiple node page writes > > > > due to inline_data > > > > > > > > The sceanrio is: > > > > 1. create fully node blocks > > > > 2. flush node blocks > > > > 3. write inline_data for all the node blocks again > > > > 4. flush node blocks redundantly > > > > > > > > Signed-off-by: Jaegeuk Kim > > > > --- > > > > fs/f2fs/data.c | 14 +++--- > > > > 1 file changed, 11 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > > > > index 8d0d9ec..011456e 100644 > > > > --- a/fs/f2fs/data.c > > > > +++ b/fs/f2fs/data.c > > > > @@ -1622,14 +1622,22 @@ static int f2fs_write_end(struct file *file, > > > > > > > > trace_f2fs_write_end(inode, pos, len, copied); > > > > > > > > - set_page_dirty(page); > > > > - > > > > if (pos + copied > i_size_read(inode)) { > > > > i_size_write(inode, pos + copied); > > > > mark_inode_dirty(inode); > > > > - update_inode_page(inode); > > > > } > > > > > > > > + if (f2fs_has_inline_data(inode) && > > > > + is_inode_flag_set(F2FS_I(inode), > > > > FI_DATA_EXIST)) { > > > > + int err = f2fs_write_inline_data(inode, page); > > > > > > Oh, I'm sure this can fix that issue, but IMO: > > > a) this implementation has side-effect, it triggers inline data copying > > > between data page and node page whenever user write inline datas, so if > > > user updates inline data frequently, write-through approach would cause > > > memory copy overhead. > > > > Agreed. > > > > > b) inline storm should be a rare case, as we didn't get any report about > > > problem for long time until Dave's, and write_end is a hot path, I think > > > it's better to be cautious to change our inline data cache policy for > > > fixing a rare issue in hot path. > > > > > > What about delaying the merge operation? like: > > > 1) as I proposed before, merging inline page into inode page when > > > detecting free_sections <= (node_secs + 2 * dent_secs + inline_secs). > > > 2) merge inline page into inode page before writeback inode page in > > > sync_node_pages. > > > > Okay, I'm thinking more general way where we can get rid of every > > inlien_data > > write when we flush node pages. > > I encountered deadlock issue, could you have a look at it? Yeah, I've been stablizing this for a while. Please check f2fs.git/dev-test. Thanks, > > == > [ INFO: possible circular locking dependency detected ] > 4.5.0-rc1 #45 Tainted: G O > --- > fstrim/15301 is trying to acquire lock: > (sb_internal#2){..}, at: [] __sb_start_write+0xda/0xf0 > > but task is already holding lock: > (&sbi->cp_rwsem){..}, at: [] > block_operations+0x82/0x130 [f2fs] > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (&sbi->cp_rwsem){..}: > [] lock_acquire+0xb7/0x130 > [] down_read+0x39/0x50 > [] f2fs_evict_inode+0x26f/0x370 [f2fs] > [] evict+0xdd/0x1d0 > [] iput+0x19f/0x250 > [] do_unlinkat+0x20d/0x310 > [] SyS_unlinkat+0x22/0x40 > [] entry_SYSCALL_64_fastpath+0x12/0x6f > > -> #0 (sb_internal#2){..}: > [] __lock_acquire+0x132b/0x1770 > [] lock_acquire+0xb7/0x130 > [] percpu_down_read+0x3c/0x80 > [] __sb_start_write+0xda/0xf0 > [] f2fs_evict_inode+0x221/0x370 [f2fs] > [] evict+0xdd/0x1d0 > [] iput+0x19f/0x250 > [] sync_node_pages+0x703/0x900 [f2fs] > [] block_operations+0x10a/0x130 [f2fs] > [] write_checkpoint+0xc4/0xb80 [f2fs] > [] f2fs_trim_fs+0x122/0x1d0 [f2fs] > [] f2fs_ioctl+0x7fa/0x9d0 [f2fs] > [] vfs_ioctl+0x18/0x40 > [] do_vfs_ioctl+0x96/0x680 > [] SyS_ioctl+0x92/0xa0 > [] entry_SYSCALL_64_fastpath+0x12/0x6f > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0
Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
On Tue, 2016-01-26 at 10:25 -0600, Christoph Lameter wrote: > On Mon, 25 Jan 2016, Michal Hocko wrote: > > > > Why would the deferring cause this overhead? > > > > I guess the profile speaks for itself, doesn't it? > > But the system is going idle? Why would this impact performance? We enter/exit idle a lot. Your reluctance to move it seem to suggest that 99.99% of CPUs on the planet chewing up cycles (measured) doing what for most is useless work on every micro-idle is a perfectly fine price to pay to ensure that .01% (or whatever tiny minority) get what they want. I disagree. You're burning electrons for no benefit at all to me on my box. You want to do high speed trading, that's fine, but I expect my box to be able to pop in and out of idle without having to pay a toll to the high speed trading bandits of the world, thank you very much. This specialty thing does not belong in the generic fast path. -Mike
[PATCH 09/11] dma-buf/sync_timeline: de-stage sync_timeline
From: Gustavo Padovan De-stage the remaining bit of sync framework: sync_timeline and sw_sync plus some debugging routines. Signed-off-by: Gustavo Padovan --- drivers/dma-buf/Kconfig| 10 + drivers/dma-buf/Makefile | 3 +- drivers/dma-buf/sw_sync.h | 32 +++ drivers/dma-buf/sync_debug.c | 374 + drivers/dma-buf/sync_debug.h | 35 +++ drivers/dma-buf/sync_timeline.c| 222 +++ drivers/staging/android/Kconfig| 20 -- drivers/staging/android/sync.c | 221 --- drivers/staging/android/sync.h | 130 drivers/staging/android/sync_debug.c | 373 drivers/staging/android/trace/sync.h | 32 --- drivers/staging/android/uapi/sw_sync.h | 32 --- include/linux/sync_timeline.h | 114 ++ include/trace/events/sync_timeline.h | 31 +++ 14 files changed, 820 insertions(+), 809 deletions(-) create mode 100644 drivers/dma-buf/sw_sync.h create mode 100644 drivers/dma-buf/sync_debug.c create mode 100644 drivers/dma-buf/sync_debug.h create mode 100644 drivers/dma-buf/sync_timeline.c delete mode 100644 drivers/staging/android/sync.c delete mode 100644 drivers/staging/android/sync.h delete mode 100644 drivers/staging/android/sync_debug.c delete mode 100644 drivers/staging/android/trace/sync.h delete mode 100644 drivers/staging/android/uapi/sw_sync.h create mode 100644 include/linux/sync_timeline.h create mode 100644 include/trace/events/sync_timeline.h diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig index 9824bc4..73df024 100644 --- a/drivers/dma-buf/Kconfig +++ b/drivers/dma-buf/Kconfig @@ -8,4 +8,14 @@ config SYNC_FILE ---help--- This option enables the fence framework synchronization to export sync_files to userspace that can represent one or more fences. + +config SW_SYNC + bool "Software synchronization objects" + default n + depends on SYNC_FILE + ---help--- + A sync object driver that uses a 32bit counter to coordinate + synchronization. Useful when there is no hardware primitive backing + the synchronization. + endmenu diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile index 4a424ec..78d8ec4 100644 --- a/drivers/dma-buf/Makefile +++ b/drivers/dma-buf/Makefile @@ -1,2 +1,3 @@ obj-y := dma-buf.o fence.o reservation.o seqno-fence.o -obj-$(CONFIG_SYNC_FILE)+= sync_file.o +obj-$(CONFIG_SYNC_FILE)+= sync_file.o sync_debug.o +obj-$(CONFIG_SW_SYNC) += sync_timeline.o diff --git a/drivers/dma-buf/sw_sync.h b/drivers/dma-buf/sw_sync.h new file mode 100644 index 000..9b5d486 --- /dev/null +++ b/drivers/dma-buf/sw_sync.h @@ -0,0 +1,32 @@ +/* + * Copyright (C) 2012 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#ifndef _UAPI_LINUX_SW_SYNC_H +#define _UAPI_LINUX_SW_SYNC_H + +#include + +struct sw_sync_create_fence_data { + __u32 value; + charname[32]; + __s32 fence; /* fd of new fence */ +}; + +#define SW_SYNC_IOC_MAGIC 'W' + +#define SW_SYNC_IOC_CREATE_FENCE _IOWR(SW_SYNC_IOC_MAGIC, 0,\ + struct sw_sync_create_fence_data) +#define SW_SYNC_IOC_INC_IOW(SW_SYNC_IOC_MAGIC, 1, __u32) + +#endif /* _UAPI_LINUX_SW_SYNC_H */ diff --git a/drivers/dma-buf/sync_debug.c b/drivers/dma-buf/sync_debug.c new file mode 100644 index 000..7da9ff5 --- /dev/null +++ b/drivers/dma-buf/sync_debug.c @@ -0,0 +1,374 @@ +/* + * drivers/base/sync.c + * + * Copyright (C) 2012 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "sync_debug.h" +#include "sw_sync.h" + +#ifdef CONFIG_DEBUG_FS + +static struct dentry *dbgfs; + +static LIST_HEAD(sync_timeline_list_head); +static DEFINE_SPINLOCK(sync_timeline_list_lock); +static LIST_HE
[PATCH 11/11] dma-buf/sync_file: bring sync_dump() back
From: Gustavo Padovan During the de-stage of sync framework it was easy to keep sync_dump() out to avoid an early de-stage of all debug code, but now that sync_debug.c was de-staged bring sync_dump() back. Signed-off-by: Gustavo Padovan --- drivers/dma-buf/sync_file.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c index aa1215d..fd7e3b9 100644 --- a/drivers/dma-buf/sync_file.c +++ b/drivers/dma-buf/sync_file.c @@ -218,15 +218,19 @@ int sync_file_wait(struct sync_file *sync_file, long timeout) if (ret < 0) { return ret; } else if (ret == 0) { - if (timeout) + if (timeout) { pr_info("sync_file timeout on [%p] after %dms\n", sync_file, jiffies_to_msecs(timeout)); + sync_dump(); + } return -ETIME; } ret = atomic_read(&sync_file->status); - if (ret) + if (ret) { pr_info("sync_file error %ld on [%p]\n", ret, sync_file); + sync_dump(); + } return ret; } -- 2.5.0
[PATCH 03/11] staging/android: remove .fill_driver_data() timeline ops
From: Gustavo Padovan The .fill_driver_data() ops was just a useless abstraction for fence_ops op of the same name. Now that we use fence->seqno to store the value it is cleaner to remove the abstraction and fill the data directly. Signed-off-by: Gustavo Padovan --- drivers/staging/android/sw_sync.c | 14 -- drivers/staging/android/sync.c| 9 + drivers/staging/android/sync.h| 7 --- 3 files changed, 5 insertions(+), 25 deletions(-) diff --git a/drivers/staging/android/sw_sync.c b/drivers/staging/android/sw_sync.c index b9d53d3..428e22c 100644 --- a/drivers/staging/android/sw_sync.c +++ b/drivers/staging/android/sw_sync.c @@ -38,19 +38,6 @@ struct fence *sw_sync_pt_create(struct sw_sync_timeline *obj, u32 value) } EXPORT_SYMBOL(sw_sync_pt_create); -static int sw_sync_fill_driver_data(struct fence *fence, - void *data, int size) -{ - struct sw_sync_pt *pt = (struct sw_sync_pt *)fence; - - if (size < sizeof(pt->value)) - return -ENOMEM; - - memcpy(data, &pt->value, sizeof(pt->value)); - - return sizeof(pt->value); -} - static void sw_sync_timeline_value_str(struct sync_timeline *sync_timeline, char *str, int size) { @@ -68,7 +55,6 @@ static void sw_sync_fence_value_str(struct fence *fence, char *str, int size) static struct sync_timeline_ops sw_sync_timeline_ops = { .driver_name = "sw_sync", - .fill_driver_data = sw_sync_fill_driver_data, .timeline_value_str = sw_sync_timeline_value_str, .fence_value_str = sw_sync_fence_value_str, }; diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c index 4eea5c3..39af5fb 100644 --- a/drivers/staging/android/sync.c +++ b/drivers/staging/android/sync.c @@ -185,11 +185,12 @@ static bool android_fence_enable_signaling(struct fence *fence) static int android_fence_fill_driver_data(struct fence *fence, void *data, int size) { - struct sync_timeline *parent = fence_parent(fence); + if (size < sizeof(fence->seqno)) + return -ENOMEM; + + memcpy(data, &fence->seqno, sizeof(fence->seqno)); - if (!parent->ops->fill_driver_data) - return 0; - return parent->ops->fill_driver_data(fence, data, size); + return sizeof(fence->seqno); } static void android_fence_value_str(struct fence *fence, diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h index 4d3dfbf..500838b 100644 --- a/drivers/staging/android/sync.h +++ b/drivers/staging/android/sync.h @@ -27,10 +27,6 @@ struct sync_timeline; /** * struct sync_timeline_ops - sync object implementation ops * @driver_name: name of the implementation - * @fill_driver_data: write implementation specific driver data to data. - * should return an error if there is not enough room - * as specified by size. This information is returned - * to userspace by SYNC_IOC_FENCE_INFO. * @timeline_value_str: fill str with the value of the sync_timeline's counter * @fence_value_str: fill str with the value of the fence */ @@ -38,9 +34,6 @@ struct sync_timeline_ops { const char *driver_name; /* optional */ - int (*fill_driver_data)(struct fence *fence, void *data, int size); - - /* optional */ void (*timeline_value_str)(struct sync_timeline *timeline, char *str, int size); -- 2.5.0
[PATCH 05/11] staging/android: remove struct sync_timeline_ops
From: Gustavo Padovan Move drv_name, the last field of sync_timeline_ops, to sync_timeline and remove sync_timeline_ops. struct sync_timeline_ops was just an extra abstraction on top of fence_ops, and in the last few commits we removed all it ops in favor of cleaner fence_ops. Signed-off-by: Gustavo Padovan --- drivers/staging/android/sw_sync.c| 9 ++--- drivers/staging/android/sync.c | 8 drivers/staging/android/sync.h | 28 +--- drivers/staging/android/sync_debug.c | 3 +-- 4 files changed, 16 insertions(+), 32 deletions(-) diff --git a/drivers/staging/android/sw_sync.c b/drivers/staging/android/sw_sync.c index 4200b12..c5e92c6 100644 --- a/drivers/staging/android/sw_sync.c +++ b/drivers/staging/android/sw_sync.c @@ -38,16 +38,11 @@ struct fence *sw_sync_pt_create(struct sw_sync_timeline *obj, u32 value) } EXPORT_SYMBOL(sw_sync_pt_create); -static struct sync_timeline_ops sw_sync_timeline_ops = { - .driver_name = "sw_sync", -}; - struct sw_sync_timeline *sw_sync_timeline_create(const char *name) { struct sw_sync_timeline *obj = (struct sw_sync_timeline *) - sync_timeline_create(&sw_sync_timeline_ops, -sizeof(struct sw_sync_timeline), -name); + sync_timeline_create(sizeof(struct sw_sync_timeline), +"sw_sync", name); return obj; } diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c index f2d298c..07fe995 100644 --- a/drivers/staging/android/sync.c +++ b/drivers/staging/android/sync.c @@ -30,8 +30,8 @@ static const struct fence_ops android_fence_ops; -struct sync_timeline *sync_timeline_create(const struct sync_timeline_ops *ops, - int size, const char *name) +struct sync_timeline *sync_timeline_create(int size, const char *drv_name, + const char *name) { struct sync_timeline *obj; @@ -43,9 +43,9 @@ struct sync_timeline *sync_timeline_create(const struct sync_timeline_ops *ops, return NULL; kref_init(&obj->kref); - obj->ops = ops; obj->context = fence_context_alloc(1); strlcpy(obj->name, name, sizeof(obj->name)); + strlcpy(obj->drv_name, drv_name, sizeof(obj->drv_name)); INIT_LIST_HEAD(&obj->child_list_head); INIT_LIST_HEAD(&obj->active_list_head); @@ -139,7 +139,7 @@ static const char *android_fence_get_driver_name(struct fence *fence) { struct sync_timeline *parent = fence_parent(fence); - return parent->ops->driver_name; + return parent->drv_name; } static const char *android_fence_get_timeline_name(struct fence *fence) diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h index b1a4b06..be94a80 100644 --- a/drivers/staging/android/sync.h +++ b/drivers/staging/android/sync.h @@ -22,20 +22,10 @@ #include #include -struct sync_timeline; - -/** - * struct sync_timeline_ops - sync object implementation ops - * @driver_name: name of the implementation - */ -struct sync_timeline_ops { - const char *driver_name; -}; - /** * struct sync_timeline - sync object * @kref: reference count on fence. - * @ops: ops that define the implementation of the sync_timeline + * @drv_name: drv_name of the driver using the sync_timeline * @name: name of the sync_timeline. Useful for debugging * @destroyed: set when sync_timeline is destroyed * @child_list_head: list of children sync_pts for this sync_timeline @@ -46,7 +36,7 @@ struct sync_timeline_ops { */ struct sync_timeline { struct kref kref; - const struct sync_timeline_ops *ops; + chardrv_name[32]; charname[32]; /* protected by child_list_lock */ @@ -75,17 +65,17 @@ static inline struct sync_timeline *fence_parent(struct fence *fence) /** * sync_timeline_create() - creates a sync object - * @ops: specifies the implementation ops for the object * @size: size to allocate for this obj + * @drv_name: sync_timeline driver name * @name: sync_timeline name * - * Creates a new sync_timeline which will use the implementation specified by - * @ops. @size bytes will be allocated allowing for implementation specific - * data to be kept after the generic sync_timeline struct. Returns the - * sync_timeline object or NULL in case of error. + * Creates a new sync_timeline. @size bytes will be allocated allowing + * for implementation specific data to be kept after the generic + * sync_timeline struct. Returns the sync_timeline object or NULL in + * case of error. */ -struct sync_timeline *sync_timeline_create(const struct sync_timeline_ops *ops, - int size, const char *
[PATCH 07/11] staging/android: remove sw_sync.[ch] files
From: Gustavo Padovan We can glue the sw_sync file operations directly on the sync framework without the need to pass through sw_sync wrappers. It only builds sw_sync debugfs file support if CONFIG_SW_SYNC is enabled. Signed-off-by: Gustavo Padovan --- drivers/staging/android/Makefile | 1 - drivers/staging/android/sw_sync.c| 45 -- drivers/staging/android/sw_sync.h| 47 drivers/staging/android/sync_debug.c | 17 ++--- 4 files changed, 13 insertions(+), 97 deletions(-) delete mode 100644 drivers/staging/android/sw_sync.c delete mode 100644 drivers/staging/android/sw_sync.h diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile index c7b6c99..2c1d97f 100644 --- a/drivers/staging/android/Makefile +++ b/drivers/staging/android/Makefile @@ -7,4 +7,3 @@ obj-$(CONFIG_ANDROID_TIMED_OUTPUT) += timed_output.o obj-$(CONFIG_ANDROID_TIMED_GPIO) += timed_gpio.o obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)+= lowmemorykiller.o obj-$(CONFIG_SYNC) += sync.o sync_debug.o -obj-$(CONFIG_SW_SYNC) += sw_sync.o diff --git a/drivers/staging/android/sw_sync.c b/drivers/staging/android/sw_sync.c deleted file mode 100644 index 461dbd9..000 --- a/drivers/staging/android/sw_sync.c +++ /dev/null @@ -1,45 +0,0 @@ -/* - * drivers/base/sw_sync.c - * - * Copyright (C) 2012 Google, Inc. - * - * This software is licensed under the terms of the GNU General Public - * License version 2, as published by the Free Software Foundation, and - * may be copied, distributed, and modified under those terms. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - */ - -#include -#include -#include -#include -#include -#include -#include -#include - -#include "sw_sync.h" - -struct fence *sw_sync_pt_create(struct sync_timeline *obj, u32 value) -{ - return sync_pt_create(obj, sizeof(struct fence), value); -} -EXPORT_SYMBOL(sw_sync_pt_create); - -struct sync_timeline *sw_sync_timeline_create(const char *name) -{ - return sync_timeline_create(sizeof(struct sync_timeline), - "sw_sync", name); -} -EXPORT_SYMBOL(sw_sync_timeline_create); - -void sw_sync_timeline_inc(struct sync_timeline *obj, u32 inc) -{ - sync_timeline_signal(obj, inc); -} -EXPORT_SYMBOL(sw_sync_timeline_inc); diff --git a/drivers/staging/android/sw_sync.h b/drivers/staging/android/sw_sync.h deleted file mode 100644 index 9f26c62..000 --- a/drivers/staging/android/sw_sync.h +++ /dev/null @@ -1,47 +0,0 @@ -/* - * include/linux/sw_sync.h - * - * Copyright (C) 2012 Google, Inc. - * - * This software is licensed under the terms of the GNU General Public - * License version 2, as published by the Free Software Foundation, and - * may be copied, distributed, and modified under those terms. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - */ - -#ifndef _LINUX_SW_SYNC_H -#define _LINUX_SW_SYNC_H - -#include -#include -#include "sync.h" -#include "uapi/sw_sync.h" - -#if IS_ENABLED(CONFIG_SW_SYNC) -struct sync_timeline *sw_sync_timeline_create(const char *name); -void sw_sync_timeline_inc(struct sync_timeline *obj, u32 inc); - -struct fence *sw_sync_pt_create(struct sync_timeline *obj, u32 value); -#else -static inline struct sync_timeline *sw_sync_timeline_create(const char *name) -{ - return NULL; -} - -static inline void sw_sync_timeline_inc(struct sync_timeline *obj, u32 inc) -{ -} - -static inline struct fence *sw_sync_pt_create(struct sync_timeline *obj, - u32 value) -{ - return NULL; -} -#endif /* IS_ENABLED(CONFIG_SW_SYNC) */ - -#endif /* _LINUX_SW_SYNC_H */ diff --git a/drivers/staging/android/sync_debug.c b/drivers/staging/android/sync_debug.c index e984955..9312e6f 100644 --- a/drivers/staging/android/sync_debug.c +++ b/drivers/staging/android/sync_debug.c @@ -28,7 +28,11 @@ #include #include #include -#include "sw_sync.h" +#include +#include + +#include "uapi/sw_sync.h" +#include "sync.h" #ifdef CONFIG_DEBUG_FS @@ -202,6 +206,7 @@ static const struct file_operations sync_info_debugfs_fops = { .release= single_release, }; +#if IS_ENABLED(CONFIG_SW_SYNC) /* * *WARNING* * @@ -216,7 +221,7 @@ static int sw_sync_debugfs_open(struct inode *inode, struct file *file) get_task_comm(task_comm, current); - obj = sw_sync_timeline_create(task_comm); + obj = sync_timeline_create(sizeof(*obj), "sw_sync", task_comm); if (!obj)
[PATCH 01/11] dma-buf/sync_file: de-stage sync_file
From: Gustavo Padovan sync_file is useful to connect one or more fences to the file. The file is used by userspace to track fences. Signed-off-by: Gustavo Padovan --- drivers/Kconfig | 2 + drivers/dma-buf/Kconfig | 11 + drivers/dma-buf/Makefile | 1 + drivers/dma-buf/sync_file.c | 440 +++ drivers/staging/android/Kconfig | 1 + drivers/staging/android/sync.c | 419 - drivers/staging/android/sync.h | 105 + drivers/staging/android/sync_debug.c | 1 + drivers/staging/android/trace/sync.h | 44 drivers/staging/android/uapi/sync.h | 97 include/linux/sync_file.h| 123 ++ include/trace/events/sync_file.h | 57 + include/uapi/linux/sync.h| 97 13 files changed, 735 insertions(+), 663 deletions(-) create mode 100644 drivers/dma-buf/Kconfig create mode 100644 drivers/dma-buf/sync_file.c delete mode 100644 drivers/staging/android/uapi/sync.h create mode 100644 include/linux/sync_file.h create mode 100644 include/trace/events/sync_file.h create mode 100644 include/uapi/linux/sync.h diff --git a/drivers/Kconfig b/drivers/Kconfig index d2ac339..430f761 100644 --- a/drivers/Kconfig +++ b/drivers/Kconfig @@ -114,6 +114,8 @@ source "drivers/rtc/Kconfig" source "drivers/dma/Kconfig" +source "drivers/dma-buf/Kconfig" + source "drivers/dca/Kconfig" source "drivers/auxdisplay/Kconfig" diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig new file mode 100644 index 000..9824bc4 --- /dev/null +++ b/drivers/dma-buf/Kconfig @@ -0,0 +1,11 @@ +menu "DMABUF options" + +config SYNC_FILE + bool "sync_file support for fences" + default n + select ANON_INODES + select DMA_SHARED_BUFFER + ---help--- + This option enables the fence framework synchronization to export + sync_files to userspace that can represent one or more fences. +endmenu diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile index 57a675f..4a424ec 100644 --- a/drivers/dma-buf/Makefile +++ b/drivers/dma-buf/Makefile @@ -1 +1,2 @@ obj-y := dma-buf.o fence.o reservation.o seqno-fence.o +obj-$(CONFIG_SYNC_FILE)+= sync_file.o diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c new file mode 100644 index 000..92474dd --- /dev/null +++ b/drivers/dma-buf/sync_file.c @@ -0,0 +1,440 @@ +/* + * drivers/dma-buf/sync_file.c + * + * Copyright (C) 2012 Google, Inc. + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define CREATE_TRACE_POINTS +#include + +static const struct file_operations sync_file_fops; + +static struct sync_file *sync_file_alloc(int size, const char *name) +{ + struct sync_file *sync_file; + + sync_file = kzalloc(size, GFP_KERNEL); + if (!sync_file) + return NULL; + + sync_file->file = anon_inode_getfile("sync_file", &sync_file_fops, +sync_file, 0); + if (IS_ERR(sync_file->file)) + goto err; + + kref_init(&sync_file->kref); + strlcpy(sync_file->name, name, sizeof(sync_file->name)); + + init_waitqueue_head(&sync_file->wq); + + return sync_file; + +err: + kfree(sync_file); + return NULL; +} + +static void fence_check_cb_func(struct fence *f, struct fence_cb *cb) +{ + struct sync_file_cb *check; + struct sync_file *sync_file; + + check = container_of(cb, struct sync_file_cb, cb); + sync_file = check->sync_file; + + if (atomic_dec_and_test(&sync_file->status)) + wake_up_all(&sync_file->wq); +} + +/* TODO: implement a create which takes more that one fence */ +struct sync_file *sync_file_create(const char *name, struct fence *fence) +{ + struct sync_file *sync_file; + + sync_file = sync_file_alloc(offsetof(struct sync_file, cbs[1]), + name); + if (!sync_file) + return NULL; + + sync_file->num_fences = 1; + atomic_set(&sync_file->status, 1); + + sync_file->cbs[0].fence = fence; + sync_file->cbs[0].sync_file = sync_file; + if (fence_add_callback(fence, &sync_file->cbs[0].cb, + fence_check_cb_func)) + atomic_dec(&sync_file->status); + + return sync_file; +} +EXPO
Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
On Tue, 26 Jan 2016, Mike Galbraith wrote: > I disagree. You're burning electrons for no benefit at all to me on my > box. You want to do high speed trading, that's fine, but I expect my > box to be able to pop in and out of idle without having to pay a toll > to the high speed trading bandits of the world, thank you very much. > > This specialty thing does not belong in the generic fast path. The system going idle is a fastpath. Mind boogling.
[PATCH 04/11] staging/android: remove .{fence,timeline}_value_str() from timeline_ops
From: Gustavo Padovan Now that the value of fence and the timeline are not stored by sw_sync anymore we can remove this extra abstraction to retrieve this data. This patch changes both fence_ops (.fence_value_str and .timeline_value_str) to return the str directly. It also clean up struct sync_timeline_ops by removing both ops from there. Signed-off-by: Gustavo Padovan --- drivers/staging/android/sw_sync.c| 17 - drivers/staging/android/sync.c | 16 ++-- drivers/staging/android/sync.h | 9 - drivers/staging/android/sync_debug.c | 12 ++-- drivers/staging/android/trace/sync.h | 12 +++- 5 files changed, 7 insertions(+), 59 deletions(-) diff --git a/drivers/staging/android/sw_sync.c b/drivers/staging/android/sw_sync.c index 428e22c..4200b12 100644 --- a/drivers/staging/android/sw_sync.c +++ b/drivers/staging/android/sw_sync.c @@ -38,25 +38,8 @@ struct fence *sw_sync_pt_create(struct sw_sync_timeline *obj, u32 value) } EXPORT_SYMBOL(sw_sync_pt_create); -static void sw_sync_timeline_value_str(struct sync_timeline *sync_timeline, - char *str, int size) -{ - struct sw_sync_timeline *timeline = - (struct sw_sync_timeline *)sync_timeline; - snprintf(str, size, "%d", timeline->value); -} - -static void sw_sync_fence_value_str(struct fence *fence, char *str, int size) -{ - struct sw_sync_pt *pt = (struct sw_sync_pt *)fence; - - snprintf(str, size, "%d", pt->value); -} - static struct sync_timeline_ops sw_sync_timeline_ops = { .driver_name = "sw_sync", - .timeline_value_str = sw_sync_timeline_value_str, - .fence_value_str = sw_sync_fence_value_str, }; struct sw_sync_timeline *sw_sync_timeline_create(const char *name) diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c index 39af5fb..f2d298c 100644 --- a/drivers/staging/android/sync.c +++ b/drivers/staging/android/sync.c @@ -196,14 +196,7 @@ static int android_fence_fill_driver_data(struct fence *fence, static void android_fence_value_str(struct fence *fence, char *str, int size) { - struct sync_timeline *parent = fence_parent(fence); - - if (!parent->ops->fence_value_str) { - if (size) - *str = 0; - return; - } - parent->ops->fence_value_str(fence, str, size); + snprintf(str, size, "%d", fence->seqno); } static void android_fence_timeline_value_str(struct fence *fence, @@ -211,12 +204,7 @@ static void android_fence_timeline_value_str(struct fence *fence, { struct sync_timeline *parent = fence_parent(fence); - if (!parent->ops->timeline_value_str) { - if (size) - *str = 0; - return; - } - parent->ops->timeline_value_str(parent, str, size); + snprintf(str, size, "%d", parent->value); } static const struct fence_ops android_fence_ops = { diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h index 500838b..b1a4b06 100644 --- a/drivers/staging/android/sync.h +++ b/drivers/staging/android/sync.h @@ -27,18 +27,9 @@ struct sync_timeline; /** * struct sync_timeline_ops - sync object implementation ops * @driver_name: name of the implementation - * @timeline_value_str: fill str with the value of the sync_timeline's counter - * @fence_value_str: fill str with the value of the fence */ struct sync_timeline_ops { const char *driver_name; - - /* optional */ - void (*timeline_value_str)(struct sync_timeline *timeline, char *str, - int size); - - /* optional */ - void (*fence_value_str)(struct fence *fence, char *str, int size); }; /** diff --git a/drivers/staging/android/sync_debug.c b/drivers/staging/android/sync_debug.c index b37412d..7517fb3 100644 --- a/drivers/staging/android/sync_debug.c +++ b/drivers/staging/android/sync_debug.c @@ -134,16 +134,8 @@ static void sync_print_obj(struct seq_file *s, struct sync_timeline *obj) struct list_head *pos; unsigned long flags; - seq_printf(s, "%s %s", obj->name, obj->ops->driver_name); - - if (obj->ops->timeline_value_str) { - char value[64]; - - obj->ops->timeline_value_str(obj, value, sizeof(value)); - seq_printf(s, ": %s", value); - } - - seq_puts(s, "\n"); + seq_printf(s, "%s %s: %d\n", obj->name, obj->ops->driver_name, + obj->value); spin_lock_irqsave(&obj->child_list_lock, flags); list_for_each(pos, &obj->child_list_head) { diff --git a/drivers/staging/android/trace/sync.h b/drivers/staging/android/trace/sync.h index a0f80f4..d7f6457f 100644 --- a/drivers/staging/android/trace/sync.h +++ b/drivers/staging/android/trace/sync.h @@ -15,21 +15,15 @@ TRACE_EVENT(sync_timeline, TP_S
[PATCH 02/11] staging/android: store last signaled value on sync timeline
From: Gustavo Padovan Now fence timeline is aware of the last signaled fence, as it receives the increment to the current value in sync_timeline_signal(). That allow us to remove .has_signaled() from timeline_ops as we can directly compare using timeline->value and fence->seqno in sync.c Signed-off-by: Gustavo Padovan --- drivers/staging/android/sw_sync.c | 16 ++-- drivers/staging/android/sync.c| 15 +++ drivers/staging/android/sync.h| 14 +- 3 files changed, 14 insertions(+), 31 deletions(-) diff --git a/drivers/staging/android/sw_sync.c b/drivers/staging/android/sw_sync.c index 3bee959..b9d53d3 100644 --- a/drivers/staging/android/sw_sync.c +++ b/drivers/staging/android/sw_sync.c @@ -30,7 +30,7 @@ struct fence *sw_sync_pt_create(struct sw_sync_timeline *obj, u32 value) struct sw_sync_pt *pt; pt = (struct sw_sync_pt *) - sync_pt_create(&obj->obj, sizeof(struct sw_sync_pt)); + sync_pt_create(&obj->obj, sizeof(struct sw_sync_pt), value); pt->value = value; @@ -38,15 +38,6 @@ struct fence *sw_sync_pt_create(struct sw_sync_timeline *obj, u32 value) } EXPORT_SYMBOL(sw_sync_pt_create); -static int sw_sync_fence_has_signaled(struct fence *fence) -{ - struct sw_sync_pt *pt = (struct sw_sync_pt *)fence; - struct sw_sync_timeline *obj = - (struct sw_sync_timeline *)fence_parent(fence); - - return (pt->value > obj->value) ? 0 : 1; -} - static int sw_sync_fill_driver_data(struct fence *fence, void *data, int size) { @@ -77,7 +68,6 @@ static void sw_sync_fence_value_str(struct fence *fence, char *str, int size) static struct sync_timeline_ops sw_sync_timeline_ops = { .driver_name = "sw_sync", - .has_signaled = sw_sync_fence_has_signaled, .fill_driver_data = sw_sync_fill_driver_data, .timeline_value_str = sw_sync_timeline_value_str, .fence_value_str = sw_sync_fence_value_str, @@ -96,8 +86,6 @@ EXPORT_SYMBOL(sw_sync_timeline_create); void sw_sync_timeline_inc(struct sw_sync_timeline *obj, u32 inc) { - obj->value += inc; - - sync_timeline_signal(&obj->obj); + sync_timeline_signal(&obj->obj, inc); } EXPORT_SYMBOL(sw_sync_timeline_inc); diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c index 1e1c009..4eea5c3 100644 --- a/drivers/staging/android/sync.c +++ b/drivers/staging/android/sync.c @@ -90,7 +90,7 @@ void sync_timeline_destroy(struct sync_timeline *obj) } EXPORT_SYMBOL(sync_timeline_destroy); -void sync_timeline_signal(struct sync_timeline *obj) +void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) { unsigned long flags; struct fence *fence, *next; @@ -99,6 +99,8 @@ void sync_timeline_signal(struct sync_timeline *obj) spin_lock_irqsave(&obj->child_list_lock, flags); + obj->value += inc; + list_for_each_entry_safe(fence, next, &obj->active_list_head, active_list) { if (fence_is_signaled_locked(fence)) @@ -109,7 +111,8 @@ void sync_timeline_signal(struct sync_timeline *obj) } EXPORT_SYMBOL(sync_timeline_signal); -struct fence *sync_pt_create(struct sync_timeline *obj, int size) +struct fence *sync_pt_create(struct sync_timeline *obj, int size, +unsigned int value) { unsigned long flags; struct fence *fence; @@ -124,7 +127,7 @@ struct fence *sync_pt_create(struct sync_timeline *obj, int size) spin_lock_irqsave(&obj->child_list_lock, flags); sync_timeline_get(obj); fence_init(fence, &android_fence_ops, &obj->child_list_lock, - obj->context, ++obj->value); + obj->context, value); list_add_tail(&fence->child_list, &obj->child_list_head); INIT_LIST_HEAD(&fence->active_list); spin_unlock_irqrestore(&obj->child_list_lock, flags); @@ -164,12 +167,8 @@ static void android_fence_release(struct fence *fence) static bool android_fence_signaled(struct fence *fence) { struct sync_timeline *parent = fence_parent(fence); - int ret; - ret = parent->ops->has_signaled(fence); - if (ret < 0) - fence->status = ret; - return ret; + return (fence->seqno > parent->value) ? false : true; } static bool android_fence_enable_signaling(struct fence *fence) diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h index fb209fc..4d3dfbf 100644 --- a/drivers/staging/android/sync.h +++ b/drivers/staging/android/sync.h @@ -27,10 +27,6 @@ struct sync_timeline; /** * struct sync_timeline_ops - sync object implementation ops * @driver_name: name of the implementation - * @has_signaled: returns: - * 1 if pt has signaled - * 0 if pt has not signaled - * <0 on error * @fill_driver_da
Re: [PATCH v1 04/12] xen/hvmlite: Bootstrap HVMlite guest
On Mon, Jan 25, 2016 at 05:28:08PM -0500, Boris Ostrovsky wrote: > On 01/25/2016 04:21 PM, H. Peter Anvin wrote: > >On 01/25/16 13:12, Luis R. Rodriguez wrote: > >>>Perhaps, but someone would still have to set hardware_subarch. And > >>>it's hvmlite_bootparams() that does it. > >>No, Xen would do it as well, essentially all of hvmlite_bootparams() could > >>be > >>done in Xen. > >> > >Or a stub code. > > This patch in fact is the stub for Xen HVMlite guests, after we are > done with it we jump to bare-metal startup code (i.e startup_32|64) Right the point is the stub need not be in Linux, I'll explain in the other thread where I provided more details on the different known approaches. Luis
Re: [PATCH 4/4] perf hists browser: Check script context menu
Em Sat, Jan 23, 2016 at 10:31:42PM +0900, Namhyung Kim escreveu: > The script and data-switch context menu are only meaningful when it > deals with a data file. So add a check so that it cannot be shown when > perf-top is run. > > Signed-off-by: Namhyung Kim > --- > tools/perf/ui/browsers/hists.c | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c > index 05e94feba3cb..0aeed89c 100644 > --- a/tools/perf/ui/browsers/hists.c > +++ b/tools/perf/ui/browsers/hists.c > @@ -2309,7 +2309,7 @@ skip_annotation: >socked_id); > } > /* perf script support */ I instead used: if (is_report_browser(hbt) goto skip_scripting; > - if (browser->he_selection) { > + if (is_report_browser(hbt) && browser->he_selection) { > if (sort__has_thread) { > nr_options += add_script_opt(browser, > > &actions[nr_options], > @@ -2332,10 +2332,12 @@ skip_annotation: >NULL, > browser->selection->sym); > } > } > - nr_options += add_script_opt(browser, &actions[nr_options], > - &options[nr_options], NULL, NULL); > - nr_options += add_switch_opt(browser, &actions[nr_options], > - &options[nr_options]); > + if (is_report_browser(hbt)) { > + nr_options += add_script_opt(browser, > &actions[nr_options], > + &options[nr_options], > NULL, NULL); > + nr_options += add_switch_opt(browser, > &actions[nr_options], > + &options[nr_options]); > + } skip_scripting: > nr_options += add_exit_opt(browser, &actions[nr_options], > &options[nr_options]); > Also the other patches in this series were already done in my tree, carved out from your initial patch but instead checking things at add_foo_opt() in most cases, I'll push it to Ingo to work on another batch, after Jiri's questions are sorted out,
[PATCH] isdn: Remove unnecessary cast in kfree
Remove unnecassary casts in the argument to kfree. Found using Coccinelle. The semantic patch used to find this is as follows: // @@ type T; expression *f; @@ - kfree((T *)(f)); + kfree(f); // Signed-off-by: Amitoj Kaur Chawla --- drivers/isdn/hisax/fsm.c | 2 +- drivers/isdn/mISDN/fsm.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/isdn/hisax/fsm.c b/drivers/isdn/hisax/fsm.c index c7a9471..497f815 100644 --- a/drivers/isdn/hisax/fsm.c +++ b/drivers/isdn/hisax/fsm.c @@ -45,7 +45,7 @@ FsmNew(struct Fsm *fsm, struct FsmNode *fnlist, int fncount) void FsmFree(struct Fsm *fsm) { - kfree((void *) fsm->jumpmatrix); + kfree(fsm->jumpmatrix); } int diff --git a/drivers/isdn/mISDN/fsm.c b/drivers/isdn/mISDN/fsm.c index 26477d4..5ac2dac 100644 --- a/drivers/isdn/mISDN/fsm.c +++ b/drivers/isdn/mISDN/fsm.c @@ -51,7 +51,7 @@ EXPORT_SYMBOL(mISDN_FsmNew); void mISDN_FsmFree(struct Fsm *fsm) { - kfree((void *) fsm->jumpmatrix); + kfree(fsm->jumpmatrix); } EXPORT_SYMBOL(mISDN_FsmFree); -- 1.9.1