Re: mm: BUG in do_huge_pmd_wp_page
On Thu, Apr 11, 2013 at 04:18:13PM +0300, Kirill A. Shutemov wrote: > Minchan Kim wrote: > > On Fri, Mar 29, 2013 at 09:04:16AM -0400, Sasha Levin wrote: > > > Hi all, > > > > > > While fuzzing with trinity inside a KVM tools guest running latest -next > > > kernel, > > > I've stumbled on the following. > > > > > > It seems that the code in do_huge_pmd_wp_page() was recently modified in > > > "thp: do_huge_pmd_wp_page(): handle huge zero page". > > > > > > Here's the trace: > > > > > > [ 246.244708] BUG: unable to handle kernel paging request at > > > 88009c422000 > > > [ 246.245743] IP: [] copy_page_rep+0x5/0x10 > > > [ 246.250569] PGD 7232067 PUD 7235067 PMD bfefe067 PTE 80009c422060 > > > [ 246.251529] Oops: [#1] PREEMPT SMP DEBUG_PAGEALLOC > > > [ 246.252325] Dumping ftrace buffer: > > > [ 246.252791](ftrace buffer empty) > > > [ 246.252869] Modules linked in: > > > [ 246.252869] CPU 3 > > > [ 246.252869] Pid: 11985, comm: trinity-child12 Tainted: GW > > > 3.9.0-rc4-next-20130328-sasha-00014-g91a3267 #319 > > > [ 246.252869] RIP: 0010:[] [] > > > copy_page_rep+0x5/0x10 > > > [ 246.252869] RSP: 0018:8815bc40 EFLAGS: 00010286 > > > [ 246.252869] RAX: 8815bfd8 RBX: 02710880 RCX: > > > 0200 > > > [ 246.252869] RDX: RSI: 88009c422000 RDI: > > > 88009a422000 > > > [ 246.252869] RBP: 8815bc98 R08: 02718000 R09: > > > 0001 > > > [ 246.252869] R10: 0001 R11: R12: > > > 8800 > > > [ 246.252869] R13: 8815bfd8 R14: 8815bfd8 R15: > > > fff8 > > > [ 246.252869] FS: 7f53db93f700() GS:8800bba0() > > > knlGS: > > > [ 246.252869] CS: 0010 DS: ES: CR0: 80050033 > > > [ 246.252869] CR2: 88009c422000 CR3: 00159000 CR4: > > > 000406e0 > > > [ 246.252869] DR0: DR1: DR2: > > > > > > [ 246.252869] DR3: DR6: 0ff0 DR7: > > > 0400 > > > [ 246.252869] Process trinity-child12 (pid: 11985, threadinfo > > > 8815a000, task 88009c60b000) > > > [ 246.252869] Stack: > > > [ 246.252869] 81234aae 8815bc88 81273639 > > > 00a0 > > > [ 246.252869] 02718000 8800ab36d050 88153800 > > > ea000269 > > > [ 246.252869] 00a0 8800ab36d000 ea000271 > > > 8815bd48 > > > [ 246.252869] Call Trace: > > > [ 246.252869] [] ? copy_user_huge_page+0x1de/0x240 > > > [ 246.252869] [] ? mem_cgroup_charge_common+0xa9/0xc0 > > > [ 246.252869] [] do_huge_pmd_wp_page+0x9f7/0xc60 > > > [ 246.252869] [] ? __const_udelay+0x29/0x30 > > > [ 246.252869] [] handle_mm_fault+0x26e/0x650 > > > [ 246.252869] [] ? __lock_is_held+0x5a/0x80 > > > [ 246.252869] [] ? __do_page_fault+0x514/0x5e0 > > > [ 246.252869] [] __do_page_fault+0x570/0x5e0 > > > [ 246.252869] [] ? rcu_eqs_exit_common+0x60/0x260 > > > [ 246.252869] [] ? rcu_eqs_enter_common+0x33e/0x3b0 > > > [ 246.252869] [] ? rcu_eqs_exit+0x9c/0xb0 > > > [ 246.252869] [] do_page_fault+0x32/0x50 > > > [ 246.252869] [] do_async_page_fault+0x30/0xc0 > > > [ 246.252869] [] async_page_fault+0x28/0x30 > > > [ 246.252869] Code: 90 90 90 90 90 90 9c fa 65 48 3b 06 75 14 65 48 3b > > > 56 08 75 0d 65 48 89 1e 65 48 89 4e 08 9d b0 01 c3 9d 30 > > > c0 c3 b9 00 02 00 00 48 a5 c3 0f 1f 80 00 00 00 00 eb ee 66 66 66 90 > > > 66 66 66 90 > > > [ 246.252869] RIP [] copy_page_rep+0x5/0x10 > > > [ 246.252869] RSP > > > [ 246.252869] CR2: 88009c422000 > > > [ 246.252869] ---[ end trace 09fbe37b108d5766 ]--- > > > > > > And this is the code: > > > > > > if (is_huge_zero_pmd(orig_pmd)) > > > clear_huge_page(new_page, haddr, HPAGE_PMD_NR); > > > else > > > copy_user_huge_page(new_page, page, haddr, vma, > > > HPAGE_PMD_NR); <--- this > > > > > > > > > Thanks, > > > Sasha > > > > I don't know this issue was already resolved. If so, my reply become a just > > question to Kirill regardless of this BUG. > > > > When I am looking at the code, I was wonder about the logic of GHZP(aka, > > get_huge_zero_page) reference handling. The logic depends on that page > > allocator never alocate PFN 0. > > > > Who makes sure it? What happens if allocator allocates PFN 0? > > I don't know all of architecture makes sure it. > > You investigated it for all arches? > > > > If not, > > CPU 1 CPU 2 CPU 3 > > > > shrink_huge_zero_page > > huge_zero_refcount = 0; > > GHZP > > pfn_0_zero_page = alloc_pages > > GHZP > > pfn_some_zero_page > > = alloc_page >
[GIT PULL REQUEST] watchdog - v3.9-rc6 Fixes
Hi Linus, Please pull from 'master' branch of git://www.linux-watchdog.org/linux-watchdog.git It will fix compile errors for teh at91rm9200_wdt driver. This will update the following files: Kconfig |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) with these Changes: commit 09549cd01726a7ff8b102a93e46b059531583ab6 Author: Nicolas Ferre Date: Wed Apr 10 14:36:22 2013 +0200 watchdog: Revert the AT91RM9200_WATCHDOG dependency Compiling the at91rm9200_wdt.c driver without at91rm9200 support was leading to several errors: drivers/built-in.o: In function `at91_wdt_close': at91_adc.c:(.text+0xc9fe4): undefined reference to `at91_st_base' drivers/built-in.o: In function `at91_wdt_write': at91_adc.c:(.text+0xca004): undefined reference to `at91_st_base' drivers/built-in.o: In function `at91wdt_shutdown': at91_adc.c:(.text+0xca01c): undefined reference to `at91_st_base' drivers/built-in.o: In function `at91wdt_suspend': at91_adc.c:(.text+0xca038): undefined reference to `at91_st_base' drivers/built-in.o: In function `at91_wdt_open': at91_adc.c:(.text+0xca0cc): undefined reference to `at91_st_base' drivers/built-in.o:at91_adc.c:(.text+0xca2c8): more undefined references to `at91_st_base' follow So, reverting the modification of the "depends" Kconfig line introduced by patch a6a1bcd37 (watchdog: at91rm9200: add DT support) seems to be the good solution. Signed-off-by: Nicolas Ferre Acked-by: Guenter Roeck Signed-off-by: Wim Van Sebroeck For completeness, I added the overal diff below. Greetings, Wim. diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig index 9fcc70c..e89fc31 100644 --- a/drivers/watchdog/Kconfig +++ b/drivers/watchdog/Kconfig @@ -117,7 +117,7 @@ config ARM_SP805_WATCHDOG config AT91RM9200_WATCHDOG tristate "AT91RM9200 watchdog" - depends on ARCH_AT91 + depends on ARCH_AT91RM9200 help Watchdog timer embedded into AT91RM9200 chips. This will reboot your system when the timeout is reached. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Return value of __mm_populate
Hi, Il 14/04/2013 02:18, KOSAKI Motohiro ha scritto: (4/13/13 5:14 AM), Marco Stornelli wrote: Hi, I was seeing the code of __mm_populate (in -next) and I've got a doubt about the return value. The function __mlock_posix_error_return should return a proper error for mlock, converting the return value from __get_user_pages. It checks for EFAULT and ENOMEM. Actually __get_user_pages could return, in addition, ERESTARTSYS and EHWPOISON. __get_user_pages doesn't return EHWPOISON if FOLL_HWPOISON is not specified. I'm not expert ERESTARTSYS. I understand correctly, ERESTARTSYS is only returned when signal received, and signal handling routine (e.g. do_signal) modify EIP and hidden ERESTARTSYS from userland generically. Yep, you're right, the "magic" is inside the signal management. Thanks!! Marco -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v7 00/11] Support vrange for anonymous page
Hi KOSAKI, On Thu, Apr 11, 2013 at 11:01:11AM -0400, KOSAKI Motohiro wrote: > and adding new syscall invokation is unwelcome. > >>> > >>> Sure. But one more system call could be cheaper than page-granuarity > >>> operation on purged range. > >> > >> I don't think vrange(VOLATILE) cost is the related of this discusstion. > >> Whether sending SIGBUS or just nuke pte, purge should be done on vmscan, > >> not vrange() syscall. > > > > Again, please see the MADV_FREE. http://lwn.net/Articles/230799/ > > It does changes pte and page flags on all pages of the range through > > zap_pte_range. So it would make vrange(VOLASTILE) expensive and > > the bigger cost is, the bigger range is. > > This haven't been crossed my mind. now try_to_discard_one() insert vrange > for making SIGBUS. then, we can insert pte_none() as the same cost too. Am > I missing something? For your requirement, we need some tracking model to detect some page is using by the process currently before VM discards it *if* we don't give vrange(NOVOLATILE) pair system call(Look at below). So the tracking model should be formed in vrange(VOLATILE) system call context. > > I couldn't imazine why pte should be zapping on vrange(VOLATILE). Sorry, my explanation was too bad to understand. I will try again. First of all, thing you want is almost like MADV_FREE. So let's look at it firstly. If you call madvise(range, MADV_FREE), VM should investigate all of pages mapped at page table for range(start, start + len) so we need page table lookup for the range and mark a flag to all page descriptor (ex,PG_lazyfree) to give hint to kernel for discarding the page instead of swappint out when reclaim happens. Another thing we need is to clear out a dirty bit from PTE to detect the pages is dirtied or not, since we call madvise(range, MADV_FREE) because we can't discard them, which are using by some process since he called madvise. So if VM find the page has PG_lazyfree but the page is dirtied recenlty by peeking PTE, VM can't discard the page. So madivse system call's overhead is folloinwg as in madvise(MADV_FREE) 1. look up all pages from page table for the range. 2. mark some bit(PG_lazyfree) for page descriptors of pages mapped at range 3. clear dirty bit and TLB flush So, madvise(MADV_FREE) would be better than madvise(DONTNEED) because it can avoid page fault if memory pressure doesn't happen but system call overhead could be still huge and expecially the overhead is increased proportionally by range size. Let's talk about vrange(range, VOLATILE) The overhead of it is very small, which is just mark a flag into a structure which represents the range (ie, struct vrange). When VM want to reclaim some pages, VM find a page is mapped at VOLATILE area, so it could discard it instead of swapping out. It moves the ovehead from system call itself to VM reclaim path which is very slow path in the system and I think it's desirable design(And that's why we have rmap). But the problem is remained. VM can't detect page using by process after he calls vrange(range, VOLATILE) because we didn't do anything in vrange(VOLATILE) so VM might discard the page under the process. It didn't happen in madvise(MADV_FREE) because it cleared out dirty bit of PTE to detect the page is used or not since madvise is called. Solution in vrange is to make new vrange(range, NOVOLATILE) system call, which give the hint to kernel for preventing descarding pages in the range any more. The cost of vrange(range, NOVOLATILE) is very small, too. It just clear out the flags from a struct vrange which represents a range. So I think calling of pair system call about volatile would be cheaper than a only madvise(MADV_FREE). I hope it helps your understanding but not sure because I am writing this in airport which are very hard to focus my work. :( > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org";> em...@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Documentation: cfq-iosched: update documentation help for cfq tunnables
On Sat, Apr 13 2013, Rob Landley wrote: > Cleaning out "look at this" directory, I don't see this applied upstream but > it may already be in Jens' tree. (That's the tree it should go in > through...) It's already included, see: http://git.kernel.dk/?p=linux-block.git;a=commit;h=fdc6fdc52e4630f5020281ce5450be7cc1887de2 I changed some of the wording. But since this is your forte, please do send any incremental patches against what is already in there. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] mtip32xx: mtip32xx: Disable TRIM support
On Fri, Apr 12 2013, Asai Thambi S P wrote: > > Temporarily disabling TRIM support until TRIM related issues > are addressed in the firmware. How serious is this? We do have released kernels out there with the driver, you might want to consider a stable backport too. Anyway, applied for 3.10. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] mtip32xx: fix a smatch warning
On Fri, Apr 12 2013, Asai Thambi S P wrote: > > Reported smatch warning: > drivers/block/mtip32xx/mtip32xx.c:4163 mtip_block_shutdown() warn: variable > dereferenced before check 'dd->disk' (see line 4159) > > dd->disk->disk_name accessed before the check if dd->disk is NULL. Fixed this > and access of dd->queue/dd->disk->queue. Applied for 3.10, thanks. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git pull] drm fixes
Hi Linus one fix for a hotplug locking regressions, and one fix for an oops if you unplug the monitor at an inopportune moment on the udl device. Dave. The following changes since commit cfb63bafdb87bbcdc5d6dbbca623d3f69475f118: Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma (2013-04-11 20:35:11 -0700) are available in the git repository at: git://people.freedesktop.org/~airlied/linux drm-fixes for you to fetch changes up to 89ced125472b8551c65526934b7f6c733a6864fa: drm/fb-helper: Fix locking in drm_fb_helper_hotplug_event (2013-04-12 14:21:12 +1000) Daniel Vetter (1): drm/fb-helper: Fix locking in drm_fb_helper_hotplug_event Dave Airlie (1): udl: handle EDID failure properly. drivers/gpu/drm/drm_fb_helper.c | 8 +--- drivers/gpu/drm/udl/udl_connector.c | 4 2 files changed, 9 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Add a Kconfig shortcut for a kvm-bootable kernel
Hello, On 4/12/13 9:19 PM, Borislav Petkov wrote: so I'm currently experimenting with my randconfig build scripts and thought that maybe it would be a cool thing to not only do the random builds only but also boot-test them in kvm. Which reminded me that we have that KVMTOOL_TEST_ENABLE config option in the kvmtool with which we can select all the stuff needed to boot the kernel in kvm. So I copied it. I now have an all.config in the repo with CONFIG_KVM_TEST_ENABLE=y in it so that the random builds can have the required support. So what do people think? It is pretty helpful for such testing; AFAICT Fengguang is doing his testing with kvm so he probably could use it too. And regardless, there are more and more reasons to boot the kernel in kvm so having a single option which selects the needed support makes more sense with time. And I haven't picked up the 'make kvmconfig' functionality because it is not strictly needed (yet) but it wouldn't hurt if we took it because someone has a good reason for needing it. I obviously support having something like this in mainline. I wonder though if we could just call this "default standalone KVM guest config" instead of emphasizing testing angle. Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6] mmc: core: Add support for idle time BKOPS
Devices have various maintenance operations need to perform internally. In order to reduce latencies during time critical operations like read and write, it is better to execute maintenance operations in other times - when the host is not being serviced. Such operations are called Background operations (BKOPS). The device notifies the status of the BKOPS need by updating BKOPS_STATUS (EXT_CSD byte [246]). According to the standard a host that supports BKOPS shall check the status periodically and start background operations as needed, so that the device has enough time for its maintenance operations. This patch adds support for this periodic check of the BKOPS status. Since foreground operations are of higher priority than background operations the host will check the need for BKOPS when it is idle (in runtime suspend), and in case of an incoming request the BKOPS operation will be interrupted. If the card raised an exception with need for urgent BKOPS (level 2/3) a flag will be set to indicate MMC to start the BKOPS activity when it becomes idle. Since running the BKOPS too often can impact the eMMC endurance, the card need for BKOPS is not checked on every runtime suspend. In order to estimate when is the best time to check for BKOPS need the host will take into account the card capacity and percentages of changed sectors in the card. A future enhancement can be to check the card need for BKOPS only in case of random activity. Signed-off-by: Maya Erez --- This patch depends on the following patches: [PATCH V2 1/2] mmc: core: Add bus_ops fro runtime pm callbacks [PATCH V2 2/2] mmc: block: Enable runtime pm for mmc blkdevice --- diff --git a/Documentation/mmc/mmc-dev-attrs.txt b/Documentation/mmc/mmc-dev-attrs.txt index 189bab0..8257aa6 100644 --- a/Documentation/mmc/mmc-dev-attrs.txt +++ b/Documentation/mmc/mmc-dev-attrs.txt @@ -8,6 +8,15 @@ The following attributes are read/write. force_roEnforce read-only access even if write protect switch is off. + bkops_check_threshold This attribute is used to determine whether + the status bit that indicates the need for BKOPS should be checked. + The value should be given in percentages of the card size. + This value is used to calculate the minimum number of sectors that + needs to be changed in the device (written or discarded) in order to + require the status-bit of BKOPS to be checked. + The value can modified via sysfs by writing the required value to: + /sys/block//bkops_check_threshold + SD and MMC Device Attributes diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c index 536331a..ef42117 100644 --- a/drivers/mmc/card/block.c +++ b/drivers/mmc/card/block.c @@ -116,6 +116,7 @@ struct mmc_blk_data { unsigned intpart_curr; struct device_attribute force_ro; struct device_attribute power_ro_lock; + struct device_attribute bkops_check_threshold; int area_type; }; @@ -287,6 +288,65 @@ out: return ret; } +static ssize_t +bkops_check_threshold_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct mmc_blk_data *md = mmc_blk_get(dev_to_disk(dev)); + struct mmc_card *card = md->queue.card; + int ret; + + if (!card) + ret = -EINVAL; + else + ret = snprintf(buf, PAGE_SIZE, "%d\n", + card->bkops_info.size_percentage_to_start_bkops); + + mmc_blk_put(md); + return ret; +} + +static ssize_t +bkops_check_threshold_store(struct device *dev, +struct device_attribute *attr, +const char *buf, size_t count) +{ + int value; + struct mmc_blk_data *md = mmc_blk_get(dev_to_disk(dev)); + struct mmc_card *card = md->queue.card; + unsigned int card_size; + int ret = count; + + if (!card) { + ret = -EINVAL; + goto exit; + } + + sscanf(buf, "%d", &value); + if ((value <= 0) || (value >= 100)) { + ret = -EINVAL; + goto exit; + } + + card_size = (unsigned int)get_capacity(md->disk); + if (card_size <= 0) { + ret = -EINVAL; + goto exit; + } + card->bkops_info.size_percentage_to_start_bkops = value; + card->bkops_info.min_sectors_to_start_bkops = + (card_size * value) / 100; + + pr_debug("%s: size_percentage = %d, min_sectors = %d", + mmc_hostname(card->host), + card->bkops_info.size_percentage_to_start_bkops, + card->bkops_info.min_sectors_to_start_bkops); + +exit: + mmc_blk_put(md); + return count; +} + static int mmc_blk_open(struct block_device *bdev, fmode_t mode) { struct mmc_blk_data *md = mmc_blk_get(bdev->bd_disk); @@ -
Re: [PATCH] x86: Add a Kconfig shortcut for a kvm-bootable kernel
On Sun, Apr 14, 2013 at 12:31:12PM +0300, Pekka Enberg wrote: > I obviously support having something like this in mainline. I wonder > though if we could just call this "default standalone KVM guest > config" instead of emphasizing testing angle. /me nods agreeingly... And it should be unter HYPERVISOR_GUEST where the rest of this stuff resides. Good point. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL REQUEST] watchdog - v3.9-rc6 Fixes
On Sun, Apr 14, 2013 at 09:17:03AM +0200, Wim Van Sebroeck wrote: > Hi Linus, > > Please pull from 'master' branch of > git://www.linux-watchdog.org/linux-watchdog.git > > It will fix compile errors for teh at91rm9200_wdt driver. > > This will update the following files: > > Kconfig |2 +- > 1 files changed, 1 insertion(+), 1 deletion(-) > > with these Changes: > > commit 09549cd01726a7ff8b102a93e46b059531583ab6 > Author: Nicolas Ferre > Date: Wed Apr 10 14:36:22 2013 +0200 > Hi Wim, What is your take on "watchdog: Fix race condition in registration code" [1] ? Thanks, Guenter [1] http://www.spinics.net/lists/linux-watchdog/msg02291.html, https://patchwork.kernel.org/patch/2400801/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 08/30] i2c: s3c2410: make header file local
On Thu, Apr 11, 2013 at 02:04:50AM +0200, Arnd Bergmann wrote: > No other file in the kernel besides i2c-s3c2410.c uses the current > plat/regs-iic.h, so we can simply move the header file to live in the > same directory as the driver, as a preparation to multiplatform builds. What about putting the regs in the driver itself? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] doc: Hold down best practices for pull requests
On 04/13/13 23:46, Rob Landley wrote: > On 04/06/2013 03:55:26 PM, Randy Dunlap wrote: >> On 03/03/13 04:43, Borislav Petkov wrote: >> > From: Borislav Petkov >> > >> > Documentation/SubmittingPullRequests | 148 >> > +++ >> > 1 file changed, 148 insertions(+) >> > create mode 100644 Documentation/SubmittingPullRequests >> > >> > diff --git a/Documentation/SubmittingPullRequests >> > b/Documentation/SubmittingPullRequests >> > new file mode 100644 >> > index ..d123745e0cf5 >> > --- /dev/null >> > +++ b/Documentation/SubmittingPullRequests >> > @@ -0,0 +1,148 @@ > >> > +1.) The patchset going to an upper level maintainer should NOT be based >> > +on some random, potentially completely broken commit in the middle of a >> > +merge window, or some other random point in the tree history. >> > + >> > +Tangential to that, it shouldn't contain back-merges - not to "next" >> > +trees, and not to a "random commit of the day" in Linus' tree. > > Could you do positive advice first instead of negative advice? "Base your > tree on a release version, and never re-pull between releases without a damn > good reason." > > Not "don't do this, don't do this, don't do this" and make them figure out > what they _should_ do by process of elimination. agreed. >> > +Here's Linus counting the ways why you shouldn't make merges yourself: >> > + >> > +" - I'm usually a day or two behind in my merge queue anyway, partly >> > +because I get tons of pull requests in a short while and I just want >> > +to get a feel for what's going on, and partly because I tend to do >> > +pulls in waves of "ok, I'm going filesystems now, then I'll look at >> >> doing ? >> >> > +drivers". > > Given that he's quoting linus, it would be "[doing]". ack. >> > +8.) After the maintainer has pulled, it is always a good idea to take a >> > +look at the merge and verify it has happened as you've expected it to, >> > +maybe even run your tests on it to double-check everything went fine. >> > + >> > +Further reading: Documentation/development-process/* >> > >> >> Looks good and useful overall. > > Looks longer than necessary to me, and if we have a > Documentation/development-process why isn't this going in there instead of at > the top level? (Although really why isn't it just another couple bullet > points under submittingpatches?) Well, yes, my first thought was actually why not update SubmittingPatches instead of add this new file. -- ~Randy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/2] pps: new client driver using GPIO
Hi James, I am a newbie to linux kernel device driver and would like to use this client driver on my uClinux running on the NIOS2. Can you kindly point me to the right direction, since I am using a device tree and believe this doesn't support device tree, right? What do I need to add/modify so I can use a input GPIO as a source? I saw a google post to add this code to (?? an unknown) and then you need to call pps_init in the configuration routine (?? not sure what it mean) add /* PPS-GPIO platform data */ static struct pps_gpio_platform_data pps_gpio_info = { .assert_falling_edge = false, .capture_clear= false, .gpio_pin=63, .gpio_label="PPS", }; static struct platform_device pps_gpio_device = { .name = "pps-gpio", .id = -1, .dev = { .platform_data = &pps_gpio_info }, }; static void pps_init(int evm_id, int profile) { int err; err = platform_device_register(&pps_gpio_device); if (err) { pr_warning("Could not register PPS_GPIO device"); } } Thanks in advance for any help, Yeung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6] pstore/ram: Add ramoops support for the Flattened Device Tree.
On Mon, Apr 08, 2013 at 12:54:01PM -0700, Bryan Freed wrote: [...] > And as a more general question, why should we try not to put > configuration in the device tree? It seems like a great (and > portable) place to put this stuff. > It certainly seems better to have it there than hardwired in the > kernel or tacked onto the kernel command line. But then we have two in-kernel APIs to pass kernel parameters? So we'll have to maintain two ways of passing the options for each driver. That is hardly a good solution. If you would like to see a convenient way to pass kernel/module options via the device tree, I would suggest implementing something like this: chosen { kernel-options { linux,pstore.record-size = 123; linux,foo = "bar"; }; }; And then let the kernel translate all these to module_param_*(). I am still not sure about placing the options along with devices layout, but if we go this route, then that is also viable: pstore-node { linux,pstore.record-size = 123; }; And translate "linux,*" this to module_param_*(). How does that sound? Thanks, Anton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] perf fixes
Linus, Please pull the latest perf-urgent-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-urgent-for-linus HEAD: c481420248c6730246d2a1b1773d5d7007ae0835 perf: Fix error return code Misc fixlets. Thanks, Ingo --> Chen Gang (3): perf: Fix strncpy() use, always make sure it's NUL terminated perf: Fix strncpy() use, use strlcpy() instead of strncpy() ftrace: Fix strncpy() use, use strlcpy() instead of strncpy() Stephane Eranian (2): perf/x86: Fix uninitialized pt_regs in intel_pmu_drain_bts_buffer() perf: Fix ring_buffer perf_output_space() boundary calculation Wei Yongjun (1): perf: Fix error return code arch/x86/kernel/cpu/perf_event_intel_ds.c | 3 ++- kernel/events/core.c | 4 +++- kernel/events/internal.h | 2 +- kernel/events/ring_buffer.c | 22 ++ kernel/trace/ftrace.c | 4 ++-- kernel/trace/trace.c | 4 ++-- 6 files changed, 28 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c index 826054a..f71c9f0 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c @@ -314,10 +314,11 @@ int intel_pmu_drain_bts_buffer(void) if (top <= at) return 0; + memset(®s, 0, sizeof(regs)); + ds->bts_index = ds->bts_buffer_base; perf_sample_data_init(&data, 0, event->hw.last_period); - regs.ip = 0; /* * Prepare a generic sample, i.e. fill in the invariant fields. diff --git a/kernel/events/core.c b/kernel/events/core.c index 59412d0..7e0962e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4737,7 +4737,8 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event) } else { if (arch_vma_name(mmap_event->vma)) { name = strncpy(tmp, arch_vma_name(mmap_event->vma), - sizeof(tmp)); + sizeof(tmp) - 1); + tmp[sizeof(tmp) - 1] = '\0'; goto got_name; } @@ -5986,6 +5987,7 @@ skip_type: if (pmu->pmu_cpu_context) goto got_cpu_context; + ret = -ENOMEM; pmu->pmu_cpu_context = alloc_percpu(struct perf_cpu_context); if (!pmu->pmu_cpu_context) goto free_dev; diff --git a/kernel/events/internal.h b/kernel/events/internal.h index d56a64c..eb675c4 100644 --- a/kernel/events/internal.h +++ b/kernel/events/internal.h @@ -16,7 +16,7 @@ struct ring_buffer { int page_order; /* allocation order */ #endif int nr_pages; /* nr of data pages */ - int writable; /* are we writable */ + int overwrite; /* can overwrite itself */ atomic_tpoll; /* POLL_ for wakeups */ diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 23cb34f..97fddb0 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -18,12 +18,24 @@ static bool perf_output_space(struct ring_buffer *rb, unsigned long tail, unsigned long offset, unsigned long head) { - unsigned long mask; + unsigned long sz = perf_data_size(rb); + unsigned long mask = sz - 1; - if (!rb->writable) + /* +* check if user-writable +* overwrite : over-write its own tail +* !overwrite: buffer possibly drops events. +*/ + if (rb->overwrite) return true; - mask = perf_data_size(rb) - 1; + /* +* verify that payload is not bigger than buffer +* otherwise masking logic may fail to detect +* the "not enough space" condition +*/ + if ((head - offset) > sz) + return false; offset = (offset - tail) & mask; head = (head - tail) & mask; @@ -212,7 +224,9 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags) rb->watermark = max_size / 2; if (flags & RING_BUFFER_WRITABLE) - rb->writable = 1; + rb->overwrite = 0; + else + rb->overwrite = 1; atomic_set(&rb->refcount, 1); diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 6893d5a..db14374 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -3441,14 +3441,14 @@ static char ftrace_filter_buf[FTRACE_FILTER_SIZE] __initdata; static int __init set_ftrace_notrace(char *str) { - strncpy(ftrace_notrace_buf, str, FTRACE_FILTER_SIZE); + strlcpy(ftrace_notrace_buf, str, FTRACE_FILTER_SIZE);
[GIT PULL] scheduler fixes
Linus, Please pull the latest sched-urgent-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-urgent-for-linus HEAD: e614b3332a4f3f264a26da28e5a1f4cc3aea3974 sched/cputime: Fix accounting on multi-threaded processes Misc fixlets. Thanks, Ingo --> Stanislaw Gruszka (1): sched/cputime: Fix accounting on multi-threaded processes Tejun Heo (1): sched: Convert BUG_ON()s in try_to_wake_up_local() to WARN_ON_ONCE()s Thomas Gleixner (1): sched_clock: Prevent 64bit inatomicity on 32bit systems libin (1): sched/debug: Fix sd->*_idx limit range avoiding overflow kernel/sched/clock.c | 26 ++ kernel/sched/core.c| 8 +--- kernel/sched/cputime.c | 2 +- 3 files changed, 32 insertions(+), 4 deletions(-) diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index c685e31..c3ae144 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -176,10 +176,36 @@ static u64 sched_clock_remote(struct sched_clock_data *scd) u64 this_clock, remote_clock; u64 *ptr, old_val, val; +#if BITS_PER_LONG != 64 +again: + /* +* Careful here: The local and the remote clock values need to +* be read out atomic as we need to compare the values and +* then update either the local or the remote side. So the +* cmpxchg64 below only protects one readout. +* +* We must reread via sched_clock_local() in the retry case on +* 32bit as an NMI could use sched_clock_local() via the +* tracer and hit between the readout of +* the low32bit and the high 32bit portion. +*/ + this_clock = sched_clock_local(my_scd); + /* +* We must enforce atomic readout on 32bit, otherwise the +* update on the remote cpu can hit inbetween the readout of +* the low32bit and the high 32bit portion. +*/ + remote_clock = cmpxchg64(&scd->clock, 0, 0); +#else + /* +* On 64bit the read of [my]scd->clock is atomic versus the +* update, so we can avoid the above 32bit dance. +*/ sched_clock_local(my_scd); again: this_clock = my_scd->clock; remote_clock = scd->clock; +#endif /* * Use the opportunity that we have both locks diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b7b03cd..fa07792 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1488,8 +1488,10 @@ static void try_to_wake_up_local(struct task_struct *p) { struct rq *rq = task_rq(p); - BUG_ON(rq != this_rq()); - BUG_ON(p == current); + if (WARN_ON_ONCE(rq != this_rq()) || + WARN_ON_ONCE(p == current)) + return; + lockdep_assert_held(&rq->lock); if (!raw_spin_trylock(&p->pi_lock)) { @@ -4931,7 +4933,7 @@ static void sd_free_ctl_entry(struct ctl_table **tablep) } static int min_load_idx = 0; -static int max_load_idx = CPU_LOAD_IDX_MAX; +static int max_load_idx = CPU_LOAD_IDX_MAX-1; static void set_table_entry(struct ctl_table *entry, diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index ed12cbb..e93cca9 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -310,7 +310,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) t = tsk; do { - task_cputime(tsk, &utime, &stime); + task_cputime(t, &utime, &stime); times->utime += utime; times->stime += stime; times->sum_exec_runtime += task_sched_runtime(t); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] x86 fixes
Linus, Please pull the latest x86-urgent-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-urgent-for-linus HEAD: 26564600c9e88c6572a5e6ef5ae9121907edfb7f x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set Misc fixes. Thanks, Ingo --> Andrea Arcangeli (2): x86/mm/cpa: Convert noop to functional fix x86/mm/cpa/selftest: Fix false positive in CPA self test Boris Ostrovsky (2): x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set Samu Kallio (1): x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates arch/x86/include/asm/paravirt.h | 5 - arch/x86/include/asm/paravirt_types.h | 2 ++ arch/x86/kernel/paravirt.c| 25 + arch/x86/lguest/boot.c| 1 + arch/x86/mm/fault.c | 6 -- arch/x86/mm/pageattr-test.c | 2 +- arch/x86/mm/pageattr.c| 12 +++- arch/x86/xen/mmu.c| 1 + 8 files changed, 33 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 5edd174..7361e47 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -703,7 +703,10 @@ static inline void arch_leave_lazy_mmu_mode(void) PVOP_VCALL0(pv_mmu_ops.lazy_mode.leave); } -void arch_flush_lazy_mmu_mode(void); +static inline void arch_flush_lazy_mmu_mode(void) +{ + PVOP_VCALL0(pv_mmu_ops.lazy_mode.flush); +} static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, phys_addr_t phys, pgprot_t flags) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 142236e..b3b0ec1 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -91,6 +91,7 @@ struct pv_lazy_ops { /* Set deferred update mode, used for batching operations. */ void (*enter)(void); void (*leave)(void); + void (*flush)(void); }; struct pv_time_ops { @@ -679,6 +680,7 @@ void paravirt_end_context_switch(struct task_struct *next); void paravirt_enter_lazy_mmu(void); void paravirt_leave_lazy_mmu(void); +void paravirt_flush_lazy_mmu(void); void _paravirt_nop(void); u32 _paravirt_ident_32(u32); diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 17fff18..8bfb335 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -263,6 +263,18 @@ void paravirt_leave_lazy_mmu(void) leave_lazy(PARAVIRT_LAZY_MMU); } +void paravirt_flush_lazy_mmu(void) +{ + preempt_disable(); + + if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) { + arch_leave_lazy_mmu_mode(); + arch_enter_lazy_mmu_mode(); + } + + preempt_enable(); +} + void paravirt_start_context_switch(struct task_struct *prev) { BUG_ON(preemptible()); @@ -292,18 +304,6 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void) return this_cpu_read(paravirt_lazy_mode); } -void arch_flush_lazy_mmu_mode(void) -{ - preempt_disable(); - - if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) { - arch_leave_lazy_mmu_mode(); - arch_enter_lazy_mmu_mode(); - } - - preempt_enable(); -} - struct pv_info pv_info = { .name = "bare hardware", .paravirt_enabled = 0, @@ -475,6 +475,7 @@ struct pv_mmu_ops pv_mmu_ops = { .lazy_mode = { .enter = paravirt_nop, .leave = paravirt_nop, + .flush = paravirt_nop, }, .set_fixmap = native_set_fixmap, diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c index 1cbd89c..7114c63 100644 --- a/arch/x86/lguest/boot.c +++ b/arch/x86/lguest/boot.c @@ -1334,6 +1334,7 @@ __init void lguest_init(void) pv_mmu_ops.read_cr3 = lguest_read_cr3; pv_mmu_ops.lazy_mode.enter = paravirt_enter_lazy_mmu; pv_mmu_ops.lazy_mode.leave = lguest_leave_lazy_mmu_mode; + pv_mmu_ops.lazy_mode.flush = paravirt_flush_lazy_mmu; pv_mmu_ops.pte_update = lguest_pte_update; pv_mmu_ops.pte_update_defer = lguest_pte_update; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2b97525..0e88336 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -378,10 +378,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address) if (pgd_none(*pgd_ref)) return -1; - if (pgd_none(*pgd)) + if (pgd_none(*pgd)) { set_pgd(pgd, *pgd_ref); - else + arch_flush_lazy_mmu_mode(); + } else { BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref)); + } /* * Below here mismatches are bugs because these lower tables diff --git a/arch/x86/mm/pageat
Re: [patch v7 0/21] sched: power aware scheduling
On Sun, Apr 14, 2013 at 09:28:50AM +0800, Alex Shi wrote: > Even some scenario the total energy cost more, at least the avg watts > dropped in that scenarios. Ok, what's wrong with x = 32 then? So basically if you're looking at avg watts, you don't want to have more than 16 threads, otherwise powersaving sucks on that particular uarch and platform. Can you say that for all platforms out there? Also, I've added in the columns below the Energy = Power * Time thing. And the funny thing is, exactly there where avg watts is better in powersaving, energy for workload retire is worse. And the other way around. Basically, avg watts vs retire energy is reciprocal. Great :-\. > Len said he has low p-state which can work there. but that's is > different. I had sent some data in another email list to show the > difference: > > The following is 2 times kbuild testing result for 3 kinds condiation on > SNB EP box, the middle column is the lowest p-state testing result, we > can see, it has the lowest power consumption, also has the lowest > performance/watts value. > At least for kbuild benchmark, powersaving policy has the best > compromise on powersaving and power efficient. Further more, due to cpu > boost feature, it has better performance in some scenarios. > >powersaving + ondemand userspace + fixed 1.2GHz performance+ondemand > x = 8231.318 /75 57 165.063 /166 36253.552 /63 62 > x = 16 280.357 /49 72 174.408 /106 54296.776 /41 82 > x = 32 325.206 /34 90 178.675 /90 62 314.153 /37 86 > > x = 8233.623 /74 57 164.507 /168 36254.775 /65 60 > x = 16 272.54 /38 96 174.364 /106 54297.731 /42 79 > x = 32 320.758 /34 91 177.917 /91 61 317.875 /35 89 > x = 64 326.837 /33 92 179.037 /90 62 320.615 /36 86 17348.850 27400.458 15973.776 13737.493 18487.248 12167.816 11057.004 16080.750 11623.661 17288.102 27637.176 16560.375 10356.5218482.584 12504.702 10905.772 16190.447 11125.625 10785.621 16113.330 11542.140 -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:perf/urgent] perf: Fix error return code
Commit-ID: c481420248c6730246d2a1b1773d5d7007ae0835 Gitweb: http://git.kernel.org/tip/c481420248c6730246d2a1b1773d5d7007ae0835 Author: Wei Yongjun AuthorDate: Fri, 12 Apr 2013 11:05:54 +0800 Committer: Ingo Molnar CommitDate: Fri, 12 Apr 2013 06:33:56 +0200 perf: Fix error return code Fix to return -ENOMEM in the allocation error case instead of 0 (if pmu_bus_running == 1), as done elsewhere in this function. Signed-off-by: Wei Yongjun Cc: a.p.zijls...@chello.nl Cc: pau...@samba.org Cc: a...@ghostprotocols.net Link: http://lkml.kernel.org/r/capglhd8j_fwcgqe%3dklwjpbj%2b%3do0pw6z-seq%3dntpu08c2w1t...@mail.gmail.com [ Tweaked the error code setting placement and the changelog. ] Signed-off-by: Ingo Molnar --- kernel/events/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index 7f0d67e..7e0962e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5987,6 +5987,7 @@ skip_type: if (pmu->pmu_cpu_context) goto got_cpu_context; + ret = -ENOMEM; pmu->pmu_cpu_context = alloc_percpu(struct perf_cpu_context); if (!pmu->pmu_cpu_context) goto free_dev; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/urgent] x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set
Commit-ID: 26564600c9e88c6572a5e6ef5ae9121907edfb7f Gitweb: http://git.kernel.org/tip/26564600c9e88c6572a5e6ef5ae9121907edfb7f Author: Boris Ostrovsky AuthorDate: Thu, 11 Apr 2013 13:59:52 -0400 Committer: Ingo Molnar CommitDate: Fri, 12 Apr 2013 07:19:19 +0200 x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set When CONFIG_DEBUG_PAGEALLOC is set page table updates made by kernel_map_pages() are not made visible (via TLB flush) immediately if lazy MMU is on. In environments that support lazy MMU (e.g. Xen) this may lead to fatal page faults, for example, when zap_pte_range() needs to allocate pages in __tlb_remove_page() -> tlb_next_batch(). Signed-off-by: Boris Ostrovsky Cc: konrad.w...@oracle.com Link: http://lkml.kernel.org/r/1365703192-2089-1-git-send-email-boris.ostrov...@oracle.com Signed-off-by: Ingo Molnar --- arch/x86/mm/pageattr.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 7896f71..fb4e73e 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -1413,6 +1413,8 @@ void kernel_map_pages(struct page *page, int numpages, int enable) * but that can deadlock->flush only current cpu: */ __flush_tlb_all(); + + arch_flush_lazy_mmu_mode(); } #ifdef CONFIG_HIBERNATION -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/urgent] x86/mm/cpa/selftest: Fix false positive in CPA self test
Commit-ID: 18699739b60cb60230153ff5475b2ba92be185f9 Gitweb: http://git.kernel.org/tip/18699739b60cb60230153ff5475b2ba92be185f9 Author: Andrea Arcangeli AuthorDate: Thu, 11 Apr 2013 15:36:09 +0200 Committer: Ingo Molnar CommitDate: Fri, 12 Apr 2013 06:39:20 +0200 x86/mm/cpa/selftest: Fix false positive in CPA self test If the pmd is not present, _PAGE_PSE will not be set anymore. Fix the false positive. Reported-by: Ingo Molnar Signed-off-by: Andrea Arcangeli Cc: Stefan Bader Cc: Andy Whitcroft Cc: Mel Gorman Cc: Borislav Petkov Link: http://lkml.kernel.org/r/1365687369-30802-1-git-send-email-aarca...@redhat.com Signed-off-by: Ingo Molnar --- arch/x86/mm/pageattr-test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/pageattr-test.c b/arch/x86/mm/pageattr-test.c index b008656..0e38951 100644 --- a/arch/x86/mm/pageattr-test.c +++ b/arch/x86/mm/pageattr-test.c @@ -68,7 +68,7 @@ static int print_split(struct split_state *s) s->gpg++; i += GPS/PAGE_SIZE; } else if (level == PG_LEVEL_2M) { - if (!(pte_val(*pte) & _PAGE_PSE)) { + if ((pte_val(*pte) & _PAGE_PRESENT) && !(pte_val(*pte) & _PAGE_PSE)) { printk(KERN_ERR "%lx level %d but not PSE %Lx\n", addr, level, (u64)pte_val(*pte)); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] ptrace/x86: Revert "hw_breakpoints: Fix racy access to ptrace breakpoints"
This reverts commit 87dc669ba25777b67796d7262c569429e58b1ed4. The patch was fine but we can no longer race with SIGKILL after 9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL", the __TASK_TRACED tracee can't be woken up and ->ptrace_bps[] can't go away. The patch only removes ptrace_get_breakpoints/ptrace_put_breakpoints and does a couple of "while at it" cleanups, it doesn't remove other changes from the reverted commit. Signed-off-by: Oleg Nesterov --- arch/x86/kernel/ptrace.c | 28 +--- 1 files changed, 5 insertions(+), 23 deletions(-) diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index 29a8120..7a98b21 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -641,9 +641,6 @@ static int ptrace_write_dr7(struct task_struct *tsk, unsigned long data) unsigned len, type; struct perf_event *bp; - if (ptrace_get_breakpoints(tsk) < 0) - return -ESRCH; - data &= ~DR_CONTROL_RESERVED; old_dr7 = ptrace_get_dr7(thread->ptrace_bps); restore: @@ -692,9 +689,7 @@ restore: goto restore; } - ptrace_put_breakpoints(tsk); - - return ((orig_ret < 0) ? orig_ret : rc); + return orig_ret < 0 ? orig_ret : rc; } /* @@ -706,18 +701,10 @@ static unsigned long ptrace_get_debugreg(struct task_struct *tsk, int n) unsigned long val = 0; if (n < HBP_NUM) { - struct perf_event *bp; + struct perf_event *bp = thread->ptrace_bps[n]; - if (ptrace_get_breakpoints(tsk) < 0) - return -ESRCH; - - bp = thread->ptrace_bps[n]; - if (!bp) - val = 0; - else + if (bp) val = bp->hw.info.address; - - ptrace_put_breakpoints(tsk); } else if (n == 6) { val = thread->debugreg6; } else if (n == 7) { @@ -734,9 +721,6 @@ static int ptrace_set_breakpoint_addr(struct task_struct *tsk, int nr, struct perf_event_attr attr; int err = 0; - if (ptrace_get_breakpoints(tsk) < 0) - return -ESRCH; - if (!t->ptrace_bps[nr]) { ptrace_breakpoint_init(&attr); /* @@ -762,7 +746,7 @@ static int ptrace_set_breakpoint_addr(struct task_struct *tsk, int nr, */ if (IS_ERR(bp)) { err = PTR_ERR(bp); - goto put; + goto out; } t->ptrace_bps[nr] = bp; @@ -773,9 +757,7 @@ static int ptrace_set_breakpoint_addr(struct task_struct *tsk, int nr, attr.bp_addr = addr; err = modify_user_hw_breakpoint(bp, &attr); } - -put: - ptrace_put_breakpoints(tsk); +out: return err; } -- 1.5.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/5] ptrace/arm: Revert "hw_breakpoints: Fix racy access to ptrace breakpoints"
This reverts commit bf0b8f4b55e591ba417c2dbaff42769e1fc773b0. The patch was fine but we can no longer race with SIGKILL after 9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL", the __TASK_TRACED tracee can't be woken up and ->ptrace_bps[] can't go away. Signed-off-by: Oleg Nesterov Cc: Russell King Cc: Will Deacon --- arch/arm/kernel/ptrace.c |8 1 files changed, 0 insertions(+), 8 deletions(-) diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c index 03deeff..41668e5 100644 --- a/arch/arm/kernel/ptrace.c +++ b/arch/arm/kernel/ptrace.c @@ -886,20 +886,12 @@ long arch_ptrace(struct task_struct *child, long request, #ifdef CONFIG_HAVE_HW_BREAKPOINT case PTRACE_GETHBPREGS: - if (ptrace_get_breakpoints(child) < 0) - return -ESRCH; - ret = ptrace_gethbpregs(child, addr, (unsigned long __user *)data); - ptrace_put_breakpoints(child); break; case PTRACE_SETHBPREGS: - if (ptrace_get_breakpoints(child) < 0) - return -ESRCH; - ret = ptrace_sethbpregs(child, addr, (unsigned long __user *)data); - ptrace_put_breakpoints(child); break; #endif -- 1.5.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/5] ptrace/sh: Revert "hw_breakpoints: Fix racy access to ptrace breakpoints"
This reverts commit e0ac8457d020c0289ea566917267da9e5e6d9865. The patch was fine but we can no longer race with SIGKILL after 9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL", the __TASK_TRACED tracee can't be woken up and ->ptrace_bps[] can't go away. Signed-off-by: Oleg Nesterov Cc: Paul Mundt --- arch/sh/kernel/ptrace_32.c |4 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c index 81f999a..668c816 100644 --- a/arch/sh/kernel/ptrace_32.c +++ b/arch/sh/kernel/ptrace_32.c @@ -117,11 +117,7 @@ void user_enable_single_step(struct task_struct *child) set_tsk_thread_flag(child, TIF_SINGLESTEP); - if (ptrace_get_breakpoints(child) < 0) - return; - set_single_step(child, pc); - ptrace_put_breakpoints(child); } void user_disable_single_step(struct task_struct *child) -- 1.5.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5] ptrace: Revert "Prepare to fix racy accesses on task breakpoints"
This reverts commit bf26c018490c2fce7fe9b629083b96ce0e6ad019. The patch was fine but we can no longer race with SIGKILL after 9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL", the __TASK_TRACED tracee can't be woken up and ->ptrace_bps[] can't go away. Now that ptrace_get_breakpoints/ptrace_put_breakpoints have no callers, we can kill them and remove task->ptrace_bp_refcnt. Signed-off-by: Oleg Nesterov --- include/linux/ptrace.h | 10 -- include/linux/sched.h |3 --- kernel/exit.c |2 +- kernel/ptrace.c| 16 4 files changed, 1 insertions(+), 30 deletions(-) diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index 89573a3..07d0df6 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -142,9 +142,6 @@ static inline void ptrace_init_task(struct task_struct *child, bool ptrace) { INIT_LIST_HEAD(&child->ptrace_entry); INIT_LIST_HEAD(&child->ptraced); -#ifdef CONFIG_HAVE_HW_BREAKPOINT - atomic_set(&child->ptrace_bp_refcnt, 1); -#endif child->jobctl = 0; child->ptrace = 0; child->parent = child->real_parent; @@ -351,11 +348,4 @@ extern int task_current_syscall(struct task_struct *target, long *callno, unsigned long args[6], unsigned int maxargs, unsigned long *sp, unsigned long *pc); -#ifdef CONFIG_HAVE_HW_BREAKPOINT -extern int ptrace_get_breakpoints(struct task_struct *tsk); -extern void ptrace_put_breakpoints(struct task_struct *tsk); -#else -static inline void ptrace_put_breakpoints(struct task_struct *tsk) { } -#endif /* CONFIG_HAVE_HW_BREAKPOINT */ - #endif diff --git a/include/linux/sched.h b/include/linux/sched.h index d35d2b6..89dc3e4 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1570,9 +1570,6 @@ struct task_struct { } memcg_batch; unsigned int memcg_kmem_skip_account; #endif -#ifdef CONFIG_HAVE_HW_BREAKPOINT - atomic_t ptrace_bp_refcnt; -#endif #ifdef CONFIG_UPROBES struct uprobe_task *utask; #endif diff --git a/kernel/exit.c b/kernel/exit.c index 60bc027..0a66f6d 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -819,7 +819,7 @@ void do_exit(long code) /* * FIXME: do that only when needed, using sched_exit tracepoint */ - ptrace_put_breakpoints(tsk); + flush_ptrace_hw_breakpoint(tsk); exit_notify(tsk, group_dead); #ifdef CONFIG_NUMA diff --git a/kernel/ptrace.c b/kernel/ptrace.c index acbd284..776ab3b 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -1098,19 +1098,3 @@ asmlinkage long compat_sys_ptrace(compat_long_t request, compat_long_t pid, return ret; } #endif /* CONFIG_COMPAT */ - -#ifdef CONFIG_HAVE_HW_BREAKPOINT -int ptrace_get_breakpoints(struct task_struct *tsk) -{ - if (atomic_inc_not_zero(&tsk->ptrace_bp_refcnt)) - return 0; - - return -1; -} - -void ptrace_put_breakpoints(struct task_struct *tsk) -{ - if (atomic_dec_and_test(&tsk->ptrace_bp_refcnt)) - flush_ptrace_hw_breakpoint(tsk); -} -#endif /* CONFIG_HAVE_HW_BREAKPOINT */ -- 1.5.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/5] kill ptrace_{get,put}_breakpoints()
Hello. Kill ptrace_{get,put}_breakpoints and task_struct->ptrace_bp_refcnt, 9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL" made this all unneeded. Benjamin, Paul, arch_dup_task_struct()->flush_ptrace_hw_breakpoint(src) on powerpc looks "obviously wrong". Don't we need - flush_ptrace_hw_breakpoint(src); + dst->thread->ptrace_bps[0] = NULL; ? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched: Document task_struct::personality field
Commit-ID: 9b89f6ba2ab56e4d9c00e7e591d6bc333137895e Gitweb: http://git.kernel.org/tip/9b89f6ba2ab56e4d9c00e7e591d6bc333137895e Author: Andrei Epure AuthorDate: Thu, 11 Apr 2013 20:30:29 +0300 Committer: Ingo Molnar CommitDate: Fri, 12 Apr 2013 07:20:27 +0200 sched: Document task_struct::personality field Signed-off-by: Andrei Epure Cc: Linus Torvalds Cc: Andrew Morton Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1365701429-4721-1-git-send-email-epure.and...@gmail.com Signed-off-by: Ingo Molnar --- include/linux/sched.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 9004f6e..6bdaa73 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1105,8 +1105,10 @@ struct task_struct { int exit_code, exit_signal; int pdeath_signal; /* The signal sent when the parent dies */ unsigned int jobctl;/* JOBCTL_*, siglock protected */ - /* ??? */ + + /* Used for emulating ABI behavior of previous Linux versions */ unsigned int personality; + unsigned did_exec:1; unsigned in_execve:1; /* Tell the LSMs that the process is doing an * execve */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/5] ptrace/powerpc: Revert "hw_breakpoints: Fix racy access to ptrace breakpoints"
This reverts commit 07fa7a0a8a586c01a8b416358c7012dcb9dc688d and removes ptrace_get/put_breakpoints() added by other commits. The patch was fine but we can no longer race with SIGKILL after 9899d11f "ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL", the __TASK_TRACED tracee can't be woken up and ->ptrace_bps[] can't go away. Signed-off-by: Oleg Nesterov Cc: Benjamin Herrenschmidt Cc: Paul Mackerras --- arch/powerpc/kernel/ptrace.c | 20 1 files changed, 0 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index f9b30c6..d278e43 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -969,16 +969,12 @@ int ptrace_set_debugreg(struct task_struct *task, unsigned long addr, hw_brk.type = (data & HW_BRK_TYPE_DABR) | HW_BRK_TYPE_PRIV_ALL; hw_brk.len = 8; #ifdef CONFIG_HAVE_HW_BREAKPOINT - if (ptrace_get_breakpoints(task) < 0) - return -ESRCH; - bp = thread->ptrace_bps[0]; if ((!data) || !(hw_brk.type & HW_BRK_TYPE_RDWR)) { if (bp) { unregister_hw_breakpoint(bp); thread->ptrace_bps[0] = NULL; } - ptrace_put_breakpoints(task); return 0; } if (bp) { @@ -991,11 +987,9 @@ int ptrace_set_debugreg(struct task_struct *task, unsigned long addr, ret = modify_user_hw_breakpoint(bp, &attr); if (ret) { - ptrace_put_breakpoints(task); return ret; } thread->ptrace_bps[0] = bp; - ptrace_put_breakpoints(task); thread->hw_brk = hw_brk; return 0; } @@ -1010,12 +1004,9 @@ int ptrace_set_debugreg(struct task_struct *task, unsigned long addr, ptrace_triggered, NULL, task); if (IS_ERR(bp)) { thread->ptrace_bps[0] = NULL; - ptrace_put_breakpoints(task); return PTR_ERR(bp); } - ptrace_put_breakpoints(task); - #endif /* CONFIG_HAVE_HW_BREAKPOINT */ task->thread.hw_brk = hw_brk; #else /* CONFIG_PPC_ADV_DEBUG_REGS */ @@ -1434,9 +1425,6 @@ static long ppc_set_hwdebug(struct task_struct *child, if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_WRITE) brk.type |= HW_BRK_TYPE_WRITE; #ifdef CONFIG_HAVE_HW_BREAKPOINT - if (ptrace_get_breakpoints(child) < 0) - return -ESRCH; - /* * Check if the request is for 'range' breakpoints. We can * support it if range < 8 bytes. @@ -1444,12 +1432,10 @@ static long ppc_set_hwdebug(struct task_struct *child, if (bp_info->addr_mode == PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE) { len = bp_info->addr2 - bp_info->addr; } else if (bp_info->addr_mode != PPC_BREAKPOINT_MODE_EXACT) { - ptrace_put_breakpoints(child); return -EINVAL; } bp = thread->ptrace_bps[0]; if (bp) { - ptrace_put_breakpoints(child); return -ENOSPC; } @@ -1463,11 +1449,9 @@ static long ppc_set_hwdebug(struct task_struct *child, ptrace_triggered, NULL, child); if (IS_ERR(bp)) { thread->ptrace_bps[0] = NULL; - ptrace_put_breakpoints(child); return PTR_ERR(bp); } - ptrace_put_breakpoints(child); return 1; #endif /* CONFIG_HAVE_HW_BREAKPOINT */ @@ -1511,16 +1495,12 @@ static long ppc_del_hwdebug(struct task_struct *child, long data) return -EINVAL; #ifdef CONFIG_HAVE_HW_BREAKPOINT - if (ptrace_get_breakpoints(child) < 0) - return -ESRCH; - bp = thread->ptrace_bps[0]; if (bp) { unregister_hw_breakpoint(bp); thread->ptrace_bps[0] = NULL; } else ret = -ENOENT; - ptrace_put_breakpoints(child); return ret; #else /* CONFIG_HAVE_HW_BREAKPOINT */ if (child->thread.hw_brk.address == 0) -- 1.5.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/mm] x86/mm/fixmap: Remove unused FIX_CYCLONE_TIMER
Commit-ID: a7e6567585e513cb4e44387831cb75eb5b562cbb Gitweb: http://git.kernel.org/tip/a7e6567585e513cb4e44387831cb75eb5b562cbb Author: Paul Bolle AuthorDate: Thu, 11 Apr 2013 18:49:42 +0200 Committer: Ingo Molnar CommitDate: Fri, 12 Apr 2013 07:21:18 +0200 x86/mm/fixmap: Remove unused FIX_CYCLONE_TIMER The last users of FIX_CYCLONE_TIMER were removed in v2.6.18. We can remove this unneeded constant. Signed-off-by: Paul Bolle Link: http://lkml.kernel.org/r/1365698982.1427.3.camel@x61.thuisdomein Signed-off-by: Ingo Molnar --- arch/x86/include/asm/fixmap.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index a09c285..83ea2fb 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -107,9 +107,6 @@ enum fixed_addresses { #ifdef CONFIG_X86_F00F_BUG FIX_F00F_IDT, /* Virtual mapping for IDT */ #endif -#ifdef CONFIG_X86_CYCLONE_TIMER - FIX_CYCLONE_TIMER, /*cyclone timer register*/ -#endif #ifdef CONFIG_X86_32 FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */ FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] algif_skcipher: Avoid crash if buffer is not multiple of cipher block size
When user requests encryption (or decryption) of block which is not aligned to cipher block size through userspace crypto interface, an OOps like this can happen: [ 112.738285] BUG: unable to handle kernel paging request at e1c44840 [ 112.738407] IP: [] scatterwalk_done+0x53/0x70 ... [ 112.740515] Call Trace: [ 112.740588] [] blkcipher_walk_done+0x160/0x1e0 [ 112.740663] [] blkcipher_walk_next+0x318/0x3c0 [ 112.740737] [] blkcipher_walk_first+0x70/0x160 [ 112.740811] [] blkcipher_walk_virt+0x17/0x20 [ 112.740886] [] cbc_encrypt+0x29/0x100 [aesni_intel] [ 112.740968] [] ? get_user_pages_fast+0x123/0x150 [ 112.741046] [] ? trace_hardirqs_on+0xb/0x10 [ 112.741119] [] __ablk_encrypt+0x39/0x40 [ablk_helper] [ 112.741198] [] ablk_encrypt+0x1a/0x70 [ablk_helper] [ 112.741275] [] skcipher_recvmsg+0x20c/0x400 [algif_skcipher] [ 112.741359] [] ? sched_clock_cpu+0x11d/0x1a0 [ 112.741435] [] ? find_get_page+0x79/0xc0 [ 112.741509] [] sock_aio_read+0x104/0x140 [ 112.741580] [] ? __do_fault+0x248/0x420 [ 112.741650] [] do_sync_read+0x97/0xd0 [ 112.741719] [] vfs_read+0x11d/0x140 [ 112.741789] [] ? sys_socketcall+0x2a3/0x320 [ 112.741861] [] sys_read+0x42/0x90 [ 112.742578] [] sysenter_do_call+0x12/0x32 Patch fixes it by simply rejecting buffer which is not multiple of cipher block. (Bug is present in all stable kernels as well.) Signed-off-by: Milan Broz --- crypto/algif_skcipher.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index 6a6dfc0..5f7713b 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -463,7 +463,7 @@ static int skcipher_recvmsg(struct kiocb *unused, struct socket *sock, used -= used % bs; err = -EINVAL; - if (!used) + if (!used || used % bs) goto free; ablkcipher_request_set_crypt(&ctx->req, sg, -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] algif_skcipher: Avoid crash if buffer is not multiple of cipher block size
On 04/14/2013 06:12 PM, Milan Broz wrote: > When user requests encryption (or decryption) of block which > is not aligned to cipher block size through userspace crypto > interface, an OOps like this can happen And this is a reproducer for the problem above... Milan /* * Check for unaligned buffer to block cipher size in kernel crypto API * fixed by patch: http://article.gmane.org/gmane.linux.kernel.cryptoapi/7980 * * Compile with gcc test.c -o tst */ #include #include #include #include #include #include #include #include #ifndef AF_ALG #define AF_ALG 38 #endif #ifndef SOL_ALG #define SOL_ALG 279 #endif static int kernel_crypt(int opfd, const char *in, char *out, size_t length, const char *iv, size_t iv_length, uint32_t direction) { int r = 0; ssize_t len; struct af_alg_iv *alg_iv; struct cmsghdr *header; uint32_t *type; struct iovec iov = { .iov_base = (void*)(uintptr_t)in, .iov_len = length, }; int iv_msg_size = iv ? CMSG_SPACE(sizeof(*alg_iv) + iv_length) : 0; char buffer[CMSG_SPACE(sizeof(type)) + iv_msg_size]; struct msghdr msg = { .msg_control = buffer, .msg_controllen = sizeof(buffer), .msg_iov = &iov, .msg_iovlen = 1, }; if (!in || !out || !length) return -EINVAL; if ((!iv && iv_length) || (iv && !iv_length)) return -EINVAL; memset(buffer, 0, sizeof(buffer)); /* Set encrypt/decrypt operation */ header = CMSG_FIRSTHDR(&msg); header->cmsg_level = SOL_ALG; header->cmsg_type = ALG_SET_OP; header->cmsg_len = CMSG_LEN(sizeof(type)); type = (void*)CMSG_DATA(header); *type = direction; /* Set IV */ if (iv) { header = CMSG_NXTHDR(&msg, header); header->cmsg_level = SOL_ALG; header->cmsg_type = ALG_SET_IV; header->cmsg_len = iv_msg_size; alg_iv = (void*)CMSG_DATA(header); alg_iv->ivlen = iv_length; memcpy(alg_iv->iv, iv, iv_length); } len = sendmsg(opfd, &msg, 0); if (len != (ssize_t)length) { r = -EIO; goto bad; } len = read(opfd, out, length); if (len != (ssize_t)length) r = -EIO; bad: memset(buffer, 0, sizeof(buffer)); return r; } int main (int argc, char *argv[]) { const char key[32] = "0123456789abcdef0123456789abcdef"; const char iv[16] = "0001"; struct sockaddr_alg sa = { .salg_family = AF_ALG, .salg_type = "skcipher", .salg_name = "cbc(aes)" }; int tfmfd, opfd; char *data; if (posix_memalign((void*)&data, 4096, 32)) { printf("Cannot alloc memory.\n"); return 1; } tfmfd = socket(AF_ALG, SOCK_SEQPACKET, 0); if (tfmfd == -1) goto bad; if (bind(tfmfd, (struct sockaddr *)&sa, sizeof(sa)) == -1) goto bad; opfd = accept(tfmfd, NULL, 0); if (opfd == -1) goto bad; if (setsockopt(tfmfd, SOL_ALG, ALG_SET_KEY, key, sizeof(key)) == -1) goto bad; if (kernel_crypt(opfd, data, data, 1, iv, sizeof(iv), ALG_OP_ENCRYPT) < 0) printf("Cannot encrypt data.\n"); close(tfmfd); close(opfd); free(data); return 0; bad: printf("Cannot initialise cipher.\n"); return 1; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/8] dmaengine: ste_dma40: Do not configure channels during an channel allocation
2013/4/12 Lee Jones : > So I need to devise another way, as this function cannot be called > here either. Using the dmaengine API, allocating a channel and > configuring it are to be completed using different calls. Using the > API correctly, there is no way the driver can setup the channel > with all of the relevant information during allocation time. > > The steps are as follows: > > dma_request_channel() - here we only allot a channel number and > allocate the appropriate resources for the > channel. > > dma_slave_config()- this is where we're meant to configure the > channel, so d40_config_write() needs to be > called here, as only dma_slave_config() will > carry the information required so as > d40_*_cfg() can make the correct decisions. The choice between whether a physical or a logical channel is used is not something that is configurable via dma_slave_config(). And d40_config_write() only needs that information, and that information is available in dma_request_channel(). Therefore no more information relevant to d40_config_write() will be obtained in dma_slave_config(). Hence d40_config_write() can be called in dma_request_channel(). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/8] workqueue: advance concurrency management
I found the early-increasing nr_running in wq_worker_waking_up() is useless in many cases. it tries to avoid waking up idle workers for pending work item. but delay increasing nr_running does not increase waking up idle workers. so we delay increasing and remove wq_worker_waking_up() and ... enjoy a simpler concurrency management. Lai Jiangshan (8): workqueue: remove @cpu from wq_worker_sleeping() workqueue: use create_and_start_worker() in manage_workers() workqueue: remove cpu_intensive from process_one_work() workqueue: quit cm mode when sleeping workqueue: remove disabled wq_worker_waking_up() workqueue: make nr_running non-atomic workqueue: move worker->flags up workqueue: rename ->nr_running to ->nr_cm_workers kernel/sched/core.c |6 +- kernel/workqueue.c | 234 +++--- kernel/workqueue_internal.h |9 +- 3 files changed, 89 insertions(+), 160 deletions(-) -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/8] workqueue: remove @cpu from wq_worker_sleeping()
WARN_ON_ONCE(cpu != raw_smp_processor_id()) in wq_worker_sleeping() in useless, the caller ensures cpu == raw_smp_processor_id(). We should use WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()) to do the expected test. It results @cpu removed from wq_worker_sleeping() Signed-off-by: Lai Jiangshan --- kernel/sched/core.c |2 +- kernel/workqueue.c |7 +++ kernel/workqueue_internal.h |2 +- 3 files changed, 5 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 23606ee..ffc06ad 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2907,7 +2907,7 @@ need_resched: if (prev->flags & PF_WQ_WORKER) { struct task_struct *to_wakeup; - to_wakeup = wq_worker_sleeping(prev, cpu); + to_wakeup = wq_worker_sleeping(prev); if (to_wakeup) try_to_wake_up_local(to_wakeup); } diff --git a/kernel/workqueue.c b/kernel/workqueue.c index c273376..b3095ad 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -807,7 +807,6 @@ void wq_worker_waking_up(struct task_struct *task, int cpu) /** * wq_worker_sleeping - a worker is going to sleep * @task: task going to sleep - * @cpu: CPU in question, must be the current CPU number * * This function is called during schedule() when a busy worker is * going to sleep. Worker on the same cpu can be woken up by @@ -817,9 +816,9 @@ void wq_worker_waking_up(struct task_struct *task, int cpu) * spin_lock_irq(rq->lock) * * RETURNS: - * Worker task on @cpu to wake up, %NULL if none. + * Worker task on the same pool to wake up, %NULL if none. */ -struct task_struct *wq_worker_sleeping(struct task_struct *task, int cpu) +struct task_struct *wq_worker_sleeping(struct task_struct *task) { struct worker *worker = kthread_data(task), *to_wakeup = NULL; struct worker_pool *pool; @@ -835,7 +834,7 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task, int cpu) pool = worker->pool; /* this can only happen on the local cpu */ - if (WARN_ON_ONCE(cpu != raw_smp_processor_id())) + if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id())) return NULL; /* diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h index 84ab6e1..aec8df4 100644 --- a/kernel/workqueue_internal.h +++ b/kernel/workqueue_internal.h @@ -57,6 +57,6 @@ static inline struct worker *current_wq_worker(void) * sched.c and workqueue.c. */ void wq_worker_waking_up(struct task_struct *task, int cpu); -struct task_struct *wq_worker_sleeping(struct task_struct *task, int cpu); +struct task_struct *wq_worker_sleeping(struct task_struct *task); #endif /* _KERNEL_WORKQUEUE_INTERNAL_H */ -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/8] workqueue: use create_and_start_worker() in manage_workers()
After we allocated worker, we are free to access the worker without and protection before it is visiable/published. In old code, worker is published by start_worker(), and it is visiable only after start_worker(), but in current code, it is visiable by for_each_pool_worker() after "idr_replace(&pool->worker_idr, worker, worker->id);" It means the step of publishing worker is not atomic, it is very fragile. (although I did not find any bug from it in current code). it should be fixed. It can be fixed by moving "idr_replace(&pool->worker_idr, worker, worker->id);" to start_worker() or by folding start_worker() in to create_worker(). I choice the second one. It makes the code much simple. Signed-off-by: Lai Jiangshan --- kernel/workqueue.c | 62 +++ 1 files changed, 18 insertions(+), 44 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index b3095ad..d1e10c5 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -64,7 +64,7 @@ enum { * * Note that DISASSOCIATED should be flipped only while holding * manager_mutex to avoid changing binding state while -* create_worker() is in progress. +* create_and_start_worker_locked() is in progress. */ POOL_MANAGE_WORKERS = 1 << 0, /* need to manage workers */ POOL_DISASSOCIATED = 1 << 2, /* cpu can't serve workers */ @@ -1542,7 +1542,10 @@ static void worker_enter_idle(struct worker *worker) (worker->hentry.next || worker->hentry.pprev))) return; - /* can't use worker_set_flags(), also called from start_worker() */ + /* +* can't use worker_set_flags(), also called from +* create_and_start_worker_locked(). +*/ worker->flags |= WORKER_IDLE; pool->nr_idle++; worker->last_active = jiffies; @@ -1663,12 +1666,10 @@ static struct worker *alloc_worker(void) } /** - * create_worker - create a new workqueue worker + * create_and_start_worker_locked - create and start a worker for a pool * @pool: pool the new worker will belong to * - * Create a new worker which is bound to @pool. The returned worker - * can be started by calling start_worker() or destroyed using - * destroy_worker(). + * Create a new worker which is bound to @pool and start it. * * CONTEXT: * Might sleep. Does GFP_KERNEL allocations. @@ -1676,7 +1677,7 @@ static struct worker *alloc_worker(void) * RETURNS: * Pointer to the newly created worker. */ -static struct worker *create_worker(struct worker_pool *pool) +static struct worker *create_and_start_worker_locked(struct worker_pool *pool) { struct worker *worker = NULL; int id = -1; @@ -1734,9 +1735,15 @@ static struct worker *create_worker(struct worker_pool *pool) if (pool->flags & POOL_DISASSOCIATED) worker->flags |= WORKER_UNBOUND; - /* successful, commit the pointer to idr */ spin_lock_irq(&pool->lock); + /* successful, commit the pointer to idr */ idr_replace(&pool->worker_idr, worker, worker->id); + + /* start worker */ + worker->flags |= WORKER_STARTED; + worker->pool->nr_workers++; + worker_enter_idle(worker); + wake_up_process(worker->task); spin_unlock_irq(&pool->lock); return worker; @@ -1752,23 +1759,6 @@ fail: } /** - * start_worker - start a newly created worker - * @worker: worker to start - * - * Make the pool aware of @worker and start it. - * - * CONTEXT: - * spin_lock_irq(pool->lock). - */ -static void start_worker(struct worker *worker) -{ - worker->flags |= WORKER_STARTED; - worker->pool->nr_workers++; - worker_enter_idle(worker); - wake_up_process(worker->task); -} - -/** * create_and_start_worker - create and start a worker for a pool * @pool: the target pool * @@ -1779,14 +1769,7 @@ static int create_and_start_worker(struct worker_pool *pool) struct worker *worker; mutex_lock(&pool->manager_mutex); - - worker = create_worker(pool); - if (worker) { - spin_lock_irq(&pool->lock); - start_worker(worker); - spin_unlock_irq(&pool->lock); - } - + worker = create_and_start_worker_locked(pool); mutex_unlock(&pool->manager_mutex); return worker ? 0 : -ENOMEM; @@ -1934,17 +1917,8 @@ restart: mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INITIAL_TIMEOUT); while (true) { - struct worker *worker; - - worker = create_worker(pool); - if (worker) { - del_timer_sync(&pool->mayday_timer); - spin_lock_irq(&pool->lock); - start_worker(worker); - if (WARN_ON_ONCE(need_to_create_worker(pool))) - goto restart; - return true; -
[PATCH 3/8] workqueue: remove cpu_intensive from process_one_work()
In process_one_work(), we can use "worker->flags & WORKER_CPU_INTENSIVE" instead "cpu_intensive" and because worker->flags is hot field (accessed when process each work item). so this change will not cause any performance down. It prepare for also clearing WORKER_QUIT_CM in the same place. Signed-off-by: Lai Jiangshan --- kernel/workqueue.c |7 +++ 1 files changed, 3 insertions(+), 4 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index d1e10c5..a4bc589 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2068,7 +2068,6 @@ __acquires(&pool->lock) { struct pool_workqueue *pwq = get_work_pwq(work); struct worker_pool *pool = worker->pool; - bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE; int work_color; struct worker *collision; #ifdef CONFIG_LOCKDEP @@ -2118,7 +2117,7 @@ __acquires(&pool->lock) * CPU intensive works don't participate in concurrency * management. They're the scheduler's responsibility. */ - if (unlikely(cpu_intensive)) + if (unlikely(pwq->wq->flags & WQ_CPU_INTENSIVE)) worker_set_flags(worker, WORKER_CPU_INTENSIVE, true); /* @@ -2161,8 +2160,8 @@ __acquires(&pool->lock) spin_lock_irq(&pool->lock); - /* clear cpu intensive status */ - if (unlikely(cpu_intensive)) + /* clear cpu intensive status if it is set */ + if (unlikely(worker->flags & WORKER_CPU_INTENSIVE)) worker_clr_flags(worker, WORKER_CPU_INTENSIVE); /* we're done with it, release */ -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/8] workqueue: quit cm mode when sleeping
When a work is waken up from sleeping, it makes very small sense if we still consider this worker is RUNNING(in view of concurrency management) o if the work goes to sleep again, it is not RUNNING again. o if the work runs long without sleeping, the worker should be consider as CPU_INTENSIVE. o if the work runs short without sleeping, we can still consider this worker is not RUNNING this harmless short time, and fix it up before next work. o In almost all cases, the increasing nr_running does not increase nr_running from 0. there are other RUNNING workers, the other workers will not goto sleeping very probably before this worker finishes the work in may cases. this early increasing makes less sense. So don't need consider this worker is RUNNING so early and we can delay increasing nr_running a little. we increase it after finished the work. It is done by adding a new worker flag: WORKER_QUIT_CM. it used for disabling increasing nr_running in wq_worker_waking_up(), and for increasing nr_running after finished the work. This change maybe cause we wakeup(or create) more workers in raw case, but this is not incorrect. It make the currency management much more simpler Signed-off-by: Lai Jiangshan --- kernel/workqueue.c | 20 ++-- kernel/workqueue_internal.h |2 +- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index a4bc589..668e9b7 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -75,11 +75,13 @@ enum { WORKER_DIE = 1 << 1, /* die die die */ WORKER_IDLE = 1 << 2, /* is idle */ WORKER_PREP = 1 << 3, /* preparing to run works */ + WORKER_QUIT_CM = 1 << 4, /* quit concurrency managed */ WORKER_CPU_INTENSIVE= 1 << 6, /* cpu intensive */ WORKER_UNBOUND = 1 << 7, /* worker is unbound */ WORKER_REBOUND = 1 << 8, /* worker was rebound */ - WORKER_NOT_RUNNING = WORKER_PREP | WORKER_CPU_INTENSIVE | + WORKER_NOT_RUNNING = WORKER_PREP | WORKER_QUIT_CM | + WORKER_CPU_INTENSIVE | WORKER_UNBOUND | WORKER_REBOUND, NR_STD_WORKER_POOLS = 2,/* # standard pools per cpu */ @@ -122,6 +124,10 @@ enum { *cpu or grabbing pool->lock is enough for read access. If *POOL_DISASSOCIATED is set, it's identical to L. * + * LI: If POOL_DISASSOCIATED is NOT set, read/modification access should be + * done with local IRQ-disabled and only from local cpu. + * If POOL_DISASSOCIATED is set, it's identical to L. + * * MG: pool->manager_mutex and pool->lock protected. Writes require both * locks. Reads can happen under either lock. * @@ -843,11 +849,13 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task) * Please read comment there. * * NOT_RUNNING is clear. This means that we're bound to and -* running on the local cpu w/ rq lock held and preemption +* running on the local cpu w/ rq lock held and preemption/irq * disabled, which in turn means that none else could be * manipulating idle_list, so dereferencing idle_list without pool -* lock is safe. +* lock is safe. And which in turn also means that we can +* manipulating worker->flags. */ + worker->flags |= WORKER_QUIT_CM; if (atomic_dec_and_test(&pool->nr_running) && !list_empty(&pool->worklist)) to_wakeup = first_worker(pool); @@ -2160,9 +2168,9 @@ __acquires(&pool->lock) spin_lock_irq(&pool->lock); - /* clear cpu intensive status if it is set */ - if (unlikely(worker->flags & WORKER_CPU_INTENSIVE)) - worker_clr_flags(worker, WORKER_CPU_INTENSIVE); + /* clear cpu intensive status or WORKER_QUIT_CM if they are set */ + if (unlikely(worker->flags & (WORKER_CPU_INTENSIVE | WORKER_QUIT_CM))) + worker_clr_flags(worker, WORKER_CPU_INTENSIVE | WORKER_QUIT_CM); /* we're done with it, release */ hash_del(&worker->hentry); diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h index aec8df4..1713ae7 100644 --- a/kernel/workqueue_internal.h +++ b/kernel/workqueue_internal.h @@ -35,7 +35,7 @@ struct worker { /* L: for rescuers */ /* 64 bytes boundary on 64bit, 32 on 32bit */ unsigned long last_active;/* L: last active timestamp */ - unsigned intflags; /* X: flags */ + unsigned intflags; /* LI: flags */ int id; /* I: worker id */ /* used only by rescuers to point to the target workqueue */ --
[PATCH 5/8] workqueue: remove disabled wq_worker_waking_up()
When a worker is sleeping, its flags has WORKER_QUIT_CM, which means worker->flags & WORKER_NOT_RUNNING is always non-zero, and which means wq_worker_waking_up() is disabled. so we removed wq_worker_waking_up(). (the access to worker->flags in wq_worker_waking_up() is not protected by "LI". after this, it is alwasy protected by "LI") The patch also do these changes after removal: 1) because wq_worker_waking_up() is removed, we don't need schedule() before zapping nr_running in wq_unbind_fn(), and don't need to release/regain pool->lock. 2) the sanity check in worker_enter_idle() is changed to also check for unbound/disassociated pools. (because the above change and nr_running is expected always reliable in worker_enter_idle() now.) Signed-off-by: Lai Jiangshan --- kernel/sched/core.c |4 --- kernel/workqueue.c | 58 +++--- kernel/workqueue_internal.h |3 +- 3 files changed, 11 insertions(+), 54 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ffc06ad..18f95884 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1276,10 +1276,6 @@ static void ttwu_activate(struct rq *rq, struct task_struct *p, int en_flags) { activate_task(rq, p, en_flags); p->on_rq = 1; - - /* if a worker is waking up, notify workqueue */ - if (p->flags & PF_WQ_WORKER) - wq_worker_waking_up(p, cpu_of(rq)); } /* diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 668e9b7..9f1ebdf 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -790,27 +790,6 @@ static void wake_up_worker(struct worker_pool *pool) } /** - * wq_worker_waking_up - a worker is waking up - * @task: task waking up - * @cpu: CPU @task is waking up to - * - * This function is called during try_to_wake_up() when a worker is - * being awoken. - * - * CONTEXT: - * spin_lock_irq(rq->lock) - */ -void wq_worker_waking_up(struct task_struct *task, int cpu) -{ - struct worker *worker = kthread_data(task); - - if (!(worker->flags & WORKER_NOT_RUNNING)) { - WARN_ON_ONCE(worker->pool->cpu != cpu); - atomic_inc(&worker->pool->nr_running); - } -} - -/** * wq_worker_sleeping - a worker is going to sleep * @task: task going to sleep * @@ -1564,14 +1543,8 @@ static void worker_enter_idle(struct worker *worker) if (too_many_workers(pool) && !timer_pending(&pool->idle_timer)) mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT); - /* -* Sanity check nr_running. Because wq_unbind_fn() releases -* pool->lock between setting %WORKER_UNBOUND and zapping -* nr_running, the warning may trigger spuriously. Check iff -* unbind is not in progress. -*/ - WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) && -pool->nr_workers == pool->nr_idle && + /* Sanity check nr_running. */ + WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && atomic_read(&pool->nr_running)); } @@ -4385,24 +4358,12 @@ static void wq_unbind_fn(struct work_struct *work) pool->flags |= POOL_DISASSOCIATED; - spin_unlock_irq(&pool->lock); - mutex_unlock(&pool->manager_mutex); - /* -* Call schedule() so that we cross rq->lock and thus can -* guarantee sched callbacks see the %WORKER_UNBOUND flag. -* This is necessary as scheduler callbacks may be invoked -* from other cpus. -*/ - schedule(); - - /* -* Sched callbacks are disabled now. Zap nr_running. -* After this, nr_running stays zero and need_more_worker() -* and keep_working() are always true as long as the -* worklist is not empty. This pool now behaves as an -* unbound (in terms of concurrency management) pool which -* are served by workers tied to the pool. +* Zap nr_running. After this, nr_running stays zero +* and need_more_worker() and keep_working() are always true +* as long as the worklist is not empty. This pool now +* behaves as an unbound (in terms of concurrency management) +* pool which are served by workers tied to the pool. */ atomic_set(&pool->nr_running, 0); @@ -4411,9 +4372,9 @@ static void wq_unbind_fn(struct work_struct *work) * worker blocking could lead to lengthy stalls. Kick off * unbound chain execution of currently pending work items. */ - spin_lock_irq(&pool->lock); wake_up_worker(pool); spin_unlock_irq(&pool->lock); + mutex_unlock(&pool->manager_mutex); } } @@ -4466,9 +4427,10 @@ static void
[PATCH 6/8] workqueue: make nr_running non-atomic
Now, nr_running is accessed only with local IRQ-disabled and only from local cpu if the pool is assocated.(execpt read-access in insert_work()). so we convert it to non-atomic to reduce the overhead of atomic. It is protected by "LI" Signed-off-by: Lai Jiangshan --- kernel/workqueue.c | 49 + 1 files changed, 21 insertions(+), 28 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 9f1ebdf..25e2e5a 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -150,6 +150,7 @@ struct worker_pool { int node; /* I: the associated node ID */ int id; /* I: pool ID */ unsigned intflags; /* X: flags */ + int nr_running; /* LI: count for running */ struct list_headworklist; /* L: list of pending works */ int nr_workers; /* L: total number of workers */ @@ -175,13 +176,6 @@ struct worker_pool { int refcnt; /* PL: refcnt for unbound pools */ /* -* The current concurrency level. As it's likely to be accessed -* from other CPUs during try_to_wake_up(), put it in a separate -* cacheline. -*/ - atomic_tnr_running cacheline_aligned_in_smp; - - /* * Destruction of pool is sched-RCU protected to allow dereferences * from get_work_pool(). */ @@ -700,7 +694,7 @@ static bool work_is_canceling(struct work_struct *work) static bool __need_more_worker(struct worker_pool *pool) { - return !atomic_read(&pool->nr_running); + return !pool->nr_running; } /* @@ -725,8 +719,7 @@ static bool may_start_working(struct worker_pool *pool) /* Do I need to keep working? Called from currently running workers. */ static bool keep_working(struct worker_pool *pool) { - return !list_empty(&pool->worklist) && - atomic_read(&pool->nr_running) <= 1; + return !list_empty(&pool->worklist) && pool->nr_running <= 1; } /* Do we need a new worker? Called from manager. */ @@ -823,21 +816,24 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task) return NULL; /* -* The counterpart of the following dec_and_test, implied mb, -* worklist not empty test sequence is in insert_work(). -* Please read comment there. -* * NOT_RUNNING is clear. This means that we're bound to and * running on the local cpu w/ rq lock held and preemption/irq * disabled, which in turn means that none else could be * manipulating idle_list, so dereferencing idle_list without pool * lock is safe. And which in turn also means that we can -* manipulating worker->flags. +* manipulating worker->flags and pool->nr_running. */ worker->flags |= WORKER_QUIT_CM; - if (atomic_dec_and_test(&pool->nr_running) && - !list_empty(&pool->worklist)) - to_wakeup = first_worker(pool); + if (--pool->nr_running == 0) { + /* +* This smp_mb() forces a mb between decreasing nr_running +* and reading worklist. It paires with the smp_mb() in +* insert_work(). Please read comment there. +*/ + smp_mb(); + if (!list_empty(&pool->worklist)) + to_wakeup = first_worker(pool); + } return to_wakeup ? to_wakeup->task : NULL; } @@ -868,12 +864,10 @@ static inline void worker_set_flags(struct worker *worker, unsigned int flags, */ if ((flags & WORKER_NOT_RUNNING) && !(worker->flags & WORKER_NOT_RUNNING)) { - if (wakeup) { - if (atomic_dec_and_test(&pool->nr_running) && - !list_empty(&pool->worklist)) - wake_up_worker(pool); - } else - atomic_dec(&pool->nr_running); + pool->nr_running--; + if (wakeup && !pool->nr_running && + !list_empty(&pool->worklist)) + wake_up_worker(pool); } worker->flags |= flags; @@ -905,7 +899,7 @@ static inline void worker_clr_flags(struct worker *worker, unsigned int flags) */ if ((flags & WORKER_NOT_RUNNING) && (oflags & WORKER_NOT_RUNNING)) if (!(worker->flags & WORKER_NOT_RUNNING)) - atomic_inc(&pool->nr_running); + pool->nr_running++; } /** @@ -1544,8 +1538,7 @@ static void worker_enter_idle(struct worker *worker) mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT); /* Sanity check nr_running. */ - WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && -
[PATCH 7/8] workqueue: move worker->flags up
worker->flags is hot field(accessed when process each work item). Move it up the the first 64 bytes(32 byte in 32bis) which are hot fields. And move colder field worker->task down to ensure worker->pool is still in the first 64 bytes. Signed-off-by: Lai Jiangshan --- kernel/workqueue_internal.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h index e9fd05f..63cfac7 100644 --- a/kernel/workqueue_internal.h +++ b/kernel/workqueue_internal.h @@ -20,6 +20,7 @@ struct worker_pool; * Only to be used in workqueue and async. */ struct worker { + unsigned intflags; /* LI: flags */ /* on idle list while idle, on busy hash table while busy */ union { struct list_headentry; /* L: while idle */ @@ -30,12 +31,11 @@ struct worker { work_func_t current_func; /* L: current_work's fn */ struct pool_workqueue *current_pwq; /* L: current_work's pwq */ struct list_headscheduled; /* L: scheduled works */ - struct task_struct *task; /* I: worker task */ struct worker_pool *pool; /* I: the associated pool */ /* L: for rescuers */ /* 64 bytes boundary on 64bit, 32 on 32bit */ + struct task_struct *task; /* I: worker task */ unsigned long last_active;/* L: last active timestamp */ - unsigned intflags; /* LI: flags */ int id; /* I: worker id */ /* used only by rescuers to point to the target workqueue */ -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 8/8] workqueue: rename ->nr_running to ->nr_cm_workers
nr_running is not a good name, the reviewers may think they are non-sleeping busy workers. nr_running is actually a counter for concurrency managed workers. renaming it to nr_cm_workers would be better. s/nr_running/nr_cm_workers/ s/NOT_RUNNING/NOT_CM/ manually tune a little(indent and the comment for nr_cm_workers) Signed-off-by: Lai Jiangshan --- kernel/workqueue.c | 69 +-- 1 files changed, 34 insertions(+), 35 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 25e2e5a..25e028c 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -80,7 +80,7 @@ enum { WORKER_UNBOUND = 1 << 7, /* worker is unbound */ WORKER_REBOUND = 1 << 8, /* worker was rebound */ - WORKER_NOT_RUNNING = WORKER_PREP | WORKER_QUIT_CM | + WORKER_NOT_CM = WORKER_PREP | WORKER_QUIT_CM | WORKER_CPU_INTENSIVE | WORKER_UNBOUND | WORKER_REBOUND, @@ -150,7 +150,7 @@ struct worker_pool { int node; /* I: the associated node ID */ int id; /* I: pool ID */ unsigned intflags; /* X: flags */ - int nr_running; /* LI: count for running */ + int nr_cm_workers; /* LI: count for cm workers */ struct list_headworklist; /* L: list of pending works */ int nr_workers; /* L: total number of workers */ @@ -694,14 +694,14 @@ static bool work_is_canceling(struct work_struct *work) static bool __need_more_worker(struct worker_pool *pool) { - return !pool->nr_running; + return !pool->nr_cm_workers; } /* * Need to wake up a worker? Called from anything but currently * running workers. * - * Note that, because unbound workers never contribute to nr_running, this + * Note that, because unbound workers never contribute to nr_cm_workers, this * function will always return %true for unbound pools as long as the * worklist isn't empty. */ @@ -719,7 +719,7 @@ static bool may_start_working(struct worker_pool *pool) /* Do I need to keep working? Called from currently running workers. */ static bool keep_working(struct worker_pool *pool) { - return !list_empty(&pool->worklist) && pool->nr_running <= 1; + return !list_empty(&pool->worklist) && pool->nr_cm_workers <= 1; } /* Do we need a new worker? Called from manager. */ @@ -804,9 +804,9 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task) /* * Rescuers, which may not have all the fields set up like normal * workers, also reach here, let's not access anything before -* checking NOT_RUNNING. +* checking NOT_CM. */ - if (worker->flags & WORKER_NOT_RUNNING) + if (worker->flags & WORKER_NOT_CM) return NULL; pool = worker->pool; @@ -816,17 +816,17 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task) return NULL; /* -* NOT_RUNNING is clear. This means that we're bound to and +* NOT_CM is clear. This means that we're bound to and * running on the local cpu w/ rq lock held and preemption/irq * disabled, which in turn means that none else could be * manipulating idle_list, so dereferencing idle_list without pool * lock is safe. And which in turn also means that we can -* manipulating worker->flags and pool->nr_running. +* manipulating worker->flags and pool->nr_cm_workers. */ worker->flags |= WORKER_QUIT_CM; - if (--pool->nr_running == 0) { + if (--pool->nr_cm_workers == 0) { /* -* This smp_mb() forces a mb between decreasing nr_running +* This smp_mb() forces a mb between decreasing nr_cm_workers * and reading worklist. It paires with the smp_mb() in * insert_work(). Please read comment there. */ @@ -838,13 +838,13 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task) } /** - * worker_set_flags - set worker flags and adjust nr_running accordingly + * worker_set_flags - set worker flags and adjust nr_cm_workers accordingly * @worker: self * @flags: flags to set * @wakeup: wakeup an idle worker if necessary * - * Set @flags in @worker->flags and adjust nr_running accordingly. If - * nr_running becomes zero and @wakeup is %true, an idle worker is + * Set @flags in @worker->flags and adjust nr_cm_workers accordingly. If + * nr_cm_workers becomes zero and @wakeup is %true, an idle worker is * woken up. * * CONTEXT: @@ -858,14 +858,13 @@ static inline void worker_set_flags(struct worker *worker, unsigned int flags, WARN_ON_ONCE(worker->task != current); /*
Re: [PATCH 08/30] i2c: s3c2410: make header file local
Am Sonntag, 14. April 2013, 14:20:35 schrieb Wolfram Sang: > On Thu, Apr 11, 2013 at 02:04:50AM +0200, Arnd Bergmann wrote: > > No other file in the kernel besides i2c-s3c2410.c uses the current > > plat/regs-iic.h, so we can simply move the header file to live in the > > same directory as the driver, as a preparation to multiplatform builds. > > What about putting the regs in the driver itself? they already are :-) [0] and Arnd will drop this patch in his next iteration [1]. Heiko [0] https://git.kernel.org/cgit/linux/kernel/git/wsa/linux.git/commit/?h=i2c/for- next&id=e636602ac2613da8c1777cb42443223994be4107 [1] Message-Id: <201304121011.13028.a...@arndb.de> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ARM: KVM: Add missing break;
Hi Joe, On Sat, 13 Apr 2013 22:55:45 -0700, Joe Perches wrote: > commit 3401d54696f ("KVM: ARM: Introduce KVM_ARM_SET_DEVICE_ADDR ioctl") > added the case, but omitted adding break; [...] Already reported here: http://lists.infradead.org/pipermail/linux-arm-kernel/2013-April/160127.html > $ grep -rP --include=*.[ch] > "\b(\w+)\s*=[^;]+;\s*(?:case\s+\w+:|default:)\s*\1\s*=" Cool regexp! :-) Cheers, M. -- Fast, cheap, reliable. Pick two. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] UCB1400: Pass ucb1400-gpio data through ac97 bus
Cc: Linus Walleij Cc: Jean Delvare Cc: Samuel Ortiz Cc: Mark Brown Cc: Guenter Roeck Cc: linux-kernel Cc: Grant Likely Signed-off-by: Marek Vasut --- drivers/gpio/gpio-ucb1400.c | 19 ++- drivers/mfd/ucb1400_core.c |5 + include/linux/ucb1400.h | 18 ++ 3 files changed, 17 insertions(+), 25 deletions(-) v2: Rebase patch from: http://lists.infradead.org/pipermail/linux-arm-kernel/2010-October/028656.html NOTE: I didn't even compile-test this, but the fix was plenty straightforward. diff --git a/drivers/gpio/gpio-ucb1400.c b/drivers/gpio/gpio-ucb1400.c index 26405ef..6d0feb2 100644 --- a/drivers/gpio/gpio-ucb1400.c +++ b/drivers/gpio/gpio-ucb1400.c @@ -12,8 +12,6 @@ #include #include -struct ucb1400_gpio_data *ucbdata; - static int ucb1400_gpio_dir_in(struct gpio_chip *gc, unsigned off) { struct ucb1400_gpio *gpio; @@ -50,7 +48,7 @@ static int ucb1400_gpio_probe(struct platform_device *dev) struct ucb1400_gpio *ucb = dev->dev.platform_data; int err = 0; - if (!(ucbdata && ucbdata->gpio_offset)) { + if (!(ucb && ucb->gpio_offset)) { err = -EINVAL; goto err; } @@ -58,7 +56,7 @@ static int ucb1400_gpio_probe(struct platform_device *dev) platform_set_drvdata(dev, ucb); ucb->gc.label = "ucb1400_gpio"; - ucb->gc.base = ucbdata->gpio_offset; + ucb->gc.base = ucb->gpio_offset; ucb->gc.ngpio = 10; ucb->gc.owner = THIS_MODULE; @@ -72,8 +70,8 @@ static int ucb1400_gpio_probe(struct platform_device *dev) if (err) goto err; - if (ucbdata && ucbdata->gpio_setup) - err = ucbdata->gpio_setup(&dev->dev, ucb->gc.ngpio); + if (ucb && ucb->gpio_setup) + err = ucb->gpio_setup(&dev->dev, ucb->gc.ngpio); err: return err; @@ -85,8 +83,8 @@ static int ucb1400_gpio_remove(struct platform_device *dev) int err = 0; struct ucb1400_gpio *ucb = platform_get_drvdata(dev); - if (ucbdata && ucbdata->gpio_teardown) { - err = ucbdata->gpio_teardown(&dev->dev, ucb->gc.ngpio); + if (ucb && ucb->gpio_teardown) { + err = ucb->gpio_teardown(&dev->dev, ucb->gc.ngpio); if (err) return err; } @@ -103,11 +101,6 @@ static struct platform_driver ucb1400_gpio_driver = { }, }; -void __init ucb1400_gpio_set_data(struct ucb1400_gpio_data *data) -{ - ucbdata = data; -} - module_platform_driver(ucb1400_gpio_driver); MODULE_DESCRIPTION("Philips UCB1400 GPIO driver"); diff --git a/drivers/mfd/ucb1400_core.c b/drivers/mfd/ucb1400_core.c index daf6952..e9031fa 100644 --- a/drivers/mfd/ucb1400_core.c +++ b/drivers/mfd/ucb1400_core.c @@ -75,6 +75,11 @@ static int ucb1400_core_probe(struct device *dev) /* GPIO */ ucb_gpio.ac97 = ac97; + if (pdata) { + ucb_gpio.gpio_setup = pdata->gpio_setup; + ucb_gpio.gpio_teardown = pdata->gpio_teardown; + ucb_gpio.gpio_offset = pdata->gpio_offset; + } ucb->ucb1400_gpio = platform_device_alloc("ucb1400_gpio", -1); if (!ucb->ucb1400_gpio) { err = -ENOMEM; diff --git a/include/linux/ucb1400.h b/include/linux/ucb1400.h index d21b33c..2e9ee4d 100644 --- a/include/linux/ucb1400.h +++ b/include/linux/ucb1400.h @@ -83,15 +83,12 @@ #define UCB_ID 0x7e #define UCB_ID_1400 0x4304 -struct ucb1400_gpio_data { - int gpio_offset; - int (*gpio_setup)(struct device *dev, int ngpio); - int (*gpio_teardown)(struct device *dev, int ngpio); -}; - struct ucb1400_gpio { struct gpio_chipgc; struct snd_ac97 *ac97; + int gpio_offset; + int (*gpio_setup)(struct device *dev, int ngpio); + int (*gpio_teardown)(struct device *dev, int ngpio); }; struct ucb1400_ts { @@ -110,6 +107,9 @@ struct ucb1400 { struct ucb1400_pdata { int irq; + int gpio_offset; + int (*gpio_setup)(struct device *dev, int ngpio); + int (*gpio_teardown)(struct device *dev, int ngpio); }; static inline u16 ucb1400_reg_read(struct snd_ac97 *ac97, u16 reg) @@ -162,10 +162,4 @@ static inline void ucb1400_adc_disable(struct snd_ac97 *ac97) unsigned int ucb1400_adc_read(struct snd_ac97 *ac97, u16 adc_channel, int adcsync); -#ifdef CONFIG_GPIO_UCB1400 -void __init ucb1400_gpio_set_data(struct ucb1400_gpio_data *data); -#else -static inline void ucb1400_gpio_set_data(struct ucb1400_gpio_data *data) {} -#endif - #endif -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.
[PATCH 2/4] cgroup: drop hierarchy_id_lock
Now that hierarchy_id alloc / free are protected by the cgroup mutexes, there's no need for this separate lock. Drop it. Signed-off-by: Tejun Heo --- kernel/cgroup.c | 23 +-- 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 823cb56..e15bdb7 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -237,9 +237,13 @@ struct cgroup_event { static LIST_HEAD(roots); static int root_count; +/* + * Hierarchy ID allocation and mapping. It follows the same exclusion + * rules as other root ops - both cgroup_mutex and cgroup_root_mutex for + * writes, either for reads. + */ static DEFINE_IDA(hierarchy_ida); static int next_hierarchy_id; -static DEFINE_SPINLOCK(hierarchy_id_lock); /* dummytop is a shorthand for the dummy hierarchy's top cgroup */ #define dummytop (&rootnode.top_cgroup) @@ -1456,10 +1460,12 @@ static int cgroup_init_root_id(struct cgroupfs_root *root) { int ret; + lockdep_assert_held(&cgroup_mutex); + lockdep_assert_held(&cgroup_root_mutex); + do { if (!ida_pre_get(&hierarchy_ida, GFP_KERNEL)) return -ENOMEM; - spin_lock(&hierarchy_id_lock); /* Try to allocate the next unused ID */ ret = ida_get_new_above(&hierarchy_ida, next_hierarchy_id, &root->hierarchy_id); @@ -1472,18 +1478,17 @@ static int cgroup_init_root_id(struct cgroupfs_root *root) /* Can only get here if the 31-bit IDR is full ... */ BUG_ON(ret); } - spin_unlock(&hierarchy_id_lock); } while (ret); return 0; } static void cgroup_exit_root_id(struct cgroupfs_root *root) { + lockdep_assert_held(&cgroup_mutex); + lockdep_assert_held(&cgroup_root_mutex); + if (root->hierarchy_id) { - spin_lock(&hierarchy_id_lock); ida_remove(&hierarchy_ida, root->hierarchy_id); - spin_unlock(&hierarchy_id_lock); - root->hierarchy_id = 0; } } @@ -4656,8 +4661,14 @@ int __init cgroup_init(void) hash_add(css_set_table, &init_css_set.hlist, key); /* allocate id for the dummy hierarchy */ + mutex_lock(&cgroup_mutex); + mutex_lock(&cgroup_root_mutex); + BUG_ON(cgroup_init_root_id(&rootnode)); + mutex_unlock(&cgroup_root_mutex); + mutex_unlock(&cgroup_mutex); + cgroup_kobj = kobject_create_and_add("cgroup", fs_kobj); if (!cgroup_kobj) { err = -ENOMEM; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] cgroup: implement task_cgroup_path_from_hierarchy()
kdbus folks want a sane way to determine the cgroup path that a given task belongs to on a given hierarchy, which is a reasonble thing to expect from cgroup core. Implement task_cgroup_path_from_hierarchy(). Signed-off-by: Tejun Heo Cc: Kay Sievers Cc: Greg Kroah-Hartman Cc: Lennart Poettering Cc: Daniel Mack --- include/linux/cgroup.h | 2 ++ kernel/cgroup.c| 33 + 2 files changed, 35 insertions(+) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 17ed818..ee83af2 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -443,6 +443,8 @@ int cgroup_is_removed(const struct cgroup *cgrp); bool cgroup_is_descendant(struct cgroup *cgrp, struct cgroup *ancestor); int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen); +int task_cgroup_path_from_hierarchy(struct task_struct *task, int hierarchy_id, + char *buf, size_t buflen); int cgroup_task_count(const struct cgroup *cgrp); diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 75d85e8..5184fcd 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1842,6 +1842,39 @@ out: } EXPORT_SYMBOL_GPL(cgroup_path); +/** + * task_cgroup_path_from_hierarchy - cgroup path of a task on a hierarchy + * @task: target task + * @hierarchy_id: the hierarchy to look up @task's cgroup from + * @buf: the buffer to write the path into + * @buflen: the length of the buffer + * + * Determine @task's cgroup on the hierarchy specified by @hierarchy_id and + * copy its path into @buf. This function grabs cgroup_mutex and shouldn't + * be used inside locks used by cgroup controller callbacks. + */ +int task_cgroup_path_from_hierarchy(struct task_struct *task, int hierarchy_id, + char *buf, size_t buflen) +{ + struct cgroupfs_root *root; + struct cgroup *cgrp = NULL; + int ret = -ENOENT; + + mutex_lock(&cgroup_mutex); + + root = idr_find(&cgroup_hierarchy_idr, hierarchy_id); + if (root) { + cgrp = task_cgroup_from_root(task, root); + if (cgrp) + ret = cgroup_path(cgrp, buf, buflen); + } + + mutex_unlock(&cgroup_mutex); + + return ret; +} +EXPORT_SYMBOL_GPL(task_cgroup_path_from_hierarchy); + /* * Control Group taskset */ -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] cgroup: refactor hierarchy_id handling
We're planning to converting hierarchy_ida to an idr and use it to look up hierarchy from its id. As we want the mapping to happen atomically with cgroupfs_root registration, this patch refactors hierarchy_id init / exit so that ida operations happen inside cgroup_[root_]mutex. * s/init_root_id()/cgroup_init_root_id()/ and make it return 0 or -errno like a normal function. * Move hierarchy_id initialization from cgroup_root_from_opts() into cgroup_mount() block where the root is confirmed to be used and being registered while holding both mutexes. * Split cgroup_drop_id() into cgroup_exit_root_id() and cgroup_free_root(), so that ID release can happen before dropping the mutexes in cgroup_kill_sb(). The latter expects hierarchy_id to be exited before being invoked. Signed-off-by: Tejun Heo --- kernel/cgroup.c | 56 +++- 1 file changed, 35 insertions(+), 21 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index a790409..823cb56 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1452,13 +1452,13 @@ static void init_cgroup_root(struct cgroupfs_root *root) list_add_tail(&cgrp->allcg_node, &root->allcg_list); } -static bool init_root_id(struct cgroupfs_root *root) +static int cgroup_init_root_id(struct cgroupfs_root *root) { - int ret = 0; + int ret; do { if (!ida_pre_get(&hierarchy_ida, GFP_KERNEL)) - return false; + return -ENOMEM; spin_lock(&hierarchy_id_lock); /* Try to allocate the next unused ID */ ret = ida_get_new_above(&hierarchy_ida, next_hierarchy_id, @@ -1474,7 +1474,18 @@ static bool init_root_id(struct cgroupfs_root *root) } spin_unlock(&hierarchy_id_lock); } while (ret); - return true; + return 0; +} + +static void cgroup_exit_root_id(struct cgroupfs_root *root) +{ + if (root->hierarchy_id) { + spin_lock(&hierarchy_id_lock); + ida_remove(&hierarchy_ida, root->hierarchy_id); + spin_unlock(&hierarchy_id_lock); + + root->hierarchy_id = 0; + } } static int cgroup_test_super(struct super_block *sb, void *data) @@ -1508,10 +1519,6 @@ static struct cgroupfs_root *cgroup_root_from_opts(struct cgroup_sb_opts *opts) if (!root) return ERR_PTR(-ENOMEM); - if (!init_root_id(root)) { - kfree(root); - return ERR_PTR(-ENOMEM); - } init_cgroup_root(root); root->subsys_mask = opts->subsys_mask; @@ -1526,17 +1533,15 @@ static struct cgroupfs_root *cgroup_root_from_opts(struct cgroup_sb_opts *opts) return root; } -static void cgroup_drop_root(struct cgroupfs_root *root) +static void cgroup_free_root(struct cgroupfs_root *root) { - if (!root) - return; + if (root) { + /* hierarhcy ID shoulid already have been released */ + WARN_ON_ONCE(root->hierarchy_id); - BUG_ON(!root->hierarchy_id); - spin_lock(&hierarchy_id_lock); - ida_remove(&hierarchy_ida, root->hierarchy_id); - spin_unlock(&hierarchy_id_lock); - ida_destroy(&root->cgroup_ida); - kfree(root); + ida_destroy(&root->cgroup_ida); + kfree(root); + } } static int cgroup_set_super(struct super_block *sb, void *data) @@ -1623,7 +1628,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, sb = sget(fs_type, cgroup_test_super, cgroup_set_super, 0, &opts); if (IS_ERR(sb)) { ret = PTR_ERR(sb); - cgroup_drop_root(opts.new_root); + cgroup_free_root(opts.new_root); goto drop_modules; } @@ -1667,6 +1672,10 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, if (ret) goto unlock_drop; + ret = cgroup_init_root_id(root); + if (ret) + goto unlock_drop; + ret = rebind_subsystems(root, root->subsys_mask); if (ret == -EBUSY) { free_cg_links(&tmp_cg_links); @@ -1710,7 +1719,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, * We re-used an existing hierarchy - the new root (if * any) is not needed */ - cgroup_drop_root(opts.new_root); + cgroup_free_root(opts.new_root); /* no subsys rebinding, so refcounts don't change */ drop_parsed_module_refcounts(opts.subsys_mask); } @@ -1720,6 +1729,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, return dget(sb->s_root); unlock_drop: + cgroup_exit_root_id(root); mutex_unlock(&cgroup_root_mutex); mutex_unl
[PATCH 3/4] cgroup: make hierarchy_id use cyclic idr
We want to be able to lookup a hierarchy from its id and cyclic allocation is a whole lot simpler with idr. Convert to idr and use idr_alloc_cyclc(). Signed-off-by: Tejun Heo --- kernel/cgroup.c | 28 1 file changed, 8 insertions(+), 20 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index e15bdb7..75d85e8 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -242,8 +242,7 @@ static int root_count; * rules as other root ops - both cgroup_mutex and cgroup_root_mutex for * writes, either for reads. */ -static DEFINE_IDA(hierarchy_ida); -static int next_hierarchy_id; +static DEFINE_IDR(cgroup_hierarchy_idr); /* dummytop is a shorthand for the dummy hierarchy's top cgroup */ #define dummytop (&rootnode.top_cgroup) @@ -1458,27 +1457,16 @@ static void init_cgroup_root(struct cgroupfs_root *root) static int cgroup_init_root_id(struct cgroupfs_root *root) { - int ret; + int id; lockdep_assert_held(&cgroup_mutex); lockdep_assert_held(&cgroup_root_mutex); - do { - if (!ida_pre_get(&hierarchy_ida, GFP_KERNEL)) - return -ENOMEM; - /* Try to allocate the next unused ID */ - ret = ida_get_new_above(&hierarchy_ida, next_hierarchy_id, - &root->hierarchy_id); - if (ret == -ENOSPC) - /* Try again starting from 0 */ - ret = ida_get_new(&hierarchy_ida, &root->hierarchy_id); - if (!ret) { - next_hierarchy_id = root->hierarchy_id + 1; - } else if (ret != -EAGAIN) { - /* Can only get here if the 31-bit IDR is full ... */ - BUG_ON(ret); - } - } while (ret); + id = idr_alloc_cyclic(&cgroup_hierarchy_idr, root, 2, 0, GFP_KERNEL); + if (id < 0) + return id; + + root->hierarchy_id = id; return 0; } @@ -1488,7 +1476,7 @@ static void cgroup_exit_root_id(struct cgroupfs_root *root) lockdep_assert_held(&cgroup_root_mutex); if (root->hierarchy_id) { - ida_remove(&hierarchy_ida, root->hierarchy_id); + idr_remove(&cgroup_hierarchy_idr, root->hierarchy_id); root->hierarchy_id = 0; } } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHSET] cgroup: implement task_cgroup_path_from_hierarchy()
kdbus folks want a sane way to determine the cgroup path that a given task belongs to on a given hierarchy, which is a reasonble thing to expect from cgroup core. This patchset make hierarchy_id allocation use idr instead of ida and implement task_cgroup_path_from_hierarchy(). In the process, the yucky ida cyclic allocation is replaced with idr_alloc_cyclic(). 0001-cgroup-refactor-hierarchy_id-handling.patch 0002-cgroup-drop-hierarchy_id_lock.patch 0003-cgroup-make-hierarchy_id-use-cyclic-idr.patch 0004-cgroup-implement-task_cgroup_path_from_hierarchy.patch 0001-0002 prepare for conversion to idr, which 0003 does. 0004 implements the new function. This patchset is on top of next-20130412 as idr_alloc_cyclic() patch is currently in -mm. Given that this isn't an urgent thing and the merge window is just around the corner, it'd be probably best to route these through cgroup/for-3.11 once v3.10-rc1 drops. These patches are also available in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-task_cgroup_path_from_hierarchy And it actually reduces LOC. Woot Woot. include/linux/cgroup.h |2 kernel/cgroup.c| 128 + 2 files changed, 89 insertions(+), 41 deletions(-) -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] UCB1400: Pass ucb1400-gpio data through ac97 bus
Dear Marek Vasut, > Cc: Linus Walleij > Cc: Jean Delvare > Cc: Samuel Ortiz > Cc: Mark Brown > Cc: Guenter Roeck > Cc: linux-kernel > Cc: Grant Likely > Signed-off-by: Marek Vasut > --- > drivers/gpio/gpio-ucb1400.c | 19 ++- > drivers/mfd/ucb1400_core.c |5 + > include/linux/ucb1400.h | 18 ++ > 3 files changed, 17 insertions(+), 25 deletions(-) > > v2: Rebase patch from: > http://lists.infradead.org/pipermail/linux-arm-kernel/2010-October/028656.h > tml > > NOTE: I didn't even compile-test this, but the fix was plenty > straightforward. But damn, this code is ugly. I'm retrospectively-ashamed. Best regards, Marek Vasut -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] ptrace/x86: simplify the "disable" logic in ptrace_write_dr7()
ptrace_write_dr7() looks unnecessarily overcomplicated. We can factor out ptrace_modify_breakpoint() and do not do "continue" twice, just we need to pass the proper "disabled" argument to ptrace_modify_breakpoint(). Signed-off-by: Oleg Nesterov --- arch/x86/kernel/ptrace.c | 40 +++- 1 files changed, 15 insertions(+), 25 deletions(-) diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index 7a98b21..0649f16 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -637,9 +637,7 @@ static int ptrace_write_dr7(struct task_struct *tsk, unsigned long data) struct thread_struct *thread = &(tsk->thread); unsigned long old_dr7; int i, orig_ret = 0, rc = 0; - int enabled, second_pass = 0; - unsigned len, type; - struct perf_event *bp; + int second_pass = 0; data &= ~DR_CONTROL_RESERVED; old_dr7 = ptrace_get_dr7(thread->ptrace_bps); @@ -649,30 +647,22 @@ restore: * appropriate changes to each. */ for (i = 0; i < HBP_NUM; i++) { - enabled = decode_dr7(data, i, &len, &type); - bp = thread->ptrace_bps[i]; - - if (!enabled) { - if (bp) { - /* -* Don't unregister the breakpoints right-away, -* unless all register_user_hw_breakpoint() -* requests have succeeded. This prevents -* any window of opportunity for debug -* register grabbing by other users. -*/ - if (!second_pass) - continue; - - rc = ptrace_modify_breakpoint(bp, len, type, - tsk, 1); - if (rc) - break; - } - continue; + unsigned len, type; + bool disabled = !decode_dr7(data, i, &len, &type); + struct perf_event *bp = thread->ptrace_bps[i]; + + if (disabled) { + /* +* Don't unregister the breakpoints right-away, unless +* all register_user_hw_breakpoint() requests have +* succeeded. This prevents any window of opportunity +* for debug register grabbing by other users. +*/ + if (!bp || !second_pass) + continue; } - rc = ptrace_modify_breakpoint(bp, len, type, tsk, 0); + rc = ptrace_modify_breakpoint(bp, len, type, tsk, disabled); if (rc) break; } -- 1.5.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] cgroup: implement task_cgroup_path_from_hierarchy()
On Sun, Apr 14, 2013 at 11:36:59AM -0700, Tejun Heo wrote: > kdbus folks want a sane way to determine the cgroup path that a given > task belongs to on a given hierarchy, which is a reasonble thing to > expect from cgroup core. > > Implement task_cgroup_path_from_hierarchy(). > > Signed-off-by: Tejun Heo > Cc: Kay Sievers > Cc: Greg Kroah-Hartman > Cc: Lennart Poettering > Cc: Daniel Mack Thanks so much for doing this. Acked-by: Greg Kroah-Hartman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] ptrace/x86: dont delay perf_event_disable() till second pass in ptrace_write_dr7()
ptrace_write_dr7() skips ptrace_modify_breakpoint(disabled => true) unless second_pass, this buys nothing but complicates the code and means that we always do the main loop twice even if "disabled" was never true. The comment says: Don't unregister the breakpoints right-away, unless all register_user_hw_breakpoint() requests have succeeded. I think this logic was always wrong, hw_breakpoint_del() does not free the slot so perf_event_disable() can't hurt. But in any case this looks unneeded nowadays, and contrary to what the comment says we do not do register_user_hw_breakpoint(), this was removed by 24f1e32c "hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events". Remove the "second_pass" check from the main loop and simplify the code. Since we have to check "bp != NULL" anyway, the patch also removes the same check in ptrace_modify_breakpoint() and moves the comment into ptrace_write_dr7(). Signed-off-by: Oleg Nesterov --- arch/x86/kernel/ptrace.c | 46 +- 1 files changed, 17 insertions(+), 29 deletions(-) diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index 0649f16..6814f27 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -609,14 +609,6 @@ ptrace_modify_breakpoint(struct perf_event *bp, int len, int type, int gen_len, gen_type; struct perf_event_attr attr; - /* -* We should have at least an inactive breakpoint at this -* slot. It means the user is writing dr7 without having -* written the address register first -*/ - if (!bp) - return -EINVAL; - err = arch_bp_generic_fields(len, type, &gen_len, &gen_type); if (err) return err; @@ -634,10 +626,10 @@ ptrace_modify_breakpoint(struct perf_event *bp, int len, int type, */ static int ptrace_write_dr7(struct task_struct *tsk, unsigned long data) { - struct thread_struct *thread = &(tsk->thread); + struct thread_struct *thread = &tsk->thread; unsigned long old_dr7; - int i, orig_ret = 0, rc = 0; - int second_pass = 0; + int i, ret = 0, rc = 0; + bool second_pass = false; data &= ~DR_CONTROL_RESERVED; old_dr7 = ptrace_get_dr7(thread->ptrace_bps); @@ -651,35 +643,31 @@ restore: bool disabled = !decode_dr7(data, i, &len, &type); struct perf_event *bp = thread->ptrace_bps[i]; - if (disabled) { + if (!bp) { + if (disabled) + continue; /* -* Don't unregister the breakpoints right-away, unless -* all register_user_hw_breakpoint() requests have -* succeeded. This prevents any window of opportunity -* for debug register grabbing by other users. +* We should have at least an inactive breakpoint at +* this slot. It means the user is writing dr7 without +* having written the address register first. */ - if (!bp || !second_pass) - continue; + rc = -EINVAL; + break; } rc = ptrace_modify_breakpoint(bp, len, type, tsk, disabled); if (rc) break; } - /* -* Make a second pass to free the remaining unused breakpoints -* or to restore the original breakpoints if an error occurred. -*/ - if (!second_pass) { - second_pass = 1; - if (rc < 0) { - orig_ret = rc; - data = old_dr7; - } + /* Make a second pass to restore the original breakpoints if failed */ + if (!second_pass && rc) { + second_pass = true; + ret = rc; + data = old_dr7; goto restore; } - return orig_ret < 0 ? orig_ret : rc; + return ret; } /* -- 1.5.5.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] ptrace/x86: simplify ptrace_write_dr7()
Hello. On top of "[PATCH 0/5] kill ptrace_{get,put}_breakpoints()". Cleanup and preparation for the potential fix, see below. -- Now the question. Initially I was going to make more patches and fix the regression introduced by 24f1e32c (although I am not 100% sure which exactly patch should be blamed). See https://bugzilla.redhat.com/show_bug.cgi?id=660204 for details. ptrace_write_dr7() does not create bp if it is zero, the comment says: /* * We should have at least an inactive breakpoint at * this slot. It means the user is writing dr7 without * having written the address register first. */ and this looks logical. However, at least until 72f674d2 ptrace_set_debugreg(n => 7) worked even if addr wasn't set by ptrace_set_debugreg(n => 0|1|2|3) before. And note that ptrace_get_debugreg() does not fail if !ptrace_bps[n], it just returns zero as if the address register was written. And there is no way to know if address was actually set, not good and not consistent. Jan, Frederic, et all. What do you think we should do? 1. Change ptrace_write_dr7() to do register_user_hw_breakpoint() if necessary. This is what I was going to do, but I am no longer sure we want this. For what? Unlikely it is very useful to use the "default" addr == 0 for debugging. 2. Change ptrace_get_debugreg(0-4) to return -ESOMETHING if ptrace_bps[n] == NULL. This will match ptrace_set_debugreg(), but this can break something else... 3. Do nothing. I am inclined to do "1", but please comment. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] iommu: Move swap_pci_ref function to pci.h.
swap_pci_ref function is used by the IOMMU API code for swapping pci device pointers, while determining the iommu group for the device. Currently this function was being implemented for different IOMMU drivers. This patch moves the function to pci.h so that the implementation can be shared across various IOMMU drivers. Signed-off-by: Varun Sethi --- drivers/iommu/amd_iommu.c |6 -- drivers/iommu/intel-iommu.c |6 -- include/linux/pci.h |8 3 files changed, 8 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index a7f6b04..c36c046 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -263,12 +263,6 @@ static bool check_device(struct device *dev) return true; } -static void swap_pci_ref(struct pci_dev **from, struct pci_dev *to) -{ - pci_dev_put(*from); - *from = to; -} - static struct pci_bus *find_hosted_bus(struct pci_bus *bus) { while (!bus->self) { diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 6e0b9ff..8d7c979 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4137,12 +4137,6 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain, return 0; } -static void swap_pci_ref(struct pci_dev **from, struct pci_dev *to) -{ - pci_dev_put(*from); - *from = to; -} - #define REQ_ACS_FLAGS (PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF) static int intel_iommu_add_device(struct device *dev) diff --git a/include/linux/pci.h b/include/linux/pci.h index 2461033a..41511de 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1850,6 +1850,14 @@ static inline struct eeh_dev *pci_dev_to_eeh_dev(struct pci_dev *pdev) } #endif +#ifdef CONFIG_IOMMU_API +static inline void swap_pci_ref(struct pci_dev **from, struct pci_dev *to) +{ + pci_dev_put(*from); + *from = to; +} +#endif + /** * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a device * @pdev: the PCI device -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3 v12] iommu/fsl: Add additional iommu attributes required by the PAMU driver.
Added the following domain attributes for the FSL PAMU driver: 1. Added new iommu stash attribute, which allows setting of the LIODN specific stash id parameter through IOMMU API. 2. Added an attribute for enabling/disabling DMA to a particular memory window. 3. Added domain attribute to check for PAMUV1 specific constraints. Signed-off-by: Varun Sethi --- -v12 changes: - Moved PAMU specifc stash ids and structures to PAMU header file. - no change in v11. - no change in v10. include/linux/iommu.h | 16 1 files changed, 16 insertions(+), 0 deletions(-) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 2727810..c5dc2b9 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -57,10 +57,26 @@ struct iommu_domain { #define IOMMU_CAP_CACHE_COHERENCY 0x1 #define IOMMU_CAP_INTR_REMAP 0x2 /* isolates device intrs */ +/* + * Following constraints are specifc to PAMUV1: + * -aperture must be power of 2, and naturally aligned + * -number of windows must be power of 2, and address space size + * of each window is determined by aperture size / # of windows + * -the actual size of the mapped region of a window must be power + * of 2 starting with 4KB and physical address must be naturally + * aligned. + * DOMAIN_ATTR_FSL_PAMUV1 corresponds to the above mentioned contraints. + * The caller can invoke iommu_domain_get_attr to check if the underlying + * iommu implementation supports these constraints. + */ + enum iommu_attr { DOMAIN_ATTR_GEOMETRY, DOMAIN_ATTR_PAGING, DOMAIN_ATTR_WINDOWS, + DOMAIN_ATTR_PAMU_STASH, + DOMAIN_ATTR_PAMU_ENABLE, + DOMAIN_ATTR_FSL_PAMUV1, DOMAIN_ATTR_MAX, }; -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] ptrace/x86: simplify ptrace_write_dr7()
On Sun, 14 Apr 2013 21:12:05 +0200, Oleg Nesterov wrote: > Jan, Frederic, et all. What do you think we should do? > > 1. Change ptrace_write_dr7() to do register_user_hw_breakpoint() > if necessary. > > This is what I was going to do, but I am no longer sure > we want this. For what? Unlikely it is very useful to use > the "default" addr == 0 for debugging. I do not understand how these functions map to the PTRACE_* syscall. But this was a regression from the application point of view as some application did/do: * waitpid - get the process to: t (tracing stop) * PTRACE_POKEUSER DR7, enableDR0 * PTRACE_POKEUSER DR0, address * PTRACE_CONT This was perfectly valid before, there is no "default" addr == 0 used for any debugging. Just the applications did not care about PTRACE_POKEUSER ordering. This is also how the bug was found. Thanks, Jan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] [RFC] blkdev: flush generation optimization
Some filesystems try to optimize barrier flushes by maintaining fs-specific generation counters, but if we introduce generic flush generation counter for block device filesystems may use it for fdatasync(2) optimization. Optimization should works if userspace performs mutli-threaded IO with a lot of fdatasync() Here are graphs for a test where each task performs random buffered writes to dedicated file and performs fdatasync(2) after each operation. Axis: x=nr_tasks, y=write_iops # Chunk server simulation workload # Files 'chunk.$NUM_JOB.0' should be precreated before the test # [global] bs=4k ioengine=psync filesize=64M size=8G direct=0 runtime=30 directory=/mnt fdatasync=1 group_reporting=1 [chunk] overwrite=1 new_group=1 write_bw_log=bw.log rw=randwrite numjobs=${NUM_JOBS} fsync=1 stonewall <><> TOC: 0001 blkdev: add flush generation counter 0002 ext4: Add fdatasync scalability optimization
[PATCH 2/2] ext4: Add fdatasync scalability optimization
Track blkdev's flush generation counter on per-inode basis and update inside end_io. If inode's flush generation counter is older than current blkdev's flush counter inode's data was already flushed to stable media, so we can skip explicit barrier. Optimization is safe only when inode's end_io was called before flush request was QUEUED and COMPLETED. With that optimization we do not longer need jbd2 flush optimization. Signed-off-by: Dmitry Monakhov --- fs/ext4/ext4.h |1 + fs/ext4/ext4_jbd2.h | 10 +- fs/ext4/fsync.c | 16 +++- fs/ext4/inode.c |3 ++- fs/ext4/page-io.c |2 +- 5 files changed, 24 insertions(+), 8 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 75b2326..e2ec980 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -932,6 +932,7 @@ struct ext4_inode_info { */ tid_t i_sync_tid; tid_t i_datasync_tid; + atomic_t i_flush_tag; /* Precomputed uuid+inum+igen checksum for seeding inode checksums */ __u32 i_csum_seed; diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index c8c6885..46943ed 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -365,7 +365,15 @@ static inline void ext4_update_inode_fsync_trans(handle_t *handle, ei->i_sync_tid = handle->h_transaction->t_tid; if (datasync) ei->i_datasync_tid = handle->h_transaction->t_tid; - } + } else { + struct request_queue *q = bdev_get_queue(inode->i_sb->s_bdev); + if (q) + atomic_set(&EXT4_I(inode)->i_flush_tag, + atomic_read(&q->flush_tag)); + else + atomic_set(&EXT4_I(inode)->i_flush_tag, UINT_MAX); + } + } /* super.c */ diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c index 8a0dee8..b02d1ec 100644 --- a/fs/ext4/fsync.c +++ b/fs/ext4/fsync.c @@ -116,10 +116,10 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) struct inode *inode = file->f_mapping->host; struct ext4_inode_info *ei = EXT4_I(inode); journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; + bool needs_barrier = journal->j_flags & JBD2_BARRIER; + struct request_queue *q = bdev_get_queue(inode->i_sb->s_bdev); int ret, err; tid_t commit_tid; - bool needs_barrier = false; - J_ASSERT(ext4_journal_current_handle() == NULL); trace_ext4_sync_file_enter(file, datasync); @@ -163,10 +163,16 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) } commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid; - if (journal->j_flags & JBD2_BARRIER && - !jbd2_trans_will_send_data_barrier(journal, &commit_tid)) - needs_barrier = true; ret = jbd2_complete_transaction(journal, commit_tid); + /* +* We must send a barrier unless we can guarantee that: +* Latest io-requst for given inode was completed before +* new flush request was QUEUED and COMPLETED by blkdev. +*/ + if (q && ((unsigned int)atomic_read(&q->flush_tag) & ~1U) + > (((unsigned int)atomic_read(&ei->i_flush_tag) + 1U) & (~1U))) + needs_barrier = 0; + if (needs_barrier) { err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); if (!ret) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 1be5827..761513c 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3073,11 +3073,12 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset, size); iocb->private = NULL; - /* if not aio dio with unwritten extents, just free io and return */ if (!(io_end->flag & EXT4_IO_END_UNWRITTEN)) { ext4_free_io_end(io_end); out: + if (size) + ext4_update_inode_fsync_trans(NULL, inode, 1); inode_dio_done(inode); if (is_async) aio_complete(iocb, ret, 0); diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index 047a6de..8a2a09b 100644 --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c @@ -282,7 +282,7 @@ static void ext4_end_bio(struct bio *bio, int error) } io_end->num_io_pages = 0; inode = io_end->inode; - + ext4_update_inode_fsync_trans(NULL, inode, 1); if (error) { io_end->flag |= EXT4_IO_END_ERROR; ext4_warning(inode->i_sb, "I/O error writing to inode %lu " -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] blkdev: add flush generation counter
Callers may use this counter to optimize flushes Signed-off-by: Dmitry Monakhov --- block/blk-core.c |1 + block/blk-flush.c |3 ++- include/linux/blkdev.h |1 + 3 files changed, 4 insertions(+), 1 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 074b758..afb5a4b 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -537,6 +537,7 @@ void blk_cleanup_queue(struct request_queue *q) spin_unlock_irq(lock); mutex_unlock(&q->sysfs_lock); + atomic_set(&q->flush_tag, 0); /* * Drain all requests queued before DYING marking. Set DEAD flag to * prevent that q->request_fn() gets invoked after draining finished. diff --git a/block/blk-flush.c b/block/blk-flush.c index cc2b827..b1adc75 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -203,7 +203,7 @@ static void flush_end_io(struct request *flush_rq, int error) /* account completion of the flush request */ q->flush_running_idx ^= 1; elv_completed_request(q, flush_rq); - + atomic_inc(&q->flush_tag); /* and push the waiting requests to the next stage */ list_for_each_entry_safe(rq, n, running, flush.list) { unsigned int seq = blk_flush_cur_seq(rq); @@ -268,6 +268,7 @@ static bool blk_kick_flush(struct request_queue *q) q->flush_rq.end_io = flush_end_io; q->flush_pending_idx ^= 1; + atomic_inc(&q->flush_tag); list_add_tail(&q->flush_rq.queuelist, &q->queue_head); return true; } diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 78feda9..e079fbd 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -416,6 +416,7 @@ struct request_queue { unsigned intflush_queue_delayed:1; unsigned intflush_pending_idx:1; unsigned intflush_running_idx:1; + atomic_tflush_tag; unsigned long flush_pending_since; struct list_headflush_queue[2]; struct list_headflush_data_in_flight; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] freescale: Update logging style
From: Joe Perches Date: Sat, 13 Apr 2013 22:03:16 -0700 > Convert various printk logging styles to current styles. > > Uncompiled, untested. > > Joe Perches (3): > fec: Convert printks to netdev_ > gianfar: Use netdev_ when possible > ucc_geth: Convert ugeth_ to pr_ All applied. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] ptrace/x86: simplify ptrace_write_dr7()
On 04/14, Jan Kratochvil wrote: > > On Sun, 14 Apr 2013 21:12:05 +0200, Oleg Nesterov wrote: > > Jan, Frederic, et all. What do you think we should do? > > > > 1. Change ptrace_write_dr7() to do register_user_hw_breakpoint() > >if necessary. > > > >This is what I was going to do, but I am no longer sure > >we want this. For what? Unlikely it is very useful to use > >the "default" addr == 0 for debugging. > > I do not understand how these functions map to the PTRACE_* syscall. > > But this was a regression from the application point of view as some > application did/do: > * waitpid - get the process to: t (tracing stop) > * PTRACE_POKEUSER DR7, enableDR0 > * PTRACE_POKEUSER DR0, address > * PTRACE_CONT > > This was perfectly valid before, there is no "default" addr == 0 used for any > debugging. Just the applications did not care about PTRACE_POKEUSER ordering. > This is also how the bug was found. Yes, exactly. Except 'there is no "default" addr == 0', the first "PTRACE_POKEUSER DR7, enableDR0" used addr == 0 and then it was changed by "PTRACE_POKEUSER DR0". And once again, I am ready to make the patch, it should be simple. Just I am not sure it worth the trouble, so I decided to ask first. Nobody noticed this problem(?) except you, and this was broken a long ago. PTRACE_POKEUSER DR0, address PTRACE_POKEUSER DR7, enableDR0 should work and this looks better, we do not enable bp until it has the correct address set. Of course this doesn't really matter if the tracee doesn't not run in between, but still... Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V4 1/6] clk: OMAP: introduce device tree binding to kernel clock data
On 10:22-20130413, Tony Lindgren wrote: > * Nishanth Menon [130412 16:43]: > > Thanks for checking up. Fixed all of them below, will post part of > > series again, only if I need to address further comments in other > > patches.. > > Thanks it seems that the other ones are ready to go, just one > more comment below. > [...] > > +static struct clk *omap_clk_src_get(struct of_phandle_args *clkspec, void > > *data) > > +{ > > + struct clk *clk; > > + char clk_name[32]; > > + struct device_node *np = clkspec->np; > > + > > + snprintf(clk_name, 32, "%s_ck", np->name); > > + clk = clk_get(NULL, clk_name); > > + if (IS_ERR(clk)) { > > + pr_err("%s: could not get clock %s(%ld)\n", __func__, > > + clk_name, PTR_ERR(clk)); > > + goto out; > > + } > > + clk_put(clk); > > It seems that clk_put() is actually wrong here. That's because > of_clk_get() should boild down to just the look up of the clock > and then clk_get() on it, so no double clk_get() is done in this > case. Once the consumer driver is done, it will just call clk_put() > on it. Yep - updated version below. >From d0bf3fce235cff46feac7f5ef1a40e2fa0f2aa12 Mon Sep 17 00:00:00 2001 From: Nishanth Menon Date: Tue, 9 Apr 2013 19:26:40 -0500 Subject: [PATCH V5.1 1/6] clk: OMAP: introduce device tree binding to kernel clock data OMAP clock data is located in arch/arm/mach-omap2/cclockXYZ_data.c. However, this presents an obstacle for using these clock nodes in Device Tree definitions. This is especially true for board specific clocks initially. The fixed clocks are currently found via clock aliases table. There are many possible approaches to this problem as discussed in the following thread: http://marc.info/?t=13637032569&r=1&w=2. Highlights of the options: a) device specific clk_add_alias: cons: driver handling required b) using an generic clk node and indexing to reach the clock required. This is similar in approach taken by tegra and few other platforms. Example usage: clock = <&clk 5>; cons: potential to have mismatches in indexed table and associated dtb data. In addition, managing continued documentation in bindings as clock indexing increases. Even though readability angle could be improved by using preprocessing of DT using macros, indexed approach is inherently risky from cases like the following: clk indexes in kernel: 1 - mpu_dpll 2 - aux_clk1 3 - core_clk DT entry for peripheral X uses <&clk 2> to reach aux_clk1. Now, let's say kernel updates indices to: 1 - mpu_dpll 2 - per_dpll 3 - aux_clk1 4 - core_clk using the old dtb(or dts missing an update), on new kernel which has updated indices will result in per_dpll now controlled for peripheral X without warning or any potential error detection. Even though we could claim this is user error, such errors are hard to track down and fix. An alternate approach introduced here is to introduce device tree bindings corresponding to the clock nodes required in DT definition for SoC which automatically maps back to the definitions in cclockXYZ_data.c. The driver introduced here to do this mapping will eventually be the place where the clock handling will migrate to. We need to consider this angle as well so that the solution will be an valid transition point for moving the clock data out of kernel image (into device tree or firmware load etc..). Overall strategy introduced here is simple: a clock node described in device tree blob is used to identify the exact clock provided in the SoC specific data. This is then linked back using of_clk_add_provider to the device node to be accessible by of_clk_get. Based on discussion contributions from Roger Quadros, Grygorii Strashko and others. Cc: Kevin Hilman Cc: Mike Turquette Cc: Paul Walmsley [t...@atomide.com: co-developed] Signed-off-by: Tony Lindgren Signed-off-by: Nishanth Menon --- .../devicetree/bindings/clock/omap-clock.txt | 40 + drivers/clk/Makefile |1 + drivers/clk/omap/Makefile |1 + drivers/clk/omap/clk.c | 91 4 files changed, 133 insertions(+) create mode 100644 Documentation/devicetree/bindings/clock/omap-clock.txt create mode 100644 drivers/clk/omap/Makefile create mode 100644 drivers/clk/omap/clk.c diff --git a/Documentation/devicetree/bindings/clock/omap-clock.txt b/Documentation/devicetree/bindings/clock/omap-clock.txt new file mode 100644 index 000..047c1e7 --- /dev/null +++ b/Documentation/devicetree/bindings/clock/omap-clock.txt @@ -0,0 +1,40 @@ +Device Tree Clock bindings for Texas Instrument's OMAP compatible platforms + +This binding is an initial minimal binding that may be enhanced as part of +transitioning OMAP clock data out of kernel image. + +This binding uses the common clock binding[1]. + +[1] Documentation/devicetree/bindings/clock/clock-bindings.txt
[PATCH] drivers: dma: Use devm_request_and_ioremap
Use devm_request_and_ioremap function which provides more consistent error handling. Signed-off-by: Alexandru Gheorghiu --- drivers/dma/txx9dmac.c |6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/dma/txx9dmac.c b/drivers/dma/txx9dmac.c index 913f55c..471f9f1 100644 --- a/drivers/dma/txx9dmac.c +++ b/drivers/dma/txx9dmac.c @@ -1217,11 +1217,7 @@ static int __init txx9dmac_probe(struct platform_device *pdev) if (!ddev) return -ENOMEM; - if (!devm_request_mem_region(&pdev->dev, io->start, resource_size(io), -dev_name(&pdev->dev))) - return -EBUSY; - - ddev->regs = devm_ioremap(&pdev->dev, io->start, resource_size(io)); + ddev->regs = devm_request_and_ioremap(&pdev->dev, io); if (!ddev->regs) return -ENOMEM; ddev->have_64bit_regs = pdata->have_64bit_regs; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.6.11.1-rt32
Hi Steven, I'm pleased to announce the 3.6.11.1-rt32 stable release. Unfortunately, there is another compile error: drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_gem_wait_for_error’: drivers/gpu/drm/i915/i915_gem.c:118:3: warning: passing argument 1 of ‘rt_spin_lock’ from incompatible pointer type [enabled by default] In file included from include/linux/spinlock.h:273:0, from include/linux/wait.h:24, from include/linux/fs.h:396, from include/drm/drmP.h:47, from drivers/gpu/drm/i915/i915_gem.c:28: [..] I would propose to adopt the mechanism that Sebastian introduced in 3.8.4-rt2 (https://lkml.org/lkml/2013/3/26/600). The kernel compiles and runs without any problem with the below patch on a system that requires the i915 driver module. Thanks Carsten, I'll be updating this later today. Thank you. BTW, did you get any core dumps from the work queue race that we've been seeing? No, not yet. Originally, the farm systems did not use crashkernels by default. I understood that it does no harm but could help in cases like this one. Therefore, I've started to reconfigure all farm system with crashkernels - starting with the two systems that had the work queue race crashes. The kernel messages here (one is a 12-core, the other one a 32-core box) look exactly like the one (https://lkml.org/lkml/2013/3/18/325) you saw in your 40-core machine (https://lkml.org/lkml/2013/3/18/430). We'll need to wait for the next crash that will give us a core dump we may then dissect. -Carsten. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] lib: digsig: Use ERR_CAST function
Use ERR_CAST function instead of ERR_PTR and PTR_ERR. Patch found using coccinelle. Signed-off-by: Alexandru Gheorghiu --- lib/digsig.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/digsig.c b/lib/digsig.c index 2f31e6a..8793aed 100644 --- a/lib/digsig.c +++ b/lib/digsig.c @@ -209,7 +209,7 @@ int digsig_verify(struct key *keyring, const char *sig, int siglen, kref = keyring_search(make_key_ref(keyring, 1UL), &key_type_user, name); if (IS_ERR(kref)) - key = ERR_PTR(PTR_ERR(kref)); + key = ERR_CAST(kref); else key = key_ref_to_ptr(kref); } else { -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Cannot add new efi boot entry
Hi, after update to 3.8, every update of the kernel ends up in an unbootable machine. It is due to the following commit: commit 68d929862e29a8b52a7f2f2f86a0600423b093cd Author: Matthew Garrett Date: Sat Mar 2 19:40:17 2013 -0500 efi: be more paranoid about available space when creating variables efibootmgr which tries to add an entry and silently fails when writing to /sys/firmware/efi/vars/new_var with -ENOSPC. There are many entries in there: # efibootmgr BootCurrent: 000D Timeout: 0 seconds BootOrder: 0018,,0001,0002,0003,0007,0008,0009,000A,000B,000C,000D,000E,000F,0010,0011,0012 Boot Setup Boot0001 Boot Menu Boot0002 Diagnostic Splash Screen Boot0003 Lenovo Diagnostics Boot0004 Startup Interrupt Menu Boot0005 ME Configuration Menu Boot0006 Rescue and Recovery Boot0007* USB CD Boot0008* USB FDD Boot0009* ATAPI CD0 Boot000A* ATA HDD0 Boot000B* ATA HDD1 Boot000C* ATA HDD2 Boot000D* USB HDD Boot000E* PCI LAN Boot000F* ATAPI CD1 Boot0010 Other CD Boot0011* ATA HDD3 Boot0012 Other HDD Boot0013* IDER BOOT CDROM Boot0014* IDER BOOT Floppy Boot0015* ATA HDD Boot0016* ATAPI CD: Boot0017* PCI LAN Boot0018* Linux Remaining size is about 20k, added entry size is hundreds bytes, store size is 64k. Obviously lowering the limitation from 1/2 to 1/4 fixes the problem for me because it always worked on my setup to store a new entry... Any ideas how to overcome that? It would be better to blacklist bad machines rather than whitelist good ones, right? thanks, -- js suse labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] drivers: usb: gadget: Use ERR_CAST function
Use ERR_CAST function instead of ERR_PTR and PTR_ERR. Patch found using coccinelle. Signed-off-by: Alexandru Gheorghiu --- drivers/usb/gadget/composite.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/gadget/composite.c b/drivers/usb/gadget/composite.c index 7c821de..5b73a74 100644 --- a/drivers/usb/gadget/composite.c +++ b/drivers/usb/gadget/composite.c @@ -1138,7 +1138,7 @@ struct usb_string *usb_gstrings_attach(struct usb_composite_dev *cdev, uc = copy_gadget_strings(sp, n_gstrings, n_strings); if (IS_ERR(uc)) - return ERR_PTR(PTR_ERR(uc)); + return ERR_CAST(uc); n_gs = get_containers_gs(uc); ret = usb_string_ids_tab(cdev, n_gs[0]->strings); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] fs: reiserfs: Use kstrdup function
Use kstrdup function instead of kmalloc and strcpy. Patch found using coccinelle. Signed-off-by: Alexandru Gheorghiu --- fs/reiserfs/super.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c index 194113b..f8a23c3 100644 --- a/fs/reiserfs/super.c +++ b/fs/reiserfs/super.c @@ -1147,8 +1147,7 @@ static int reiserfs_parse_options(struct super_block *s, char *options, /* strin "on filesystem root."); return 0; } - qf_names[qtype] = - kmalloc(strlen(arg) + 1, GFP_KERNEL); + qf_names[qtype] = kstrdup(arg, GFP_KERNEL); if (!qf_names[qtype]) { reiserfs_warning(s, "reiserfs-2502", "not enough memory " @@ -1156,7 +1155,6 @@ static int reiserfs_parse_options(struct super_block *s, char *options, /* strin "quotafile name."); return 0; } - strcpy(qf_names[qtype], arg); if (qtype == USRQUOTA) *mount_options |= 1 << REISERFS_USRQUOTA; else -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] staging: ramster: add how-to for ramster
From: Dan Magenheimer Add how-to for ramster. Singed-off-by: Dan Magenheimer Signed-off-by: Wanpeng Li --- drivers/staging/zcache/ramster/HOWTO.txt | 249 ++ 1 file changed, 249 insertions(+) create mode 100644 drivers/staging/zcache/ramster/HOWTO.txt diff --git a/drivers/staging/zcache/ramster/HOWTO.txt b/drivers/staging/zcache/ramster/HOWTO.txt new file mode 100644 index 000..e6387e8 --- /dev/null +++ b/drivers/staging/zcache/ramster/HOWTO.txt @@ -0,0 +1,249 @@ +Version: 130309 + Dan Magenheimer + +This is a how-to document for RAMster. It applies to the March 9, 2013 +version of RAMster, re-merged with the new zcache codebase, built and tested +on the 3.9 tree and submitted for the staging tree for 3.9. + +Note that this document was created from notes taken earlier. I would +appreciate any feedback from anyone who follows the process as described +to confirm that it works and to clarify any possible misunderstandings, +or to report problems. + +A. PRELIMINARY + +1) Install two or more Linux systems that are known to work when upgraded + to a recent upstream Linux kernel version (e.g. v3.9). I used Oracle + Linux 6 ("OL6") on two Dell Optiplex 790s. Note that it should be possible + to use ocfs2 as a filesystem on your systems but this hasn't been + tested thoroughly, so if you do use ocfs2 and run into problems, please + report them. Up to eight nodes should work, but not much testing has + been done with more than three nodes. + +On each system: + +2) Configure, build and install then boot Linux (e.g. 3.9), just to ensure it + can be done with an unmodified upstream kernel. Confirm you booted + the upstream kernel with "uname -a". + +3) Install ramster-tools. The src.rpm and an OL6 rpm are available + in this directory. I'm not very good at userspace stuff and + would welcome any help in turning ramster-tools into more + distributable rpms/debs for a wider range of distros. + +B. BUILDING RAMSTER INTO THE KERNEL + +Do the following on each system: + +1) Ensure you have the new codebase for drivers/staging/zcache in your source. + +2) Change your .config to have: + + CONFIG_CLEANCACHE=y + CONFIG_FRONTSWAP=y + CONFIG_STAGING=y + CONFIG_ZCACHE=y + CONFIG_RAMSTER=y + + You may have to reconfigure your kernel multiple times to ensure + all of these are set properly. I use: + + # yes "" | make oldconfig + + and then manually check the .config file to ensure my selections + have "taken". + + Do not bother to build the kernel until you are certain all of + the above config selections will stick for the build. + +3) Build this kernel and "make install" so that you have a new kernel + in /etc/grub.conf + +4) Add "ramster" to the kernel boot line in /etc/grub.conf. + +5) Reboot and check dmesg to ensure there are some messages from ramster + and that "ramster_enabled=1" appears. + + # dmesg | grep ramster + + You should also see a lot of files in: + + # ls /sys/kernel/debug/zcache + # ls /sys/kernel/debug/ramster + + and a few files in: + + # ls /sys/kernel/mm/ramster + + RAMster now will act as a single-system zcache but doesn't yet + know anything about the cluster so can't do anything remotely. + +C. BUILDING THE RAMSTER CLUSTER + +This is the error prone part unless you are a clustering expert. We need +to describe the cluster in /etc/ramster.conf file and the init scripts +that parse it are extremely picky about the syntax. + +1) Create the /etc/ramster.conf file and ensure it is identical + on both systems. There is a good amount of similar documentation + for ocfs2 /etc/cluster.conf that can be googled for this, but I use: + + cluster: + name = ramster + node_count = 2 + node: + name = system1 + cluster = ramster + number = 0 + ip_address = my.ip.ad.r1 + ip_port = + node: + name = system2 + cluster = ramster + number = 0 + ip_address = my.ip.ad.r2 + ip_port = + + You must ensure that the "name" field in the file exactly matches + the output of "hostname" on each system. The following assumes + you use "ramster" as the name of your cluster. + +2) Enable the ramster service and configure it: + + # chkconfig --add ramster + # service ramster configure + + Set "load on boot" to "y", cluster to start is "ramster" (or whatever + name you chose in ramster.conf), heartbeat dead threshold as "500", + network idle timeout as "100". Leave the others as default. + +4) Reboot. After reboot, try: + + # service ramster status + + You should see "Checking ramster cluster ramster: Online". If you do + not, something is wrong and RAMster will not work. Note that you + should also see that the driver for "configfs" is loaded a
Re: [PATCH] staging: ramster: add how-to for ramster
On Mon, Apr 15, 2013 at 07:56:56AM +0800, Wanpeng Li wrote: > +This is a how-to document for RAMster. It applies to the March 9, 2013 > +version of RAMster, re-merged with the new zcache codebase, built and tested > +on the 3.9 tree and submitted for the staging tree for 3.9. This is not needed at all, given that it should just reflect the state of the code in the kernel that this file is present in. Please remove it. > +Note that this document was created from notes taken earlier. I would > +appreciate any feedback from anyone who follows the process as described > +to confirm that it works and to clarify any possible misunderstandings, > +or to report problems. Is this needed? > +A. PRELIMINARY > + > +1) Install two or more Linux systems that are known to work when upgraded > + to a recent upstream Linux kernel version (e.g. v3.9). I used Oracle > + Linux 6 ("OL6") on two Dell Optiplex 790s. Note that it should be > possible > + to use ocfs2 as a filesystem on your systems but this hasn't been > + tested thoroughly, so if you do use ocfs2 and run into problems, please > + report them. Up to eight nodes should work, but not much testing has > + been done with more than three nodes. > + > +On each system: > + > +2) Configure, build and install then boot Linux (e.g. 3.9), just to ensure it > + can be done with an unmodified upstream kernel. Confirm you booted > + the upstream kernel with "uname -a". > + > +3) Install ramster-tools. The src.rpm and an OL6 rpm are available > + in this directory. I'm not very good at userspace stuff and > + would welcome any help in turning ramster-tools into more > + distributable rpms/debs for a wider range of distros. This isn't true, the rpms are not here. > +B. BUILDING RAMSTER INTO THE KERNEL > + > +Do the following on each system: > + > +1) Ensure you have the new codebase for drivers/staging/zcache in your > source. > + > +2) Change your .config to have: > + > + CONFIG_CLEANCACHE=y > + CONFIG_FRONTSWAP=y > + CONFIG_STAGING=y > + CONFIG_ZCACHE=y > + CONFIG_RAMSTER=y > + > + You may have to reconfigure your kernel multiple times to ensure > + all of these are set properly. I use: > + > + # yes "" | make oldconfig > + > + and then manually check the .config file to ensure my selections > + have "taken". This last bit isn't needed at all. Just stick to the "these are the settings you need enabled." > + Do not bother to build the kernel until you are certain all of > + the above config selections will stick for the build. > + > +3) Build this kernel and "make install" so that you have a new kernel > + in /etc/grub.conf Don't assume 'make install' works for all distros, nor that /etc/grub.conf is a grub config file (hint, it usually isn't, and what about all the people not even using grub for their bootloader?) > +4) Add "ramster" to the kernel boot line in /etc/grub.conf. Again, drop grub.conf reference > +5) Reboot and check dmesg to ensure there are some messages from ramster > + and that "ramster_enabled=1" appears. > + > + # dmesg | grep ramster Are you sure ramster still spits out messages? If so, provide an example of what it should look like. > + You should also see a lot of files in: > + > + # ls /sys/kernel/debug/zcache > + # ls /sys/kernel/debug/ramster You forgot to mention that debugfs needs to be mounted. > + and a few files in: > + > + # ls /sys/kernel/mm/ramster > + > + RAMster now will act as a single-system zcache but doesn't yet > + know anything about the cluster so can't do anything remotely. > + > +C. BUILDING THE RAMSTER CLUSTER > + > +This is the error prone part unless you are a clustering expert. We need > +to describe the cluster in /etc/ramster.conf file and the init scripts > +that parse it are extremely picky about the syntax. > + > +1) Create the /etc/ramster.conf file and ensure it is identical > + on both systems. There is a good amount of similar documentation > + for ocfs2 /etc/cluster.conf that can be googled for this, but I use: > + > + cluster: > + name = ramster > + node_count = 2 > + node: > + name = system1 > + cluster = ramster > + number = 0 > + ip_address = my.ip.ad.r1 > + ip_port = > + node: > + name = system2 > + cluster = ramster > + number = 0 > + ip_address = my.ip.ad.r2 > + ip_port = > + > + You must ensure that the "name" field in the file exactly matches > + the output of "hostname" on each system. The following assumes > + you use "ramster" as the name of your cluster. > + > +2) Enable the ramster service and configure it: > + > + # chkconfig --add ramster > + # service ramster configure That's a huge assumption as to how your system config/startup scripts work, right? Not all the world is using old-style system V init any
RE: [PATCH 3.8-stable] gpio: fix wrong checking condition for gpio range
Dear Haojian Ahuang. > > This patch looks like it should be in the 3.8-stable tree, should we > apply > > it? > > > > It could be merged into 3.8-stable tree. > Thanks~ Best Regards. > -Original Message- > From: Haojian Zhuang [mailto:haojian.zhu...@linaro.org] > Sent: Sunday, April 14, 2013 12:27 AM > To: Jonghwan Choi > Cc: Linus Walleij; sta...@vger.kernel.org; linux-kernel@vger.kernel.org; > Jonghwan Choi > Subject: Re: [PATCH 3.8-stable] gpio: fix wrong checking condition for > gpio range > > On 13 April 2013 22:46, Jonghwan Choi wrote: > > From: Haojian Zhuang > > > > This patch looks like it should be in the 3.8-stable tree, should we > apply > > it? > > > > It could be merged into 3.8-stable tree. > > Regards > Haojian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 3/4] x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low
(2013/04/12 15:54), Yinghai Lu wrote: > Index: linux-2.6/kernel/kexec.c > === > --- linux-2.6.orig/kernel/kexec.c > +++ linux-2.6/kernel/kexec.c > @@ -1368,35 +1368,114 @@ static int __init parse_crashkernel_simp > return 0; > } > > +#define SUFFIX_HIGH 0 > +#define SUFFIX_LOW 1 > +#define SUFFIX_NULL 2 > +static __initdata char *suffix_tbl[] = { > + [SUFFIX_HIGH] = ",high", > + [SUFFIX_LOW] = ",low", > + [SUFFIX_NULL] = NULL, > +}; > + > /* > - * That function is the entry point for command line parsing and should be > - * called from the arch-specific code. > + * That function parses "suffix" crashkernel command lines like > + * > + * crashkernel=size,[high|low] > + * > + * It returns 0 on success and -EINVAL on failure. >*/ > +static int __init parse_crashkernel_suffix(char *cmdline, > +unsigned long long *crash_size, > +unsigned long long *crash_base, > +const char *suffix) > +{ > + char *cur = cmdline; > + > + *crash_size = memparse(cmdline, &cur); > + if (cmdline == cur) { > + pr_warn("crashkernel: memory value expected\n"); > + return -EINVAL; > + } > + > + /* check with suffix */ > + if (strncmp(cur, suffix, strlen(suffix))) { > + pr_warn("crashkernel: unrecognized char\n"); > + return -EINVAL; > + } > + cur += strlen(suffix); > + if (*cur != ' ' && *cur != '\0') { > + pr_warn("crashkernel: unrecognized char\n"); > + return -EINVAL; > + } > + > + return 0; > +} Thanks, looks good to me. -- Thanks. HATAYAMA, Daisuke -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] watchdog: introduce new watchdog AUTOSTART option
Hi Guenter > I really don't like that idea. It defeats a significant part of the > purpose > for having a watchdog, which is to prevent user-space hangups. > > To make this a driver option is even more odd - it forces every user of > this > driver to use it in-kernel only, and makes /dev/watchdog quite useless. > > I mean, really, if you have such a watchdog, what is the point of using > the > watchdog infrastructure in the first place ? Just make it a kernel > thread or > timer-activated platform code which pings your watchdog once in a while. > No > need to get the watchdog infrastructure involved in the first place. > > Am I missing something ? I wanted to enable the watchdog timer without the watchdog application for making sure the system alive. However, I think I misunderstood the purpose of the watchdog driver. The watchdog is for detecting user-space hangups rather than kernel stall. Is it correct? If yes, this patch is totally wrong. Thanks! Milo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Sleeping process from kernel space: not doing schedule?
Hi all, First of all, I am not subscribed to the mailing list and I'd like to get the answers directly to my email. Thank you! I am working on a project where I am trying to detect the Out Of Memory machine state and collect some data from the machine. I've created a LKM that hacks the do_brk call and checks if there is enough memory to perform the call. I successfully detect the OOM state and my next step is sending a signal to a process in user space that writes info to a log file (this way, I avoid the necessity of opening files in kernel space). After sending the signal, the LKM puts the current process to sleep, using the function schedule_timeout_uninterruptible(). What I expect, is to see the process in user space running some time while the process is sleeping, but, instead, I see that the user process do not run until the sleeping process has finished... I'm suspecting that it is not doing scheduling or something similar. I'm missing something? Is this the correct way of putting a process to sleep from kernel space? Thank you!!! Jose -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix
Quoting Tejun Heo (t...@kernel.org): > There's no reason to be using bitops, which tends to be more > cumbersome, to handle root flags. Convert them to masks. Also, as > they'll be moved to include/linux/cgroup.h and it's generally a good > idea, add CGRP_ prefix. > > Note that flags are assigned from (1 << 1). The first bit will be > used by a flag which will be added soon. > > Signed-off-by: Tejun Heo This *is* much nicer to read, thanks. Acked-by: Serge E. Hallyn > --- > kernel/cgroup.c | 21 ++--- > 1 file changed, 10 insertions(+), 11 deletions(-) > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 678a22c..a372eaa 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -296,10 +296,10 @@ bool cgroup_is_descendant(struct cgroup *cgrp, struct > cgroup *ancestor) > } > EXPORT_SYMBOL_GPL(cgroup_is_descendant); > > -/* bits in struct cgroupfs_root flags field */ > +/* cgroupfs_root->flags */ > enum { > - ROOT_NOPREFIX, /* mounted subsystems have no named prefix */ > - ROOT_XATTR, /* supports extended attributes */ > + CGRP_ROOT_NOPREFIX = (1 << 1), /* mounted subsystems have no named > prefix */ > + CGRP_ROOT_XATTR = (1 << 2), /* supports extended attributes */ > }; > > static int cgroup_is_releasable(const struct cgroup *cgrp) > @@ -1137,9 +1137,9 @@ static int cgroup_show_options(struct seq_file *seq, > struct dentry *dentry) > mutex_lock(&cgroup_root_mutex); > for_each_subsys(root, ss) > seq_printf(seq, ",%s", ss->name); > - if (test_bit(ROOT_NOPREFIX, &root->flags)) > + if (root->flags & CGRP_ROOT_NOPREFIX) > seq_puts(seq, ",noprefix"); > - if (test_bit(ROOT_XATTR, &root->flags)) > + if (root->flags & CGRP_ROOT_XATTR) > seq_puts(seq, ",xattr"); > if (strlen(root->release_agent_path)) > seq_printf(seq, ",release_agent=%s", root->release_agent_path); > @@ -1202,7 +1202,7 @@ static int parse_cgroupfs_options(char *data, struct > cgroup_sb_opts *opts) > continue; > } > if (!strcmp(token, "noprefix")) { > - set_bit(ROOT_NOPREFIX, &opts->flags); > + opts->flags |= CGRP_ROOT_NOPREFIX; > continue; > } > if (!strcmp(token, "clone_children")) { > @@ -1210,7 +1210,7 @@ static int parse_cgroupfs_options(char *data, struct > cgroup_sb_opts *opts) > continue; > } > if (!strcmp(token, "xattr")) { > - set_bit(ROOT_XATTR, &opts->flags); > + opts->flags |= CGRP_ROOT_XATTR; > continue; > } > if (!strncmp(token, "release_agent=", 14)) { > @@ -1293,8 +1293,7 @@ static int parse_cgroupfs_options(char *data, struct > cgroup_sb_opts *opts) >* with the old cpuset, so we allow noprefix only if mounting just >* the cpuset subsystem. >*/ > - if (test_bit(ROOT_NOPREFIX, &opts->flags) && > - (opts->subsys_mask & mask)) > + if ((opts->flags & CGRP_ROOT_NOPREFIX) && (opts->subsys_mask & mask)) > return -EINVAL; > > > @@ -2523,7 +2522,7 @@ static struct simple_xattrs *__d_xattrs(struct dentry > *dentry) > static inline int xattr_enabled(struct dentry *dentry) > { > struct cgroupfs_root *root = dentry->d_sb->s_fs_info; > - return test_bit(ROOT_XATTR, &root->flags); > + return root->flags & CGRP_ROOT_XATTR; > } > > static bool is_valid_xattr(const char *name) > @@ -2695,7 +2694,7 @@ static int cgroup_add_file(struct cgroup *cgrp, struct > cgroup_subsys *subsys, > > simple_xattrs_init(&cft->xattrs); > > - if (subsys && !test_bit(ROOT_NOPREFIX, &cgrp->root->flags)) { > + if (subsys && !(cgrp->root->flags & CGRP_ROOT_NOPREFIX)) { > strcpy(name, subsys->name); > strcat(name, "."); > } > -- > 1.8.1.4 > > ___ > Containers mailing list > contain...@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/containers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] hfs/hfsplus: Convert dprint to hfs_dbg
--- On Mon, 8/4/13, Joe Perches wrote: > Use a more current logging style. > > Rename macro and uses. > Add do {} while (0) to macro. > Add DBG_ to macro. > Add and use hfs_dbg_cont variant where appropriate. > > Signed-off-by: Joe Perches > +++ b/fs/hfs/hfs_fs.h > @@ -34,8 +34,18 @@ > //#define DBG_MASK > (DBG_CAT_MOD|DBG_BNODE_REFS|DBG_INODE|DBG_EXTENT) > #define DBG_MASK (0) > > -#define dprint(flg, fmt, args...) \ > - if (flg & DBG_MASK) printk(fmt , ## > args) > +#define hfs_dbg(flg, fmt, ...) > \ > +do { > > \ > + if (DBG_##flg & > DBG_MASK) > \ > + printk(KERN_DEBUG > fmt, ##__VA_ARGS__); \ > +} while (0) > + > +#define hfs_dbg_cont(flg, fmt, ...) > \ > +do { > > \ > + if (DBG_##flg & > DBG_MASK) > \ > + printk(KERN_CONT fmt, > ##__VA_ARGS__); \ > +} while (0) > + > > /* > * struct hfs_inode_info This set of change seems to be somewhat zealous - it doesn't offer any benefits other than possibly satisfying somebody's idea of code-purity. FWIW, I have been sitting on a patch which changes this part of the code to dynamic debugging, and it is much simplier. Just: = diff --git a/fs/hfsplus/hfsplus_fs.h b/fs/hfsplus/hfsplus_fs.h index e298b83..55d211d 100644 --- a/fs/hfsplus/hfsplus_fs.h +++ b/fs/hfsplus/hfsplus_fs.h @@ -45,8 +25,7 @@ #define HFSPLUS_JOURNAL_SWAP 1 #define dprint(flg, fmt, args...) \ - if (flg & DBG_MASK) \ - printk(fmt , ## args) + pr_debug(fmt , ## args) /* Runtime config options */ #define HFSPLUS_DEF_CR_TYPE0x3F3F3F3F /* '' */ = (and you can then remove all the DBG_* defines before that, since they then don't have any effect any more). The benefit of this alternative is that it does not break any out-of-tree patches, while make it easier to debug say patches... and I am still sitting on a rather substantial set of the journal change, plus all the other issues that come out of it, like the folder count patch for case-sensitive file systems. I think one needs to think very carefully about make bulk changes like this, which serves no real purpose other than satisfying somebody's idea of code purity. The problem with such bulk "stylistic" changes, is that it forces people who are working on real functionalities and bug fixes to rebase their work, and spend time on doing so, and also at the risk introducing new bugs while rebasing. I know I am writing on a somewhat selfish purpose: if I need to rebase my work due to other's bug fixes or enhancement, etc, then fair enough, but I'd prefer not to rebase for the purpose of other's preference of, and attempts at re-arranging the style of the debug statements, when the debugging output means little to them. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/4] move cgroupfs_root to include/linux/cgroup.h
Quoting Tejun Heo (t...@kernel.org): > While controllers shouldn't be accessing cgroupfs_root directly, it > being hidden inside kern/cgroup.c makes somethings pretty silly. This > makes routing hierarchy-wide settings which need to be visible to > controllers cumbersome. > > We're gonna add another hierarchy-wide setting which needs to be > accessed from controllers. Move cgroupfs_root and its flags to the > header file so that we can access root settings with inline helpers. > > Signed-off-by: Tejun Heo Acked-by: Serge E. Hallyn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] cgroup: introduce sane_behavior mount option
Quoting Tejun Heo (t...@kernel.org): > It's a sad fact that at this point various cgroup controllers are > carrying so many idiosyncrasies and pure insanities that it simply > isn't possible to reach any sort of sane consistent behavior while > maintaining staying fully compatible with what already has been > exposed to userland. > > As we can't break exposed userland interface, transitioning to sane > behaviors can only be done in steps while maintaining backwards > compatibility. This patch introduces a new mount option - > __DEVEL__sane_behavior - which disables crazy features and enforces > consistent behaviors in cgroup core proper and various controllers. > As exactly which behaviors it changes are still being determined, the > mount option, at this point, is useful only for development of the new > behaviors. As such, the mount option is prefixed with __DEVEL__ and > generates a warning message when used. > > Eventually, once we get to the point where all controller's behaviors > are consistent enough to implement unified hierarchy, the __DEVEL__ > prefix will be dropped, and more importantly, unified-hierarchy will > enforce sane_behavior by default. Maybe we'll able to completely drop > the crazy stuff after a while, maybe not, but we at least have a > strategy to move on to saner behaviors. > > This patch introduces the mount option and changes the following > behaviors in cgroup core. > > * Mount options "noprefix" and "clone_children" are disallowed. Also, > cgroupfs file cgroup.clone_children is not created. > > * When mounting an existing superblock, mount options should match. > This is currently pretty crazy. If one mounts a cgroup, creates a > subdirectory, unmounts it and then mount it again with different > option, it looks like the new options are applied but they aren't. > > * Remount is disallowed. > > The behaviors changes are documented in the comment above > CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different > controllers are converted and planned improvements progress. > > Signed-off-by: Tejun Heo Acked-by: Serge E. Hallyn > Cc: Li Zefan > Cc: Michal Hocko > Cc: Vivek Goyal > --- > include/linux/cgroup.h | 43 +++ > kernel/cgroup.c| 49 + > 2 files changed, 92 insertions(+) > > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h > index b21881e..9c300ad 100644 > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -156,6 +156,8 @@ enum { >* specified at mount time and thus is implemented here. >*/ > CGRP_CPUSET_CLONE_CHILDREN, > + /* see the comment above CGRP_ROOT_SANE_BEHAVIOR for details */ > + CGRP_SANE_BEHAVIOR, > }; > > struct cgroup_name { > @@ -243,6 +245,37 @@ struct cgroup { > > /* cgroupfs_root->flags */ > enum { > + /* > + * Unfortunately, cgroup core and various controllers are riddled > + * with idiosyncrasies and pointless options. The following flag, > + * when set, will force sane behavior - some options are forced on, > + * others are disallowed, and some controllers will change their > + * hierarchical or other behaviors. > + * > + * The set of behaviors affected by this flag are still being > + * determined and developed and the mount option for this flag is > + * prefixed with __DEVEL__. The prefix will be dropped once we > + * reach the point where all behaviors are compatible with the > + * planned unified hierarchy, which will automatically turn on this > + * flag. > + * > + * The followings are the behaviors currently affected this flag. > + * > + * - Mount options "noprefix" and "clone_children" are disallowed. > + * Also, cgroupfs file cgroup.clone_children is not created. > + * > + * - When mounting an existing superblock, mount options should > + * match. > + * > + * - Remount is disallowed. > + * > + * The followings are planned changes. > + * > + * - release_agent will be disallowed once replacement notification > + * mechanism is implemented. > + */ > + CGRP_ROOT_SANE_BEHAVIOR = (1 << 0), > + > CGRP_ROOT_NOPREFIX = (1 << 1), /* mounted subsystems have no named > prefix */ > CGRP_ROOT_XATTR = (1 << 2), /* supports extended attributes */ > }; > @@ -360,6 +393,7 @@ struct cgroup_map_cb { > /* cftype->flags */ > #define CFTYPE_ONLY_ON_ROOT (1U << 0) /* only create on root cg */ > #define CFTYPE_NOT_ON_ROOT (1U << 1) /* don't create on root cg */ > +#define CFTYPE_INSANE(1U << 2) /* don't create if > sane_behavior */ > > #define MAX_CFTYPE_NAME 64 > > @@ -486,6 +520,15 @@ struct cgroup_scanner { > void *data; > }; > > +/* > + * See the comment above CGRP_ROOT_SANE_BEHAVIOR for details. This > + * function can be c
Re: [PATCH 4/4] memcg: force use_hierarchy if sane_behavior
Quoting Tejun Heo (t...@kernel.org): > Turn on use_hierarchy by default if sane_behavior is specified and > don't create .use_hierarchy file. > > It is debatable whether to remove .use_hierarchy file or make it ro as > the former could make transition easier in certain cases; however, the > behavior changes which will be gated by sane_behavior are intensive > including changing basic meaning of certain control knobs in a few > controllers and I don't really think keeping this piece would make > things easier in any noticeable way, so let's remove it. > > Signed-off-by: Tejun Heo Acked-by: Serge E. Hallyn > Cc: Michal Hocko > Cc: KAMEZAWA Hiroyuki > --- > include/linux/cgroup.h | 3 +++ > mm/memcontrol.c| 13 + > 2 files changed, 16 insertions(+) > > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h > index 9c300ad..c562e33 100644 > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -269,6 +269,9 @@ enum { >* >* - Remount is disallowed. >* > + * - memcg: use_hierarchy is on by default and the cgroup file for > + * the flag is not created. > + * >* The followings are planned changes. >* >* - release_agent will be disallowed once replacement notification > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 9715c0c..a651131 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5814,6 +5814,7 @@ static struct cftype mem_cgroup_files[] = { > }, > { > .name = "use_hierarchy", > + .flags = CFTYPE_INSANE, > .write_u64 = mem_cgroup_hierarchy_write, > .read_u64 = mem_cgroup_hierarchy_read, > }, > @@ -6784,6 +6785,17 @@ static void mem_cgroup_move_task(struct cgroup *cont, > } > #endif > > +/* > + * Cgroup retains root cgroups across [un]mount cycles making it necessary > + * to verify sane_behavior flag on each mount attempt. > + */ > +static void mem_cgroup_bind(struct cgroup *root) > +{ > + /* use_hierarchy is forced with sane_behavior */ > + if (cgroup_sane_behavior(root)) > + mem_cgroup_from_cont(root)->use_hierarchy = true; > +} > + > struct cgroup_subsys mem_cgroup_subsys = { > .name = "memory", > .subsys_id = mem_cgroup_subsys_id, > @@ -6794,6 +6806,7 @@ struct cgroup_subsys mem_cgroup_subsys = { > .can_attach = mem_cgroup_can_attach, > .cancel_attach = mem_cgroup_cancel_attach, > .attach = mem_cgroup_move_task, > + .bind = mem_cgroup_bind, > .base_cftypes = mem_cgroup_files, > .early_init = 0, > .use_id = 1, > -- > 1.8.1.4 > > ___ > Containers mailing list > contain...@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/containers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] memcg: force use_hierarchy if sane_behavior
Quoting Tejun Heo (t...@kernel.org): > Turn on use_hierarchy by default if sane_behavior is specified and > don't create .use_hierarchy file. > > It is debatable whether to remove .use_hierarchy file or make it ro as > the former could make transition easier in certain cases; however, the > behavior changes which will be gated by sane_behavior are intensive > including changing basic meaning of certain control knobs in a few > controllers and I don't really think keeping this piece would make > things easier in any noticeable way, so let's remove it. Hi Tejun, this actually reminds me of something that's been on my todo list to report for some time, but I haven't had time to find the source of the bug... And maybe it's already been reported... but If I do cd /sys/fs/cgroup/memory mkdir b cd b echo 1 > memory.use_hierarchy echo 5000 > memory.limit_in_bytes cat memory.limit_in_bytes 8192 mkdir c cd c cat memory.use_hierarchy 1 cat memory.limit_in_bytes 9223372036854775807 echo $$ > tasks bash So it seems the hierarchy is being enforced, but not reported in child limit_in_bytes files. (Last tested tonight on 3.8.0-17-generic #27-Ubuntu fwiw) -serge -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux 3.9-rc7
Another week, another -rc. This is mostly random one-liners, with a few slightly larger driver fixes. The most interesting (to me, probably to nobody else) fix is a fix for a rather subtle TLB invalidate bug that only hits 32-bit PAE due to the weird way that works. Even then it only hits you if you have some particularly insane mapping patterns, but we *suspect* that this one might be the cause behind google chrome having triggered bugs like chrome: Corrupted page table at address 34a03000 *pdpt = *pde = Bad pagetable: 000f [#1] PREEMPT SMP however, the problem is so rare that we haven't been able to verify that this really fixes it. That said, this bug is much more common (and by "much more common" I mean "still basically impossible to hit unless you were really unlucky") on newer machines that have bigger TLB's and that could happily have run in 64-bit mode without the disgusting abortion that is x86 PAE, and a small part of me feels that anybody who hit this problem on such a machine probably got whatever they deserved. But if you've seen messages like this, and you still run PAE, give the new -rc a try. The rest of the fixes are probably more relevant to most people, but hey, the PAE oen tickles my fancy. Anyway, go out and test regardless of the PAE issue, Linus --- Al Viro (3): ecryptfs: close rmmod race procfs: add proc_remove_subtree() palinfo fixes Alban Bedel (1): ASoC: wm8903: Fix the bypass to HP/LINEOUT when no DAC or ADC is running Alex Williamson (1): vfio-pci: Fix possible integer overflow Alexandre Belloni (1): gpio: pca953x: fix irq_domain_add_simple usage Alexey Khoroshilov (1): tty: mxser: fix cycle termination condition in mxser_probe() and mxser_module_init() Alexey Pelykh (1): OMAP/serial: Revert bad fix of Rx FIFO threshold granularity Andrea Arcangeli (2): x86/mm/cpa: Convert noop to functional fix x86/mm/cpa/selftest: Fix false positive in CPA self test Andrey Vagin (1): mnt: release locks on error path in do_loopback Arnd Bergmann (1): block: avoid using uninitialized value in from queue_var_store Artem Savkov (1): cfg80211: sched_scan_mtx lock in cfg80211_conn_work() Arun Easi (1): [SCSI] qla2xxx: Fix crash during firmware dump procedure. Asai Thambi S P (3): mtip32xx: recovery from command timeout mtip32xx: return 0 from pci probe in case of rebuild mtip32xx: Add debugfs entry device_status Asias He (7): tcm_vhost: Use ACCESS_ONCE for vs->vs_tpg[target] access tcm_vhost: Use vq->private_data to indicate if the endpoint is setup tcm_vhost: Initialize vq->last_used_idx when set endpoint tcm_vhost: Remove double check of response tcm_vhost: Fix tv_cmd leak in vhost_scsi_handle_vq tcm_vhost: Add vhost_scsi_send_bad_target() helper tcm_vhost: Send bad target to guest when cmd fails Bing Zhao (1): mwifiex: complete last internal scan Boris Ostrovsky (2): x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set Brian King (1): [SCSI] ibmvscsi: Fix slave_configure deadlock Calvin Owens (1): drm/nouveau: fix unconditional return waiting on memory Charles Keepax (1): ASoC: compress: Cancel delayed power down if needed Chen Gang (3): perf: Fix strncpy() use, always make sure it's NUL terminated perf: Fix strncpy() use, use strlcpy() instead of strncpy() ftrace: Fix strncpy() use, use strlcpy() instead of strncpy() Chris Metcalf (1): tile: comment assumption about __insn_mtspr for Christian Ruppert (1): ARC: Add implicit compiler barrier to raw_local_irq* functions Christoph Paasch (1): ipv6/tcp: Stop processing ICMPv6 redirect messages Christopher Harvey (1): drm/mgag200: Index 24 in extended CRTC registers is 24 in hex, not decimal. Chuck Lever (1): SUNRPC: Remove extra xprt_put() Daniel Vetter (1): drm/fb-helper: Fix locking in drm_fb_helper_hotplug_event Dave Airlie (1): udl: handle EDID failure properly. Dave Hansen (1): x86-32: Fix possible incomplete TLB invalidate with PAE pagetables David Woodhouse (1): libata: fix DMA to stack in reading devslp_timing parameters Dirk Behme (1): ARM i.MX6: Fix ldb_di clock selection Dirk Brandewie (1): cpufreq / intel_pstate: Set timer timeout correctly Dmitry Tarnyagin (1): remoteproc/ste: fix memory leak on shutdown Eldad Zack (1): ALSA: usb-audio: fix endianness bug in snd_nativeinstruments_* Eric Dumazet (1): selinux: add a skb_owned_by() hook Franky Lin (1): brcmfmac: do not proceed if fail to download nvram to dongle Gabor Juhos (1): rt2x00: rt2x00pci: fix build error on Ralink RT3x5x SoCs Greg Ungerer (1): m68k: define a local gpio_request_one() function
Re: [PATCHv3 1/3] thermal: introduce thermal_zone_get_zone_by_name helper function
On Fri, 2013-04-05 at 08:32 -0400, Eduardo Valentin wrote: > This patch adds a helper function to get a reference of > a thermal zone, based on the zone type name. > > It will perform a zone name lookup and return a reference > to a thermal zone device that matches the name requested. > In case the zone is not found or when several zones match > same name or if the required parameters are invalid, it will return > the corresponding error code (ERR_PTR). > > Cc: Durgadoss R > Signed-off-by: Eduardo Valentin refreshed the patch to modify drivers/thermal/thermal_core.c instead of drivers/thermal/thermal_sys.c and applied to thermal -next. thanks, rui > --- > drivers/thermal/thermal_sys.c | 38 ++ > include/linux/thermal.h |1 + > 2 files changed, 39 insertions(+), 0 deletions(-) > > diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c > index 5bd95d4..e9b636b 100644 > --- a/drivers/thermal/thermal_sys.c > +++ b/drivers/thermal/thermal_sys.c > @@ -1790,6 +1790,44 @@ void thermal_zone_device_unregister(struct > thermal_zone_device *tz) > } > EXPORT_SYMBOL_GPL(thermal_zone_device_unregister); > > +/** > + * thermal_zone_get_zone_by_name() - search for a zone and returns its ref > + * @name: thermal zone name to fetch the temperature > + * > + * When only one zone is found with the passed name, returns a reference to > it. > + * > + * Return: On success returns a reference to an unique thermal zone with > + * matching name equals to @name, an ERR_PTR otherwise (-EINVAL for invalid > + * paramenters, -ENODEV for not found and -EEXIST for multiple matches). > + */ > +struct thermal_zone_device *thermal_zone_get_zone_by_name(const char *name) > +{ > + struct thermal_zone_device *pos = NULL, *ref = ERR_PTR(-EINVAL); > + unsigned int found = 0; > + > + if (!name) > + goto exit; > + > + mutex_lock(&thermal_list_lock); > + list_for_each_entry(pos, &thermal_tz_list, node) > + if (!strnicmp(name, pos->type, THERMAL_NAME_LENGTH)) { > + found++; > + ref = pos; > + } > + mutex_unlock(&thermal_list_lock); > + > + /* nothing has been found, thus an error code for it */ > + if (found == 0) > + ref = ERR_PTR(-ENODEV); > + else if (found > 1) > + /* Success only when an unique zone is found */ > + ref = ERR_PTR(-EEXIST); > + > +exit: > + return ref; > +} > +EXPORT_SYMBOL_GPL(thermal_zone_get_zone_by_name); > + > #ifdef CONFIG_NET > static struct genl_family thermal_event_genl_family = { > .id = GENL_ID_GENERATE, > diff --git a/include/linux/thermal.h b/include/linux/thermal.h > index 542a39c..0cf9eb5 100644 > --- a/include/linux/thermal.h > +++ b/include/linux/thermal.h > @@ -237,6 +237,7 @@ void thermal_zone_device_update(struct > thermal_zone_device *); > struct thermal_cooling_device *thermal_cooling_device_register(char *, void > *, > const struct thermal_cooling_device_ops *); > void thermal_cooling_device_unregister(struct thermal_cooling_device *); > +struct thermal_zone_device *thermal_zone_get_zone_by_name(const char *name); > > int thermal_zone_trend_get(struct thermal_zone_device *, int); > struct thermal_instance *thermal_instance_get(struct thermal_zone_device *, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv3 2/3] thermal: expose thermal_zone_get_temp API
On Fri, 2013-04-05 at 08:32 -0400, Eduardo Valentin wrote: > This patch exports the thermal_zone_get_temp API so that driver > writers can fetch temperature of thermal zones managed by other > drivers. > > Acked-by: Durgadoss R > Signed-off-by: Eduardo Valentin refreshed the patch to modify drivers/thermal/thermal_core.c instead of drivers/thermal/thermal_sys.c and applied to thermal -next. thanks, rui > --- > drivers/thermal/thermal_sys.c | 20 +--- > include/linux/thermal.h |1 + > 2 files changed, 18 insertions(+), 3 deletions(-) > > diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c > index e9b636b..83bfa0d 100644 > --- a/drivers/thermal/thermal_sys.c > +++ b/drivers/thermal/thermal_sys.c > @@ -371,16 +371,28 @@ static void handle_thermal_trip(struct > thermal_zone_device *tz, int trip) > monitor_thermal_zone(tz); > } > > -static int thermal_zone_get_temp(struct thermal_zone_device *tz, > - unsigned long *temp) > +/** > + * thermal_zone_get_temp() - returns its the temperature of thermal zone > + * @tz: a valid pointer to a struct thermal_zone_device > + * @temp: a valid pointer to where to store the resulting temperature. > + * > + * When a valid thermal zone reference is passed, it will fetch its > + * temperature and fill @temp. > + * > + * Return: On success returns 0, an error code otherwise > + */ > +int thermal_zone_get_temp(struct thermal_zone_device *tz, unsigned long > *temp) > { > - int ret = 0; > + int ret = -EINVAL; > #ifdef CONFIG_THERMAL_EMULATION > int count; > unsigned long crit_temp = -1UL; > enum thermal_trip_type type; > #endif > > + if (IS_ERR_OR_NULL(tz)) > + goto exit; > + > mutex_lock(&tz->lock); > > ret = tz->ops->get_temp(tz, temp); > @@ -404,8 +416,10 @@ static int thermal_zone_get_temp(struct > thermal_zone_device *tz, > skip_emul: > #endif > mutex_unlock(&tz->lock); > +exit: > return ret; > } > +EXPORT_SYMBOL_GPL(thermal_zone_get_temp); > > static void update_temperature(struct thermal_zone_device *tz) > { > diff --git a/include/linux/thermal.h b/include/linux/thermal.h > index 0cf9eb5..8eea86c 100644 > --- a/include/linux/thermal.h > +++ b/include/linux/thermal.h > @@ -238,6 +238,7 @@ struct thermal_cooling_device > *thermal_cooling_device_register(char *, void *, > const struct thermal_cooling_device_ops *); > void thermal_cooling_device_unregister(struct thermal_cooling_device *); > struct thermal_zone_device *thermal_zone_get_zone_by_name(const char *name); > +int thermal_zone_get_temp(struct thermal_zone_device *tz, unsigned long > *temp); > > int thermal_zone_trend_get(struct thermal_zone_device *, int); > struct thermal_instance *thermal_instance_get(struct thermal_zone_device *, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv3 3/3] staging: ti-soc-thermal: remove external heat while extrapolating hotspot
On Fri, 2013-04-05 at 08:32 -0400, Eduardo Valentin wrote: > For boards that provide a PCB sensor close to SoC junction > temperature, it is possible to remove the cumulative heat > reported by the SoC temperature sensor. > > This patch changes the extrapolation computation to consider > an external sensor in the extrapolation equations. > > Signed-off-by: Eduardo Valentin hmm, who should take this patch? thanks, rui > --- > drivers/staging/ti-soc-thermal/ti-thermal-common.c | 30 +-- > 1 files changed, 20 insertions(+), 10 deletions(-) > > diff --git a/drivers/staging/ti-soc-thermal/ti-thermal-common.c > b/drivers/staging/ti-soc-thermal/ti-thermal-common.c > index 231c549..780368b 100644 > --- a/drivers/staging/ti-soc-thermal/ti-thermal-common.c > +++ b/drivers/staging/ti-soc-thermal/ti-thermal-common.c > @@ -38,6 +38,7 @@ > /* common data structures */ > struct ti_thermal_data { > struct thermal_zone_device *ti_thermal; > + struct thermal_zone_device *pcb_tz; > struct thermal_cooling_device *cool_dev; > struct ti_bandgap *bgp; > enum thermal_device_mode mode; > @@ -77,10 +78,12 @@ static inline int ti_thermal_hotspot_temperature(int t, > int s, int c) > static inline int ti_thermal_get_temp(struct thermal_zone_device *thermal, > unsigned long *temp) > { > + struct thermal_zone_device *pcb_tz = NULL; > struct ti_thermal_data *data = thermal->devdata; > struct ti_bandgap *bgp; > const struct ti_temp_sensor *s; > - int ret, tmp, pcb_temp, slope, constant; > + int ret, tmp, slope, constant; > + unsigned long pcb_temp; > > if (!data) > return 0; > @@ -92,16 +95,22 @@ static inline int ti_thermal_get_temp(struct > thermal_zone_device *thermal, > if (ret) > return ret; > > - pcb_temp = 0; > - /* TODO: Introduce pcb temperature lookup */ > + /* Default constants */ > + slope = s->slope; > + constant = s->constant; > + > + pcb_tz = data->pcb_tz; > /* In case pcb zone is available, use the extrapolation rule with it */ > - if (pcb_temp) { > - tmp -= pcb_temp; > - slope = s->slope_pcb; > - constant = s->constant_pcb; > - } else { > - slope = s->slope; > - constant = s->constant; > + if (!IS_ERR_OR_NULL(pcb_tz)) { > + ret = thermal_zone_get_temp(pcb_tz, &pcb_temp); > + if (!ret) { > + tmp -= pcb_temp; /* got a valid PCB temp */ > + slope = s->slope_pcb; > + constant = s->constant_pcb; > + } else { > + dev_err(bgp->dev, > + "Failed to read PCB state. Using defaults\n"); > + } > } > *temp = ti_thermal_hotspot_temperature(tmp, slope, constant); > > @@ -248,6 +257,7 @@ static struct ti_thermal_data > data->sensor_id = id; > data->bgp = bgp; > data->mode = THERMAL_DEVICE_ENABLED; > + data->pcb_tz = thermal_zone_get_zone_by_name("pcb"); > INIT_WORK(&data->thermal_wq, ti_thermal_work); > > return data; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] backlight: platform_lcd: introduce probe callback
On Friday, April 12, 2013 5:25 AM, Andrew Bresticker wrote: > > Platform LCD devices may need to do some device-specific > initialization before they can be used (regulator or GPIO setup, > for example), but currently the driver does not support any way of > doing this. This patch adds a probe() callback to plat_lcd_data > which platform LCD devices can set to indicate that device-specific > initialization is needed. > > Signed-off-by: Andrew Bresticker CC'ed Andrew Morton, It looks good. Acked-by: Jingoo Han Best regards, Jingoo Han > --- > drivers/video/backlight/platform_lcd.c | 6 ++ > include/video/platform_lcd.h | 1 + > 2 files changed, 7 insertions(+) > > diff --git a/drivers/video/backlight/platform_lcd.c > b/drivers/video/backlight/platform_lcd.c > index 17a6b83..f46180e 100644 > --- a/drivers/video/backlight/platform_lcd.c > +++ b/drivers/video/backlight/platform_lcd.c > @@ -86,6 +86,12 @@ static int platform_lcd_probe(struct platform_device *pdev) > return -EINVAL; > } > > + if (pdata->probe) { > + err = pdata->probe(pdata); > + if (err) > + return err; > + } > + > plcd = devm_kzalloc(&pdev->dev, sizeof(struct platform_lcd), > GFP_KERNEL); > if (!plcd) { > diff --git a/include/video/platform_lcd.h b/include/video/platform_lcd.h > index ad3bdfe..23864b2 100644 > --- a/include/video/platform_lcd.h > +++ b/include/video/platform_lcd.h > @@ -15,6 +15,7 @@ struct plat_lcd_data; > struct fb_info; > > struct plat_lcd_data { > + int (*probe)(struct plat_lcd_data *); > void(*set_power)(struct plat_lcd_data *, unsigned int power); > int (*match_fb)(struct plat_lcd_data *, struct fb_info *); > }; > -- > 1.8.1.3 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] hfs/hfsplus: Convert dprint to hfs_dbg
On Mon, 2013-04-15 at 01:53 +0100, Hin-Tak Leung wrote: > --- On Mon, 8/4/13, Joe Perches wrote: > > Use a more current logging style. [] > I have been sitting on a patch which changes this part of the code to dynamic > debugging, and it is much simplier. Just: > #define dprint(flg, fmt, args...) \ > - if (flg & DBG_MASK) \ > - printk(fmt , ## args) > + pr_debug(fmt , ## args) This change wouldn't work well as it would make a mess of output that uses no prefix (ie: emits at KERN_DEFAULT) with output that uses KERN_DEBUG That's the reason for _dbg and _dbg_cont. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH mainline] btrfs: fix minor typo in comment
In the comment describing the sync_writers field of the btrfs_inode struct, "fsyncing" was misspelled "fsycing." Signed-off-by: Nathaniel Yazdani --- diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index d9b97d4..08b286b 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -93,7 +93,7 @@ struct btrfs_inode { unsigned long runtime_flags; - /* Keep track of who's O_SYNC/fsycing currently */ + /* Keep track of who's O_SYNC/fsyncing currently */ atomic_t sync_writers; /* full 64 bit generation number, struct vfs_inode doesn't have a big -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] process cputimer is moving faster than its corresponding clock
On Fri, 2013-04-12 at 11:16 +0200, Peter Zijlstra wrote: > On Wed, 2013-04-10 at 11:48 -0400, Olivier Langlois wrote: > > Please explain how expensive it is. All I am seeing is a couple of > > additions. > > Let me start with this, since your earlier argument also refers to > this. > > So yes it does look simple and straight fwd, only one addition. However > its an atomic operation across all threads of the same process. Imagine > a single process with 512 threads, all running on a separate cpu. > Peter, It now makes perfect sense. Thank you for your explanation. It is showing me an aspect that I did overlook. Greetings, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] hfs/hfsplus: Convert dprint to hfs_dbg
--- On Mon, 15/4/13, Joe Perches wrote: > On Mon, 2013-04-15 at 01:53 +0100, > Hin-Tak Leung wrote: > > --- On Mon, 8/4/13, Joe Perches > wrote: > > > Use a more current logging style. > [] > > I have been sitting on a patch which changes this part > of the code to dynamic debugging, and it is much simplier. > Just: > > #define dprint(flg, fmt, args...) \ > > - if (flg & > DBG_MASK) \ > > - > printk(fmt , ## args) > > + > pr_debug(fmt , ## args) > > This change wouldn't work well as it would make a mess > of output that uses no prefix (ie: emits at KERN_DEFAULT) > with output that uses KERN_DEBUG > > That's the reason for _dbg and _dbg_cont. Hmm, I don't get it. Is there any *existing* use of dprint in the hfplus code which is affected by your comment? Or is this another general stylistic comment? i.e. "this does not work in general"? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] x86, kdump: Set crashkernel_low automatically
On 04/11/2013 11:54 PM, Yinghai Lu wrote: > + /* > + * two parts from lib/swiotlb.c: > + * swiotlb size: user specified with swiotlb= or default. > + * swiotlb overflow buffer: now is hardcoded to 32k, > + * round to 8M to cover more others. > + */ This comment is incomprehensible. "Cover more others"? -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH tip/core/rcu 6/7] rcu: Drive quiescent-state-forcing delay from HZ
On Fri, Apr 12, 2013 at 11:38:04PM -0700, Paul E. McKenney wrote: > On Fri, Apr 12, 2013 at 04:54:02PM -0700, Josh Triplett wrote: > > On Fri, Apr 12, 2013 at 04:19:13PM -0700, Paul E. McKenney wrote: > > > From: "Paul E. McKenney" > > > > > > Systems with HZ=100 can have slow bootup times due to the default > > > three-jiffy delays between quiescent-state forcing attempts. This > > > commit therefore auto-tunes the RCU_JIFFIES_TILL_FORCE_QS value based > > > on the value of HZ. However, this would break very large systems that > > > require more time between quiescent-state forcing attempts. This > > > commit therefore also ups the default delay by one jiffy for each > > > 256 CPUs that might be on the system (based off of nr_cpu_ids at > > > runtime, -not- NR_CPUS at build time). > > > > > > Reported-by: Paul Mackerras > > > Signed-off-by: Paul E. McKenney > > > > Something seems very wrong if RCU regularly hits the fqs code during > > boot; feels like there's some more straightforward solution we're > > missing. What causes these CPUs to fall under RCU's scrutiny during > > boot yet not actually hit the RCU codepaths naturally? > > The problem is that they are running HZ=100, so that RCU will often > take 30-60 milliseconds per grace period. At that point, you only > need 16-30 grace periods to chew up a full second, so it is not all > that hard to eat up the additional 8-12 seconds of boot time that > they were seeing. IIRC, UP boot was costing them 4 seconds. I added some instrumentation, which counted 202 calls to synchronize_sched() during boot (Fedora 17 minimal install + development tools) with a 3.8.0 kernel on a 4-cpu KVM virtual machine on a POWER7. Without this patch, those 202 calls take up a total of 4.32 seconds; with it, they take up 3.6 seconds. The kernel is compiled with HZ=100 and NR_CPUS=1024, like the standard Fedora kernel. I suspect a lot of the calls are in udevd and related processes. Interestingly there were no calls to synchronize_rcu_bh or synchronize_sched_expedited. Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] hfs/hfsplus: Convert dprint to hfs_dbg
On Mon, 2013-04-15 at 02:56 +0100, Hin-Tak Leung wrote: > --- On Mon, 15/4/13, Joe Perches wrote: > > On Mon, 2013-04-15 at 01:53 +0100, > > Hin-Tak Leung wrote: > > > --- On Mon, 8/4/13, Joe Perches wrote: > > > > Use a more current logging style. > > [] > > > I have been sitting on a patch which changes this part > > of the code to dynamic debugging, and it is much simplier. [] > > This change wouldn't work well as it would make a mess > > of output that uses no prefix (ie: emits at KERN_DEFAULT) > > with output that uses KERN_DEBUG > > > > That's the reason for _dbg and _dbg_cont. > > Hmm, I don't get it. Is there any *existing* use of dprint > in the hfplus code which is affected by your comment? Code like this prints out currently on a single line at KERN_DEFAULT. @@ -138,16 +138,16 @@ void hfs_bnode_dump(struct hfs_bnode *node) [] for (i = be16_to_cpu(desc.num_recs); i >= 0; off -= 2, i--) { key_off = hfs_bnode_read_u16(node, off); - dprint(DBG_BNODE_MOD, " %d", key_off); + hfs_dbg_cont(BNODE_MOD, " %d", key_off); By converting this dprint() to pr_debug(), it would print out on a multiple lines, one for each read. That's why it should use a mechanism like dbg_cont. btw: there is no current pr_debug_cont mechanism. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] watchdog: introduce new watchdog AUTOSTART option
On Mon, Apr 15, 2013 at 12:43:31AM +, Kim, Milo wrote: > Hi Guenter > > > I really don't like that idea. It defeats a significant part of the > > purpose > > for having a watchdog, which is to prevent user-space hangups. > > > > To make this a driver option is even more odd - it forces every user of > > this > > driver to use it in-kernel only, and makes /dev/watchdog quite useless. > > > > I mean, really, if you have such a watchdog, what is the point of using > > the > > watchdog infrastructure in the first place ? Just make it a kernel > > thread or > > timer-activated platform code which pings your watchdog once in a while. > > No > > need to get the watchdog infrastructure involved in the first place. > > > > Am I missing something ? > > I wanted to enable the watchdog timer without the watchdog application for > making sure the system alive. > However, I think I misunderstood the purpose of the watchdog driver. > The watchdog is for detecting user-space hangups rather than kernel stall. > Is it correct? If yes, this patch is totally wrong. > Correct. After all, if the kernel stalls, user space will stall as well, so by covering user space it covers both. Covering kernel alone doesn't help much, since most of the stalls (at least in my experience) happen in user space. Guenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3] tracepoints: prevents null probe from being added
From: Sahara Somehow tracepoint_entry_add_probe function allows a null probe function. And, this may lead to unexpected result since the number of probe functions in an entry can be counted by checking whether probe is null or not in for-loop. This patch prevents the null probe from being added. In tracepoint_entry_remove_probe function, checking probe parameter within for-loop is moved out for code efficiency leaving the null probe feature which removes all probe functions in the entry. Signed-off-by: Sahara Reviewed-by: Steven Rostedt Reviewed-by: Mathieu Desnoyers --- kernel/tracepoint.c | 21 + 1 files changed, 13 insertions(+), 8 deletions(-) diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 0c05a45..29f2654 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -112,7 +112,8 @@ tracepoint_entry_add_probe(struct tracepoint_entry *entry, int nr_probes = 0; struct tracepoint_func *old, *new; - WARN_ON(!probe); + if (WARN_ON(!probe)) + return ERR_PTR(-EINVAL); debug_print_probes(entry); old = entry->funcs; @@ -152,13 +153,18 @@ tracepoint_entry_remove_probe(struct tracepoint_entry *entry, debug_print_probes(entry); /* (N -> M), (N > 1, M >= 0) probes */ - for (nr_probes = 0; old[nr_probes].func; nr_probes++) { - if (!probe || - (old[nr_probes].func == probe && -old[nr_probes].data == data)) - nr_del++; + if (probe) { + for (nr_probes = 0; old[nr_probes].func; nr_probes++) { + if (old[nr_probes].func == probe && +old[nr_probes].data == data) + nr_del++; + } } + /* +* If probe is NULL, then nr_probes = nr_del = 0, and then the +* entire entry will be removed. +*/ if (nr_probes - nr_del == 0) { /* N -> 0, (N > 1) */ entry->funcs = NULL; @@ -173,8 +179,7 @@ tracepoint_entry_remove_probe(struct tracepoint_entry *entry, if (new == NULL) return ERR_PTR(-ENOMEM); for (i = 0; old[i].func; i++) - if (probe && - (old[i].func != probe || old[i].data != data)) + if (old[i].func != probe || old[i].data != data) new[j++] = old[i]; new[nr_probes - nr_del].func = NULL; entry->refcount = nr_probes - nr_del; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] watchdog: introduce new watchdog AUTOSTART option
> -Original Message- > From: Guenter Roeck [mailto:li...@roeck-us.net] > Sent: Monday, April 15, 2013 11:07 AM > To: Kim, Milo > Cc: w...@iguana.be; linux-watch...@vger.kernel.org; linux- > ker...@vger.kernel.org > Subject: Re: [PATCH 1/2] watchdog: introduce new watchdog AUTOSTART > option > > On Mon, Apr 15, 2013 at 12:43:31AM +, Kim, Milo wrote: > > Hi Guenter > > > > > I really don't like that idea. It defeats a significant part of the > > > purpose > > > for having a watchdog, which is to prevent user-space hangups. > > > > > > To make this a driver option is even more odd - it forces every > user of > > > this > > > driver to use it in-kernel only, and makes /dev/watchdog quite > useless. > > > > > > I mean, really, if you have such a watchdog, what is the point of > using > > > the > > > watchdog infrastructure in the first place ? Just make it a kernel > > > thread or > > > timer-activated platform code which pings your watchdog once in a > while. > > > No > > > need to get the watchdog infrastructure involved in the first place. > > > > > > Am I missing something ? > > > > I wanted to enable the watchdog timer without the watchdog > application for > > making sure the system alive. > > However, I think I misunderstood the purpose of the watchdog driver. > > The watchdog is for detecting user-space hangups rather than kernel > stall. > > Is it correct? If yes, this patch is totally wrong. > > > Correct. After all, if the kernel stalls, user space will stall as well, > so by > covering user space it covers both. Covering kernel alone doesn't help > much, > since most of the stalls (at least in my experience) happen in user > space. Got it. I nearly spoiled it due to my misunderstanding ;) Many thanks! Milo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Bug fix PATCH] resource: Reusing a resource structure allocated by bootmem
When hot removing memory presented at boot time, following messages are shown: [ 296.867031] [ cut here ] [ 296.922273] kernel BUG at mm/slub.c:3409! [ 296.970229] invalid opcode: [#1] SMP [ 297.019453] Modules linked in: ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc ipmi_devintf ipmi_msghandler sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc vfat fat dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr sg i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core e1000e ptp pps_core tpm_infineon ioatdma dca sr_mod cdrom sd_mod crc_t10dif usb_storage megaraid_sas lpfc scsi_transport_fc scsi_tgt scsi_mod [ 297.747808] CPU 0 [ 297.769764] Pid: 5091, comm: kworker/0:2 Tainted: GW3.9.0-rc6+ #15 [ 297.897917] RIP: 0010:[] [] kfree+0x232/0x240 [ 297.988634] RSP: 0018:88084678d968 EFLAGS: 00010246 [ 298.052196] RAX: 00600400 RBX: 8987fea0 RCX: [ 298.137595] RDX: 8107a5ae RSI: 0001 RDI: 8987fea0 [ 298.222994] RBP: 88084678d998 R08: 8200 R09: 0001 [ 298.308390] R10: R11: R12: 0300 [ 298.393792] R13: ea061fc0 R14: 0303 R15: 0080 [ 298.479190] FS: () GS:88085aa0() knlGS: [ 298.576030] CS: 0010 DS: ES: CR0: 80050033 [ 298.644791] CR2: 025d3f78 CR3: 01c0c000 CR4: 001407f0 [ 298.730192] DR0: DR1: DR2: [ 298.815590] DR3: DR6: 0ff0 DR7: 0400 [ 298.900997] Process kworker/0:2 (pid: 5091, threadinfo 88084678c000, task 88083928ca80) [ 299.005121] Stack: [ 299.029156] 0303 8987fea0 0300 8987fe90 [ 299.118116] 0303 0080 88084678d9c8 8107a5d4 [ 299.207084] 3000 8987fffb2680 0080 3000 [ 299.296045] Call Trace: [ 299.325288] [] __release_region+0xd4/0xe0 [ 299.393020] [] __remove_pages+0x52/0x110 [ 299.459707] [] arch_remove_memory+0x89/0xd0 [ 299.529505] [] remove_memory+0xc4/0x100 [ 299.595145] [] acpi_memory_device_remove+0x6d/0xb1 [ 299.672230] [] acpi_device_remove+0x89/0xab [ 299.742033] [] __device_release_driver+0x7c/0xf0 [ 299.817048] [] device_release_driver+0x2f/0x50 [ 299.889972] [] acpi_bus_device_detach+0x6c/0x70 [ 299.963938] [] acpi_ns_walk_namespace+0x11a/0x250 [ 300.039982] [] ? power_state_show+0x36/0x36 [ 300.109800] [] ? power_state_show+0x36/0x36 [ 300.179612] [] acpi_walk_namespace+0xee/0x137 [ 300.251492] [] acpi_bus_trim+0x33/0x7a [ 300.316089] [] ? mutex_lock_nested+0x4a/0x60 [ 300.386927] [] acpi_bus_hot_remove_device+0xc4/0x1a1 [ 300.466096] [] acpi_os_execute_deferred+0x27/0x34 [ 300.542137] [] process_one_work+0x1f7/0x590 [ 300.611940] [] ? process_one_work+0x185/0x590 [ 300.683823] [] worker_thread+0x11a/0x370 [ 300.750502] [] ? manage_workers+0x180/0x180 [ 300.820308] [] kthread+0xee/0x100 [ 300.879714] [] ? __lock_release+0x12b/0x190 [ 300.949512] [] ? __init_kthread_worker+0x70/0x70 [ 301.024517] [] ret_from_fork+0x7c/0xb0 [ 301.089135] [] ? __init_kthread_worker+0x70/0x70 [ 301.164138] Code: 89 ef e8 c2 2c fb ff e9 0b ff ff ff 4d 8b 6d 30 e9 5c fe ff ff 4c 89 f1 48 89 da 4c 89 ee 4c 89 e7 e8 03 f9 ff ff e9 ec fe ff ff <0f> 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec [ 301.397214] RIP [] kfree+0x232/0x240 [ 301.459855] RSP [ 301.501675] ---[ end trace 8679967aa8606ed8 ]--- The reason why the messages are shown is to release a resource structure, allocated by bootmem, by kfree(). So when we release a resource structure, we should check whether it is allocated by bootmem or not. But even if we know a resource structure is allocated by bootmem, we cannot release it since SLxB cannot treat it. So for reusing a resource structure, this patch remembers it by using bootmem_resource as follows: When releasing a resource structure by free_resource(), free_resource() checks whether the resource structure is allocated by bootmem or not. If it is allocated by bootmem, free_resource() adds it to bootmem_resource. If it is not allocated by bootmem, free_resource() release it by kfree(). And when getting a new resource structure by get_resource(), get_resource() checks whether bootmem_resource has released resource structures or not. If there is a released resource structure, get_resource() returns it. If there is not a releaed resource structure, get_resource() returns new resource structure allocated by kzalloc
[ 00/11] 3.0.74-stable review
This is the start of the stable review cycle for the 3.0.74 release. There are 11 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Wed Apr 17 02:05:34 UTC 2013. Anything received after that time might be too late. The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.0.74-rc1.gz and the diffstat can be found below. thanks, greg k-h - Pseudo-Shortlog of commits: Greg Kroah-Hartman Linux 3.0.74-rc1 Hayes Wang r8169: fix auto speed down issue Linus Torvalds mtdchar: fix offset overflow detection Boris Ostrovsky x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal Samu Kallio x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates Thomas Gleixner sched_clock: Prevent 64bit inatomicity on 32bit systems Nicholas Bellinger target: Fix incorrect fallthrough of ALUA Standby/Offline/Transition CDBs Huacai Chen PM / reboot: call syscore_shutdown() after disable_nonboot_cpus() Namhyung Kim tracing: Fix double free when function profile init failed Alban Bedel ASoC: wm8903: Fix the bypass to HP/LINEOUT when no DAC or ADC is running Dave Hansen x86-32, mm: Rip out x86_32 NUMA remapping code Eldad Zack ALSA: usb-audio: fix endianness bug in snd_nativeinstruments_* - Diffstat: Makefile | 4 +- arch/x86/include/asm/paravirt.h | 5 +- arch/x86/include/asm/paravirt_types.h | 2 + arch/x86/kernel/paravirt.c| 25 +++--- arch/x86/lguest/boot.c| 1 + arch/x86/mm/fault.c | 6 +- arch/x86/mm/numa_32.c | 161 -- arch/x86/xen/mmu.c| 1 + drivers/mtd/mtdchar.c | 48 -- drivers/net/r8169.c | 30 ++- drivers/target/target_core_alua.c | 3 + kernel/sched_clock.c | 26 ++ kernel/sys.c | 3 +- kernel/trace/ftrace.c | 1 - sound/soc/codecs/wm8903.c | 2 + sound/usb/mixer_quirks.c | 4 +- sound/usb/quirks.c| 2 +- 17 files changed, 131 insertions(+), 193 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/