2.4.2 seems to break loopback and/or mount
Please CC me on replies. I just joined the list and don't want to miss any replies. I have been running 2.4.1-pre10 for quite some time with no problems. I just upgraded to 2.4.2 and everything seem to work fine until I did... (as root or course) mount -t iso9660 -o loop,ro mycdimage.iso /mnt/cdrom at which point the mount process hung in an uninterruptable sleep. after that I can no longer successfully issue any other mount commands, including non-loopback mounts. I can mount/unmount regular partitions before mounting anything via loopback. Any ideas as to what is wrong? The only thing I can think of is that my modutils is v2.3.19 but I doubt that is doing it as the loop module and other modules are loaded fine. If anybody has an idea as to what I broke please let me know. I will upgrade modutils tomorrow and see if the problem goes away while I wait for a possibly more accurate response. Thank you, Jeff Wiegley - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
the editing is that you need
We would like to check if your photos need editing. We can do it for you. Our image editing is for web store photos, jewelries images and beauty and portrait photos etc. It is including cut out and clipping path , and also retouching if it is needed. We can do test on your photos. Just send us a photo we will start to work on it, Thanks, Jeff Allen
the photos is what you need
We would like to check if your photos need editing. We can do it for you. Our image editing is for web store photos, jewelries images and beauty and portrait photos etc. It is including cut out and clipping path , and also retouching if it is needed. We can do test on your photos. Just send us a photo we will start to work on it, Thanks, Jeff Allen
Re: Recent kernel "mount" slow
On Sun, Nov 25, 2012 at 7:23 AM, Jeff Chua wrote: > On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka wrote: >> So it's better to slow down mount. > > I am quite proud of the linux boot time pitting against other OS. Even > with 10 partitions. Linux can boot up in just a few seconds, but now > you're saying that we need to do this semaphore check at boot up. By > doing so, it's inducing additional 4 seconds during boot up. By the way, I'm using a pretty fast SSD (Samsung PM830) and fast CPU (2.8GHz). I wonder if those on slower hard disk or slower CPU, what kind of degradation would this cause or just the same? Thanks, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Recent kernel "mount" slow
Jens, Limited access now at Incheon Airport. Will try the patch out when I arrived. Thanks, Jeff On 11/27/12, Jens Axboe wrote: > On 2012-11-27 08:38, Jens Axboe wrote: >> On 2012-11-27 06:57, Jeff Chua wrote: >>> On Sun, Nov 25, 2012 at 7:23 AM, Jeff Chua >>> wrote: >>>> On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka >>>> wrote: >>>>> So it's better to slow down mount. >>>> >>>> I am quite proud of the linux boot time pitting against other OS. Even >>>> with 10 partitions. Linux can boot up in just a few seconds, but now >>>> you're saying that we need to do this semaphore check at boot up. By >>>> doing so, it's inducing additional 4 seconds during boot up. >>> >>> By the way, I'm using a pretty fast SSD (Samsung PM830) and fast CPU >>> (2.8GHz). I wonder if those on slower hard disk or slower CPU, what >>> kind of degradation would this cause or just the same? >> >> It'd likely be the same slow down time wise, but as a percentage it >> would appear smaller on a slower disk. >> >> Could you please test Mikulas' suggestion of changing >> synchronize_sched() in include/linux/percpu-rwsem.h to >> synchronize_sched_expedited()? >> >> linux-next also has a re-write of the per-cpu rw sems, out of Andrews >> tree. It would be a good data point it you could test that, too. >> >> In any case, the slow down definitely isn't acceptable. Fixing an >> obscure issue like block sizes changing while O_DIRECT is in flight >> definitely does NOT warrant a mount slow down. > > Here's Olegs patch, might be easier for you than switching to > linux-next. Please try that. > > From: Oleg Nesterov > Subject: percpu_rw_semaphore: reimplement to not block the readers > unnecessarily > > Currently the writer does msleep() plus synchronize_sched() 3 times to > acquire/release the semaphore, and during this time the readers are > blocked completely. Even if the "write" section was not actually started > or if it was already finished. > > With this patch down_write/up_write does synchronize_sched() twice and > down_read/up_read are still possible during this time, just they use the > slow path. > > percpu_down_write() first forces the readers to use rw_semaphore and > increment the "slow" counter to take the lock for reading, then it > takes that rw_semaphore for writing and blocks the readers. > > Also. With this patch the code relies on the documented behaviour of > synchronize_sched(), it doesn't try to pair synchronize_sched() with > barrier. > > Signed-off-by: Oleg Nesterov > Reviewed-by: Paul E. McKenney > Cc: Linus Torvalds > Cc: Mikulas Patocka > Cc: Peter Zijlstra > Cc: Ingo Molnar > Cc: Srikar Dronamraju > Cc: Ananth N Mavinakayanahalli > Cc: Anton Arapov > Cc: Jens Axboe > Signed-off-by: Andrew Morton > --- > > include/linux/percpu-rwsem.h | 85 +++--- > lib/Makefile |2 > lib/percpu-rwsem.c | 123 + > 3 files changed, 138 insertions(+), 72 deletions(-) > > diff -puN > include/linux/percpu-rwsem.h~percpu_rw_semaphore-reimplement-to-not-block-the-readers-unnecessarily > include/linux/percpu-rwsem.h > --- > a/include/linux/percpu-rwsem.h~percpu_rw_semaphore-reimplement-to-not-block-the-readers-unnecessarily > +++ a/include/linux/percpu-rwsem.h > @@ -2,82 +2,25 @@ > #define _LINUX_PERCPU_RWSEM_H > > #include > +#include > #include > -#include > -#include > +#include > > struct percpu_rw_semaphore { > - unsigned __percpu *counters; > - bool locked; > - struct mutex mtx; > + unsigned int __percpu *fast_read_ctr; > + struct mutexwriter_mutex; > + struct rw_semaphore rw_sem; > + atomic_tslow_read_ctr; > + wait_queue_head_t write_waitq; > }; > > -#define light_mb() barrier() > -#define heavy_mb() synchronize_sched() > +extern void percpu_down_read(struct percpu_rw_semaphore *); > +extern void percpu_up_read(struct percpu_rw_semaphore *); > > -static inline void percpu_down_read(struct percpu_rw_semaphore *p) > -{ > - rcu_read_lock_sched(); > - if (unlikely(p->locked)) { > - rcu_read_unlock_sched(); > - mutex_lock(&p->mtx); > - this_cpu_inc(*p->counters); > - mutex_unlock(&p->mtx); > - return; > - } > - this_cpu_inc(*p->counters); > - rcu_read_unlock_sched(); > - light_mb();
Re: Recent kernel "mount" slow
On Tue, Nov 27, 2012 at 3:38 PM, Jens Axboe wrote: > On 2012-11-27 06:57, Jeff Chua wrote: >> On Sun, Nov 25, 2012 at 7:23 AM, Jeff Chua wrote: >>> On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka >>> wrote: >>>> So it's better to slow down mount. >>> >>> I am quite proud of the linux boot time pitting against other OS. Even >>> with 10 partitions. Linux can boot up in just a few seconds, but now >>> you're saying that we need to do this semaphore check at boot up. By >>> doing so, it's inducing additional 4 seconds during boot up. >> >> By the way, I'm using a pretty fast SSD (Samsung PM830) and fast CPU >> (2.8GHz). I wonder if those on slower hard disk or slower CPU, what >> kind of degradation would this cause or just the same? > > It'd likely be the same slow down time wise, but as a percentage it > would appear smaller on a slower disk. > > Could you please test Mikulas' suggestion of changing > synchronize_sched() in include/linux/percpu-rwsem.h to > synchronize_sched_expedited()? Tested. It seems as fast as before, but may be a "tick" slower. Just perception. I was getting pretty much 0.012s with everything reverted. With synchronize_sched_expedited(), it seems to be 0.012s ~ 0.013s. So, it's good. > linux-next also has a re-write of the per-cpu rw sems, out of Andrews > tree. It would be a good data point it you could test that, too. Tested. It's slower. 0.350s. But still faster than 0.500s without the patch. # time mount /dev/sda1 /mnt; sync; sync; umount /mnt So, here's the comparison ... 0.500s 3.7.0-rc7 0.168s 3.7.0-rc2 0.012s 3.6.0 0.013s 3.7.0-rc7 + synchronize_sched_expedited() 0.350s 3.7.0-rc7 + Oleg's patch. Thanks, Jeff. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch,v3,repost 03/10] scsi: make scsi_alloc_sdev numa-aware
Use the numa node id set in the Scsi_Host to allocate the sdev structure on the device-local numa node. Reviewed-by: Bart Van Assche Signed-off-by: Jeff Moyer --- drivers/scsi/scsi_scan.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 3e58b22..d91749d 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -232,8 +232,8 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget, extern void scsi_evt_thread(struct work_struct *work); extern void scsi_requeue_run_queue(struct work_struct *work); - sdev = kzalloc(sizeof(*sdev) + shost->transportt->device_size, - GFP_ATOMIC); + sdev = kzalloc_node(sizeof(*sdev) + shost->transportt->device_size, + GFP_ATOMIC, scsi_host_get_numa_node(shost)); if (!sdev) goto out; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch,v3,repost 07/10] megaraid_sas: use scsi_host_alloc_node
Signed-off-by: Jeff Moyer --- drivers/scsi/megaraid/megaraid_sas_base.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index d2c5366..707a6cd 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -4020,8 +4020,9 @@ megasas_probe_one(struct pci_dev *pdev, const struct pci_device_id *id) if (megasas_set_dma_mask(pdev)) goto fail_set_dma_mask; - host = scsi_host_alloc(&megasas_template, - sizeof(struct megasas_instance)); + host = scsi_host_alloc_node(&megasas_template, + sizeof(struct megasas_instance), + dev_to_node(&pdev->dev)); if (!host) { printk(KERN_DEBUG "megasas: scsi_host_alloc failed\n"); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch,v3,repost 06/10] ata: use scsi_host_alloc_node
Acked-by: Jeff Garzik Signed-off-by: Jeff Moyer --- drivers/ata/libata-scsi.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index e3bda07..9d5dd09 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -3586,7 +3586,8 @@ int ata_scsi_add_hosts(struct ata_host *host, struct scsi_host_template *sht) struct Scsi_Host *shost; rc = -ENOMEM; - shost = scsi_host_alloc(sht, sizeof(struct ata_port *)); + shost = scsi_host_alloc_node(sht, sizeof(struct ata_port *), +dev_to_node(host->dev)); if (!shost) goto err_alloc; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch,v3,repost 10/10] cciss: use blk_init_queue_node
Signed-off-by: Jeff Moyer --- drivers/block/cciss.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c index b0f553b..5fe5546 100644 --- a/drivers/block/cciss.c +++ b/drivers/block/cciss.c @@ -1930,7 +1930,8 @@ static void cciss_get_serial_no(ctlr_info_t *h, int logvol, static int cciss_add_disk(ctlr_info_t *h, struct gendisk *disk, int drv_index) { - disk->queue = blk_init_queue(do_cciss_request, &h->lock); + disk->queue = blk_init_queue_node(do_cciss_request, &h->lock, + dev_to_node(&h->dev)); if (!disk->queue) goto init_queue_failure; sprintf(disk->disk_name, "cciss/c%dd%d", h->ctlr, drv_index); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch,v3,repost 01/10] scsi: add scsi_host_alloc_node
Allow an LLD to specify on which numa node to allocate scsi data structures. Thanks to Bart Van Assche for the suggestion. Reviewed-by: Bart Van Assche Signed-off-by: Jeff Moyer --- drivers/scsi/hosts.c | 13 +++-- include/scsi/scsi_host.h | 28 2 files changed, 39 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index 593085a..06ce602 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -336,16 +336,25 @@ static struct device_type scsi_host_type = { **/ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) { + return scsi_host_alloc_node(sht, privsize, NUMA_NO_NODE); +} +EXPORT_SYMBOL(scsi_host_alloc); + +struct Scsi_Host *scsi_host_alloc_node(struct scsi_host_template *sht, + int privsize, int node) +{ struct Scsi_Host *shost; gfp_t gfp_mask = GFP_KERNEL; if (sht->unchecked_isa_dma && privsize) gfp_mask |= __GFP_DMA; - shost = kzalloc(sizeof(struct Scsi_Host) + privsize, gfp_mask); + shost = kzalloc_node(sizeof(struct Scsi_Host) + privsize, +gfp_mask, node); if (!shost) return NULL; + scsi_host_set_numa_node(shost, node); shost->host_lock = &shost->default_lock; spin_lock_init(shost->host_lock); shost->shost_state = SHOST_CREATED; @@ -443,7 +452,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) kfree(shost); return NULL; } -EXPORT_SYMBOL(scsi_host_alloc); +EXPORT_SYMBOL(scsi_host_alloc_node); struct Scsi_Host *scsi_register(struct scsi_host_template *sht, int privsize) { diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index 4908480..438856d 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -732,6 +732,14 @@ struct Scsi_Host { */ struct device *dma_dev; +#ifdef CONFIG_NUMA + /* +* Numa node this device is closest to, used for allocating +* data structures locally. +*/ + int numa_node; +#endif + /* * We should ensure that this is aligned, both for better performance * and also because some compilers (m68k) don't automatically force @@ -776,6 +784,8 @@ extern int scsi_queue_work(struct Scsi_Host *, struct work_struct *); extern void scsi_flush_work(struct Scsi_Host *); extern struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *, int); +extern struct Scsi_Host *scsi_host_alloc_node(struct scsi_host_template *, + int, int); extern int __must_check scsi_add_host_with_dma(struct Scsi_Host *, struct device *, struct device *); @@ -919,6 +929,24 @@ static inline unsigned char scsi_host_get_guard(struct Scsi_Host *shost) return shost->prot_guard_type; } +#ifdef CONFIG_NUMA +static inline int scsi_host_get_numa_node(struct Scsi_Host *shost) +{ + return shost->numa_node; +} + +static inline void scsi_host_set_numa_node(struct Scsi_Host *shost, int node) +{ + shost->numa_node = node; +} +#else /* CONFIG_NUMA */ +static inline int scsi_host_get_numa_node(struct Scsi_Host *shost) +{ + return NUMA_NO_NODE; +} +static inline void scsi_host_set_numa_node(struct Scsi_Host *shost, int node) {} +#endif + /* legacy interfaces */ extern struct Scsi_Host *scsi_register(struct scsi_host_template *, int); extern void scsi_unregister(struct Scsi_Host *); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch,v3,repost 09/10] lpfc: use scsi_host_alloc_node
Acked-By: James Smart Signed-off-by: Jeff Moyer --- drivers/scsi/lpfc/lpfc_init.c | 10 ++ 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 7dc4218..65956d3 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -3051,11 +3051,13 @@ lpfc_create_port(struct lpfc_hba *phba, int instance, struct device *dev) int error = 0; if (dev != &phba->pcidev->dev) - shost = scsi_host_alloc(&lpfc_vport_template, - sizeof(struct lpfc_vport)); + shost = scsi_host_alloc_node(&lpfc_vport_template, +sizeof(struct lpfc_vport), +dev_to_node(&phba->pcidev->dev)); else - shost = scsi_host_alloc(&lpfc_template, - sizeof(struct lpfc_vport)); + shost = scsi_host_alloc_node(&lpfc_template, +sizeof(struct lpfc_vport), +dev_to_node(&phba->pcidev->dev)); if (!shost) goto out; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch,v3,repost 08/10] mpt2sas: use scsi_host_alloc_node
Signed-off-by: Jeff Moyer --- drivers/scsi/mpt2sas/mpt2sas_scsih.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c index af4e6c4..a4d6b36 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c @@ -8011,8 +8011,8 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id) struct MPT2SAS_ADAPTER *ioc; struct Scsi_Host *shost; - shost = scsi_host_alloc(&scsih_driver_template, - sizeof(struct MPT2SAS_ADAPTER)); + shost = scsi_host_alloc_node(&scsih_driver_template, + sizeof(struct MPT2SAS_ADAPTER), dev_to_node(&pdev->dev)); if (!shost) return -ENODEV; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch,v3,repost 02/10] scsi: make __scsi_alloc_queue numa-aware
Pass the numa node id set in the Scsi_Host on to blk_init_queue_node in order to keep all allocations local to the numa node the device is closest to. Reviewed-by: Bart Van Assche Signed-off-by: Jeff Moyer --- drivers/scsi/scsi_lib.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index da36a3a..ebad5e8 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1664,7 +1664,8 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost, struct request_queue *q; struct device *dev = shost->dma_dev; - q = blk_init_queue(request_fn, NULL); + q = blk_init_queue_node(request_fn, NULL, + scsi_host_get_numa_node(shost)); if (!q) return NULL; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch,v3,repost 05/10] sd: use alloc_disk_node
Reviewed-by: Bart Van Assche Signed-off-by: Jeff Moyer --- drivers/scsi/sd.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 12f6fdf..a5dae6b 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2714,7 +2714,7 @@ static int sd_probe(struct device *dev) if (!sdkp) goto out; - gd = alloc_disk(SD_MINORS); + gd = alloc_disk_node(SD_MINORS, scsi_host_get_numa_node(sdp->host)); if (!gd) goto out_free; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch,v3,repost 00/10] make I/O path allocations more numa-friendly
Hi, This patch set makes memory allocations for data structures used in the I/O path more numa friendly by allocating them from the same numa node as the storage device. I've only converted a handful of drivers at this point. My testing is limited by the hardware I have on hand. Using these patches, I was able to max out the bandwidth of the storage controller when issuing I/O from any node on my 4 node system. Without the patch, I/O from nodes remote to the storage device would suffer a penalty ranging from 6-12%. Given my relatively low-end setup[1], I wouldn't be surprised if others could show a more significant performance advantage. This is a repost of the last posting. The only changes are additional reviewed-by/acked-by tags. I think this version is ready for inclusion. James, would you mind taking a look? Cheers, Jeff [1] LSI Megaraid SAS controller with 1GB battery-backed cache, fronting a RAID 6 10+2. The workload I used was tuned to not have to hit disk. Fio file attached. -- changes from v2->v3: - Made the numa_node Scsi_Host structure member dependent on CONFIG_NUMA - Got rid of a GFP_ZERO I added accidentally changes from v1->v2: - got rid of the vfs patch, as Al pointed out some fundamental problems with it - credited Bart van Assche properly Jeff Moyer (10): scsi: add scsi_host_alloc_node scsi: make __scsi_alloc_queue numa-aware scsi: make scsi_alloc_sdev numa-aware scsi: allocate scsi_cmnd-s from the device's local numa node sd: use alloc_disk_node ata: use scsi_host_alloc_node megaraid_sas: use scsi_host_alloc_node mpt2sas: use scsi_host_alloc_node lpfc: use scsi_host_alloc_node cciss: use blk_init_queue_node drivers/ata/libata-scsi.c |3 ++- drivers/block/cciss.c |3 ++- drivers/scsi/hosts.c | 13 +++-- drivers/scsi/lpfc/lpfc_init.c | 10 ++ drivers/scsi/megaraid/megaraid_sas_base.c |5 +++-- drivers/scsi/mpt2sas/mpt2sas_scsih.c |4 ++-- drivers/scsi/scsi.c | 16 ++-- drivers/scsi/scsi_lib.c |3 ++- drivers/scsi/scsi_scan.c |4 ++-- drivers/scsi/sd.c |2 +- include/scsi/scsi_host.h | 28 11 files changed, 69 insertions(+), 22 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch,v3,repost 04/10] scsi: allocate scsi_cmnd-s from the device's local numa node
Reviewed-by: Bart Van Assche Signed-off-by: Jeff Moyer --- drivers/scsi/scsi.c | 16 ++-- 1 files changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 2936b44..1750702 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -173,16 +173,19 @@ static DEFINE_MUTEX(host_cmd_pool_mutex); * NULL on failure */ static struct scsi_cmnd * -scsi_pool_alloc_command(struct scsi_host_cmd_pool *pool, gfp_t gfp_mask) +scsi_pool_alloc_command(struct scsi_host_cmd_pool *pool, gfp_t gfp_mask, + int node) { struct scsi_cmnd *cmd; - cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask | pool->gfp_mask); + cmd = kmem_cache_alloc_node(pool->cmd_slab, + gfp_mask | pool->gfp_mask | __GFP_ZERO, + node); if (!cmd) return NULL; - cmd->sense_buffer = kmem_cache_alloc(pool->sense_slab, -gfp_mask | pool->gfp_mask); + cmd->sense_buffer = kmem_cache_alloc_node(pool->sense_slab, + gfp_mask | pool->gfp_mask, node); if (!cmd->sense_buffer) { kmem_cache_free(pool->cmd_slab, cmd); return NULL; @@ -223,7 +226,8 @@ scsi_host_alloc_command(struct Scsi_Host *shost, gfp_t gfp_mask) { struct scsi_cmnd *cmd; - cmd = scsi_pool_alloc_command(shost->cmd_pool, gfp_mask); + cmd = scsi_pool_alloc_command(shost->cmd_pool, gfp_mask, + scsi_host_get_numa_node(shost)); if (!cmd) return NULL; @@ -435,7 +439,7 @@ struct scsi_cmnd *scsi_allocate_command(gfp_t gfp_mask) if (!pool) return NULL; - return scsi_pool_alloc_command(pool, gfp_mask); + return scsi_pool_alloc_command(pool, gfp_mask, NUMA_NO_NODE); } EXPORT_SYMBOL(scsi_allocate_command); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2 v5] loop: Limit the number of requests in the bio list
Lukas Czerner writes: > Currently there is not limitation of number of requests in the loop bio > list. This can lead into some nasty situations when the caller spawns > tons of bio requests taking huge amount of memory. This is even more > obvious with discard where blkdev_issue_discard() will submit all bios > for the range and wait for them to finish afterwards. On really big loop > devices and slow backing file system this can lead to OOM situation as > reported by Dave Chinner. > > With this patch we will wait in loop_make_request() if the number of > bios in the loop bio list would exceed 'nr_congestion_on'. > We'll wake up the process as we process the bios form the list. Some > threshold hysteresis is in place to avoid high frequency oscillation. > > Signed-off-by: Lukas Czerner > Reported-by: Dave Chinner Acked-by: Jeff Moyer -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Recent kernel "mount" slow
On Wed, Nov 28, 2012 at 4:33 PM, Jens Axboe wrote: > On 2012-11-28 04:57, Mikulas Patocka wrote: >> >> >> On Tue, 27 Nov 2012, Jens Axboe wrote: >> >>> On 2012-11-27 11:06, Jeff Chua wrote: >>>> On Tue, Nov 27, 2012 at 3:38 PM, Jens Axboe wrote: >>>>> On 2012-11-27 06:57, Jeff Chua wrote: >>>>>> On Sun, Nov 25, 2012 at 7:23 AM, Jeff Chua >>>>>> wrote: >>>>>>> On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka >>>>>>> wrote: >>>>>>>> So it's better to slow down mount. >>>>>>> >>>>>>> I am quite proud of the linux boot time pitting against other OS. Even >>>>>>> with 10 partitions. Linux can boot up in just a few seconds, but now >>>>>>> you're saying that we need to do this semaphore check at boot up. By >>>>>>> doing so, it's inducing additional 4 seconds during boot up. >>>>>> >>>>>> By the way, I'm using a pretty fast SSD (Samsung PM830) and fast CPU >>>>>> (2.8GHz). I wonder if those on slower hard disk or slower CPU, what >>>>>> kind of degradation would this cause or just the same? >>>>> >>>>> It'd likely be the same slow down time wise, but as a percentage it >>>>> would appear smaller on a slower disk. >>>>> >>>>> Could you please test Mikulas' suggestion of changing >>>>> synchronize_sched() in include/linux/percpu-rwsem.h to >>>>> synchronize_sched_expedited()? >>>> >>>> Tested. It seems as fast as before, but may be a "tick" slower. Just >>>> perception. I was getting pretty much 0.012s with everything reverted. >>>> With synchronize_sched_expedited(), it seems to be 0.012s ~ 0.013s. >>>> So, it's good. >>> >>> Excellent >>> >>>>> linux-next also has a re-write of the per-cpu rw sems, out of Andrews >>>>> tree. It would be a good data point it you could test that, too. >>>> >>>> Tested. It's slower. 0.350s. But still faster than 0.500s without the >>>> patch. >>> >>> Makes sense, it's 2 synchronize_sched() instead of 3. So it doesn't fix >>> the real issue, which is having to do synchronize_sched() in the first >>> place. >>> >>>> # time mount /dev/sda1 /mnt; sync; sync; umount /mnt >>>> >>>> >>>> So, here's the comparison ... >>>> >>>> 0.500s 3.7.0-rc7 >>>> 0.168s 3.7.0-rc2 >>>> 0.012s 3.6.0 >>>> 0.013s 3.7.0-rc7 + synchronize_sched_expedited() >>>> 0.350s 3.7.0-rc7 + Oleg's patch. >>> >>> I wonder how many of them are due to changing to the same block size. >>> Does the below patch make a difference? >> >> This patch is wrong because you must check if the device is mapped while >> holding bdev->bd_block_size_semaphore (because >> bdev->bd_block_size_semaphore prevents new mappings from being created) > > No it doesn't. If you read the patch, that was moved to i_mmap_mutex. > >> I'm sending another patch that has the same effect. >> >> >> Note that ext[234] filesystems set blocksize to 1024 temporarily during >> mount, so it doesn't help much (it only helps for other filesystems, such >> as jfs). For ext[234], you have a device with default block size 4096, the >> filesystem sets block size to 1024 during mount, reads the super block and >> sets it back to 4096. > > That is true, hence I was hesitant to think it'll actually help. In any > case, basically any block device will have at least one blocksize > transitioned when being mounted for the first time. I wonder if we just > shouldn't default to having a 4kb soft block size to avoid that one, > though it is working around the issue to some degree. I tested on reiserfs. It helped. 0.012s as in 3.6.0, but as Mikulas mentioned, it didn't really improve much for ext2. Jeff. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] percpu-rwsem: use synchronize_sched_expedited
On Wed, Nov 28, 2012 at 11:59 AM, Mikulas Patocka wrote: > > > On Tue, 27 Nov 2012, Jeff Chua wrote: > >> On Tue, Nov 27, 2012 at 3:38 PM, Jens Axboe wrote: >> > On 2012-11-27 06:57, Jeff Chua wrote: >> >> On Sun, Nov 25, 2012 at 7:23 AM, Jeff Chua >> >> wrote: >> >>> On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka >> >>> wrote: >> >>>> So it's better to slow down mount. >> >>> >> >>> I am quite proud of the linux boot time pitting against other OS. Even >> >>> with 10 partitions. Linux can boot up in just a few seconds, but now >> >>> you're saying that we need to do this semaphore check at boot up. By >> >>> doing so, it's inducing additional 4 seconds during boot up. >> >> >> >> By the way, I'm using a pretty fast SSD (Samsung PM830) and fast CPU >> >> (2.8GHz). I wonder if those on slower hard disk or slower CPU, what >> >> kind of degradation would this cause or just the same? >> > >> > It'd likely be the same slow down time wise, but as a percentage it >> > would appear smaller on a slower disk. >> > >> > Could you please test Mikulas' suggestion of changing >> > synchronize_sched() in include/linux/percpu-rwsem.h to >> > synchronize_sched_expedited()? >> >> Tested. It seems as fast as before, but may be a "tick" slower. Just >> perception. I was getting pretty much 0.012s with everything reverted. >> With synchronize_sched_expedited(), it seems to be 0.012s ~ 0.013s. >> So, it's good. >> >> >> > linux-next also has a re-write of the per-cpu rw sems, out of Andrews >> > tree. It would be a good data point it you could test that, too. >> >> Tested. It's slower. 0.350s. But still faster than 0.500s without the patch. >> >> # time mount /dev/sda1 /mnt; sync; sync; umount /mnt >> >> >> So, here's the comparison ... >> >> 0.500s 3.7.0-rc7 >> 0.168s 3.7.0-rc2 >> 0.012s 3.6.0 >> 0.013s 3.7.0-rc7 + synchronize_sched_expedited() >> 0.350s 3.7.0-rc7 + Oleg's patch. >> >> >> Thanks, >> Jeff. > > OK, I'm seinding two patches to reduce mount times. If it is possible to > put them to 3.7.0, put them there. > > Mikulas > > --- > > percpu-rwsem: use synchronize_sched_expedited > > Use synchronize_sched_expedited() instead of synchronize_sched() > to improve mount speed. > > This patch improves mount time from 0.500s to 0.013s. > > Note: if realtime people complain about the use > synchronize_sched_expedited() and synchronize_rcu_expedited(), I suggest > that they introduce an option CONFIG_REALTIME or > /proc/sys/kernel/realtime and turn off these *_expedited functions if > the option is enabled (i.e. turn synchronize_sched_expedited into > synchronize_sched and synchronize_rcu_expedited into synchronize_rcu). > > Signed-off-by: Mikulas Patocka > > --- > include/linux/percpu-rwsem.h |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > Index: linux-3.7-rc7/include/linux/percpu-rwsem.h > === > --- linux-3.7-rc7.orig/include/linux/percpu-rwsem.h 2012-11-28 > 02:41:03.0 +0100 > +++ linux-3.7-rc7/include/linux/percpu-rwsem.h 2012-11-28 02:41:15.0 > +0100 > @@ -13,7 +13,7 @@ struct percpu_rw_semaphore { > }; > > #define light_mb() barrier() > -#define heavy_mb() synchronize_sched() > +#define heavy_mb() synchronize_sched_expedited() > > static inline void percpu_down_read(struct percpu_rw_semaphore *p) > { > @@ -51,7 +51,7 @@ static inline void percpu_down_write(str > { > mutex_lock(&p->mtx); > p->locked = true; > - synchronize_sched(); /* make sure that all readers exit the > rcu_read_lock_sched region */ > + synchronize_sched_expedited(); /* make sure that all readers exit the > rcu_read_lock_sched region */ > while (__percpu_count(p->counters)) > msleep(1); > heavy_mb(); /* C, between read of p->counter and write to data, > paired with B */ Mikulas, Tested this one and this is good! Back to 3.6.0 behavior. As for the 2nd patch (block_dev.c), it didn't really make any difference for ext2/3/4, but for reiserfs, it does. So, won't just the patch about(synchronize_sched_expedited) be good enough? Thanks, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] block_dev: don't take the write lock if block size doesn't change
On Wed, Nov 28, 2012 at 12:01 PM, Mikulas Patocka wrote: > block_dev: don't take the write lock if block size doesn't change > > Taking the write lock has a big performance impact on the whole system > (because of synchronize_sched_expedited). This patch avoids taking the > write lock if the block size doesn't change (i.e. when mounting > filesystem with block size equal to the default block size). > > The logic to test if the block device is mapped was moved to a separate > function is_bdev_mapped to avoid code duplication. > > Signed-off-by: Mikulas Patocka > > --- > fs/block_dev.c | 25 ++--- > 1 file changed, 18 insertions(+), 7 deletions(-) > > Index: linux-3.7-rc7/fs/block_dev.c > === > --- linux-3.7-rc7.orig/fs/block_dev.c 2012-11-28 04:09:01.0 +0100 > +++ linux-3.7-rc7/fs/block_dev.c2012-11-28 04:13:53.0 +0100 > @@ -114,10 +114,18 @@ void invalidate_bdev(struct block_device > } > EXPORT_SYMBOL(invalidate_bdev); > > -int set_blocksize(struct block_device *bdev, int size) > +static int is_bdev_mapped(struct block_device *bdev) > { > - struct address_space *mapping; > + int ret_val; > + struct address_space *mapping = bdev->bd_inode->i_mapping; > + mutex_lock(&mapping->i_mmap_mutex); > + ret_val = mapping_mapped(mapping); > + mutex_unlock(&mapping->i_mmap_mutex); > + return ret_val; > +} > > +int set_blocksize(struct block_device *bdev, int size) > +{ > /* Size must be a power of two, and between 512 and PAGE_SIZE */ > if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size)) > return -EINVAL; > @@ -126,18 +134,21 @@ int set_blocksize(struct block_device *b > if (size < bdev_logical_block_size(bdev)) > return -EINVAL; > > + /* > +* If the block size doesn't change, don't take the write lock. > +* We check for is_bdev_mapped anyway, for consistent behavior. > +*/ > + if (size == bdev->bd_block_size) > + return is_bdev_mapped(bdev) ? -EBUSY : 0; > + > /* Prevent starting I/O or mapping the device */ > percpu_down_write(&bdev->bd_block_size_semaphore); > > /* Check that the block device is not memory mapped */ > - mapping = bdev->bd_inode->i_mapping; > - mutex_lock(&mapping->i_mmap_mutex); > - if (mapping_mapped(mapping)) { > - mutex_unlock(&mapping->i_mmap_mutex); > + if (is_bdev_mapped(bdev)) { > percpu_up_write(&bdev->bd_block_size_semaphore); > return -EBUSY; > } > - mutex_unlock(&mapping->i_mmap_mutex); > > /* Don't change the size if it is same as current */ > if (bdev->bd_block_size != size) { This patch didn't really make any difference for ext2/3/4 but for reiserfs it does. With the synchronize_sched_expedited() patch applied, it didn't make any difference. Thanks, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] vfs: remove DCACHE_NEED_LOOKUP
The code that relied on that flag was ripped out of btrfs quite some time ago, and never added back. Josef indicated that he was going to take a different approach to the problem in btrfs, and that we could just eliminate this flag. Cc: Josef Bacik Signed-off-by: Jeff Layton --- fs/btrfs/inode.c | 16 +--- fs/dcache.c| 33 + fs/namei.c | 11 +-- include/linux/dcache.h | 8 4 files changed, 3 insertions(+), 65 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 95542a1..0e5ca81 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4219,16 +4219,7 @@ struct inode *btrfs_lookup_dentry(struct inode *dir, struct dentry *dentry) if (dentry->d_name.len > BTRFS_NAME_LEN) return ERR_PTR(-ENAMETOOLONG); - if (unlikely(d_need_lookup(dentry))) { - memcpy(&location, dentry->d_fsdata, sizeof(struct btrfs_key)); - kfree(dentry->d_fsdata); - dentry->d_fsdata = NULL; - /* This thing is hashed, drop it for now */ - d_drop(dentry); - } else { - ret = btrfs_inode_by_name(dir, dentry, &location); - } - + ret = btrfs_inode_by_name(dir, dentry, &location); if (ret < 0) return ERR_PTR(ret); @@ -4298,11 +4289,6 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry, struct dentry *ret; ret = d_splice_alias(btrfs_lookup_dentry(dir, dentry), dentry); - if (unlikely(d_need_lookup(dentry))) { - spin_lock(&dentry->d_lock); - dentry->d_flags &= ~DCACHE_NEED_LOOKUP; - spin_unlock(&dentry->d_lock); - } return ret; } diff --git a/fs/dcache.c b/fs/dcache.c index 3a463d0..1782be3 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -455,24 +455,6 @@ void d_drop(struct dentry *dentry) EXPORT_SYMBOL(d_drop); /* - * d_clear_need_lookup - drop a dentry from cache and clear the need lookup flag - * @dentry: dentry to drop - * - * This is called when we do a lookup on a placeholder dentry that needed to be - * looked up. The dentry should have been hashed in order for it to be found by - * the lookup code, but now needs to be unhashed while we do the actual lookup - * and clear the DCACHE_NEED_LOOKUP flag. - */ -void d_clear_need_lookup(struct dentry *dentry) -{ - spin_lock(&dentry->d_lock); - __d_drop(dentry); - dentry->d_flags &= ~DCACHE_NEED_LOOKUP; - spin_unlock(&dentry->d_lock); -} -EXPORT_SYMBOL(d_clear_need_lookup); - -/* * Finish off a dentry we've decided to kill. * dentry->d_lock must be held, returns with it unlocked. * If ref is non-zero, then decrement the refcount too. @@ -565,13 +547,7 @@ repeat: if (d_unhashed(dentry)) goto kill_it; - /* -* If this dentry needs lookup, don't set the referenced flag so that it -* is more likely to be cleaned up by the dcache shrinker in case of -* memory pressure. -*/ - if (!d_need_lookup(dentry)) - dentry->d_flags |= DCACHE_REFERENCED; + dentry->d_flags |= DCACHE_REFERENCED; dentry_lru_add(dentry); dentry->d_count--; @@ -1737,13 +1713,6 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode, } /* -* We are going to instantiate this dentry, unhash it and clear the -* lookup flag so we can do that. -*/ - if (unlikely(d_need_lookup(found))) - d_clear_need_lookup(found); - - /* * Negative dentry: instantiate it unless the inode is a directory and * already has a dentry. */ diff --git a/fs/namei.c b/fs/namei.c index 937f9d5..9738f97 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1275,9 +1275,7 @@ static struct dentry *lookup_dcache(struct qstr *name, struct dentry *dir, *need_lookup = false; dentry = d_lookup(dir, name); if (dentry) { - if (d_need_lookup(dentry)) { - *need_lookup = true; - } else if (dentry->d_flags & DCACHE_OP_REVALIDATE) { + if (dentry->d_flags & DCACHE_OP_REVALIDATE) { error = d_revalidate(dentry, flags); if (unlikely(error <= 0)) { if (error < 0) { @@ -1383,8 +1381,6 @@ static int lookup_fast(struct nameidata *nd, struct qstr *name, return -ECHILD; nd->seq = seq; - if (unlikely(d_need_lookup(dentry))) - goto unlazy; if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE)) { status = d_revalidate(dentry, nd->flags); if (unlike
Re: [PATCH 02/16] ata: Convert dev_printk(KERN_ to dev_(
On 10/28/2012 04:05 AM, Joe Perches wrote: dev_ calls take less code than dev_printk(KERN_ and reducing object size is good. Coalesce formats for easier grep. Signed-off-by: Joe Perches --- drivers/ata/pata_cmd64x.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) applied -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ata_piix: reenable MS Virtual PC guests
On 09/18/2012 11:48 AM, Olaf Hering wrote: An earlier commit cd006086fa5d91414d8ff9ff2b78fbb593878e3c ("ata_piix: defer disks to the Hyper-V drivers by default") broke MS Virtual PC guests. Hyper-V guests and Virtual PC guests have nearly identical DMI info. As a result the driver does currently ignore the emulated hardware in Virtual PC guests and defers the handling to hv_blkvsc. Since Virtual PC does not offer paravirtualized drivers no disks will be found in the guest. One difference in the DMI info is the product version. This patch adds a match for MS Virtual PC 2007 and "unignores" the emulated hardware. This was reported for openSuSE 12.1 in bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=737532 Here is a detailed list of DMI info from example guests: hwinfo --bios: virtual pc guest: System Info: #1 Manufacturer: "Microsoft Corporation" Product: "Virtual Machine" Version: "VS2005R2" Serial: "3178-9905-1533-4840-9282-0569-59" UUID: undefined, but settable Wake-up: 0x06 (Power Switch) Board Info: #2 Manufacturer: "Microsoft Corporation" Product: "Virtual Machine" Version: "5.0" Serial: "3178-9905-1533-4840-9282-0569-59" Chassis Info: #3 Manufacturer: "Microsoft Corporation" Version: "5.0" Serial: "3178-9905-1533-4840-9282-0569-59" Asset Tag: "7188-3705-6309-9738-9645-0364-00" Type: 0x03 (Desktop) Bootup State: 0x03 (Safe) Power Supply State: 0x03 (Safe) Thermal State: 0x01 (Other) Security Status: 0x01 (Other) win2k8 guest: System Info: #1 Manufacturer: "Microsoft Corporation" Product: "Virtual Machine" Version: "7.0" Serial: "9106-3420-9819-5495-1514-2075-48" UUID: undefined, but settable Wake-up: 0x06 (Power Switch) Board Info: #2 Manufacturer: "Microsoft Corporation" Product: "Virtual Machine" Version: "7.0" Serial: "9106-3420-9819-5495-1514-2075-48" Chassis Info: #3 Manufacturer: "Microsoft Corporation" Version: "7.0" Serial: "9106-3420-9819-5495-1514-2075-48" Asset Tag: "7076-9522-6699-1042-9501-1785-77" Type: 0x03 (Desktop) Bootup State: 0x03 (Safe) Power Supply State: 0x03 (Safe) Thermal State: 0x01 (Other) Security Status: 0x01 (Other) win2k12 guest: System Info: #1 Manufacturer: "Microsoft Corporation" Product: "Virtual Machine" Version: "7.0" Serial: "8179-1954-0187-0085-3868-2270-14" UUID: undefined, but settable Wake-up: 0x06 (Power Switch) Board Info: #2 Manufacturer: "Microsoft Corporation" Product: "Virtual Machine" Version: "7.0" Serial: "8179-1954-0187-0085-3868-2270-14" Chassis Info: #3 Manufacturer: "Microsoft Corporation" Version: "7.0" Serial: "8179-1954-0187-0085-3868-2270-14" Asset Tag: "8374-0485-4557-6331-0620-5845-25" Type: 0x03 (Desktop) Bootup State: 0x03 (Safe) Power Supply State: 0x03 (Safe) Thermal State: 0x01 (Other) Security Status: 0x01 (Other) Signed-off-by: Olaf Hering applied. Apologies for missing this one. It was accidentally shifting into the low-priority pile. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] bdi: add a user-tunable cpu_list for the bdi flusher threads
Hi, In realtime environments, it may be desirable to keep the per-bdi flusher threads from running on certain cpus. This patch adds a cpu_list file to /sys/class/bdi/* to enable this. The default is to tie the flusher threads to the same numa node as the backing device (though I could be convinced to make it a mask of all cpus to avoid a change in behaviour). Comments, as always, are appreciated. Cheers, Jeff Signed-off-by: Jeff Moyer diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 2a9a9ab..68263e0 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -18,6 +18,7 @@ #include #include #include +#include struct page; struct device; @@ -105,6 +106,9 @@ struct backing_dev_info { struct timer_list laptop_mode_wb_timer; + cpumask_t *flusher_cpumask; /* used for writeback thread scheduling */ + struct mutex flusher_cpumask_mutex; + #ifdef CONFIG_DEBUG_FS struct dentry *debug_dir; struct dentry *debug_stats; diff --git a/mm/backing-dev.c b/mm/backing-dev.c index d3ca2b3..c4f7dde 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -10,6 +10,7 @@ #include #include #include +#include #include static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0); @@ -221,12 +222,59 @@ static ssize_t max_ratio_store(struct device *dev, } BDI_SHOW(max_ratio, bdi->max_ratio) +static ssize_t cpu_list_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t count) +{ + struct backing_dev_info *bdi = dev_get_drvdata(dev); + struct bdi_writeback *wb = &bdi->wb; + cpumask_var_t newmask; + ssize_t ret; + struct task_struct *task; + + if (!alloc_cpumask_var(&newmask, GFP_KERNEL)) + return -ENOMEM; + + ret = cpulist_parse(buf, newmask); + if (!ret) { + spin_lock(&bdi->wb_lock); + task = wb->task; + get_task_struct(task); + spin_unlock(&bdi->wb_lock); + if (task) + ret = set_cpus_allowed_ptr(task, newmask); + put_task_struct(task); + if (ret == 0) { + mutex_lock(&bdi->flusher_cpumask_mutex); + cpumask_copy(bdi->flusher_cpumask, newmask); + mutex_unlock(&bdi->flusher_cpumask_mutex); + ret = count; + } + } + free_cpumask_var(newmask); + + return ret; +} + +static ssize_t cpu_list_show(struct device *dev, + struct device_attribute *attr, char *page) +{ + struct backing_dev_info *bdi = dev_get_drvdata(dev); + ssize_t ret; + + mutex_lock(&bdi->flusher_cpumask_mutex); + ret = cpulist_scnprintf(page, PAGE_SIZE-1, bdi->flusher_cpumask); + mutex_unlock(&bdi->flusher_cpumask_mutex); + + return ret; +} + #define __ATTR_RW(attr) __ATTR(attr, 0644, attr##_show, attr##_store) static struct device_attribute bdi_dev_attrs[] = { __ATTR_RW(read_ahead_kb), __ATTR_RW(min_ratio), __ATTR_RW(max_ratio), + __ATTR_RW(cpu_list), __ATTR_NULL, }; @@ -428,6 +476,7 @@ static int bdi_forker_thread(void *ptr) writeback_inodes_wb(&bdi->wb, 1024, WB_REASON_FORKER_THREAD); } else { + int ret; /* * The spinlock makes sure we do not lose * wake-ups when racing with 'bdi_queue_work()'. @@ -437,6 +486,14 @@ static int bdi_forker_thread(void *ptr) spin_lock_bh(&bdi->wb_lock); bdi->wb.task = task; spin_unlock_bh(&bdi->wb_lock); + mutex_lock(&bdi->flusher_cpumask_mutex); + ret = set_cpus_allowed_ptr(task, + bdi->flusher_cpumask); + mutex_unlock(&bdi->flusher_cpumask_mutex); + if (ret) + printk_once("%s: failed to bind flusher" + " thread %s, error %d\n", + __func__, task->comm, ret); wake_up_process(task); } bdi_clear_pending(bdi); @@ -509,6 +566,17 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent, dev_name(dev)); if (IS_ERR(wb->task)) return PTR_ERR(wb-
Re: [PATCH] tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)
On 11/29/2012 12:15 PM, Jim Meyering wrote: > Hugh Dickins wrote: >> On Thu, 29 Nov 2012, Jaegeuk Hanse wrote: > ... >>> But this time in which scenario will use it? >> >> I was not very convinced by the grep argument from Jim and Paul: >> that seemed to be grep holding on to a no-arbitrary-limits dogma, >> at the expense of its users, causing an absurd line-length issue, >> which use of SEEK_DATA happens to avoid in some cases. >> >> The cp of sparse files from Jeff and Dave was more convincing; >> but I still didn't see why little old tmpfs needed to be ahead >> of the pack. >> >> But at LinuxCon/Plumbers in San Diego in August, a more convincing >> case was made: I was hoping you would not ask, because I did not take >> notes, and cannot pass on the details - was it rpm building on tmpfs? >> I was convinced enough to promise support on tmpfs when support on >> ext4 goes in. > > Re the cp-vs-sparse-file case, the current FIEMAP-based code in GNU > cp is ugly and complicated enough that until recently it harbored a > hard-to-reproduce data-corrupting bug[*]. Now that SEEK_DATA/SEEK_HOLE > support work will work also for tmpfs and ext4, we can plan to remove > the FIEMAP-based code in favor of a simpler SEEK_DATA/SEEK_HOLE-based > implementation. How do we teach du(1) to aware of the real disk footprint with Btrfs clone or OCFS2 reflinked files if we remove the FIEMAP-based code? How about if we still keep it there, and introduce SEEK_DATA/SEEK_HOLE code to the extent-scan module which is dedicated to deal with sparse files? Thanks, -Jeff > > With the rise of virtualization, copying sparse images efficiently > (probably searching, too) is becoming more and more important. > > So, yes, GNU cp will soon use this feature. > > [*] https://plus.google.com/u/0/114228401647637059102/posts/FDV3JEaYsKD > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)
On 11/29/2012 02:53 PM, Jim Meyering wrote: > Jeff Liu wrote: > >> On 11/29/2012 12:15 PM, Jim Meyering wrote: >>> Hugh Dickins wrote: >>>> On Thu, 29 Nov 2012, Jaegeuk Hanse wrote: >>> ... >>>>> But this time in which scenario will use it? >>>> >>>> I was not very convinced by the grep argument from Jim and Paul: >>>> that seemed to be grep holding on to a no-arbitrary-limits dogma, >>>> at the expense of its users, causing an absurd line-length issue, >>>> which use of SEEK_DATA happens to avoid in some cases. >>>> >>>> The cp of sparse files from Jeff and Dave was more convincing; >>>> but I still didn't see why little old tmpfs needed to be ahead >>>> of the pack. >>>> >>>> But at LinuxCon/Plumbers in San Diego in August, a more convincing >>>> case was made: I was hoping you would not ask, because I did not take >>>> notes, and cannot pass on the details - was it rpm building on tmpfs? >>>> I was convinced enough to promise support on tmpfs when support on >>>> ext4 goes in. >>> >>> Re the cp-vs-sparse-file case, the current FIEMAP-based code in GNU >>> cp is ugly and complicated enough that until recently it harbored a >>> hard-to-reproduce data-corrupting bug[*]. Now that SEEK_DATA/SEEK_HOLE >>> support work will work also for tmpfs and ext4, we can plan to remove >>> the FIEMAP-based code in favor of a simpler SEEK_DATA/SEEK_HOLE-based >>> implementation. >> How do we teach du(1) to aware of the real disk footprint with Btrfs >> clone or OCFS2 reflinked files if we remove the FIEMAP-based code? >> >> How about if we still keep it there, and introduce SEEK_DATA/SEEK_HOLE >> code to the extent-scan module which is dedicated to deal with sparse files? > > Hi Jeff, > By "removing the FIEMAP-based code" I mean the uses in copy.c. > All of that should remain independent of how du does its job, > so if FIEMAP is required for your planned du enhancement, > then feel free to use it. Hi Jim, Thanks for the clarification, that's fine. :) Regards, -Jeff > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
On Thu, Nov 29, 2012 at 2:45 PM, Al Viro wrote: > On Wed, Nov 28, 2012 at 10:37:27PM -0800, Linus Torvalds wrote: >> On Wed, Nov 28, 2012 at 10:30 PM, Al Viro wrote: >> > >> > Note that sync_blockdev() a few lines prior to that is good only if we >> > have no other processes doing write(2) (or dirtying the mmapped pages, >> > for that matter). The window isn't too wide, but... >> >> So with Mikulas' patches, the write actually would block (at write >> level) due to the locking. The mmap'ed patches may be around and >> flushed, but the logic to not allow currently *active* mmaps (with the >> rather nasty random -EBUSY return value) should mean that there is no >> race. >> >> Or rather, there's a race, but it results in that EBUSY thing. > > Same as with fs mounted on it, or the sucker having been claimed for > RAID array, etc. Frankly, I'm more than slightly tempted to make > bdev mmap() just claim the sodding device exclusive for as long as > it's mmapped... > > In principle, I agree, but... I still have nightmares from mmap/truncate > races way back. You are stepping into what used to be a really nasty > minefield. I'll look into that, but it's *definitely* not -rc8 fodder. Just let me know which relevant patch(es) you want me to test or break. Thanks, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFQ PATCH] cifs: Change default security error message
On Thu, 29 Nov 2012 18:30:53 +0100 Jesper Nilsson wrote: > Hi! > > Connecting with a default security mechanism prompts an KERN_ERROR > output warning to the user that the default mechanism will be changed > in Linux 3.3. > > We're now at 3.7, so we either could remove the warning completely > (if the default has been changed), or we could bump the number to > what our current target for the change is. > > > The below patch changes the cERROR (which turns into a printk with KERN_ERROR) > into a straight printk with KERN_WARNING and changes the text to indicate > that it was changed in 3.3. > > I expect that the patch is incorrect and that we should choose > another of the alternative solutions above, but I'd like to get > some input on this. > > Not-Signed-off-by: Jesper Nilsson > --- > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c > index c83f5b65..968456f 100644 > --- a/fs/cifs/connect.c > +++ b/fs/cifs/connect.c > @@ -2480,9 +2480,9 @@ cifs_get_smb_ses(struct TCP_Server_Info *server, struct > smb_vol *volume_info) > supported for many years, time to update default security mechanism */ > if ((volume_info->secFlg == 0) && warned_on_ntlm == false) { > warned_on_ntlm = true; > - cERROR(1, "default security mechanism requested. The default " > - "security mechanism will be upgraded from ntlm to " > - "ntlmv2 in kernel release 3.3"); > + printk(KERN_WARNING "default security mechanism requested. " > + "The default security mechanism was changed " > + " from ntlm to ntlmv2 in kernel release 3.3"); > } > ses->overrideSecFlg = volume_info->secFlg; > > > > /^JN - Jesper Nilsson I think this warning has lived long enough and needs to go away. Steve supposedly has a patch that finally makes this change, but it hasn't been sent to the list yet... Steve? -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFQ PATCH] cifs: Change default security error message
On Thu, 29 Nov 2012 12:54:41 -0600 Steve French wrote: > On Thu, Nov 29, 2012 at 12:25 PM, Jeff Layton wrote: > > On Thu, 29 Nov 2012 18:30:53 +0100 > > Jesper Nilsson wrote: > > > >> Hi! > >> > >> Connecting with a default security mechanism prompts an KERN_ERROR > >> output warning to the user that the default mechanism will be changed > >> in Linux 3.3. > >> > >> We're now at 3.7, so we either could remove the warning completely > >> (if the default has been changed), or we could bump the number to > >> what our current target for the change is. > >> > >> > >> The below patch changes the cERROR (which turns into a printk with > >> KERN_ERROR) > >> into a straight printk with KERN_WARNING and changes the text to indicate > >> that it was changed in 3.3. > >> > >> I expect that the patch is incorrect and that we should choose > >> another of the alternative solutions above, but I'd like to get > >> some input on this. > >> > >> Not-Signed-off-by: Jesper Nilsson > >> --- > >> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c > >> index c83f5b65..968456f 100644 > >> --- a/fs/cifs/connect.c > >> +++ b/fs/cifs/connect.c > >> @@ -2480,9 +2480,9 @@ cifs_get_smb_ses(struct TCP_Server_Info *server, > >> struct smb_vol *volume_info) > >> supported for many years, time to update default security mechanism > >> */ > >> if ((volume_info->secFlg == 0) && warned_on_ntlm == false) { > >> warned_on_ntlm = true; > >> - cERROR(1, "default security mechanism requested. The > >> default " > >> - "security mechanism will be upgraded from ntlm to " > >> - "ntlmv2 in kernel release 3.3"); > >> + printk(KERN_WARNING "default security mechanism requested. " > >> + "The default security mechanism was changed " > >> + " from ntlm to ntlmv2 in kernel release 3.3"); > >> } > >> ses->overrideSecFlg = volume_info->secFlg; > >> > >> > >> > >> /^JN - Jesper Nilsson > > > > I think this warning has lived long enough and needs to go away. Steve > > supposedly has a patch that finally makes this change, but it hasn't > > been sent to the list yet... Steve? > > It was posted to list on November 25th (and you even included it in > your git tree on samba.org ?!) > Oops, my mistake. You're quite correct... -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] fs/aio.c: use get_user_pages_non_movable() to pin ring pages when support memory hotremove
Lin Feng writes: > This patch gets around the aio ring pages can't be migrated bug caused by > get_user_pages() via using the new function. It only works as configed with > CONFIG_MEMORY_HOTREMOVE, otherwise it uses the old version of > get_user_pages(). > > Cc: Benjamin LaHaise > Cc: Alexander Viro > Cc: Andrew Morton > Reviewed-by: Tang Chen > Reviewed-by: Gu Zheng > Signed-off-by: Lin Feng > --- > fs/aio.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/aio.c b/fs/aio.c > index 71f613c..0e9b30a 100644 > --- a/fs/aio.c > +++ b/fs/aio.c > @@ -138,9 +138,15 @@ static int aio_setup_ring(struct kioctx *ctx) > } > > dprintk("mmap address: 0x%08lx\n", info->mmap_base); > +#ifdef CONFIG_MEMORY_HOTREMOVE > + info->nr_pages = get_user_pages_non_movable(current, ctx->mm, > + info->mmap_base, nr_pages, > + 1, 0, info->ring_pages, NULL); > +#else > info->nr_pages = get_user_pages(current, ctx->mm, > info->mmap_base, nr_pages, > 1, 0, info->ring_pages, NULL); > +#endif Can't you hide this in your 1/1 patch, by providing this function as just a static inline wrapper around get_user_pages when CONFIG_MEMORY_HOTREMOVE is not enabled? Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/4] SUNRPC: remove cache_detail->cache_upcall callback
s/dns_resolve.o fs/nfs/dns_resolve.c: In function ‘nfs_dns_resolver_cache_init’: fs/nfs/dns_resolve.c:377:4: error: ‘struct cache_detail’ has no member named ‘cache_upcall’ fs/nfs/dns_resolve.c:377:35: warning: left-hand operand of comma expression has no effect [-Wunused-value] fs/nfs/dns_resolve.c:377:35: warning: value computed is not used [-Wunused-value] fs/nfs/dns_resolve.c:377:35: warning: value computed is not used [-Wunused-value] fs/nfs/dns_resolve.c:377:35: warning: value computed is not used [-Wunused-value] fs/nfs/dns_resolve.c:377:35: warning: value computed is not used [-Wunused-value] fs/nfs/dns_resolve.c:377:35: warning: value computed is not used [-Wunused-value] fs/nfs/dns_resolve.c:377:35: warning: value computed is not used [-Wunused-value] fs/nfs/dns_resolve.c: At top level: fs/nfs/dns_resolve.c:129:13: warning: ‘nfs_dns_request’ defined but not used [-Wunused-function] make[1]: *** [fs/nfs/dns_resolve.o] Error 1 make: *** [_module_fs/nfs] Error 2 ...looks like you need to convert that cache_detail to use your new scheme as well? -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] setxattr bugs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2/2/13 11:30 PM, Al Viro wrote: > * JFS, since 2005: setxattr(name, "system.posix_acl_access", NULL, > 0, 0) succeeds, creating an empty EA with "system.posix_acl_access" > as name. Validity checks should apply _after_ if (value == NULL) { > /* empty EA, do not remove */ value = ""; value_len = 0; } and not > before it. * reiserfs, since 2009: setxattr(name, attr_name, NULL, > 0, 0) is treated as removexattr(name, attr_name), not as emptying > given xattr. > > The question is, does either of those cross into "established > weirdness in ABI" or are they still at the "bugs to be fixed" > stage? Since the behavior changed once already in 2009 I'd call it a bug. That code was in the SLES kernel for a while before then and I still haven't seen a bug report on it. - -Jeff > FWIW, I'm seriously tempted to stop passing NULL as the third > argument of ->setxattr(), essentially taking all those if (!value) > value = ""; pieces from individual ->setxattr() instances to > __vfs_setxattr_noperm() (all other callers of ->setxattr() never > pass NULL data or 0 size, so it's irrelevant for them). Would fix > both jfs and reiserfs weirdness > > Objections? - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.18 (Darwin) Comment: GPGTools - http://gpgtools.org iQIcBAEBAgAGBQJREGrwAAoJEB57S2MheeWyHvMP/3kpy3Y4U0KNavnPaeL12LXe RC6vIb/dPkoSemFiZ5om26aT70M7MdXJY2ZPCwgtlNpKV6aT0NFchtwiWos2lLLN XndvFZ4M/kQLd9yDEmlcTDZn7p4fhU2Tn7FYrhPLRmOO3zP6fnUxLozSebOnGTO/ xEwV7Qtx7D4Au37khFW/hJvsAJE2Q3NrLgueIJLiTmFvSiOourZNmriNcB73MUeb vYx5gc/bJexS2oFWeQqD6WiL8UQXg4XEKRk4inNVrJWpLV365w45Kpf2zBlvCQwQ W8mdHcHoityOcQJtiXvnVDurUNpFwsthrhVquVgIopovlcvOjNtcpffH8YI9khP/ yol7+57ZDuVx2TY5DrEOa+TOTUrg5ghqagSSmOVDsOVeMngpdFNs8351QcX0IWBn Xt8/eq46g/R7EHI3I1eYJHlMIie0hP1GDc66OP94hcKEWaHbPeKwkSTOlqYH++4h ncSJcxHXWLUTGuV4b61whYTlJ2vBWwEvIteVaQmmXKaOTr41lajZBCWZDeUlzna8 XyJHE5FrcKDLzTNP1R7UNEj863fN0OUma1AKaT/6jNYMqFXOk39emTgZL5QfxP9X uLWG1OVDf87uw5nYOKubNQiORpxl8iSIsQWvZeF9SvvmFA/JzpgZgtLlNqYa78Yv oEq501m9BSEWVSGKxHcu =G2cU -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Entropy generator with 100 kB/s throughput
On Sat, Feb 09, 2013 at 01:06:29PM -0500, Theodore Ts'o wrote: > For that reasons, what I would suggest doing first is generate a > series of outputs of jitterentropy_get_nstime() followed by > schedule(). Look and see if there is any pattern. That's the problem > with the FIPS 140-2 tests. Passing those tests are necessary, but > *NOT* sufficient to prove that you have a good cryptographic > generator. Even the tiniest amount of post-processing, even if they > aren't cryptographic, can result in an utterly predictable series of > numbers to pass the FIPS 140-2 tests. In fact, Stephan's 'xor and shift a counter' design, even with zero input entropy (a counter incrementing at a constant rate), passes FIPS and a surprising fraction of dieharder tests (though some of the tests, such as diehard_count_1s_str, have this generator dead to rights); it also gives an ent entropy well in excess of 7. bits per byte. This means it's hard to be confident that the entropy measured by ent is coming from the input entropy as opposed to the (exceedingly minimal-seeming on the surface!) amount of mixing done by the xor-and-shift... It appears the entropy counted is equal to the log2 of the difference between successive samples (minus one?), but even if you have a good argument why the ones bit is unpredictable it doesn't seem an argument that applies as strongly to the weight-128 bit. When the jitterrand loop runs 10 times, the LSB of that first loop has only gotten up to the 30th bit, so there are 20+ MSBs of the register that have not yet had bits rolled into them that are 'entropic' under the definition being used. Finally, in the 'turbid' random number generator (http://www.av8n.com/turbid/), the author develops a concept of hash saturation. He concludes that if you have a noise source with a known (or assumed!) entropy and a has function that is well-distributed over N bits, you'd better put in M > N bits of entropy in order to be confident that the output register has N bits. He appears to suggest adding around 10 extra bits of randomness, or 74 bits randomness for a 64-bit hash, relatively indepently of the size of the hash. This design gathers only 64 bits if you trust the input entropy calculation, which according to the hash saturation calculation means that the output will only have about 63.2 bits of randomness per 64 bits output. Here's my 'absolutely zero entropy' version of the jitter random algorithm as I understand it: #include #include const uint64_t dt = 65309; uint64_t t, r; static inline uint64_t rol64(uint64_t word, unsigned int shift) { return (word << shift) | (word >> (64 - shift)); } uint64_t jitterrand() { int i; // each sample from the 'stuck counter' will be accounted as 7 bits of // entropy, so 10 cycles to get >= 63 bits of randomness for(i=0; i<10; i++) { t += dt; r = rol64(r ^ t, 3); } return r; } int main() { while(1) { uint64_t val = jitterrand(); ssize_t res = write(1, &val, sizeof(val)); if(res < 0) break; } return 0; } // Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Entropy generator with 100 kB/s throughput
OK, my original reading of the mixing code was not accurate. This time around, I started with the original posted tarball and turned the use of the CPU clock into a very simple and clearly bad "clock" that will provide no entropy. --- jitterentropy-0.1/jitterentropy.c 2013-02-08 15:22:22.0 -0600 +++ jitterentropy-0.1-me/jitterentropy.c2013-02-10 09:45:07.0 -0600 @@ -270,12 +270,13 @@ typedef uint64_t __u64; static int fips_enabled = 0; -#define jitterentropy_schedule sched_yield() +#define jitterentropy_schedule (0) static inline void jitterentropy_get_nstime(__u64 *out) { - struct timespec time; - if (clock_gettime(CLOCK_REALTIME, &time) == 0) - *out = time.tv_nsec; +static __u64 t = 0; +const __u64 delta2 = 257; +static __u64 delta; +*out = (t += (delta += delta2)); } /* note: these helper functions are shamelessly stolen from the kernel :-) */ This give a generator that has Entropy = 7.07 bits per byte and fails 6 in 1 FIPS 140-2 tests. It also passes some (but not all) dieharder tests. Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 23/32] Generic dynamic per cpu refcounting
Kent Overstreet writes: > On Fri, Feb 08, 2013 at 03:49:02PM +0100, Jens Axboe wrote: [...] >> I'd feel a lot better deferring the whole aio/dio performance series for >> one merge window. There's very little point in rushing it, and I don't >> think it's been reviewed/tested enough yet. > > It could probably use more review, but it has been sitting in linux-next > and the issues that showed up there are all fixed. You going to help > review it? :) > > I'm not really set on it going in this merge cycle, but testing wise I > do think it's in pretty good shape and I'm not sure where we're going to > get more testing from before it goes in. > > And Andrew - apologies for not getting you the benchmarks you asked for, > getting hardware for it has turned out to be more troublesome than I > expected. Still don't know what's going on with that. I'll try to get some benchmarking numbers for this patch set. -Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build failure after merge of the final tree (nfsd tree related)
On Wed, 3 Apr 2013 17:42:19 +1100 Stephen Rothwell wrote: > Hi all, > > After merging the final tree, today's linux-next build (arm defconfig) > failed like this: > > fs/built-in.o: In function `nfsd_reply_cache_stats_show': > super.c:(.text+0x87308): undefined reference to `__udivdi3' > > Probably caused by commit 187da2f90879 ("nfsd: keep track of the max and > average time to search the cache") which adds a divide by u64 > (num_searches). > > I have reverted that commit and the following one for today. Thanks, known problem... Looks like Bruce's tree has an older version of that patch series. I think we just need to get him to drop that one and merge the new one. -- Jeff Layton signature.asc Description: PGP signature
Re: linux-next: build failure after merge of the final tree (nfsd tree related)
On Wed, 3 Apr 2013 07:33:01 -0400 "J. Bruce Fields" wrote: > On Wed, Apr 03, 2013 at 07:10:54AM -0400, Jeff Layton wrote: > > On Wed, 3 Apr 2013 17:42:19 +1100 > > Stephen Rothwell wrote: > > > > > Hi all, > > > > > > After merging the final tree, today's linux-next build (arm defconfig) > > > failed like this: > > > > > > fs/built-in.o: In function `nfsd_reply_cache_stats_show': > > > super.c:(.text+0x87308): undefined reference to `__udivdi3' > > > > > > Probably caused by commit 187da2f90879 ("nfsd: keep track of the max and > > > average time to search the cache") which adds a divide by u64 > > > (num_searches). > > > > > > I have reverted that commit and the following one for today. > > > > Thanks, known problem... > > > > Looks like Bruce's tree has an older version of that patch series. I > > think we just need to get him to drop that one and merge the new one. > > Arrgh, sorry--could you remind me which is the new one? > It was the one I sent on 3/19. Those patches (plus a couple more) are also in the current nfsd-3.10 branch of my git tree too, so it may be easiest to just pick them from there. Thanks, -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build failure after merge of the final tree (nfsd tree related)
On Wed, 3 Apr 2013 14:05:19 -0400 "J. Bruce Fields" wrote: > On Wed, Apr 03, 2013 at 07:38:57AM -0400, Jeff Layton wrote: > > On Wed, 3 Apr 2013 07:33:01 -0400 > > "J. Bruce Fields" wrote: > > > > > On Wed, Apr 03, 2013 at 07:10:54AM -0400, Jeff Layton wrote: > > > > On Wed, 3 Apr 2013 17:42:19 +1100 > > > > Stephen Rothwell wrote: > > > > > > > > > Hi all, > > > > > > > > > > After merging the final tree, today's linux-next build (arm defconfig) > > > > > failed like this: > > > > > > > > > > fs/built-in.o: In function `nfsd_reply_cache_stats_show': > > > > > super.c:(.text+0x87308): undefined reference to `__udivdi3' > > > > > > > > > > Probably caused by commit 187da2f90879 ("nfsd: keep track of the max > > > > > and > > > > > average time to search the cache") which adds a divide by u64 > > > > > (num_searches). > > > > > > > > > > I have reverted that commit and the following one for today. > > > > > > > > Thanks, known problem... > > > > > > > > Looks like Bruce's tree has an older version of that patch series. I > > > > think we just need to get him to drop that one and merge the new one. > > > > > > Arrgh, sorry--could you remind me which is the new one? > > > > > > > It was the one I sent on 3/19. Those patches (plus a couple more) are > > also in the current nfsd-3.10 branch of my git tree too, so it may be > > easiest to just pick them from there. > > I hate rewriting that branch, but OK, done: does my for-3.10 look right > to you now? > > (It's still missing some of your latest patches.) > > --b. Yeah, sorry for that... I didn't find the problem with __udivdi3 until after I had asked you to merge the earlier set. Mea culpa... Latest branch looks good. It would be good to get the later patches in too, but those are less important than the DRC ones. Thanks, -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] ata: Fix DVD not dectected at some Haswell platforms
On 03/06/2013 10:49 AM, Youquan Song wrote: There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d "ata_piix: make DVD Drive recognisable on systems with Intel Sandybridge chipsets(v2)" fixing the 4 ports IDE controller 32bit PIO mode. We've hit a problem with DVD not recognized on Haswell Desktop platform which includes Lynx Point 2-port SATA controller. This quirk patch disables 32bit PIO on this controller in IDE mode. v2: Change spelling error in statememnt pointed by Sergei Shtylyov. v3: Change comment statememnt and spliting line over 80 characters pointed by Libor Pechacek and also rebase the patch against 3.8-rc7 kernel. Tested-by: Lee, Chun-Yi Signed-off-by: Youquan Song Cc: sta...@vger.kernel.org --- drivers/ata/ata_piix.c | 14 +- 1 files changed, 13 insertions(+), 1 deletions(-) applied -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ata: HDIO_DRIVE_* ioctl() Linux 3.9 regression
On 03/27/2013 08:51 AM, Krzysztof Mazur wrote: On Mon, Mar 25, 2013 at 06:26:50PM +0100, Ronald wrote: In reply to [1]: I have the same issue. Git bisect took 50+ rebuilds xD Smartd does not work anymore since 84a9a8cd9 ([libata] Set proper SK when CK_COND is set.). I hope I'm not stepping on anyone's toe's by chosing the same title. I'm not subscribed to this list. Just wanted to add a 'me2' [1] http://www.spinics.net/lists/linux-ide/msg45268.html It seems that the SAM_STAT_CHECK_CONDITION is not cleared causing -EIO, because that patch modified sensebuf and the check for clearing SAM_STAT_CHECK_CONDITION is no longer valid. diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 318b413..ff44787 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -532,8 +532,8 @@ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user *arg) struct scsi_sense_hdr sshdr; scsi_normalize_sense(sensebuf, SCSI_SENSE_BUFFERSIZE, &sshdr); - if (sshdr.sense_key == 0 && - sshdr.asc == 0 && sshdr.ascq == 0) + if (sshdr.sense_key == RECOVERED_ERROR && + sshdr.asc == 0 && sshdr.ascq == 0x1d) cmd_result &= ~SAM_STAT_CHECK_CONDITION; } @@ -618,8 +618,8 @@ int ata_task_ioctl(struct scsi_device *scsidev, void __user *arg) struct scsi_sense_hdr sshdr; scsi_normalize_sense(sensebuf, SCSI_SENSE_BUFFERSIZE, &sshdr); - if (sshdr.sense_key == 0 && - sshdr.asc == 0 && sshdr.ascq == 0) + if (sshdr.sense_key == RECOVERED_ERROR && + sshdr.asc == 0 && sshdr.ascq == 0x1d) cmd_result &= ~SAM_STAT_CHECK_CONDITION; applied -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [libata] Fix HDIO_DRIVE_CMD ioctl sense data check
On 03/29/2013 01:56 AM, Gwendal Grignou wrote: commit 84a9a8cd9d0aa93c17e5815ab8a9cc4c0a765c63 changed the sense key used for returning task registers, but HDIO_DRIVE_CMD ioctl was not changed accordingly. Tested: check that SMART ENABLE sent using HDIO_DRIVE_CMD returns 0 instead of EIO. Signed-off-by: Gwendal Grignou --- drivers/ata/libata-scsi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) applied the version from Krzysztof Mazur, which covered both cases -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] drivers: ata: Use resource_size function
On 03/16/2013 10:32 AM, Alexandru Gheorghiu wrote: Use resource_size function instead of explicit computation. Patch found using coccinelle. Signed-off-by: Alexandru Gheorghiu --- drivers/ata/pata_octeon_cf.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) applied -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 1/1] AHCI: Optimize interrupt processing
On 03/06/2013 06:26 AM, Alexander Gordeev wrote: Split interrupt service routine into hardware context handler and threaded context handler. That allows to protect ports with individual locks rather than with a single host-wide lock, which results in better parallelism. Signed-off-by: Alexander Gordeev --- drivers/ata/acard-ahci.c|8 ++--- drivers/ata/ahci.c | 54 ++- drivers/ata/ahci.h | 10 +++-- drivers/ata/ahci_platform.c |3 +- drivers/ata/libahci.c | 74 +-- 5 files changed, 85 insertions(+), 64 deletions(-) applied -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v0] Add SHA-3 hash algorithm
On 10/03/2012 01:45 AM, Jeff Garzik wrote: Whee -- SHA-3 is out! I wanted to explore the new toy a bit, and so, here is a blatantly untested rough draft of SHA-3 kernel support. Why rough draft? Because answers to the questions below will inform a more polished version. Just to update people... this has been in a holding pattern, because apparently there are revisions to SHA-3 coming down the pipe. They want to address preimage resistance, and make things faster in hardware. Random quote from NIST, on the NIST hash-forum, which doesn't provide detail but does summarize general feeling: "As best we can tell, continuing to pay that performance penalty for all future uses of SHA3 has no benefit. (All this is a longwinded way of saying: we were wrong, but hopefully we got better.)" Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 6/7] NFSv4: Add O_DENY* open flags support
On Thu, 4 Apr 2013 14:30:12 +0400 Pavel Shilovsky wrote: > 2013/3/12 Jeff Layton : > > On Mon, 11 Mar 2013 14:54:34 -0400 > > Jeff Layton wrote: > > > >> On Thu, 28 Feb 2013 19:25:32 +0400 > >> Pavel Shilovsky wrote: > >> > >> > by passing these flags to NFSv4 open request. > >> > > >> > Signed-off-by: Pavel Shilovsky > >> > --- > >> > fs/nfs/nfs4xdr.c | 24 > >> > 1 file changed, 20 insertions(+), 4 deletions(-) > >> > > >> > diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c > >> > index 26b1439..58ddc74 100644 > >> > --- a/fs/nfs/nfs4xdr.c > >> > +++ b/fs/nfs/nfs4xdr.c > >> > @@ -1325,7 +1325,8 @@ static void encode_lookup(struct xdr_stream *xdr, > >> > const struct qstr *name, struc > >> > encode_string(xdr, name->len, name->name); > >> > } > >> > > >> > -static void encode_share_access(struct xdr_stream *xdr, fmode_t fmode) > >> > +static void encode_share_access(struct xdr_stream *xdr, fmode_t fmode, > >> > + int open_flags) > >> > { > >> > __be32 *p; > >> > > >> > @@ -1343,7 +1344,22 @@ static void encode_share_access(struct xdr_stream > >> > *xdr, fmode_t fmode) > >> > default: > >> > *p++ = cpu_to_be32(0); > >> > } > >> > - *p = cpu_to_be32(0);/* for linux, share_deny = 0 always > >> > */ > >> > + if (open_flags & O_DENYMAND) { > >> > >> > >> As Bruce mentioned, I think a mount option to enable this on a per-fs > >> basis would be a better approach than this new O_DENYMAND flag. > >> > >> > >> > + switch (open_flags & (O_DENYREAD|O_DENYWRITE)) { > >> > + case O_DENYREAD: > >> > + *p = cpu_to_be32(NFS4_SHARE_DENY_READ); > >> > + break; > >> > + case O_DENYWRITE: > >> > + *p = cpu_to_be32(NFS4_SHARE_DENY_WRITE); > >> > + break; > >> > + case O_DENYREAD|O_DENYWRITE: > >> > + *p = cpu_to_be32(NFS4_SHARE_DENY_BOTH); > >> > + break; > >> > + default: > >> > + *p = cpu_to_be32(0); > >> > + } > >> > + } else > >> > + *p = cpu_to_be32(0); > >> > } > >> > > >> > static inline void encode_openhdr(struct xdr_stream *xdr, const struct > >> > nfs_openargs *arg) > >> > @@ -1354,7 +1370,7 @@ static inline void encode_openhdr(struct > >> > xdr_stream *xdr, const struct nfs_opena > >> > * owner 4 = 32 > >> > */ > >> > encode_nfs4_seqid(xdr, arg->seqid); > >> > - encode_share_access(xdr, arg->fmode); > >> > + encode_share_access(xdr, arg->fmode, arg->open_flags); > >> > p = reserve_space(xdr, 36); > >> > p = xdr_encode_hyper(p, arg->clientid); > >> > *p++ = cpu_to_be32(24); > >> > @@ -1491,7 +1507,7 @@ static void encode_open_downgrade(struct > >> > xdr_stream *xdr, const struct nfs_close > >> > encode_op_hdr(xdr, OP_OPEN_DOWNGRADE, decode_open_downgrade_maxsz, > >> > hdr); > >> > encode_nfs4_stateid(xdr, arg->stateid); > >> > encode_nfs4_seqid(xdr, arg->seqid); > >> > - encode_share_access(xdr, arg->fmode); > >> > + encode_share_access(xdr, arg->fmode, 0); > >> > } > >> > > >> > static void > >> > >> > >> Other than that, this seems reasonable. > >> > >> Acked-by: Jeff Layton > > > > Oh duh... > > > > Please ignore my comment on patch #7 to add a patch for the NFS client. > > This one does that. That said, there may be a potential problem here > > that you need to consider. > > > > In the case of a local filesystem you'll want to set deny locks using > > deny_lock_file(). For a network filesystem like CIFS or NFS though, > > the server will handle that atomically during the open. You need to > > ensure that you don't go trying to set LOCK_MAND locks on the file once > > that's done. > > > > Perhaps you can use a fstype flag to indicate that the filesystem > > handles this during the open and doesn't need to try and set a flock > > lock? > > Also, we can simply mask off O_DENY* flags in open (and atomic_open) > codepath of filesystems that support these flags: > > ... > do open request to the storage > ... > file->f_flags &= ~(O_DENYREAD | O_DENYWRITE | O_DENYDELETE); > ... > return to VFS > ... > > Thoughts? > I'd probably still stick with a FS_* flag for this... That sort of mechanism would work (for now) but sounds like the sort of subtle behavior that's difficult for filesystem authors to get right. It would also be subject to subtle breakage later. Also, suppose there are changes in the future that require you to determine this before calling into ->open? Then you'll have to go back and somehow mark the fs anyway... -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V6 00/30] loop: Issue O_DIRECT aio using bio_vec
Dave Kleikamp writes: > Al, > I'd like to push this patchset to linux-next. Would you like to pull it > into your vfs tree, would you rather I submitted it separately, or do > you have any issues with it before including it? I'm still chasing one regression in this patchset. If you use the ext4 driver for ext2 file systems, and you run the libaio test harness, then you will be able to successfully write beyond the maximum file size in a file (see test case 8). Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ptp: PTP_1588_CLOCK_PCH depends on x86
The EG20T PCH is only compatible with Intel Atom processors so it should depend on x86. Cc: Richard Cochran Signed-off-by: Jeff Mahoney --- drivers/ptp/Kconfig |1 + 1 file changed, 1 insertion(+) --- a/drivers/ptp/Kconfig +++ b/drivers/ptp/Kconfig @@ -72,6 +72,7 @@ config DP83640_PHY config PTP_1588_CLOCK_PCH tristate "Intel PCH EG20T as PTP clock" + depends on X86 select PTP_1588_CLOCK help This driver adds support for using the PCH EG20T as a PTP -- Jeff Mahoney SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V6 00/30] loop: Issue O_DIRECT aio using bio_vec
Dave Kleikamp writes: > On 01/29/2013 12:42 PM, Jeff Moyer wrote: >> Dave Kleikamp writes: >> >>> Al, >>> I'd like to push this patchset to linux-next. Would you like to pull it >>> into your vfs tree, would you rather I submitted it separately, or do >>> you have any issues with it before including it? >> >> I'm still chasing one regression in this patchset. If you use the ext4 >> driver for ext2 file systems, and you run the libaio test harness, then >> you will be able to successfully write beyond the maximum file size in a >> file (see test case 8). > > I found the problem. iov_iter_shorten() wasn't setting i->count to the new > value. > > This fixes it. I'll fix the patchset tomorrow. I just re-ran the test, and I can confirm it fixed it as well. Thanks! Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RESEND] [PATCH] kernel/res_counter.c: remove useless return statement at res_counter_member()
The return statement after BUG() is invalid, move BUG() to the default choice of the switch. Signed-off-by: Jie Liu --- kernel/res_counter.c |5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/res_counter.c b/kernel/res_counter.c index ff55247..748a3bc 100644 --- a/kernel/res_counter.c +++ b/kernel/res_counter.c @@ -135,10 +135,9 @@ res_counter_member(struct res_counter *counter, int member) return &counter->failcnt; case RES_SOFT_LIMIT: return &counter->soft_limit; + default: + BUG(); }; - - BUG(); - return NULL; } ssize_t res_counter_read(struct res_counter *counter, int member, -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: next-20130117 - kernel BUG with aio
Zach Brown writes: >> No, I didn't see that bug until after I'd fixed the other three, but as >> far as I can tell everything's fixed with the patches I'm about to mail >> out - my test VM has been running for the past two days without errors, >> it's kill -9'ing a process that's got iocbs in flight to a loopback >> device every two seconds. > > I'm really worried that this patch series hasn't seen significant enough > testing to justify being queued. > > I'll be first in line for blame for not finding the time to finish my > review of the series. > > What specific tests has this gone through? The aio tests in xfstests / > ltp? (And as you discovered while chasing this bug, whatever platform > you were running on doesn't seem slow enough to catch some paths.. run > all the tests over loop?) > > Jeff, can you suggest a more modern testing regime for the aio core? > It's been so long since I had to hammer on this stuff.. Modern? No. ;-) I usually use xfstests (all of them, not just the aio group), the libaio test harness, and then hand it off to our performance team to stress the code under benchmarking workloads. Oh, and usually targeted testing for the thing that I'm working on. I'll put a couple of kernels together to hand off to our performance team, though I don't know how much time they have at present. Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: IBM RamSan 70/80 device driver.
"Philip J. Kelleher" writes: > From: Joshua H Morris > Philip J Kelleher > > This patch includes the device driver for the IBM RamSan family > of PCI SSD flash storage cards. This driver will inlcude support for the > RamSan 70 and 80. The driver presents a block device for device I/O. Hi, Your driver does not handle REQ_FLUSH. Does that mean that the supported cards do not have a volatile write-back cache? > + blk_size = rsxx_get_logical_block_size(card); > + > + blk_queue_make_request(card->queue, rsxx_make_request); > + blk_queue_bounce_limit(card->queue, BLK_BOUNCE_ANY); > + blk_queue_dma_alignment(card->queue, blk_size - 1); > + blk_queue_max_hw_sectors(card->queue, blkdev_max_hw_sectors); > + blk_queue_logical_block_size(card->queue, blk_size); > + blk_queue_physical_block_size(card->queue, RSXX_HW_BLK_SIZE); > + blk_queue_max_discard_sectors(card->queue, RSXX_HW_BLK_SIZE >> 9); Did you mean to set max_discard_sectors inside the below for loop? Either way, do you really only support a single hardware sector discard? > + queue_flag_set_unlocked(QUEUE_FLAG_NONROT, card->queue); > + if (rsxx_discard_supported(card)) { > + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, card->queue); > + card->queue->limits.discard_granularity = RSXX_HW_BLK_SIZE; > + card->queue->limits.discard_alignment = RSXX_HW_BLK_SIZE; > + card->queue->limits.discard_zeroes_data = 1; > + } Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] ahci: AHCI-mode SATA patch for Intel Avoton DeviceIDs
On 01/25/2013 03:01 PM, Seth Heasley wrote: This patch adds the AHCI and RAID-mode SATA DeviceIDs for the Intel Avoton SOC. Signed-off-by: Seth Heasley --- drivers/ata/ahci.c | 16 1 file changed, 16 insertions(+) applied 1-2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] checkpatch.pl: Fix warnings on code comments
The following commit: commit 058806007450489bb8f457b275e5cb5c946320c1 Author: Joe Perches Date: Thu Oct 4 17:13:35 2012 -0700 checkpatch: check networking specific block comment style Produces warnings on code comments which follow the Linux coding style guide. While the desired code comment style for networking my differ from the rest of the kernel, both styles should be permitted. This patch reverts a portion of the commit to allow multi-line code comments to use either style. Signed-off-by: Jeff Kirsher Tested-by: Jeff Pieper --- scripts/checkpatch.pl | 7 --- 1 file changed, 7 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 4d2c7df..d3ffec5 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -1878,13 +1878,6 @@ sub process { } if ($realfile =~ m@^(drivers/net/|net/)@ && - $rawline =~ /^\+[ \t]*\/\*[ \t]*$/ && - $prevrawline =~ /^\+[ \t]*$/) { - WARN("NETWORKING_BLOCK_COMMENT_STYLE", -"networking block comments don't use an empty /* line, use /* Comment...\n" . $hereprev); - } - - if ($realfile =~ m@^(drivers/net/|net/)@ && $rawline !~ m@^\+[ \t]*\*/[ \t]*$@ && #trailing */ $rawline !~ m@^\+.*/\*.*\*/[ \t]*$@ && #inline /*...*/ $rawline !~ m@^\+.*\*{2,}/[ \t]*$@ && #trailing **/ -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] checkpatch.pl: Fix warnings on code comments
On Sun, 2013-01-27 at 18:59 -0500, David Miller wrote: > From: Jeff Kirsher > Date: Sun, 27 Jan 2013 03:35:39 -0800 > > > Produces warnings on code comments which follow the Linux coding style > > guide. While the desired code comment style for networking my differ > > from the rest of the kernel, both styles should be permitted. > > I was actually going to mention to you guys that I've been lackadasical > about enforcing the comment style I want with the Intel drivers. > > That was a mistake, I should have enforced it strictly, as I do for > the other drivers and the core networking code, from the beginning. > > And it's clearly a mistake if you feel the need to take out the very > checkpatch working that's meant to enforce this comment style in all > of the networking drivers and core. > > Do not revert this, follow it's advice instead. Ok, I am fine with that. I just had not seen any emails/responses that this was direction you wanted to go. So will you be fine with cleanup patches which go through and convert all the existing code comments to the desired format? If so, I will get started on patches to cleanup,convert the Intel drivers to the desired code comment style. signature.asc Description: This is a digitally signed message part
Re: [PATCH] checkpatch.pl: Fix warnings on code comments
On Mon, 2013-01-28 at 09:30 -0800, Joe Perches wrote: > On Mon, 2013-01-28 at 17:17 +, Allan, Bruce W wrote: > > David Miller Sent: Sunday, January 27, 2013 7:07 PM > > > From: Jeff Kirsher > > > > So will you be fine with cleanup patches which go through and > > > > convert all the existing code comments to the desired format? > > > Sure. > > Not all Intel drivers...e1000e already conforms to the comment style :-) > > In case anyone cares, here's a perl regex > to do network comment style conversion. > > $text =~ s@/\*[ \t]*\n[ \t]*\*@/*@g; > $text =~ s@/\*([ \t]*)([^\n]+)\n[ \t]*\*/@/\*$1$2 \*/@g; > > (assumes the entire file is in $text) > > It creates a ~220KB diff for drivers/net/ethernet/intel/ > that I won't post. > Thanks Joe, I will get patches to take care of the Intel drivers (minus e1000e since Bruce has already completed that work). signature.asc Description: This is a digitally signed message part
Re: Reproduceable SATA lockup on 3.7.8 with SSD
On 02/25/2013 07:27 PM, Marc MERLIN wrote: Howdy, I seem to have the same problem (or similar) as Mathieu Desnoyers in https://lkml.org/lkml/2013/2/22/437 I can reliably get my SSD to drop from the SATA bus given the right workload on linux. How can I tell if it's linux's fault of the drive's fault? Manually force speed to 3.0 Gbps, then 1.5 Gbps, and see what happens. Try module/kernel parameter libata.force=1.5Gbps or libata.force=3.0Gbps Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] ACPI and power management fixes for v3.9-rc1
On 02/26/2013 11:58 AM, Tejun Heo wrote: On Tue, Feb 26, 2013 at 08:47:30AM -0800, Linus Torvalds wrote: Anyway, in the US it is definitely not a common term for normal people. Googling "odd" doesn't give anything on optical drives on the first page. On the other hand, >70% is about optical drives on naver.com. The discrepancy is funny given that most computer terms in Korea come from US. Maybe it's because the character combination "odd" doesn't have any other meaning. Even then, I'm surprised there's no optical drive result at all in the first page of google search. Definitely doesn't seem like a common term in US. There is just a lot more "odd" goings-on in the US. Korea is simply less odd than the US :) Will send a patch to fix... Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] CIFS: Decrease reconnection delay when switching nics
On Wed, 27 Feb 2013 12:06:14 +0100 "Stefan (metze) Metzmacher" wrote: > Hi Dave, > > > When messages are currently in queue awaiting a response, decrease amount of > > time before attempting cifs_reconnect to SMB_MAX_RTT = 10 seconds. The > > current > > wait time before attempting to reconnect is currently > > 2*SMB_ECHO_INTERVAL(120 > > seconds) since the last response was recieved. This does not take into > > account > > the fact that messages waiting for a response should be serviced within a > > reasonable round trip time. > > Wouldn't that mean that the client will disconnect a good connection, > if the server doesn't response within 10 seconds? > Reads and Writes can take longer than 10 seconds... > Where does this magic value of 10s come from? Note that a slow server can take *minutes* to respond to writes that are long past the EOF. > > This fixes the issue where user moves from wired to wireless or vice versa > > causing the mount to hang for 120 seconds, when it could reconnect > > considerably > > faster. After this fix it will take SMB_MAX_RTT (10 seconds) from the last > > time the user attempted to access the volume or SMB_MAX_RTT after the last > > echo. The worst case of the latter scenario being > > 2*SMB_ECHO_INTERVAL+SMB_MAX_RTT+small scheduling delay (about 130 seconds). > > Statistically speaking it would normally reconnect sooner. However in the > > best > > case where the user changes nics, and immediately tries to access the cifs > > share it will take SMB_MAX_RTT=10 seconds. > > I think it would be better to detect the broken connection > by using an AF_NETLINK socket listening for RTM_DELADDR > messages? > > metze > Ick -- that sounds horrid ;) Dave, this problem sounds very similar to the one that your colleague Chris J Arges was trying to solve several months ago. You may want to go back and review that thread. Perhaps you can solve both problems at the same time here... -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] CIFS: Decrease reconnection delay when switching nics
On Wed, 27 Feb 2013 16:24:07 -0600 Dave Chiluk wrote: > On 02/27/2013 10:34 AM, Jeff Layton wrote: > > On Wed, 27 Feb 2013 12:06:14 +0100 > > "Stefan (metze) Metzmacher" wrote: > > > >> Hi Dave, > >> > >>> When messages are currently in queue awaiting a response, decrease amount > >>> of > >>> time before attempting cifs_reconnect to SMB_MAX_RTT = 10 seconds. The > >>> current > >>> wait time before attempting to reconnect is currently > >>> 2*SMB_ECHO_INTERVAL(120 > >>> seconds) since the last response was recieved. This does not take into > >>> account > >>> the fact that messages waiting for a response should be serviced within a > >>> reasonable round trip time. > >> > >> Wouldn't that mean that the client will disconnect a good connection, > >> if the server doesn't response within 10 seconds? > >> Reads and Writes can take longer than 10 seconds... > >> > > > > Where does this magic value of 10s come from? Note that a slow server > > can take *minutes* to respond to writes that are long past the EOF. > It comes from the desire to decrease the reconnection delay to something > better than a random number between 60 and 120 seconds. I am not > committed to this number, and it is open for discussion. Additionally > if you look closely at the logic it's not 10 seconds per request, but > actually when requests have been in flight for more than 10 seconds make > sure we've heard from the server in the last 10 seconds. > > Can you explain more fully your use case of writes that are long past > the EOF? Perhaps with a test-case or script that I can test? As far as > I know writes long past EOF will just result in a sparse file, and > return in a reasonable round trip time *(that's at least what I'm seeing > with my testing). dd if=/dev/zero of=/mnt/cifs/a bs=1M count=100 > seek=10, starts receiving responses from the server in about .05 > seconds with subsequent responses following at roughly .002-.01 second > intervals. This is well within my 10 second value. Even adding the > latency of AT&T's 2g cell network brings it up to only 1s. Still 10x > less than my 10 second value. > > The new logic goes like this > if( we've been expecting a response from the server (in_flight), and > message has been in_flight for more than 10 seconds and > we haven't had any other contact from the server in that time > reconnect > That will break writes long past the EOF. Note too that reconnects on CIFS are horrifically expensive and problematic. Much of the state on a CIFS mount is tied to the connection. When that drops, open files are closed and things like locks are dropped. SMB1 has no real mechanism for state recovery, so that can really be a problem. > On a side note, I discovered a small race condition in the previous > logic while working on this, that my new patch also fixes. > 1s request > 2s response > 61.995 echo job pops > 121.995 echo job pops and sends echo > 122 server_unresponsive called. Finds no response and attempts to >reconnect > 122.95 response to echo received > Sure, here's a reproducer. Do this against a windows server, preferably one exporting NTFS on relatively slow storage. Make sure that "testfile" doesn't exist first: $ dd if=/dev/zero of=/path/to/cifs/share/testfile bs=1M count=1 seek=3192 NTFS doesn't support sparse files, so the OS has to zero-fill up to the point where you're writing. That can take a long time on slow storage (minutes even). What we do now is periodically send a SMB echo to make sure the server is alive rather than trying to time out a particular call. The logic that handles that today is somewhat sub-optimal though. We send an echo every 60s whether there are any calls in flight or not and wait for 60s until we decide that the server isn't there. What would be better is to only send one when we've been waiting a long time for a response. That "long time" is debatable -- 10s would be fine with me but the logic needs to be fixed not to send echoes unless there is an outstanding request first. I think though that you're trying to use this mechanism to do something that it wasn't really designed to do. A better method might be to try and detect whether the TCP connection is really dead somehow. That would be more immediate, but I'm unclear on how best to do that. Probably it'll mean groveling around down in the TCP layer... FWIW, there was a thread on the linux-cifs mailing list started on Dec 3, 2010 entitled "cifs client timeouts and hard/soft mounts" that lays out the rationale for the current reconnection behavior. You may want to look over that before you go making changes here... -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] CIFS: Decrease reconnection delay when switching nics
On Thu, 28 Feb 2013 10:04:36 -0600 Steve French wrote: > On Thu, Feb 28, 2013 at 9:26 AM, Jeff Layton wrote: > > On Wed, 27 Feb 2013 16:24:07 -0600 > > Dave Chiluk wrote: > > > >> On 02/27/2013 10:34 AM, Jeff Layton wrote: > >> > On Wed, 27 Feb 2013 12:06:14 +0100 > >> > "Stefan (metze) Metzmacher" wrote: > >> > > >> >> Hi Dave, > >> >> > >> >>> When messages are currently in queue awaiting a response, decrease > >> >>> amount of > >> >>> time before attempting cifs_reconnect to SMB_MAX_RTT = 10 seconds. The > >> >>> current > >> >>> wait time before attempting to reconnect is currently > >> >>> 2*SMB_ECHO_INTERVAL(120 > >> >>> seconds) since the last response was recieved. This does not take > >> >>> into account > >> >>> the fact that messages waiting for a response should be serviced > >> >>> within a > >> >>> reasonable round trip time. > >> >> > >> >> Wouldn't that mean that the client will disconnect a good connection, > >> >> if the server doesn't response within 10 seconds? > >> >> Reads and Writes can take longer than 10 seconds... > >> >> > >> > > >> > Where does this magic value of 10s come from? Note that a slow server > >> > can take *minutes* to respond to writes that are long past the EOF. > >> It comes from the desire to decrease the reconnection delay to something > >> better than a random number between 60 and 120 seconds. I am not > >> committed to this number, and it is open for discussion. Additionally > >> if you look closely at the logic it's not 10 seconds per request, but > >> actually when requests have been in flight for more than 10 seconds make > >> sure we've heard from the server in the last 10 seconds. > >> > >> Can you explain more fully your use case of writes that are long past > >> the EOF? Perhaps with a test-case or script that I can test? As far as > >> I know writes long past EOF will just result in a sparse file, and > >> return in a reasonable round trip time *(that's at least what I'm seeing > >> with my testing). dd if=/dev/zero of=/mnt/cifs/a bs=1M count=100 > >> seek=10, starts receiving responses from the server in about .05 > >> seconds with subsequent responses following at roughly .002-.01 second > >> intervals. This is well within my 10 second value. Even adding the > >> latency of AT&T's 2g cell network brings it up to only 1s. Still 10x > >> less than my 10 second value. > >> > >> The new logic goes like this > >> if( we've been expecting a response from the server (in_flight), and > >> message has been in_flight for more than 10 seconds and > >> we haven't had any other contact from the server in that time > >> reconnect > >> > > > > That will break writes long past the EOF. Note too that reconnects on > > CIFS are horrifically expensive and problematic. Much of the state on a > > CIFS mount is tied to the connection. When that drops, open files are > > closed and things like locks are dropped. SMB1 has no real mechanism > > for state recovery, so that can really be a problem. > > > >> On a side note, I discovered a small race condition in the previous > >> logic while working on this, that my new patch also fixes. > >> 1s request > >> 2s response > >> 61.995 echo job pops > >> 121.995 echo job pops and sends echo > >> 122 server_unresponsive called. Finds no response and attempts to > >>reconnect > >> 122.95 response to echo received > >> > > > > Sure, here's a reproducer. Do this against a windows server, preferably > > one exporting NTFS on relatively slow storage. Make sure that > > "testfile" doesn't exist first: > > > > $ dd if=/dev/zero of=/path/to/cifs/share/testfile bs=1M count=1 > > seek=3192 > > > > NTFS doesn't support sparse files, so the OS has to zero-fill up to the > > point where you're writing. That can take a long time on slow > > storage (minutes even). What we do now is periodically send a SMB echo > > to make sure the server is alive rather than trying to time out a > > particular call. > > Writing past end of file in Windows can be very slow, but note that it > is pos
Re: [PATCH] CIFS: Decrease reconnection delay when switching nics
On Thu, 28 Feb 2013 11:31:54 -0600 Dave Chiluk wrote: > On 02/28/2013 10:47 AM, Jeff Layton wrote: > > On Thu, 28 Feb 2013 10:04:36 -0600 > > Steve French wrote: > > > >> On Thu, Feb 28, 2013 at 9:26 AM, Jeff Layton wrote: > >>> On Wed, 27 Feb 2013 16:24:07 -0600 > >>> Dave Chiluk wrote: > >>> > >>>> On 02/27/2013 10:34 AM, Jeff Layton wrote: > >>>>> On Wed, 27 Feb 2013 12:06:14 +0100 > >>>>> "Stefan (metze) Metzmacher" wrote: > >>>>> > >>>>>> Hi Dave, > >>>>>> > >>>>>>> When messages are currently in queue awaiting a response, decrease > >>>>>>> amount of > >>>>>>> time before attempting cifs_reconnect to SMB_MAX_RTT = 10 seconds. > >>>>>>> The current > >>>>>>> wait time before attempting to reconnect is currently > >>>>>>> 2*SMB_ECHO_INTERVAL(120 > >>>>>>> seconds) since the last response was recieved. This does not take > >>>>>>> into account > >>>>>>> the fact that messages waiting for a response should be serviced > >>>>>>> within a > >>>>>>> reasonable round trip time. > >>>>>> > >>>>>> Wouldn't that mean that the client will disconnect a good connection, > >>>>>> if the server doesn't response within 10 seconds? > >>>>>> Reads and Writes can take longer than 10 seconds... > >>>>>> > >>>>> > >>>>> Where does this magic value of 10s come from? Note that a slow server > >>>>> can take *minutes* to respond to writes that are long past the EOF. > >>>> It comes from the desire to decrease the reconnection delay to something > >>>> better than a random number between 60 and 120 seconds. I am not > >>>> committed to this number, and it is open for discussion. Additionally > >>>> if you look closely at the logic it's not 10 seconds per request, but > >>>> actually when requests have been in flight for more than 10 seconds make > >>>> sure we've heard from the server in the last 10 seconds. > >>>> > >>>> Can you explain more fully your use case of writes that are long past > >>>> the EOF? Perhaps with a test-case or script that I can test? As far as > >>>> I know writes long past EOF will just result in a sparse file, and > >>>> return in a reasonable round trip time *(that's at least what I'm seeing > >>>> with my testing). dd if=/dev/zero of=/mnt/cifs/a bs=1M count=100 > >>>> seek=10, starts receiving responses from the server in about .05 > >>>> seconds with subsequent responses following at roughly .002-.01 second > >>>> intervals. This is well within my 10 second value. Even adding the > >>>> latency of AT&T's 2g cell network brings it up to only 1s. Still 10x > >>>> less than my 10 second value. > >>>> > >>>> The new logic goes like this > >>>> if( we've been expecting a response from the server (in_flight), and > >>>> message has been in_flight for more than 10 seconds and > >>>> we haven't had any other contact from the server in that time > >>>> reconnect > >>>> > >>> > >>> That will break writes long past the EOF. Note too that reconnects on > >>> CIFS are horrifically expensive and problematic. Much of the state on a > >>> CIFS mount is tied to the connection. When that drops, open files are > >>> closed and things like locks are dropped. SMB1 has no real mechanism > >>> for state recovery, so that can really be a problem. > >>> > >>>> On a side note, I discovered a small race condition in the previous > >>>> logic while working on this, that my new patch also fixes. > >>>> 1s request > >>>> 2s response > >>>> 61.995 echo job pops > >>>> 121.995 echo job pops and sends echo > >>>> 122 server_unresponsive called. Finds no response and attempts to > >>>>reconnect > >>>> 122.95 response to echo received > >>>> > >>> > >>> Sure, here's a reproducer. Do this against a windows server, preferably > >&
Re: [PATCH 2/2] ACPI / glue: Drop .find_bridge() callback from struct acpi_bus_type
On 02/28/2013 04:53 PM, Rafael J. Wysocki wrote: From: Rafael J. Wysocki After PCI and USB have stopped using the .find_bridge() callback in struct acpi_bus_type, the only remaining user of it is SATA, but SATA only pretends to be a user, because it points that callback to a stub always returning -ENODEV. For this reason, drop the SATA's dummy .find_bridge() callback and remove .find_bridge(), which is not used any more, from struct acpi_bus_type entirely. Signed-off-by: Rafael J. Wysocki --- drivers/acpi/glue.c | 26 +- drivers/ata/libata-acpi.c |6 -- include/acpi/acpi_bus.h |3 --- 3 files changed, 1 insertion(+), 34 deletions(-) patches 1-2 Acked-by: Jeff Garzik -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] CIFS: Decrease reconnection delay when switching nics
On Thu, 28 Feb 2013 23:54:13 +0100 Björn JACKE wrote: > On 2013-02-28 at 07:26 -0800 Jeff Layton sent off: > > NTFS doesn't support sparse files, so the OS has to zero-fill up to the > > point where you're writing. That can take a long time on slow > > storage (minutes even). > > but you are talking about FAT here, right? NTFS does support sparse files if > the sparse bit has been explicitly been set on it. Bit even if the sparse bit > is not set filling a file with zeros by writing after a seek long beyond the > end of the file is very fast because NTFS supports that feature what Unix > filesystems like xfs call extents. > > If writing beyond the end of a file is really slow via cifs vfs in the test > case against a ntfs volume then I wonder if that operation is being really > done > optimally over the wire. ntfs really isn't that bad with handling this kind of > files. > I'm not sure since I don't know the internals of NTFS. I had always assumed that it didn't really handle sparse files well (hence the "rabbit-pellet" thing that windows clients do). All I can say however is that writes long past the EOF can take a *really* long time to run. Typically we just issue a SMB_COM_WRITEX at the offset to which we want to put the data. Is there some other way we ought to be doing this? In any case, it doesn't really change the fact that there is no guaranteed time of response from CIFS servers. They can easily take a really long time to respond to certain requests. The best method we have to deal with that is to periodically "ping" the server with an echo to see if it's still there. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 6/7] NFSv4: Add O_DENY* open flags support
On Mon, 11 Mar 2013 14:54:34 -0400 Jeff Layton wrote: > On Thu, 28 Feb 2013 19:25:32 +0400 > Pavel Shilovsky wrote: > > > by passing these flags to NFSv4 open request. > > > > Signed-off-by: Pavel Shilovsky > > --- > > fs/nfs/nfs4xdr.c | 24 > > 1 file changed, 20 insertions(+), 4 deletions(-) > > > > diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c > > index 26b1439..58ddc74 100644 > > --- a/fs/nfs/nfs4xdr.c > > +++ b/fs/nfs/nfs4xdr.c > > @@ -1325,7 +1325,8 @@ static void encode_lookup(struct xdr_stream *xdr, > > const struct qstr *name, struc > > encode_string(xdr, name->len, name->name); > > } > > > > -static void encode_share_access(struct xdr_stream *xdr, fmode_t fmode) > > +static void encode_share_access(struct xdr_stream *xdr, fmode_t fmode, > > + int open_flags) > > { > > __be32 *p; > > > > @@ -1343,7 +1344,22 @@ static void encode_share_access(struct xdr_stream > > *xdr, fmode_t fmode) > > default: > > *p++ = cpu_to_be32(0); > > } > > - *p = cpu_to_be32(0);/* for linux, share_deny = 0 always */ > > + if (open_flags & O_DENYMAND) { > > > As Bruce mentioned, I think a mount option to enable this on a per-fs > basis would be a better approach than this new O_DENYMAND flag. > > > > + switch (open_flags & (O_DENYREAD|O_DENYWRITE)) { > > + case O_DENYREAD: > > + *p = cpu_to_be32(NFS4_SHARE_DENY_READ); > > + break; > > + case O_DENYWRITE: > > + *p = cpu_to_be32(NFS4_SHARE_DENY_WRITE); > > + break; > > + case O_DENYREAD|O_DENYWRITE: > > + *p = cpu_to_be32(NFS4_SHARE_DENY_BOTH); > > + break; > > + default: > > + *p = cpu_to_be32(0); > > + } > > + } else > > + *p = cpu_to_be32(0); > > } > > > > static inline void encode_openhdr(struct xdr_stream *xdr, const struct > > nfs_openargs *arg) > > @@ -1354,7 +1370,7 @@ static inline void encode_openhdr(struct xdr_stream > > *xdr, const struct nfs_opena > > * owner 4 = 32 > > */ > > encode_nfs4_seqid(xdr, arg->seqid); > > - encode_share_access(xdr, arg->fmode); > > + encode_share_access(xdr, arg->fmode, arg->open_flags); > > p = reserve_space(xdr, 36); > > p = xdr_encode_hyper(p, arg->clientid); > > *p++ = cpu_to_be32(24); > > @@ -1491,7 +1507,7 @@ static void encode_open_downgrade(struct xdr_stream > > *xdr, const struct nfs_close > > encode_op_hdr(xdr, OP_OPEN_DOWNGRADE, decode_open_downgrade_maxsz, hdr); > > encode_nfs4_stateid(xdr, arg->stateid); > > encode_nfs4_seqid(xdr, arg->seqid); > > - encode_share_access(xdr, arg->fmode); > > + encode_share_access(xdr, arg->fmode, 0); > > } > > > > static void > > > Other than that, this seems reasonable. > > Acked-by: Jeff Layton Oh duh... Please ignore my comment on patch #7 to add a patch for the NFS client. This one does that. That said, there may be a potential problem here that you need to consider. In the case of a local filesystem you'll want to set deny locks using deny_lock_file(). For a network filesystem like CIFS or NFS though, the server will handle that atomically during the open. You need to ensure that you don't go trying to set LOCK_MAND locks on the file once that's done. Perhaps you can use a fstype flag to indicate that the filesystem handles this during the open and doesn't need to try and set a flock lock? -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] UAPI: Fix endianness conditionals in linux/aio_abi.h
Benjamin LaHaise writes: > On Wed, Mar 06, 2013 at 08:47:33PM +, David Howells wrote: >> In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be compared >> against __BYTE_ORDER in preprocessor conditionals where these are exposed to >> userspace (that is they're not inside __KERNEL__ conditionals). >> >> However, in the main kernel the norm is to check for "defined(__XXX_ENDIAN)" >> rather than comparing against __BYTE_ORDER and this has incorrectly leaked >> into the userspace headers. >> >> The definition of PADDED() in linux/aio_abi.h is wrong in this way. Note >> that >> userspace will likely interpret this and thus the order of fields in struct >> iocb incorrectly as the little-endian variant on big-endian machines - >> depending on header inclusion order. >> >> [!!!] NOTE [!!!] This patch may adversely change the userspace API. It >> might >> be better to fix the ordering of aio_key and aio_reserved1 in struct iocb. > > It is unlikely that anyone has used the existing kernel headers and hit this > issue given that most existing users use the libaio.h include (which does not > get the endianness tests wrong). Given that the kernel has always used the > correct endian mappings, this change is correct. Agreed. Acked-by: Jeff Moyer -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] igb: SR-IOV init reordering
On Tue, 2013-03-12 at 15:25 -0600, Alex Williamson wrote: > igb is ineffective at setting a lower total VFs because: > > int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs) > { > ... > /* Shouldn't change if VFs already enabled */ > if (dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE) > return -EBUSY; > > Swap init ordering. > > Signed-off-by: Alex Williamson > --- > drivers/net/ethernet/intel/igb/igb_main.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) I have added the patch to my igb queue, thanks! signature.asc Description: This is a digitally signed message part
Re: [PATCH] igb: Fix null pointer dereference
On Tue, 2013-03-12 at 14:09 -0600, Alex Williamson wrote: > The max_vfs= option has always been self limiting to the number of VFs > supported by the device. fa44f2f1 added SR-IOV configuration via > sysfs, but in the process broke this self correction factor. The > failing path is: > > igb_probe > igb_sw_init > if (max_vfs > 7) { > adapter->vfs_allocated_count = 7; > ... > igb_probe_vfs > igb_enable_sriov(, max_vfs) > if (num_vfs > 7) { > err = -EPERM; > ... > > This leaves vfs_allocated_count = 7 and vf_data = NULL, so we bomb out > when igb_probe finally calls igb_reset. It seems like a really bad > idea, and somewhat pointless, to set vfs_allocated_count separate from > vf_data, but limiting max_vfs is enough to avoid the null pointer. > > Signed-off-by: Alex Williamson > --- > drivers/net/ethernet/intel/igb/igb_main.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) I have added the patch to my igb queue, thanks! signature.asc Description: This is a digitally signed message part
Re: [PATCH] cifs: Rename cERROR and cifserror to cifs_vfs_err
On Wed, 13 Mar 2013 04:36:54 -0700 Joe Perches wrote: > On Tue, 2013-03-12 at 15:44 -0700, Joe Perches wrote: > > The cERROR macro is always used as cERROR(1, and cifserror > > is just a printk(KERN_ERR "CIFS VFS: ". > > > > Make a cifs_vfs_err function that uses the vsprintf %pV > > extension to avoid duplicating the "CIFS VFS: " prefix. > > > > Remove the cERROR macro and use cifs_vfs_err directly. > > Perhaps a better idea than this patch is to > change both the cERROR and cFYI macros to > a new use of cifs_dbg(type, fmt, ...) > > cERROR(set, fmt, ...) -> cifs_dbg(VFS, fmt, ...) > cFYI(set, fmt, ...) -> cifs_dbg(FYI, fmt, ...) > > This conversion would mark both these macros > as debug stataments as they are only enabled > with CONFIG_CIFS_DEBUG. > > Also CONFIG_CIFS_DEBUG2 use of DBG could also > be integrated with the same style. > > cFYI(DBG2, fmt, ...)-> cifs_dbg(NOISY, fmt, ...) > > The reduced object size would still apply. > > This would also enable an easier conversion to > dynamic debugging of these debug macros. > > I'd prefer to move the newline from the macro > to the format as that is more consistent with > the rest of the kernel. > > Thoughts? > I like this change overall, but the size of the patch is pretty daunting. If you could change the code that underlies cERROR() and cFYI() without needing to touch all of their call sites, it might be a simpler initial step. OTOH, I would also prefer to move the newline into the format and that's impossible without touching most of these call sites. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 6 Mar 2013 13:40:16 -0800 Tejun Heo wrote: > On Wed, Mar 06, 2013 at 01:36:36PM -0800, Tejun Heo wrote: > > On Wed, Mar 06, 2013 at 01:31:10PM -0800, Linus Torvalds wrote: > > > So I do agree that we probably have *too* many of the stupid "let's > > > check if we can freeze", and I suspect that the NFS code should get > > > rid of the "freezable_schedule()" that is causing this warning > > > (because I also agree that you should *not* freeze while holding > > > locks, because it really can cause deadlocks), but I do suspect that > > > network filesystems do need to have a few places where they check for > > > freezing on their own... Exactly because freezing isn't *quite* like a > > > signal. > > > > Well, I don't really know much about nfs so I can't really tell, but > > for most other cases, dealing with freezing like a signal should work > > fine from what I've seen although I can't be sure before actually > > trying. Trond, Bruce, can you guys please chime in? > > So, I think the question here would be, in nfs, how many of the > current freezer check points would be difficult to conver to signal > handling model after excluding the ones which are performed while > holding some locks which we need to get rid of anyway? > I think we can do this, but it isn't trivial. Here's what I'd envision, but there are still many details that would need to be worked out... Basically what we'd need is a way to go into TASK_KILLABLE sleep, but still allow the freezer to wake these processes up. I guess that likely means we need a new sleep state (TASK_FREEZABLE?). We'd also need a way to return an -ERESTARTSYS like error (-EFREEZING?) that tells the upper syscall handling layers to freeze the task and then restart the syscall after waking back up. Maybe we don't need a new error at all and -ERESTARTSYS is fine here? We also need to consider the effects vs. audit code here, but that may be OK with the overhaul that Al merged a few months ago. Assuming we have those, then we need to fix up the NFS and RPC layers to use this stuff: 1/ Prior to submitting a new RPC, we'd look and see whether "current" is being frozen. If it is, then return -EFREEZING immediately without doing anything. 2/ We might also need to retrofit certain stages in the RPC FSM to return -EFREEZING too if it's a synchronous RPC and the task is being frozen. 3/ A task is waiting for an RPC reply on an async RPC, we'd need to use this new sleep state. If the process wakes up because something wants it to freeze, then have it go back to sleep for a short period of time to try and wait for the reply (up to 15s or so?). If we get the reply, great -- return to userland and freeze the task there. If the reply never comes in, give up on it and just return -EFREEZE and hope for the best. We might have to make this latter behavior contingent on a new mount option (maybe "-o slushy" like Trond recommended). The current "hard" and "soft" semantics don't really fit this situation correctly. Of course, this is all a lot of work, and not something we can shove into the kernel for 3.9 at this point. In the meantime, while Mandeep's warning is correctly pointing out a problem, I think we ought to back it out until we can fix this properly. We're already getting a ton of reports on the mailing lists and in the fedora bug tracker for this warning. Part of the problem is the verbiage -- "BUG" makes people think "Oops", but this is really just a warning. We should also note that this is a problem too in the CIFS code since it uses a similar mechanism for allowing the kernel to suspend while waiting on SMB replies. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cifs: Rename cERROR and cFYI to cifs_dbg
On Thu, 14 Mar 2013 12:24:37 -0700 Joe Perches wrote: > It's not obvious from reading the macro names that these macros > are for debugging. Convert the names to a single more typical > kernel style cifs_dbg macro. > > cERROR(1, ...) -> cifs_dbg(VFS, ...) > cFYI(1, ...) -> cifs_dbg(FYI, ...) > cFYI(DBG2, ...) -> cifs_dbg(NOISY, ...) > > Move the terminating format newline from the macro to the call site. > > Add CONFIG_CIFS_DEBUG function cifs_vfs_err to emit the > "CIFS VFS: " prefix for VFS messages. > > Size is reduced ~ 1% when CONFIG_CIFS_DEBUG is set (default y) > > $ size fs/cifs/cifs.ko* >textdata bss dec hex filename > 265245 2525 132 267902 4167e fs/cifs/cifs.ko.new > 2683592525 132 271016 422a8 fs/cifs/cifs.ko.old > This all looks like good stuff. I am a bit concerned about mashing all of these cleanups into the same patch though. > Other miscellaneous changes around these conversions: > > o Miscellaneous typo fixes > o Add terminating \n's to almost all formats and remove them > from the macros to be more kernel style like. A few formats > previously had defective \n's > o Remove unnecessary OOM messages as kmalloc() calls dump_stack > o Coalesce formats to make grep easier, > added missing spaces when coalescing formats > o Use %s, __func__ instead of embedded function name > o Removed unnecessary "cifs: " prefixes > o Convert kzalloc with multiply to kcalloc > o Remove unused cifswarn macro > -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cifs: Rename cERROR and cFYI to cifs_dbg
On Thu, 14 Mar 2013 12:24:37 -0700 Joe Perches wrote: > It's not obvious from reading the macro names that these macros > are for debugging. Convert the names to a single more typical > kernel style cifs_dbg macro. > > cERROR(1, ...) -> cifs_dbg(VFS, ...) > cFYI(1, ...) -> cifs_dbg(FYI, ...) > cFYI(DBG2, ...) -> cifs_dbg(NOISY, ...) > > Move the terminating format newline from the macro to the call site. > > Add CONFIG_CIFS_DEBUG function cifs_vfs_err to emit the > "CIFS VFS: " prefix for VFS messages. > > Size is reduced ~ 1% when CONFIG_CIFS_DEBUG is set (default y) > > $ size fs/cifs/cifs.ko* >textdata bss dec hex filename > 265245 2525 132 267902 4167e fs/cifs/cifs.ko.new > 2683592525 132 271016 422a8 fs/cifs/cifs.ko.old > (my apologies -- my MUA has a mind of its own sometimes) This all looks like good stuff. I am a bit concerned about mashing all of these cleanups into the same patch though. > Other miscellaneous changes around these conversions: > > o Miscellaneous typo fixes > o Add terminating \n's to almost all formats and remove them > from the macros to be more kernel style like. A few formats > previously had defective \n's > o Remove unnecessary OOM messages as kmalloc() calls dump_stack > o Coalesce formats to make grep easier, > added missing spaces when coalescing formats > o Use %s, __func__ instead of embedded function name > o Removed unnecessary "cifs: " prefixes > o Convert kzalloc with multiply to kcalloc ^^^ Things like this really ought to be a separate patch, even though it is a trivial change. That's a minor nit though... > o Remove unused cifswarn macro > I think we ought to go ahead and take this for 3.10. I do have some minor concern about having to deal with backports of later patches to kernels that don't have these changes, but hey, that's the price of dealing with old kernels. The sooner Steve merges this into his for-next tree, the better. This bound to give us all sorts of merge conflicts for the 3.10 window, so we want to make sure that people know what to base their work on. Acked-by: Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove CONFIG_EXPERIMENTAL
On Mon, Aug 27, 2012 at 5:53 PM, Kees Cook wrote: > This config item has not carried much meaning for a while now and is > almost always enabled by default. Remove it and adjust various config > logic and documentation. It does have meaning... !CONFIG_EXPERIMENTAL means more stable. In the past things would get CONFIG_EXPERIMENTAL until they've been tried in the field or otherwise hit some goal in the developer's mind. Is this a practical distinction? Probably not, as the markers often go unmaintained... Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NFS: Fix Oopses in nfs_lookup_revalidate and nfs4_lookup_revalidate
On Mon, 27 Aug 2012 13:23:11 -0700 Greg KH wrote: > On Mon, Aug 27, 2012 at 08:16:09PM +, Myklebust, Trond wrote: > > On Mon, 2012-08-27 at 13:09 -0700, Greg KH wrote: > > > On Wed, Aug 22, 2012 at 04:08:17PM -0400, Trond Myklebust wrote: > > > > Fix the following Oops in 3.5.1: > > > > > > > > BUG: unable to handle kernel NULL pointer dereference at > > > > 0038 > > > > IP: [] nfs_lookup_revalidate+0x2d/0x480 [nfs] > > > > PGD 337c63067 PUD 0 > > > > Oops: [#1] SMP > > > > CPU 5 > > > > Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc > > > > af_packet binfmt_misc cpufreq_conservative cpufreq_userspace > > > > cpufreq_powersave dm_mod acpi_cpufreq mperf coretemp gpio_ich kvm_intel > > > > joydev kvm ioatdma hid_generic igb lpc_ich i7core_edac edac_core ptp > > > > serio_raw dca pcspkr i2c_i801 mfd_core sg pps_core usbhid crc32c_intel > > > > microcode button autofs4 uhci_hcd ttm drm_kms_helper drm i2c_algo_bit > > > > sysimgblt sysfillrect syscopyarea ehci_hcd usbcore usb_common > > > > scsi_dh_rdac scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh edd fan > > > > ata_piix thermal processor thermal_sys > > > > > > > > Pid: 30431, comm: java Not tainted 3.5.1-2-default #1 Supermicro > > > > X8DTT/X8DTT > > > > RIP: 0010:[] [] > > > > nfs_lookup_revalidate+0x2d/0x480 [nfs] > > > > RSP: 0018:8801b418bd38 EFLAGS: 00010292 > > > > RAX: fff6 RBX: 88032016d800 RCX: 0020 > > > > RDX: RSI: RDI: 8801824a7b00 > > > > RBP: 8801b418bdf8 R08: 7f0034323030 R09: f04c03ed > > > > R10: 8801824a7b00 R11: 0002 R12: 8801824a7b00 > > > > R13: 8801824a7b00 R14: R15: 8803201725d0 > > > > FS: 2b53a46cb700() GS:88033fc2() > > > > knlGS: > > > > CS: 0010 DS: ES: CR0: 80050033 > > > > CR2: 0038 CR3: 00020a426000 CR4: 07e0 > > > > DR0: DR1: DR2: > > > > DR3: DR6: 0ff0 DR7: 0400 > > > > Process java (pid: 30431, threadinfo 8801b418a000, task > > > > 8801b5d20600) > > > > Stack: > > > > 8801b418be44 88032016d800 8801b418bdf8 > > > > 8801824a7b00 8801b418bdd7 8803201725d0 8116a9c0 > > > > 8801b5c38dc0 0007 88032016d800 > > > > Call Trace: > > > > [] lookup_dcache+0x80/0xe0 > > > > [] __lookup_hash+0x23/0x90 > > > > [] lookup_one_len+0xc5/0x100 > > > > [] nfs_sillyrename+0xe3/0x210 [nfs] > > > > [] vfs_unlink.part.25+0x7f/0xe0 > > > > [] do_unlinkat+0x1ac/0x1d0 > > > > [] system_call_fastpath+0x16/0x1b > > > > [<2b5348b5f527>] 0x2b5348b5f526 > > > > Code: ec 38 b8 f6 ff ff ff 4c 89 64 24 18 4c 89 74 24 28 49 89 fc 48 > > > > 89 5c 24 08 48 89 6c 24 10 49 89 f6 4c 89 6c 24 20 4c 89 7c 24 30 > > > > 46 38 40 0f 85 d1 00 00 00 e8 c4 c4 df e0 48 8b 58 30 49 89 > > > > RIP [] nfs_lookup_revalidate+0x2d/0x480 [nfs] > > > > RSP > > > > CR2: 0038 > > > > ---[ end trace 845113ed191985dd ]--- > > > > > > > > This Oops affects 3.5 kernels and older, and is due to lookup_one_len() > > > > calling down to the dentry revalidation code with a NULL pointer > > > > to struct nameidata. > > > > > > > > It is fixed upstream by commit 0b728e1911c (stop passing nameidata * > > > > to ->d_revalidate()) > > > > > > So this is just a nfs-only backport of the larger patch 0b728e1911c, > > > right? Should we also do this for other filesystems as well? Or just > > > backport the whole commit? > > > > The larger patch involves a VFS api change (the atomic open code) which > > has a bunch of pre- and post-requirements. I'd assume that is a too > > large change for stable. I think that the smaller per-filesystem changes > > are probably more appropriate. The list of filesystems that care are > > likely to be small. Off the top of my head, I can only think of NFS, > > CIFS, FUSE and possibly ceph. > > Ok, I'll take this one for NFS, care to break this up also for FUSE and > CIFS and send me a patch for it? > A similar problem was already fixed quite some time ago in cifs in commit f5bc1e755d, shortly after the RCU lookup code went in. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Fix bad range check in bio_sector_offset
"Martin K. Petersen" writes: > DM would occasionally end up splitting data integrity-enabled requests > incorrectly. The culprit was a bad range check in bio_sector_offset. The patch looks ok to me, but what is the user visible behavior when this happens? Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Fix bad range check in bio_sector_offset
"Martin K. Petersen" writes: >>>>>> "Jeff" == Jeff Moyer writes: > >>> DM would occasionally end up splitting data integrity-enabled >>> requests incorrectly. The culprit was a bad range check in >>> bio_sector_offset. > > Jeff> The patch looks ok to me, but what is the user visible behavior > Jeff> when this happens? > > We'd occasionally end up mapping a bad integrity scatterlist and the HBA > would abort the I/O with a protection information error. Thanks for the explanation, Martin! Acked-by: Jeff Moyer -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: WARNING: at fs/inode.c:280 drop_nlink+0x31/0x33()
On Wed, 29 Aug 2012 09:25:27 -0700 Nick Pasich wrote: > > I'm using kernel 3.5.3 ... > > It happens on 3.5.1 and 3.5.2 also. > > I know that Nick Bowler has already reported this... > > I'm experiencing the same thing. > > It happens when moving files from one directory to another > on the same partition (NFS). > > --( Nick Pasich )-- > > > # > ## > ## Happens when PSTs are moved from one directory to another on the ISCSI ... > ## > # > > Aug 29 08:06:16 localhost kernel: [ cut here ] > Aug 29 08:06:16 localhost kernel: WARNING: at fs/inode.c:280 > drop_nlink+0x31/0x33() > Aug 29 08:06:16 localhost kernel: Hardware name: To Be Filled By O.E.M. > Aug 29 08:06:16 localhost kernel: Modules linked in: ecb md4 cifs w83627hf > eeprom asb100 hwmon_vid hwmon nfsd exportfs ipv6 psmouse usb_storage > io_edgeport usbserial sg r8169 mii evdev intel_agp uhci_hcd i2c_i801 i2c_core > shpchp intel_gtt agpgart ehci_hcd microcode serio_raw > Aug 29 08:06:16 localhost kernel: Pid: 31477, comm: rm Tainted: GW > 3.5.3 #1 > Aug 29 08:06:16 localhost kernel: Call Trace: > Aug 29 08:06:16 localhost kernel: [] ? drop_nlink+0x31/0x33 > Aug 29 08:06:16 localhost kernel: [] ? > warn_slowpath_common+0x7b/0x90 > Aug 29 08:06:16 localhost kernel: [] ? drop_nlink+0x31/0x33 > Aug 29 08:06:16 localhost kernel: [] ? warn_slowpath_null+0x1b/0x1f > Aug 29 08:06:16 localhost kernel: [] ? drop_nlink+0x31/0x33 > Aug 29 08:06:16 localhost kernel: [] ? cifs_unlink+0x134/0x63d > [cifs] > Aug 29 08:06:16 localhost kernel: [] ? dput+0x11/0x117 > Aug 29 08:06:16 localhost kernel: [] ? mntput_no_expire+0xf/0xf7 > Aug 29 08:06:16 localhost kernel: [] ? vfs_unlink+0x4e/0xb6 > Aug 29 08:06:16 localhost kernel: [] ? __lookup_hash+0x54/0xac > Aug 29 08:06:16 localhost kernel: [] ? do_unlinkat+0x10a/0x12d > Aug 29 08:06:16 localhost kernel: [] ? sys_ioctl+0x34/0x57 > Aug 29 08:06:16 localhost kernel: [] ? syscall_call+0x7/0xb > Aug 29 08:06:16 localhost kernel: ---[ end trace 756b427e3bd671f9 ]--- > (cc'ing linux-cifs ml) This stack trace comes from cifs, not nfs. Steve French has a patch queued in his tree to silence this warning that I believe he intends to send to Linus for 3.6. Perhaps we should consider backporting it for 3.5.z too? -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Drop support for x86-32
On Wed, Aug 29, 2012 at 7:03 PM, Mark Lord wrote: > On 12-08-26 10:15 AM, wbrana wrote: >> On 8/26/12, Mark Lord wrote: >>> Here are a couple of real scenarios you don't seem to have thought about. >>> A 32-bit kernel on a legacy (or even new) system in 2017 will still need >>> regular kernel updates (not "long term" un0maintained kernels) >>> in order to work with new USB devices, new 4KB+ sector hard drives, >>> newer generations of SSDs, etc.. >> 12-years-old machine is trash. > > There you go making assumptions again. > Who said anything about a 12-year old machine? > > Much more likely is a 5-year old software installation > that gets moved to a new box. Or a brand new software installation into a 32-bit virtual machine. Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUESTION] about NFS sub system between Public Kernel and Red Hat Kernel.
On Fri, 31 Aug 2012 13:40:16 +0800 gchen wrote: > Hi linux-...@vger.kernel.org > > I have 1 question, and also 2 conclusions which need confirm. > > > 1) Question: > > Jeff Layton said in Red Hat Bugzilla (bug 848706): > "Have configuration where the same host is acting as both NFS client > and server. That's a configuration known to cause deadlocks." > > Does it mean that the public Linux kernel (not Red Hat) also can cause > deadlocks if NFS client and server are under the same machine ? > Yes. > > 2) Confirm 1: (better by Jeff Layton) > > For function nfs_commit_set_lock in ./fs/nfs/write.c > > for latest public kernel version: > the parameters of out_of_line_wait_on_bit_lock() are > (&nfsi->flags, NFS_INO_COMMIT, nfs_wait_killable, TASK_KILLABLE) > for Red Hat kernel version: kernel-2.6.18-308.4.1.el5 > the parameters of out_of_line_wait_on_bit_lock() are > (&nfsi->flags, NFS_INO_COMMIT, > nfs_wait_bit_uninterruptible, TASK_UNINTERRUPTIBLE) > > It means for red hat version: > when deadlock occurs, we can not boot machine in normal way > (it is true for my test machine, the deadlock task can not be killed) > It means for public kernel version: > "Assume deadlock occurs", we can still boot machine in normal way, > because the task can be killed. > > Is what I said above correct ? > Not sure I understand your question. RHEL5 doesn't have support for TASK_KILLABLE, and I didn't backport it, hence the difference in that function. > > 3) Confirm 2: > > Is LTP (Linux Test Project) still a suitable test tools for public kernel ? > (for ltp-full-20100331.gz stress test, it mounts NFS on local machine, > and the latest LTP ltp-full-20120401.bz2 also seems the same). > That I'm not sure of. All I can tell you is that mounts over loopback (or similar configurations) are easily deadlockable under load. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: WARNING: at fs/inode.c:280 drop_nlink+0x31/0x33()
On Fri, 31 Aug 2012 08:32:06 -0700 Nick Pasich wrote: > On Fri, Aug 31, 2012 at 12:00:26PM +0400, Pavel Shilovsky wrote: > > 2012/8/31 Nick Pasich : > > > Jeff, > > > > > > I applied this patch to Kernel 3.5.3 from Pavel and the > > > the warning is gone with no problems. > > > > > > Thanks, > > > > > > --( Nick Pasich > > > > > > ## > > > > > > From df2d6b1fbf2401c5ee04f2ac143ea0954e3a87a6 Mon Sep 17 00:00:00 2001 > > > From: Pavel Shilovsky > > > Date: Fri, 13 Jul 2012 11:59:45 +0400 > > > Subject: [PATCH] CIFS: Protect i_nlink from being negative > > > > > > that can cause warning messages. > > > > > > Signed-off-by: Pavel Shilovsky > > > --- > > > fs/cifs/inode.c | 13 +++-- > > > 1 files changed, 11 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c > > > index 7354877..88afb1a 100644 > > > --- a/fs/cifs/inode.c > > > +++ b/fs/cifs/inode.c > > > @@ -1110,6 +1110,15 @@ undo_setattr: > > > goto out_close; > > > } > > > > > > +/* copied from fs/nfs/dir.c with small changes */ > > > +static void > > > +cifs_drop_nlink(struct inode *inode) > > > +{ > > > + spin_lock(&inode->i_lock); > > > + if (inode->i_nlink > 0) > > > + drop_nlink(inode); > > > + spin_unlock(&inode->i_lock); > > > +} > > > > > > /* > > > * If dentry->d_inode is null (usually meaning the cached dentry > > > @@ -1166,13 +1175,13 @@ retry_std_delete: > > > psx_del_no_retry: > > > if (!rc) { > > > if (inode) > > > - drop_nlink(inode); > > > + cifs_drop_nlink(inode); > > > } else if (rc == -ENOENT) { > > > d_drop(dentry); > > > } else if (rc == -ETXTBSY) { > > > rc = > > > cifs_rename_pending_delete(full_path, dentry, xid); > > > if (rc == 0) > > > - drop_nlink(inode); > > > + cifs_drop_nlink(inode); > > > } else if ((rc == -EACCES) && (dosattr == 0) && inode) { > > > attrs = kzalloc(sizeof(*attrs), > > > GFP_KERNEL); > > > if (attrs == NULL) { > > > -- > > > 1.7.3.3 > > > > > > ## > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > > > the body of a message to majord...@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > This one fixes only a part of the problem. Now we have another patch > > for this problem: > > > > https://git.samba.org/sfrench/?p=sfrench/cifs-2.6.git;a=commitdiff;h=b7ca69289680cf631fb20b7d436467c4ec1153cd;hp=6dab7ede9390d4d937cb89feca932e4fd575d2da > > > > -- > > Best regards, > > Pavel Shilovsky. > > > > Since I'm using kernel 3.5.3 , I get an error on hunk 7 of the patch. > > I can do it by hand... But I want to check with you first. > > Thanks, > > --( Nick Pasich )-- > If you fix it up by hand, consider submitting it as a backport for the stable series as well. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] Fix a crash when block device is read and block size is changed at the same time
Mikulas Patocka writes: > On Fri, 31 Aug 2012, Mikulas Patocka wrote: > >> Hi >> >> This is a series of patches to prevent a crash when when someone is >> reading block device and block size is changed simultaneously. (the crash >> is already happening in the production environment) >> >> The first patch adds a rw-lock to struct block_device, but doesn't use the >> lock anywhere. The reason why I submit this as a separate patch is that on >> my computer adding an unused field to this structure affects performance >> much more than any locking changes. >> >> The second patch uses the rw-lock. The lock is locked for read when doing >> I/O on the block device and it is locked for write when changing block >> size. >> >> The third patch converts the rw-lock to a percpu rw-lock for better >> performance, to avoid cache line bouncing. >> >> The fourth patch is an alternate percpu rw-lock implementation using RCU >> by Eric Dumazet. It avoids any atomic instruction in the hot path. >> >> Mikulas > > I tested performance of patches. I created 4GB ramdisk, I initially filled > it with zeros (so that ramdisk allocation-on-demand doesn't affect the > results). > > I ran fio to perform 8 concurrent accesses on 8 core machine (two > Barcelona Opterons): > time fio --rw=randrw --size=4G --bs=512 --filename=/dev/ram0 --direct=1 > --name=job1 --name=job2 --name=job3 --name=job4 --name=job5 --name=job6 > --name=job7 --name=job8 > > The results actually show that the size of struct block_device and > alignment of subsequent fields in struct inode have far more effect on > result that the type of locking used. (struct inode is placed just after > struct block_device in "struct bdev_inode" in fs/block-dev.c) > > plain kernel 3.5.3: 57.9s > patch 1: 43.4s > patches 1,2: 43.7s > patches 1,2,3: 38.5s > patches 1,2,3,4: 58.6s > > You can see that patch 1 improves the time by 14.5 seconds, but all that > patch 1 does is adding an unused structure field. > > Patch 3 is 4.9 seconds faster than patch 1, althogh patch 1 does no > locking at all and patch 3 does per-cpu locking. So, the reason for the > speedup is different sizeof of struct block_device (and subsequently, > different alignment of struct inode), rather than locking improvement. How many runs did you do? Did you see much run to run variation? > I would be interested if other people did performance testing of the > patches too. I'll do some testing next week, but don't expect to get to it before Wednesday. Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 15/16] ARM: samsung: move platform_data definitions
On 09/11/2012 09:02 AM, Arnd Bergmann wrote: Platform data for device drivers should be defined in include/linux/platform_data/*.h, not in the architecture and platform specific directories. This moves such data out of the samsung include directories Signed-off-by: Arnd Bergmann Cc: Kukjin Kim Cc: Kyungmin Park Cc: Ben Dooks Cc: Mark Brown Cc: Jeff Garzik Cc: Guenter Roeck Cc: "Wolfram Sang (embedded platforms)" Cc: Dmitry Torokhov Cc: Bryan Wu Cc: Richard Purdie Cc: Sylwester Nawrocki Cc: Mauro Carvalho Chehab Cc: Chris Ball Cc: David Woodhouse Cc: Grant Likely Cc: Felipe Balbi Cc: Greg Kroah-Hartman Cc: Alan Stern Cc: Sangbeom Kim Cc: Liam Girdwood Cc: linux-samsung-...@vger.kernel.org --- arch/arm/mach-exynos/dev-audio.c |2 +- arch/arm/mach-exynos/dev-ohci.c|2 +- arch/arm/mach-exynos/mach-nuri.c |6 +++--- arch/arm/mach-exynos/mach-origen.c |6 +++--- arch/arm/mach-exynos/mach-smdk4x12.c |2 +- arch/arm/mach-exynos/mach-smdkv310.c |6 +++--- arch/arm/mach-exynos/mach-universal_c210.c |4 ++-- arch/arm/mach-exynos/setup-i2c0.c |2 +- arch/arm/mach-exynos/setup-i2c1.c |2 +- arch/arm/mach-exynos/setup-i2c2.c |2 +- arch/arm/mach-exynos/setup-i2c3.c |2 +- arch/arm/mach-exynos/setup-i2c4.c |2 +- arch/arm/mach-exynos/setup-i2c5.c |2 +- arch/arm/mach-exynos/setup-i2c6.c |2 +- arch/arm/mach-exynos/setup-i2c7.c |2 +- arch/arm/mach-s3c24xx/common-smdk.c|4 ++-- arch/arm/mach-s3c24xx/mach-amlm5900.c |2 +- arch/arm/mach-s3c24xx/mach-anubis.c|6 +++--- arch/arm/mach-s3c24xx/mach-at2440evb.c |6 +++--- arch/arm/mach-s3c24xx/mach-bast.c |8 arch/arm/mach-s3c24xx/mach-gta02.c | 10 +- arch/arm/mach-s3c24xx/mach-h1940.c |8 arch/arm/mach-s3c24xx/mach-jive.c |6 +++--- arch/arm/mach-s3c24xx/mach-mini2440.c | 10 +- arch/arm/mach-s3c24xx/mach-n30.c |8 arch/arm/mach-s3c24xx/mach-nexcoder.c |2 +- arch/arm/mach-s3c24xx/mach-osiris.c|4 ++-- arch/arm/mach-s3c24xx/mach-otom.c |2 +- arch/arm/mach-s3c24xx/mach-qt2410.c|8 arch/arm/mach-s3c24xx/mach-rx1950.c| 10 +- arch/arm/mach-s3c24xx/mach-rx3715.c|2 +- arch/arm/mach-s3c24xx/mach-smdk2410.c |2 +- arch/arm/mach-s3c24xx/mach-smdk2413.c |4 ++-- arch/arm/mach-s3c24xx/mach-smdk2416.c |8 arch/arm/mach-s3c24xx/mach-smdk2440.c |2 +- arch/arm/mach-s3c24xx/mach-smdk2443.c |2 +- arch/arm/mach-s3c24xx/mach-tct_hammer.c|2 +- arch/arm/mach-s3c24xx/mach-vr1000.c|6 +++--- arch/arm/mach-s3c24xx/mach-vstms.c |4 ++-- arch/arm/mach-s3c24xx/setup-i2c.c |2 +- arch/arm/mach-s3c24xx/simtec-audio.c |2 +- arch/arm/mach-s3c24xx/simtec-usb.c |2 +- arch/arm/mach-s3c64xx/dev-audio.c |2 +- arch/arm/mach-s3c64xx/mach-anw6410.c |2 +- arch/arm/mach-s3c64xx/mach-crag6410-module.c |2 +- arch/arm/mach-s3c64xx/mach-crag6410.c |4 ++-- arch/arm/mach-s3c64xx/mach-hmt.c |4 ++-- arch/arm/mach-s3c64xx/mach-mini6410.c |4 ++-- arch/arm/mach-s3c64xx/mach-ncp.c |2 +- arch/arm/mach-s3c64xx/mach-real6410.c |4 ++-- arch/arm/mach-s3c64xx/mach-smartq.c|8 arch/arm/mach-s3c64xx/mach-smdk6400.c |2 +- arch/arm/mach-s3c64xx/mach-smdk6410.c |6 +++--- arch/arm/mach-s3c64xx/setup-i2c0.c |2 +- arch/arm/mach-s3c64xx/setup-i2c1.c |2 +- arch/arm/mach-s3c64xx/setup-ide.c |2 +- arch/arm/mach-s5p64x0/dev-audio.c |2 +- arch/arm/mach-s5p64x0/mach-smdk6440.c |4 ++-- arch/arm/mach-s5p64x0/mach-smdk6450.c |4 ++-- arch/arm/mach-s5p64x0/setup-i2c0.c |2 +- arch/arm/mach-s5p64x0/setup-i2c1.c |2 +- arch/arm/mach-s5pc100/dev-audio.c |2 +- arch/arm/mach-s5pc100/mach-smdkc100.c |8 arch/arm/mach-s5pc100/setup-i2c0.c |2 +- arch/arm/mach-s5pc100/setup-i2c1.c |2 +- arch/arm/mach-s5pv210/d
Re: [PATCH 2/2] [trivial] Documentation: broken URL in libata
On 02/13/2012 12:22 PM, Randy Dunlap wrote: On 02/13/2012 01:09 AM, Michael Opdenacker wrote: Fix broken link to license text: http://www.opensource.org/licenses/osl-1.1.txt The text for version 1.1 of the Open Sofware license doesn't seem to be available anywhere on http://www.opensource.org/ any more. Replace it with a snapshot from the Internet Wayback Machine. That's one option. Too bad opensource.org doesn't provide archives. OSL v1.1 is also available here: http://fedoraproject.org/wiki/Licensing:OSL1.1 and here: http://www.samurajdata.se/opensource/mirror/licenses/osl.php Jeff, I don't suppose there is any chance of changing this file's license? (since the Debian people found it to be a problem .. long ago) Yeah, that's fine... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git patches] libata new PCI IDs
Please pull 7b4f6ecacb14f384adc1a5a67ad95eb082c02bd1 from git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git tags/upstream-linus to receive the following updates: drivers/ata/ahci.c | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) Alan Cox (2): ahci: Add alternate identifier for the 88SE9172 ahci: Add identifiers for ASM106x devices Ben Hutchings (1): ahci: Add JMicron 362 device IDs diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 50d5dea..7862d17 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -268,6 +268,9 @@ static const struct pci_device_id ahci_pci_tbl[] = { /* JMicron 360/1/3/5/6, match class to avoid IDE function */ { PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_STORAGE_SATA_AHCI, 0xff, board_ahci_ign_iferr }, + /* JMicron 362B and 362C have an AHCI function with IDE class code */ + { PCI_VDEVICE(JMICRON, 0x2362), board_ahci_ign_iferr }, + { PCI_VDEVICE(JMICRON, 0x236f), board_ahci_ign_iferr }, /* ATI */ { PCI_VDEVICE(ATI, 0x4380), board_ahci_sb600 }, /* ATI SB600 */ @@ -393,6 +396,8 @@ static const struct pci_device_id ahci_pci_tbl[] = { .driver_data = board_ahci_yes_fbs }, /* 88se9125 */ { PCI_DEVICE(0x1b4b, 0x917a), .driver_data = board_ahci_yes_fbs }, /* 88se9172 */ + { PCI_DEVICE(0x1b4b, 0x9192), + .driver_data = board_ahci_yes_fbs }, /* 88se9172 on some Gigabyte */ { PCI_DEVICE(0x1b4b, 0x91a3), .driver_data = board_ahci_yes_fbs }, @@ -400,7 +405,10 @@ static const struct pci_device_id ahci_pci_tbl[] = { { PCI_VDEVICE(PROMISE, 0x3f20), board_ahci }, /* PDC42819 */ /* Asmedia */ - { PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci }, /* ASM1061 */ + { PCI_VDEVICE(ASMEDIA, 0x0601), board_ahci }, /* ASM1060 */ + { PCI_VDEVICE(ASMEDIA, 0x0602), board_ahci }, /* ASM1060 */ + { PCI_VDEVICE(ASMEDIA, 0x0611), board_ahci }, /* ASM1061 */ + { PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci }, /* ASM1062 */ /* Generic, PCI class code for AHCI */ { PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] sound fixes for 3.6-rc6
On Thu, Sep 13, 2012 at 02:28:51PM +0200, Takashi Iwai wrote: > > FWIW, it was an output from git-pull-request, which fell back to the > > equivalent branch. Usually I check it manually but I forgot it at > > this time just before going to a meeting. > > > > This was with git 1.7.11.5. I'll check whether this still happens > > with 1.7.12. > > The same problem still happens with git 1.7.12. > This is rather annoying than useful. I can't reproduce here. What is your exact request-pull invocation? Is request-pull showing a warning like: warn: You locally have sound-3.6 but it does not (yet) warn: appear to be at git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git warn: Do you want to push it there, perhaps? (it should do so since v1.7.11.2). Maybe we need to make it possible to bump that warning to a fatal error? -Peff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] remove smbfs
On Wed, 30 Jan 2008 22:16:13 +0100 Guenter Kukkukk <[EMAIL PROTECTED]> wrote: > Am Montag, 28. Januar 2008 schrieb Adrian Bunk: > > I remember that there were some small things missing in CIFS for > > completely replacing the unmaintained smbfs when we discussed > > removing smbfs back in 2005 due to smbfs being unmaintained. > > > > CIFS has improved since, smbfs is still unmaintained, and it's > > becoming time to finally remove smbfs. > > > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > > > > "... unmaintained smbfs ..." is not quite right, see >http://lkml.org/lkml/2007/11/6/94 > Before removing it now completely, drop > Jeff Layton <[EMAIL PROTECTED]> > a note. > Afaik, Redhat still has customers which rely on smbfs. > Some of our older products use smbfs, but our newer stuff (RHEL5 and up) have smbfs disabled. Fedora has had smbfs disabled for quite some time as well. I've heard very few complaints (though maybe they're just not getting to me). I have no problem with targeting smbfs for removal, but I thought Andrew had an unofficial policy that we should first mark things to be deprecated, and then remove them 2 releases later. That seems like a sensible policy to me. If we mark it deprecated in 2.6.25 then we can remove it after 2.6.26 is released. It might not even hurt to have a nice loud printk when the smbfs module is plugged in to warn users that it's slated to be removed, and that they should move to CIFS as soon as possible. > In addition, cifs cannot completely replace smbfs atm. > Even todays sold NAS-boxes (often running anchient > samba-2.x.x) work only with smbfs on the client side. It would be ideal if someone were to report these problems as bugs. I remember some of those in the past, but haven't heard of any cases of that sort of thing for some time. When I have, Steve has generally been very good about tracking down the cause and fixing it. Cheers, -- Jeff Layton <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] remove smbfs
On Thu, 31 Jan 2008 00:58:10 +0200 Adrian Bunk <[EMAIL PROTECTED]> wrote: > On Wed, Jan 30, 2008 at 05:41:03PM -0500, Jeff Layton wrote: > > On Wed, 30 Jan 2008 22:16:13 +0100 > > Guenter Kukkukk <[EMAIL PROTECTED]> wrote: > > > > > Am Montag, 28. Januar 2008 schrieb Adrian Bunk: > > > > I remember that there were some small things missing in CIFS > > > > for completely replacing the unmaintained smbfs when we > > > > discussed removing smbfs back in 2005 due to smbfs being > > > > unmaintained. > > > > > > > > CIFS has improved since, smbfs is still unmaintained, and it's > > > > becoming time to finally remove smbfs. > > > > > > > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > > > > > > > > > > "... unmaintained smbfs ..." is not quite right, see > > >http://lkml.org/lkml/2007/11/6/94 > > > Before removing it now completely, drop > > > Jeff Layton <[EMAIL PROTECTED]> > > > a note. > > > Afaik, Redhat still has customers which rely on smbfs. > > > > > > > Some of our older products use smbfs, but our newer stuff (RHEL5 and > > up) have smbfs disabled. Fedora has had smbfs disabled for quite > > some time as well. I've heard very few complaints (though maybe > > they're just not getting to me). > > > > I have no problem with targeting smbfs for removal, but I thought > > Andrew had an unofficial policy that we should first mark things to > > be deprecated, and then remove them 2 releases later. That seems > > like a sensible policy to me. If we mark it deprecated in 2.6.25 > > then we can remove it after 2.6.26 is released. > > > > It might not even hurt to have a nice loud printk when the smbfs > > module is plugged in to warn users that it's slated to be removed, > > and that they should move to CIFS as soon as possible. > > Andrew has this with a target date of December 2006 (sic) for the > removal buried in -mm... > True, but most users don't run -mm. I think we should have this marked as deprecated in mainline kernels before removing it. I rather like the idea of a runtime warning too... smbfs has the unfortunate quality of momentum. A lot of users aren't aware of CIFS at all since smbfs basically does what they need it to do. Some extra warning for those users would be nice. > > > In addition, cifs cannot completely replace smbfs atm. > > > Even todays sold NAS-boxes (often running anchient > > > samba-2.x.x) work only with smbfs on the client side. > > > > It would be ideal if someone were to report these problems as bugs. > > I remember some of those in the past, but haven't heard of any > > cases of that sort of thing for some time. When I have, Steve has > > generally been very good about tracking down the cause and fixing > > it. > > More exactly, one of the main advantages of removing redundant code > like smbfs is that people are finally forced to report their bugs. > Indeed. I'm all for removing it, but I think we should try to have a clear transition path to avoid some of the "WTF happened to smbfs?" emails we're bound to get. Marking it deprecated in mainline and stating that it'll be removed in version 2.6.26 (or whenever) seems like a reasonable thing to do. Just my $.02... -- Jeff Layton <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] remove smbfs
On Thu, 31 Jan 2008 02:47:17 +0200 Adrian Bunk <[EMAIL PROTECTED]> wrote: > On Wed, Jan 30, 2008 at 07:34:12PM -0500, Jeff Layton wrote: > > On Thu, 31 Jan 2008 00:58:10 +0200 > > Adrian Bunk <[EMAIL PROTECTED]> wrote: > > > > > On Wed, Jan 30, 2008 at 05:41:03PM -0500, Jeff Layton wrote: > > > > On Wed, 30 Jan 2008 22:16:13 +0100 > > > > Guenter Kukkukk <[EMAIL PROTECTED]> wrote: > > > > > > > > > Am Montag, 28. Januar 2008 schrieb Adrian Bunk: > > > > > > I remember that there were some small things missing in CIFS > > > > > > for completely replacing the unmaintained smbfs when we > > > > > > discussed removing smbfs back in 2005 due to smbfs being > > > > > > unmaintained. > > > > > > > > > > > > CIFS has improved since, smbfs is still unmaintained, and > > > > > > it's becoming time to finally remove smbfs. > > > > > > > > > > > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > > > > > > > > > > > > > > > > "... unmaintained smbfs ..." is not quite right, see > > > > >http://lkml.org/lkml/2007/11/6/94 > > > > > Before removing it now completely, drop > > > > > Jeff Layton <[EMAIL PROTECTED]> > > > > > a note. > > > > > Afaik, Redhat still has customers which rely on smbfs. > > > > > > > > > > > > > Some of our older products use smbfs, but our newer stuff > > > > (RHEL5 and up) have smbfs disabled. Fedora has had smbfs > > > > disabled for quite some time as well. I've heard very few > > > > complaints (though maybe they're just not getting to me). > > > > > > > > I have no problem with targeting smbfs for removal, but I > > > > thought Andrew had an unofficial policy that we should first > > > > mark things to be deprecated, and then remove them 2 releases > > > > later. That seems like a sensible policy to me. If we mark it > > > > deprecated in 2.6.25 then we can remove it after 2.6.26 is > > > > released. > > > > > > > > It might not even hurt to have a nice loud printk when the smbfs > > > > module is plugged in to warn users that it's slated to be > > > > removed, and that they should move to CIFS as soon as possible. > > > > > > Andrew has this with a target date of December 2006 (sic) for the > > > removal buried in -mm... > > > > > > > True, but most users don't run -mm. I think we should have this > > marked as deprecated in mainline kernels before removing it. I > > rather like the idea of a runtime warning too... > > drivers/pcmcia/pcmcia_ioctl.c was scheduled for removal in November > 2005 and has such a printk since 2005. > > Without any good reason why it's still in the kernel. > > > smbfs has the unfortunate quality of momentum. A lot of users aren't > > aware of CIFS at all since smbfs basically does what they need it to > > do. Some extra warning for those users would be nice. > > And many users will start whining loudly that the not deprecated > driver (in this case cifs) has this or that bug not before the patch > to finally remove the deprecated feature got applied or at least > posted. > > And will demand that it therefore does not get removed. > Sucks for them then. They're always welcome to take the old code and maintain it out of tree if they wish. > > > > > In addition, cifs cannot completely replace smbfs atm. > > > > > Even todays sold NAS-boxes (often running anchient > > > > > samba-2.x.x) work only with smbfs on the client side. > > > > > > > > It would be ideal if someone were to report these problems as > > > > bugs. I remember some of those in the past, but haven't heard > > > > of any cases of that sort of thing for some time. When I have, > > > > Steve has generally been very good about tracking down the > > > > cause and fixing it. > > > > > > More exactly, one of the main advantages of removing redundant > > > code like smbfs is that people are finally forced to report their > > > bugs. > > > > > > > Indeed. I'm all for removing it, but I think we should try to have a > > clear transition path to avoid some of the "WTF happened to smbfs?" > > emails we're bound to get. Marking it deprecated in mainline and > > stating that it'll be removed in version 2.6.26 (or whenever) seems > > like a reasonable thing to do. > > How many "WTF happened to smbfs?" emails did you get at RedHat? > Not too many, though we did have customers who demanded that we fix smbfs and refused to move to cifs. We did that for some of the older releases but they're out of luck on the new ones. I don't feel strongly about this either way, really. In general, I think offering some warning is the better approach, but if you think that removing smbfs immediately is the right one then go for it... -- Jeff Layton <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cups slow on linux-2.6.24
On Jan 30, 2008 9:47 PM, Patrick McHardy <[EMAIL PROTECTED]> wrote: > A binary dump would be more useful: > > tcpdump -i lo -w > > and I guess Jozsef also wants "-s 0" so the full packets are included. Attached. Again, both runs with this command to print ... for((i=1; i<1001;i++)); do echo $i | lpr -Plp; done In the good file (lo.good), look for this timestamp (08:38:11.818587) when it paused then continue again at 08:38:22.477261. That's almost 11 seconds of "sleep" ... (may be a feature of TCP/IP?). In the bad file (lo.bad), look for 08:47:55.434722 where it paused, and then continue at 08:48:24.449176. That's 28 seconds of sleep. But, after it continued, lpstat shows it's printing a job every 3 seconds. All 100 jobs take approx 1400 seconds to complete as compared to under 100 seconds for the good run. Again, using latest linux, one with 17311393f969090ab060540bd9dbe7dc885a76d5 reverted, and the other without. Thanks, Jeff. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cups slow on linux-2.6.24
On Jan 31, 2008 10:41 AM, Patrick McHardy <[EMAIL PROTECTED]> wrote: > Thanks. In the dump we can see that connections reusing ports > always have their first SYN dropped and retransmissted three > seconds later. I'm not sure whats causing this yet, do you have > any firewall rules that affect loopback traffic? No firewall. And the "lp" is just to print to a [EMAIL PROTECTED] # lpadmin -p lp -i /etc/cups/interfaces/lp -v lpd://localhost/file -o printer-error-policy=retry-job # lpadmin -p file -i /etc/cups/interfaces/file -v file:/dev/null -o printer-error-policy=retry-job Filter for lp is .. cat $6 Filter for file is ... cat $6 >/tmp/$$ Thanks, Jeff. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cups slow on linux-2.6.24
On Jan 31, 2008 11:25 AM, Patrick McHardy <[EMAIL PROTECTED]> wrote: > Actually its probably the SYN/ACK that is dropped. Please try whether > > modprobe ipt_LOG > echo 255 >/proc/sys/net/netfilter/nf_conntrack_log_invalid On the good run, I don't get any message, which is good. On the bad run, I got the following message ... boston kernel: nf_ct_tcp: invalid packed ignored IN= OUT= SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=8162 DF PROTO=TCP SPT=1016 DPT=515 SEQ=3834958843 ACK=0 WINDOW=32792 RES=0x00 SYN URGP=0 OPT (0204400C0402080ACC1901030307) UID=0 GID=65534 This message is displayed repeatedly after the job got "stuck", once ever 3 seconds coinciding with every 3 seconds of the print job sent. Hope this helps. Jeff. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] ata_piix.c: make piix_merge_scr() static
Adrian Bunk wrote: piix_merge_scr() can become static. Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> applied -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Are Section mismatches out of control?
Andrew Morton wrote: On Fri, 1 Feb 2008 11:47:18 +0100 Sam Ravnborg <[EMAIL PROTECTED]> wrote: James said in a related posting that the Section mismatch warnings were getting out of control. eh. They're easy - the build system tells you about them! The list is here: Question is: why do people keep adding new ones when they are so easy to detect and fix? Asnwer: because neither they nor their patch integrators are doing adequate compilation testing. I will look at drivers/isdn as next step. Thanks. Another way to look at it... All of a sudden, different from 2.6.24, kernel 2.6.25-git build spews so many warnings that I need to disable section mismatch checking completely, because there is so much noise that __normal build messages scroll off the screen__. Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] [libata] Blackfin pata-bf54x driver: Remove obsolete PM function
Bryan Wu wrote: From: Sonic Zhang <[EMAIL PROTECTED]> Signed-off-by: Sonic Zhang <[EMAIL PROTECTED]> Signed-off-by: Bryan Wu <[EMAIL PROTECTED]> --- drivers/ata/pata_bf54x.c |4 1 files changed, 0 insertions(+), 4 deletions(-) applied 1-4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] isdn: fix section mismatch warnings in isdn
Sam Ravnborg wrote: I know Jeff Garzik has some ISDN clean-up patches pending but rather than waiting forever to have them acked I deciced to fix the warnings in the current kernel. Please do... Those patches are not going to be submitted for the current merge window, as I had higher priorities. The PCI hotplug conversion is complete, but a few "rough edges" remain to be cleaned up -- then we must test, since none of this work is tested at all yet. For anyone else curious about the ISDN PCI hotplug API conversion, it is available on the 'isdn-pci' branch of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix for completion handling
Robert Hancock wrote: This patch is based on an original patch from Kuan Luo of NVIDIA, posted under subject "fixed a bug of adma in rhel4u5 with HDS7250SASUN500G". His description follows. I've reworked it a bit to avoid some unnecessary repeated checks but it should be functionally identical. "The patch is to solve the error message "ata1: CPB flags CMD err, flags=0x11" when testing HDS7250SASUN500G in rhel4u5. I tested this hd in 2.6.24-rc7 which needed to remove the mask in blacklist to run the ncq and the same error also showed up. I traced the bug and found that the interrupt finished a command (for example, tag=0) when the driver got that adma status is NV_ADMA_STAT_DONE and cpb->resp_flags is NV_CPB_RESP_DONE. However, For this hd, the drive maybe didn't clear bit 0 at this moment. It meaned the hardware had not completely finished the command. If at the same time the driver freed the command(tag 0) and sended another command (tag 0), the error happened. The notifier register is 32-bit register containing notifier value. Value is bit vector containing one bit per tag number (0-31) in corresponding bit positions (bit 0 is for tag 0, etc). When bit is set then ADMA indicates that command with corresponding tag number completed execution. So i added the check notifier code. Sometimes i saw that the notifier reg set some bits , but the adma status set NV_ADMA_STAT_CMD_COMPLETE ,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE check code." Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> applied, thanks all for investigating this stuff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] PCI: modify SB700 SATA MSI quirk
Tejun Heo wrote: From: Shane Huang <[EMAIL PROTECTED]> SB700 SATA MSI bug will be fixed in SB700 revision A21 at hardware level, but the SB700 revision older than A21 will also be found in the market. This patch modify the original quirk commit bc38b411fe696fad32b261f492cb4afbf1835256 instead of withdrawing it. The patch also removes quirk to 0x4395 because 0x4395 is SB800 device ID. Signed-off-by: Shane Huang <[EMAIL PROTECTED]> Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- Okay, here's reformatted in-line version. Shane, please invest some time into setting up email environment. Sending patches via email is an important part of the linux kernel development process and if you're gonna submit patches, you're just gonna have to do it. drivers/pci/quirks.c | 29 ++--- 1 file changed, 22 insertions(+), 7 deletions(-) FWIW, I'm happy with whatever this thread results in... it sounds like Tejun and Shane are iterating towards a satisfactory final result. Just let me know if I need to merge something, since I'm assuming that GregKH will push this through the PCI tree. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] arch/um/include/init.h: Fix missing macro definitions
On Thu, Jan 31, 2008 at 11:06:34PM +0800, WANG Cong wrote: > This patch fixed the following build error in current -git tree. > > arch/um/kernel/config.c:10: error: expected declaration specifiers or '...' > before '.' token > ... This is close to uml-arch-um-include-inith-needs-a-definition-of-__used.patch that's currently in -mm. Andrew, could you replace uml-arch-um-include-inith-needs-a-definition-of-__used.patch with the version below and push it to Linus? Jeff -- Work email - jdike at linux dot intel dot com init.h started breaking now for some reason. It turns out that there wasn't a definition of __used. Fixed this by copying the relevant stuff from compiler.h in the userspace case, and including compiler.h in the kernel case. >From WANG Cong <[EMAIL PROTECTED]> - added definition of __section Signed-off-by: Jeff Dike <[EMAIL PROTECTED]> Cc: WANG Cong <[EMAIL PROTECTED]> --- arch/um/include/init.h | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) Index: linux-2.6-git/arch/um/include/init.h === --- linux-2.6-git.orig/arch/um/include/init.h 2008-02-01 10:41:14.0 -0500 +++ linux-2.6-git/arch/um/include/init.h2008-02-01 10:52:34.0 -0500 @@ -40,6 +40,20 @@ typedef int (*initcall_t)(void); typedef void (*exitcall_t)(void); +#ifndef __KERNEL__ +#ifndef __section +# define __section(S) __attribute__ ((__section__(#S))) +#endif + +#if __GNUC_MINOR__ >= 3 +# define __used__attribute__((__used__)) +#else +# define __used__attribute__((__unused__)) +#endif + +#else +#include +#endif /* These are for everybody (although not all archs will actually discard it in modules) */ #define __init __section(.init.text) @@ -127,14 +141,3 @@ extern struct uml_param __uml_setup_star #endif #endif /* _LINUX_UML_INIT_H */ - -/* - * Overrides for Emacs so that we follow Linus's tabbing style. - * Emacs will notice this stuff at the end of the file and automatically - * adjust the settings for this buffer only. This must remain at the end - * of the file. - * --- - * Local variables: - * c-file-style: "linux" - * End: - */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH #upstream] libata: implement libata.force module parameter
Tejun Heo wrote: This patch implements libata.force module parameter which can selectively override ATA port, link and device configurations including cable type, SATA PHY SPD limit, transfer mode and NCQ. For example, you can say "use 1.5Gbps for all fan-out ports attached to the second port but allow 3.0Gbps for the PMP device itself, oh, the device attached to the third fan-out port chokes on NCQ and shouldn't go over UDMA4" by the following. libata.force=2:1.5g,2.15:3.0g,2.03:noncq,udma4 Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> --- I guess it's about time we add something like this. More than anything else this should help debugging and can serve as a last resort to work around problems. Thanks. Documentation/kernel-parameters.txt | 35 +++ drivers/ata/libata-core.c | 375 +++- drivers/ata/libata-eh.c |8 drivers/ata/libata.h|1 4 files changed, 415 insertions(+), 4 deletions(-) ACK, but it breaks the build due to section type conflicts: drivers/ata/libata-core.c:108: error: ata_force_param_buf causes a section type conflict Given that the data is marked __initdata and the code is marked __init, I cannot see the problem. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git Patch] UML: a build error fix
On Thu, Jan 31, 2008 at 11:17:41PM +0800, WANG Cong wrote: > This patch fixed this error: > > arch/um/kernel/skas/syscall.c: In function 'handle_syscall': > arch/um/kernel/skas/syscall.c:33: error: 'NR_syscalls' undeclared (first use > in this function) That works, but I think doing things the way that i386 does them is cleaner. Andrew, can you stick the patch below into -mm and push it to Linus? Jeff -- Work email - jdike at linux dot intel dot com Redo the calculation of NR_syscalls since that disappeared from i386 and use a similar mechanism on x86_64. We now figure out the size of the system call table in arch code and stick that in syscall_table_size. arch/um/kernel/skas/syscall.c defines NR_syscalls in terms of that since its the only thing that needs to know how many system calls there are. The old mechananism that was used on x86_64 is gone. arch/um/include/sysdep-i386/syscalls.h got some formatting since I was looking at it. Signed-off-by: Jeff Dike <[EMAIL PROTECTED]> Cc: WANG Cong <[EMAIL PROTECTED]> --- arch/um/include/sysdep-i386/syscalls.h |5 +++-- arch/um/include/sysdep-x86_64/kernel-offsets.h |9 - arch/um/include/sysdep-x86_64/syscalls.h |2 -- arch/um/kernel/skas/syscall.c |3 +++ arch/um/sys-i386/sys_call_table.S |5 + arch/um/sys-x86_64/syscall_table.c | 17 ++--- 6 files changed, 25 insertions(+), 16 deletions(-) Index: linux-2.6-git/arch/um/include/sysdep-x86_64/syscalls.h === --- linux-2.6-git.orig/arch/um/include/sysdep-x86_64/syscalls.h 2008-02-01 11:24:32.0 -0500 +++ linux-2.6-git/arch/um/include/sysdep-x86_64/syscalls.h 2008-02-01 11:47:51.0 -0500 @@ -30,6 +30,4 @@ extern long old_mmap(unsigned long addr, extern syscall_handler_t sys_modify_ldt; extern syscall_handler_t sys_arch_prctl; -#define NR_syscalls (UM_NR_syscall_max + 1) - #endif Index: linux-2.6-git/arch/um/kernel/skas/syscall.c === --- linux-2.6-git.orig/arch/um/kernel/skas/syscall.c2008-02-01 11:24:32.0 -0500 +++ linux-2.6-git/arch/um/kernel/skas/syscall.c 2008-02-01 11:48:02.0 -0500 @@ -9,6 +9,9 @@ #include "sysdep/ptrace.h" #include "sysdep/syscalls.h" +extern int syscall_table_size; +#define NR_syscalls (syscall_table_size / sizeof(void *)) + void handle_syscall(struct uml_pt_regs *r) { struct pt_regs *regs = container_of(r, struct pt_regs, regs); Index: linux-2.6-git/arch/um/sys-i386/sys_call_table.S === --- linux-2.6-git.orig/arch/um/sys-i386/sys_call_table.S2008-02-01 11:24:32.0 -0500 +++ linux-2.6-git/arch/um/sys-i386/sys_call_table.S 2008-02-01 12:08:17.0 -0500 @@ -9,4 +9,9 @@ #define old_mmap old_mmap_i386 +.section .rodata,"a" + #include "../../x86/kernel/syscall_table_32.S" + +ENTRY(syscall_table_size) +.long .-sys_call_table Index: linux-2.6-git/arch/um/include/sysdep-i386/syscalls.h === --- linux-2.6-git.orig/arch/um/include/sysdep-i386/syscalls.h 2007-11-28 13:01:17.0 -0500 +++ linux-2.6-git/arch/um/include/sysdep-i386/syscalls.h 2008-02-01 11:48:02.0 -0500 @@ -1,5 +1,5 @@ /* - * Copyright (C) 2000 Jeff Dike ([EMAIL PROTECTED]) + * Copyright (C) 2000 - 2008 Jeff Dike ([EMAIL PROTECTED],linux.intel}.com) * Licensed under the GPL */ @@ -18,7 +18,8 @@ extern syscall_handler_t old_mmap_i386; extern syscall_handler_t *sys_call_table[]; #define EXECUTE_SYSCALL(syscall, regs) \ - ((long (*)(struct syscall_args)) (*sys_call_table[syscall]))(SYSCALL_ARGS(®s->regs)) + ((long (*)(struct syscall_args)) \ +(*sys_call_table[syscall]))(SYSCALL_ARGS(®s->regs)) extern long sys_mmap2(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, Index: linux-2.6-git/arch/um/include/sysdep-x86_64/kernel-offsets.h === --- linux-2.6-git.orig/arch/um/include/sysdep-x86_64/kernel-offsets.h 2007-12-03 23:56:34.0 -0500 +++ linux-2.6-git/arch/um/include/sysdep-x86_64/kernel-offsets.h 2008-02-01 11:48:01.0 -0500 @@ -17,16 +17,7 @@ #define OFFSET(sym, str, mem) \ DEFINE(sym, offsetof(struct str, mem)); -#define __NO_STUBS 1 -#undef __SYSCALL -#undef _ASM_X86_64_UNISTD_H_ -#define __SYSCALL(nr, sym) [nr] = 1, -static char syscalls[] = { -#include -}; - void foo(void) { #include -DEFINE(UM_NR_syscall_max, sizeof(syscalls) - 1); } Index: linux-2.6-git/arch/um/sys-x86_64/syscall_table.c =