2.4.2 seems to break loopback and/or mount

2001-02-22 Thread jeff

Please CC me on replies. I just joined the list and don't want
to miss any replies.

I have been running 2.4.1-pre10 for quite some time with no
problems. I just upgraded to 2.4.2 and everything seem to work
fine until I did...  (as root or course)

mount -t iso9660 -o loop,ro mycdimage.iso /mnt/cdrom

at which point the mount process hung in an uninterruptable sleep.
after that I can no longer successfully issue any other mount
commands, including non-loopback mounts. I can mount/unmount
regular partitions before mounting anything via loopback.

Any ideas as to what is wrong?
The only thing I can think of is that my modutils is v2.3.19
but I doubt that is doing it as the loop module and other modules
are loaded fine.

If anybody has an idea as to what I broke please let me know.
I will upgrade modutils tomorrow and see if the problem goes
away while I wait for a possibly more accurate response.

Thank you,

Jeff Wiegley

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



the editing is that you need

2018-08-11 Thread Jeff

We would like to check if your photos need editing. We can do it for you.

Our image editing is for web store photos, jewelries images and beauty and
portrait photos etc.
It is including cut out and clipping path , and also retouching if it is
needed.

We can do test on your photos. Just send us a photo we will start to work
on it,

Thanks,
Jeff Allen



the photos is what you need

2018-08-11 Thread Jeff

We would like to check if your photos need editing. We can do it for you.

Our image editing is for web store photos, jewelries images and beauty and
portrait photos etc.
It is including cut out and clipping path , and also retouching if it is
needed.

We can do test on your photos. Just send us a photo we will start to work
on it,

Thanks,
Jeff Allen



Re: Recent kernel "mount" slow

2012-11-26 Thread Jeff Chua
On Sun, Nov 25, 2012 at 7:23 AM, Jeff Chua  wrote:
> On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka  wrote:
>> So it's better to slow down mount.
>
> I am quite proud of the linux boot time pitting against other OS. Even
> with 10 partitions. Linux can boot up in just a few seconds, but now
> you're saying that we need to do this semaphore check at boot up. By
> doing so, it's inducing additional 4 seconds during boot up.

By the way, I'm using a pretty fast SSD (Samsung PM830) and fast CPU
(2.8GHz). I wonder if those on slower hard disk or slower CPU, what
kind of degradation would this cause or just the same?

Thanks,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Recent kernel "mount" slow

2012-11-27 Thread Jeff Chua
Jens,

Limited access now at Incheon Airport. Will try the patch out when I arrived.

Thanks,
Jeff

On 11/27/12, Jens Axboe  wrote:
> On 2012-11-27 08:38, Jens Axboe wrote:
>> On 2012-11-27 06:57, Jeff Chua wrote:
>>> On Sun, Nov 25, 2012 at 7:23 AM, Jeff Chua 
>>> wrote:
>>>> On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka 
>>>> wrote:
>>>>> So it's better to slow down mount.
>>>>
>>>> I am quite proud of the linux boot time pitting against other OS. Even
>>>> with 10 partitions. Linux can boot up in just a few seconds, but now
>>>> you're saying that we need to do this semaphore check at boot up. By
>>>> doing so, it's inducing additional 4 seconds during boot up.
>>>
>>> By the way, I'm using a pretty fast SSD (Samsung PM830) and fast CPU
>>> (2.8GHz). I wonder if those on slower hard disk or slower CPU, what
>>> kind of degradation would this cause or just the same?
>>
>> It'd likely be the same slow down time wise, but as a percentage it
>> would appear smaller on a slower disk.
>>
>> Could you please test Mikulas' suggestion of changing
>> synchronize_sched() in include/linux/percpu-rwsem.h to
>> synchronize_sched_expedited()?
>>
>> linux-next also has a re-write of the per-cpu rw sems, out of Andrews
>> tree. It would be a good data point it you could test that, too.
>>
>> In any case, the slow down definitely isn't acceptable. Fixing an
>> obscure issue like block sizes changing while O_DIRECT is in flight
>> definitely does NOT warrant a mount slow down.
>
> Here's Olegs patch, might be easier for you than switching to
> linux-next. Please try that.
>
> From: Oleg Nesterov 
> Subject: percpu_rw_semaphore: reimplement to not block the readers
> unnecessarily
>
> Currently the writer does msleep() plus synchronize_sched() 3 times to
> acquire/release the semaphore, and during this time the readers are
> blocked completely.  Even if the "write" section was not actually started
> or if it was already finished.
>
> With this patch down_write/up_write does synchronize_sched() twice and
> down_read/up_read are still possible during this time, just they use the
> slow path.
>
> percpu_down_write() first forces the readers to use rw_semaphore and
> increment the "slow" counter to take the lock for reading, then it
> takes that rw_semaphore for writing and blocks the readers.
>
> Also.  With this patch the code relies on the documented behaviour of
> synchronize_sched(), it doesn't try to pair synchronize_sched() with
> barrier.
>
> Signed-off-by: Oleg Nesterov 
> Reviewed-by: Paul E. McKenney 
> Cc: Linus Torvalds 
> Cc: Mikulas Patocka 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Srikar Dronamraju 
> Cc: Ananth N Mavinakayanahalli 
> Cc: Anton Arapov 
> Cc: Jens Axboe 
> Signed-off-by: Andrew Morton 
> ---
>
>  include/linux/percpu-rwsem.h |   85 +++---
>  lib/Makefile |2
>  lib/percpu-rwsem.c   |  123 +
>  3 files changed, 138 insertions(+), 72 deletions(-)
>
> diff -puN
> include/linux/percpu-rwsem.h~percpu_rw_semaphore-reimplement-to-not-block-the-readers-unnecessarily
> include/linux/percpu-rwsem.h
> ---
> a/include/linux/percpu-rwsem.h~percpu_rw_semaphore-reimplement-to-not-block-the-readers-unnecessarily
> +++ a/include/linux/percpu-rwsem.h
> @@ -2,82 +2,25 @@
>  #define _LINUX_PERCPU_RWSEM_H
>
>  #include 
> +#include 
>  #include 
> -#include 
> -#include 
> +#include 
>
>  struct percpu_rw_semaphore {
> - unsigned __percpu *counters;
> - bool locked;
> - struct mutex mtx;
> + unsigned int __percpu   *fast_read_ctr;
> + struct mutexwriter_mutex;
> + struct rw_semaphore rw_sem;
> + atomic_tslow_read_ctr;
> + wait_queue_head_t   write_waitq;
>  };
>
> -#define light_mb()   barrier()
> -#define heavy_mb()   synchronize_sched()
> +extern void percpu_down_read(struct percpu_rw_semaphore *);
> +extern void percpu_up_read(struct percpu_rw_semaphore *);
>
> -static inline void percpu_down_read(struct percpu_rw_semaphore *p)
> -{
> - rcu_read_lock_sched();
> - if (unlikely(p->locked)) {
> - rcu_read_unlock_sched();
> - mutex_lock(&p->mtx);
> - this_cpu_inc(*p->counters);
> - mutex_unlock(&p->mtx);
> - return;
> - }
> - this_cpu_inc(*p->counters);
> - rcu_read_unlock_sched();
> - light_mb();

Re: Recent kernel "mount" slow

2012-11-27 Thread Jeff Chua
On Tue, Nov 27, 2012 at 3:38 PM, Jens Axboe  wrote:
> On 2012-11-27 06:57, Jeff Chua wrote:
>> On Sun, Nov 25, 2012 at 7:23 AM, Jeff Chua  wrote:
>>> On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka  
>>> wrote:
>>>> So it's better to slow down mount.
>>>
>>> I am quite proud of the linux boot time pitting against other OS. Even
>>> with 10 partitions. Linux can boot up in just a few seconds, but now
>>> you're saying that we need to do this semaphore check at boot up. By
>>> doing so, it's inducing additional 4 seconds during boot up.
>>
>> By the way, I'm using a pretty fast SSD (Samsung PM830) and fast CPU
>> (2.8GHz). I wonder if those on slower hard disk or slower CPU, what
>> kind of degradation would this cause or just the same?
>
> It'd likely be the same slow down time wise, but as a percentage it
> would appear smaller on a slower disk.
>
> Could you please test Mikulas' suggestion of changing
> synchronize_sched() in include/linux/percpu-rwsem.h to
> synchronize_sched_expedited()?

Tested. It seems as fast as before, but may be a "tick" slower. Just
perception. I was getting pretty much 0.012s with everything reverted.
With synchronize_sched_expedited(), it seems to be 0.012s ~ 0.013s.
So, it's good.


> linux-next also has a re-write of the per-cpu rw sems, out of Andrews
> tree. It would be a good data point it you could test that, too.

Tested. It's slower. 0.350s. But still faster than 0.500s without the patch.

# time mount /dev/sda1 /mnt; sync; sync; umount /mnt


So, here's the comparison ...

0.500s 3.7.0-rc7
0.168s 3.7.0-rc2
0.012s 3.6.0
0.013s 3.7.0-rc7 + synchronize_sched_expedited()
0.350s 3.7.0-rc7 + Oleg's patch.


Thanks,
Jeff.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch,v3,repost 03/10] scsi: make scsi_alloc_sdev numa-aware

2012-11-27 Thread Jeff Moyer
Use the numa node id set in the Scsi_Host to allocate the sdev structure
on the device-local numa node.

Reviewed-by: Bart Van Assche 
Signed-off-by: Jeff Moyer 
---
 drivers/scsi/scsi_scan.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3e58b22..d91749d 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -232,8 +232,8 @@ static struct scsi_device *scsi_alloc_sdev(struct 
scsi_target *starget,
extern void scsi_evt_thread(struct work_struct *work);
extern void scsi_requeue_run_queue(struct work_struct *work);
 
-   sdev = kzalloc(sizeof(*sdev) + shost->transportt->device_size,
-  GFP_ATOMIC);
+   sdev = kzalloc_node(sizeof(*sdev) + shost->transportt->device_size,
+   GFP_ATOMIC, scsi_host_get_numa_node(shost));
if (!sdev)
goto out;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch,v3,repost 07/10] megaraid_sas: use scsi_host_alloc_node

2012-11-27 Thread Jeff Moyer

Signed-off-by: Jeff Moyer 
---
 drivers/scsi/megaraid/megaraid_sas_base.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c 
b/drivers/scsi/megaraid/megaraid_sas_base.c
index d2c5366..707a6cd 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -4020,8 +4020,9 @@ megasas_probe_one(struct pci_dev *pdev, const struct 
pci_device_id *id)
if (megasas_set_dma_mask(pdev))
goto fail_set_dma_mask;
 
-   host = scsi_host_alloc(&megasas_template,
-  sizeof(struct megasas_instance));
+   host = scsi_host_alloc_node(&megasas_template,
+   sizeof(struct megasas_instance),
+   dev_to_node(&pdev->dev));
 
if (!host) {
printk(KERN_DEBUG "megasas: scsi_host_alloc failed\n");
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch,v3,repost 06/10] ata: use scsi_host_alloc_node

2012-11-27 Thread Jeff Moyer
Acked-by: Jeff Garzik 
Signed-off-by: Jeff Moyer 
---
 drivers/ata/libata-scsi.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index e3bda07..9d5dd09 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3586,7 +3586,8 @@ int ata_scsi_add_hosts(struct ata_host *host, struct 
scsi_host_template *sht)
struct Scsi_Host *shost;
 
rc = -ENOMEM;
-   shost = scsi_host_alloc(sht, sizeof(struct ata_port *));
+   shost = scsi_host_alloc_node(sht, sizeof(struct ata_port *),
+dev_to_node(host->dev));
if (!shost)
goto err_alloc;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch,v3,repost 10/10] cciss: use blk_init_queue_node

2012-11-27 Thread Jeff Moyer

Signed-off-by: Jeff Moyer 
---
 drivers/block/cciss.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index b0f553b..5fe5546 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -1930,7 +1930,8 @@ static void cciss_get_serial_no(ctlr_info_t *h, int 
logvol,
 static int cciss_add_disk(ctlr_info_t *h, struct gendisk *disk,
int drv_index)
 {
-   disk->queue = blk_init_queue(do_cciss_request, &h->lock);
+   disk->queue = blk_init_queue_node(do_cciss_request, &h->lock,
+ dev_to_node(&h->dev));
if (!disk->queue)
goto init_queue_failure;
sprintf(disk->disk_name, "cciss/c%dd%d", h->ctlr, drv_index);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch,v3,repost 01/10] scsi: add scsi_host_alloc_node

2012-11-27 Thread Jeff Moyer
Allow an LLD to specify on which numa node to allocate scsi data
structures.  Thanks to Bart Van Assche for the suggestion.

Reviewed-by: Bart Van Assche 
Signed-off-by: Jeff Moyer 
---
 drivers/scsi/hosts.c |   13 +++--
 include/scsi/scsi_host.h |   28 
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 593085a..06ce602 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -336,16 +336,25 @@ static struct device_type scsi_host_type = {
  **/
 struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 {
+   return scsi_host_alloc_node(sht, privsize, NUMA_NO_NODE);
+}
+EXPORT_SYMBOL(scsi_host_alloc);
+
+struct Scsi_Host *scsi_host_alloc_node(struct scsi_host_template *sht,
+  int privsize, int node)
+{
struct Scsi_Host *shost;
gfp_t gfp_mask = GFP_KERNEL;
 
if (sht->unchecked_isa_dma && privsize)
gfp_mask |= __GFP_DMA;
 
-   shost = kzalloc(sizeof(struct Scsi_Host) + privsize, gfp_mask);
+   shost = kzalloc_node(sizeof(struct Scsi_Host) + privsize,
+gfp_mask, node);
if (!shost)
return NULL;
 
+   scsi_host_set_numa_node(shost, node);
shost->host_lock = &shost->default_lock;
spin_lock_init(shost->host_lock);
shost->shost_state = SHOST_CREATED;
@@ -443,7 +452,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template 
*sht, int privsize)
kfree(shost);
return NULL;
 }
-EXPORT_SYMBOL(scsi_host_alloc);
+EXPORT_SYMBOL(scsi_host_alloc_node);
 
 struct Scsi_Host *scsi_register(struct scsi_host_template *sht, int privsize)
 {
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 4908480..438856d 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -732,6 +732,14 @@ struct Scsi_Host {
 */
struct device *dma_dev;
 
+#ifdef CONFIG_NUMA
+   /*
+* Numa node this device is closest to, used for allocating
+* data structures locally.
+*/
+   int numa_node;
+#endif
+
/*
 * We should ensure that this is aligned, both for better performance
 * and also because some compilers (m68k) don't automatically force
@@ -776,6 +784,8 @@ extern int scsi_queue_work(struct Scsi_Host *, struct 
work_struct *);
 extern void scsi_flush_work(struct Scsi_Host *);
 
 extern struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *, int);
+extern struct Scsi_Host *scsi_host_alloc_node(struct scsi_host_template *,
+ int, int);
 extern int __must_check scsi_add_host_with_dma(struct Scsi_Host *,
   struct device *,
   struct device *);
@@ -919,6 +929,24 @@ static inline unsigned char scsi_host_get_guard(struct 
Scsi_Host *shost)
return shost->prot_guard_type;
 }
 
+#ifdef CONFIG_NUMA
+static inline int scsi_host_get_numa_node(struct Scsi_Host *shost)
+{
+   return shost->numa_node;
+}
+
+static inline void scsi_host_set_numa_node(struct Scsi_Host *shost, int node)
+{
+   shost->numa_node = node;
+}
+#else /* CONFIG_NUMA */
+static inline int scsi_host_get_numa_node(struct Scsi_Host *shost)
+{
+   return NUMA_NO_NODE;
+}
+static inline void scsi_host_set_numa_node(struct Scsi_Host *shost, int node) 
{}
+#endif
+
 /* legacy interfaces */
 extern struct Scsi_Host *scsi_register(struct scsi_host_template *, int);
 extern void scsi_unregister(struct Scsi_Host *);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch,v3,repost 09/10] lpfc: use scsi_host_alloc_node

2012-11-27 Thread Jeff Moyer
Acked-By: James Smart  
Signed-off-by: Jeff Moyer 
---
 drivers/scsi/lpfc/lpfc_init.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 7dc4218..65956d3 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -3051,11 +3051,13 @@ lpfc_create_port(struct lpfc_hba *phba, int instance, 
struct device *dev)
int error = 0;
 
if (dev != &phba->pcidev->dev)
-   shost = scsi_host_alloc(&lpfc_vport_template,
-   sizeof(struct lpfc_vport));
+   shost = scsi_host_alloc_node(&lpfc_vport_template,
+sizeof(struct lpfc_vport),
+dev_to_node(&phba->pcidev->dev));
else
-   shost = scsi_host_alloc(&lpfc_template,
-   sizeof(struct lpfc_vport));
+   shost = scsi_host_alloc_node(&lpfc_template,
+sizeof(struct lpfc_vport),
+dev_to_node(&phba->pcidev->dev));
if (!shost)
goto out;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch,v3,repost 08/10] mpt2sas: use scsi_host_alloc_node

2012-11-27 Thread Jeff Moyer

Signed-off-by: Jeff Moyer 
---
 drivers/scsi/mpt2sas/mpt2sas_scsih.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c 
b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index af4e6c4..a4d6b36 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -8011,8 +8011,8 @@ _scsih_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
struct MPT2SAS_ADAPTER *ioc;
struct Scsi_Host *shost;
 
-   shost = scsi_host_alloc(&scsih_driver_template,
-   sizeof(struct MPT2SAS_ADAPTER));
+   shost = scsi_host_alloc_node(&scsih_driver_template,
+   sizeof(struct MPT2SAS_ADAPTER), dev_to_node(&pdev->dev));
if (!shost)
return -ENODEV;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch,v3,repost 02/10] scsi: make __scsi_alloc_queue numa-aware

2012-11-27 Thread Jeff Moyer
Pass the numa node id set in the Scsi_Host on to blk_init_queue_node
in order to keep all allocations local to the numa node the device is
closest to.

Reviewed-by: Bart Van Assche 
Signed-off-by: Jeff Moyer 
---
 drivers/scsi/scsi_lib.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index da36a3a..ebad5e8 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1664,7 +1664,8 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host 
*shost,
struct request_queue *q;
struct device *dev = shost->dma_dev;
 
-   q = blk_init_queue(request_fn, NULL);
+   q = blk_init_queue_node(request_fn, NULL,
+   scsi_host_get_numa_node(shost));
if (!q)
return NULL;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch,v3,repost 05/10] sd: use alloc_disk_node

2012-11-27 Thread Jeff Moyer
Reviewed-by: Bart Van Assche 
Signed-off-by: Jeff Moyer 
---
 drivers/scsi/sd.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 12f6fdf..a5dae6b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2714,7 +2714,7 @@ static int sd_probe(struct device *dev)
if (!sdkp)
goto out;
 
-   gd = alloc_disk(SD_MINORS);
+   gd = alloc_disk_node(SD_MINORS, scsi_host_get_numa_node(sdp->host));
if (!gd)
goto out_free;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch,v3,repost 00/10] make I/O path allocations more numa-friendly

2012-11-27 Thread Jeff Moyer
Hi,

This patch set makes memory allocations for data structures used in
the I/O path more numa friendly by allocating them from the same numa
node as the storage device.  I've only converted a handful of drivers
at this point.  My testing is limited by the hardware I have on hand.
Using these patches, I was able to max out the bandwidth of the storage
controller when issuing I/O from any node on my 4 node system.  Without
the patch, I/O from nodes remote to the storage device would suffer a
penalty ranging from 6-12%.  Given my relatively low-end setup[1], I
wouldn't be surprised if others could show a more significant performance
advantage.

This is a repost of the last posting.  The only changes are additional
reviewed-by/acked-by tags.  I think this version is ready for inclusion.
James, would you mind taking a look?

Cheers,
Jeff

[1] LSI Megaraid SAS controller with 1GB battery-backed cache,
fronting a RAID 6 10+2.  The workload I used was tuned to not
have to hit disk.  Fio file attached.

--
changes from v2->v3:
- Made the numa_node Scsi_Host structure member dependent on CONFIG_NUMA
- Got rid of a GFP_ZERO I added accidentally
changes from v1->v2:
- got rid of the vfs patch, as Al pointed out some fundamental
  problems with it
- credited Bart van Assche properly


Jeff Moyer (10):
  scsi: add scsi_host_alloc_node
  scsi: make __scsi_alloc_queue numa-aware
  scsi: make scsi_alloc_sdev numa-aware
  scsi: allocate scsi_cmnd-s from the device's local numa node
  sd: use alloc_disk_node
  ata: use scsi_host_alloc_node
  megaraid_sas: use scsi_host_alloc_node
  mpt2sas: use scsi_host_alloc_node
  lpfc: use scsi_host_alloc_node
  cciss: use blk_init_queue_node

 drivers/ata/libata-scsi.c |3 ++-
 drivers/block/cciss.c |3 ++-
 drivers/scsi/hosts.c  |   13 +++--
 drivers/scsi/lpfc/lpfc_init.c |   10 ++
 drivers/scsi/megaraid/megaraid_sas_base.c |5 +++--
 drivers/scsi/mpt2sas/mpt2sas_scsih.c  |4 ++--
 drivers/scsi/scsi.c   |   16 ++--
 drivers/scsi/scsi_lib.c   |3 ++-
 drivers/scsi/scsi_scan.c  |4 ++--
 drivers/scsi/sd.c |2 +-
 include/scsi/scsi_host.h  |   28 
 11 files changed, 69 insertions(+), 22 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch,v3,repost 04/10] scsi: allocate scsi_cmnd-s from the device's local numa node

2012-11-27 Thread Jeff Moyer
Reviewed-by: Bart Van Assche 
Signed-off-by: Jeff Moyer 
---
 drivers/scsi/scsi.c |   16 ++--
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 2936b44..1750702 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -173,16 +173,19 @@ static DEFINE_MUTEX(host_cmd_pool_mutex);
  * NULL on failure
  */
 static struct scsi_cmnd *
-scsi_pool_alloc_command(struct scsi_host_cmd_pool *pool, gfp_t gfp_mask)
+scsi_pool_alloc_command(struct scsi_host_cmd_pool *pool, gfp_t gfp_mask,
+   int node)
 {
struct scsi_cmnd *cmd;
 
-   cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask | pool->gfp_mask);
+   cmd = kmem_cache_alloc_node(pool->cmd_slab,
+   gfp_mask | pool->gfp_mask | __GFP_ZERO,
+   node);
if (!cmd)
return NULL;
 
-   cmd->sense_buffer = kmem_cache_alloc(pool->sense_slab,
-gfp_mask | pool->gfp_mask);
+   cmd->sense_buffer = kmem_cache_alloc_node(pool->sense_slab,
+   gfp_mask | pool->gfp_mask, node);
if (!cmd->sense_buffer) {
kmem_cache_free(pool->cmd_slab, cmd);
return NULL;
@@ -223,7 +226,8 @@ scsi_host_alloc_command(struct Scsi_Host *shost, gfp_t 
gfp_mask)
 {
struct scsi_cmnd *cmd;
 
-   cmd = scsi_pool_alloc_command(shost->cmd_pool, gfp_mask);
+   cmd = scsi_pool_alloc_command(shost->cmd_pool, gfp_mask,
+ scsi_host_get_numa_node(shost));
if (!cmd)
return NULL;
 
@@ -435,7 +439,7 @@ struct scsi_cmnd *scsi_allocate_command(gfp_t gfp_mask)
if (!pool)
return NULL;
 
-   return scsi_pool_alloc_command(pool, gfp_mask);
+   return scsi_pool_alloc_command(pool, gfp_mask, NUMA_NO_NODE);
 }
 EXPORT_SYMBOL(scsi_allocate_command);
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2 v5] loop: Limit the number of requests in the bio list

2012-11-27 Thread Jeff Moyer
Lukas Czerner  writes:

> Currently there is not limitation of number of requests in the loop bio
> list. This can lead into some nasty situations when the caller spawns
> tons of bio requests taking huge amount of memory. This is even more
> obvious with discard where blkdev_issue_discard() will submit all bios
> for the range and wait for them to finish afterwards. On really big loop
> devices and slow backing file system this can lead to OOM situation as
> reported by Dave Chinner.
>
> With this patch we will wait in loop_make_request() if the number of
> bios in the loop bio list would exceed 'nr_congestion_on'.
> We'll wake up the process as we process the bios form the list. Some
> threshold hysteresis is in place to avoid high frequency oscillation.
>
> Signed-off-by: Lukas Czerner 
> Reported-by: Dave Chinner 
Acked-by: Jeff Moyer 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Recent kernel "mount" slow

2012-11-28 Thread Jeff Chua
On Wed, Nov 28, 2012 at 4:33 PM, Jens Axboe  wrote:
> On 2012-11-28 04:57, Mikulas Patocka wrote:
>>
>>
>> On Tue, 27 Nov 2012, Jens Axboe wrote:
>>
>>> On 2012-11-27 11:06, Jeff Chua wrote:
>>>> On Tue, Nov 27, 2012 at 3:38 PM, Jens Axboe  wrote:
>>>>> On 2012-11-27 06:57, Jeff Chua wrote:
>>>>>> On Sun, Nov 25, 2012 at 7:23 AM, Jeff Chua  
>>>>>> wrote:
>>>>>>> On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka  
>>>>>>> wrote:
>>>>>>>> So it's better to slow down mount.
>>>>>>>
>>>>>>> I am quite proud of the linux boot time pitting against other OS. Even
>>>>>>> with 10 partitions. Linux can boot up in just a few seconds, but now
>>>>>>> you're saying that we need to do this semaphore check at boot up. By
>>>>>>> doing so, it's inducing additional 4 seconds during boot up.
>>>>>>
>>>>>> By the way, I'm using a pretty fast SSD (Samsung PM830) and fast CPU
>>>>>> (2.8GHz). I wonder if those on slower hard disk or slower CPU, what
>>>>>> kind of degradation would this cause or just the same?
>>>>>
>>>>> It'd likely be the same slow down time wise, but as a percentage it
>>>>> would appear smaller on a slower disk.
>>>>>
>>>>> Could you please test Mikulas' suggestion of changing
>>>>> synchronize_sched() in include/linux/percpu-rwsem.h to
>>>>> synchronize_sched_expedited()?
>>>>
>>>> Tested. It seems as fast as before, but may be a "tick" slower. Just
>>>> perception. I was getting pretty much 0.012s with everything reverted.
>>>> With synchronize_sched_expedited(), it seems to be 0.012s ~ 0.013s.
>>>> So, it's good.
>>>
>>> Excellent
>>>
>>>>> linux-next also has a re-write of the per-cpu rw sems, out of Andrews
>>>>> tree. It would be a good data point it you could test that, too.
>>>>
>>>> Tested. It's slower. 0.350s. But still faster than 0.500s without the 
>>>> patch.
>>>
>>> Makes sense, it's 2 synchronize_sched() instead of 3. So it doesn't fix
>>> the real issue, which is having to do synchronize_sched() in the first
>>> place.
>>>
>>>> # time mount /dev/sda1 /mnt; sync; sync; umount /mnt
>>>>
>>>>
>>>> So, here's the comparison ...
>>>>
>>>> 0.500s 3.7.0-rc7
>>>> 0.168s 3.7.0-rc2
>>>> 0.012s 3.6.0
>>>> 0.013s 3.7.0-rc7 + synchronize_sched_expedited()
>>>> 0.350s 3.7.0-rc7 + Oleg's patch.
>>>
>>> I wonder how many of them are due to changing to the same block size.
>>> Does the below patch make a difference?
>>
>> This patch is wrong because you must check if the device is mapped while
>> holding bdev->bd_block_size_semaphore (because
>> bdev->bd_block_size_semaphore prevents new mappings from being created)
>
> No it doesn't. If you read the patch, that was moved to i_mmap_mutex.
>
>> I'm sending another patch that has the same effect.
>>
>>
>> Note that ext[234] filesystems set blocksize to 1024 temporarily during
>> mount, so it doesn't help much (it only helps for other filesystems, such
>> as jfs). For ext[234], you have a device with default block size 4096, the
>> filesystem sets block size to 1024 during mount, reads the super block and
>> sets it back to 4096.
>
> That is true, hence I was hesitant to think it'll actually help. In any
> case, basically any block device will have at least one blocksize
> transitioned when being mounted for the first time. I wonder if we just
> shouldn't default to having a 4kb soft block size to avoid that one,
> though it is working around the issue to some degree.

I tested on reiserfs. It helped. 0.012s as in 3.6.0, but as Mikulas
mentioned, it didn't really improve much for ext2.

Jeff.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] percpu-rwsem: use synchronize_sched_expedited

2012-11-28 Thread Jeff Chua
On Wed, Nov 28, 2012 at 11:59 AM, Mikulas Patocka  wrote:
>
>
> On Tue, 27 Nov 2012, Jeff Chua wrote:
>
>> On Tue, Nov 27, 2012 at 3:38 PM, Jens Axboe  wrote:
>> > On 2012-11-27 06:57, Jeff Chua wrote:
>> >> On Sun, Nov 25, 2012 at 7:23 AM, Jeff Chua  
>> >> wrote:
>> >>> On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka  
>> >>> wrote:
>> >>>> So it's better to slow down mount.
>> >>>
>> >>> I am quite proud of the linux boot time pitting against other OS. Even
>> >>> with 10 partitions. Linux can boot up in just a few seconds, but now
>> >>> you're saying that we need to do this semaphore check at boot up. By
>> >>> doing so, it's inducing additional 4 seconds during boot up.
>> >>
>> >> By the way, I'm using a pretty fast SSD (Samsung PM830) and fast CPU
>> >> (2.8GHz). I wonder if those on slower hard disk or slower CPU, what
>> >> kind of degradation would this cause or just the same?
>> >
>> > It'd likely be the same slow down time wise, but as a percentage it
>> > would appear smaller on a slower disk.
>> >
>> > Could you please test Mikulas' suggestion of changing
>> > synchronize_sched() in include/linux/percpu-rwsem.h to
>> > synchronize_sched_expedited()?
>>
>> Tested. It seems as fast as before, but may be a "tick" slower. Just
>> perception. I was getting pretty much 0.012s with everything reverted.
>> With synchronize_sched_expedited(), it seems to be 0.012s ~ 0.013s.
>> So, it's good.
>>
>>
>> > linux-next also has a re-write of the per-cpu rw sems, out of Andrews
>> > tree. It would be a good data point it you could test that, too.
>>
>> Tested. It's slower. 0.350s. But still faster than 0.500s without the patch.
>>
>> # time mount /dev/sda1 /mnt; sync; sync; umount /mnt
>>
>>
>> So, here's the comparison ...
>>
>> 0.500s 3.7.0-rc7
>> 0.168s 3.7.0-rc2
>> 0.012s 3.6.0
>> 0.013s 3.7.0-rc7 + synchronize_sched_expedited()
>> 0.350s 3.7.0-rc7 + Oleg's patch.
>>
>>
>> Thanks,
>> Jeff.
>
> OK, I'm seinding two patches to reduce mount times. If it is possible to
> put them to 3.7.0, put them there.
>
> Mikulas
>
> ---
>
> percpu-rwsem: use synchronize_sched_expedited
>
> Use synchronize_sched_expedited() instead of synchronize_sched()
> to improve mount speed.
>
> This patch improves mount time from 0.500s to 0.013s.
>
> Note: if realtime people complain about the use
> synchronize_sched_expedited() and synchronize_rcu_expedited(), I suggest
> that they introduce an option CONFIG_REALTIME or
> /proc/sys/kernel/realtime and turn off these *_expedited functions if
> the option is enabled (i.e. turn synchronize_sched_expedited into
> synchronize_sched and synchronize_rcu_expedited into synchronize_rcu).
>
> Signed-off-by: Mikulas Patocka 
>
> ---
>  include/linux/percpu-rwsem.h |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> Index: linux-3.7-rc7/include/linux/percpu-rwsem.h
> ===
> --- linux-3.7-rc7.orig/include/linux/percpu-rwsem.h 2012-11-28 
> 02:41:03.0 +0100
> +++ linux-3.7-rc7/include/linux/percpu-rwsem.h  2012-11-28 02:41:15.0 
> +0100
> @@ -13,7 +13,7 @@ struct percpu_rw_semaphore {
>  };
>
>  #define light_mb() barrier()
> -#define heavy_mb() synchronize_sched()
> +#define heavy_mb() synchronize_sched_expedited()
>
>  static inline void percpu_down_read(struct percpu_rw_semaphore *p)
>  {
> @@ -51,7 +51,7 @@ static inline void percpu_down_write(str
>  {
> mutex_lock(&p->mtx);
> p->locked = true;
> -   synchronize_sched(); /* make sure that all readers exit the 
> rcu_read_lock_sched region */
> +   synchronize_sched_expedited(); /* make sure that all readers exit the 
> rcu_read_lock_sched region */
> while (__percpu_count(p->counters))
> msleep(1);
> heavy_mb(); /* C, between read of p->counter and write to data, 
> paired with B */


Mikulas,

Tested this one and this is good! Back to 3.6.0 behavior.

As for the 2nd patch (block_dev.c),  it didn't really make any
difference for ext2/3/4, but for reiserfs, it does. So, won't just the
patch about(synchronize_sched_expedited) be good enough?


Thanks,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] block_dev: don't take the write lock if block size doesn't change

2012-11-28 Thread Jeff Chua
On Wed, Nov 28, 2012 at 12:01 PM, Mikulas Patocka  wrote:
> block_dev: don't take the write lock if block size doesn't change
>
> Taking the write lock has a big performance impact on the whole system
> (because of synchronize_sched_expedited). This patch avoids taking the
> write lock if the block size doesn't change (i.e. when mounting
> filesystem with block size equal to the default block size).
>
> The logic to test if the block device is mapped was moved to a separate
> function is_bdev_mapped to avoid code duplication.
>
> Signed-off-by: Mikulas Patocka 
>
> ---
>  fs/block_dev.c |   25 ++---
>  1 file changed, 18 insertions(+), 7 deletions(-)
>
> Index: linux-3.7-rc7/fs/block_dev.c
> ===
> --- linux-3.7-rc7.orig/fs/block_dev.c   2012-11-28 04:09:01.0 +0100
> +++ linux-3.7-rc7/fs/block_dev.c2012-11-28 04:13:53.0 +0100
> @@ -114,10 +114,18 @@ void invalidate_bdev(struct block_device
>  }
>  EXPORT_SYMBOL(invalidate_bdev);
>
> -int set_blocksize(struct block_device *bdev, int size)
> +static int is_bdev_mapped(struct block_device *bdev)
>  {
> -   struct address_space *mapping;
> +   int ret_val;
> +   struct address_space *mapping = bdev->bd_inode->i_mapping;
> +   mutex_lock(&mapping->i_mmap_mutex);
> +   ret_val = mapping_mapped(mapping);
> +   mutex_unlock(&mapping->i_mmap_mutex);
> +   return ret_val;
> +}
>
> +int set_blocksize(struct block_device *bdev, int size)
> +{
> /* Size must be a power of two, and between 512 and PAGE_SIZE */
> if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size))
> return -EINVAL;
> @@ -126,18 +134,21 @@ int set_blocksize(struct block_device *b
> if (size < bdev_logical_block_size(bdev))
> return -EINVAL;
>
> +   /*
> +* If the block size doesn't change, don't take the write lock.
> +* We check for is_bdev_mapped anyway, for consistent behavior.
> +*/
> +   if (size == bdev->bd_block_size)
> +   return is_bdev_mapped(bdev) ? -EBUSY : 0;
> +
> /* Prevent starting I/O or mapping the device */
> percpu_down_write(&bdev->bd_block_size_semaphore);
>
> /* Check that the block device is not memory mapped */
> -   mapping = bdev->bd_inode->i_mapping;
> -   mutex_lock(&mapping->i_mmap_mutex);
> -   if (mapping_mapped(mapping)) {
> -   mutex_unlock(&mapping->i_mmap_mutex);
> +   if (is_bdev_mapped(bdev)) {
> percpu_up_write(&bdev->bd_block_size_semaphore);
> return -EBUSY;
> }
> -   mutex_unlock(&mapping->i_mmap_mutex);
>
> /* Don't change the size if it is same as current */
> if (bdev->bd_block_size != size) {


This patch didn't really make any difference for ext2/3/4 but for
reiserfs it does.

With the synchronize_sched_expedited() patch applied, it didn't make
any difference.


Thanks,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] vfs: remove DCACHE_NEED_LOOKUP

2012-11-28 Thread Jeff Layton
The code that relied on that flag was ripped out of btrfs quite some
time ago, and never added back. Josef indicated that he was going to
take a different approach to the problem in btrfs, and that we
could just eliminate this flag.

Cc: Josef Bacik 
Signed-off-by: Jeff Layton 
---
 fs/btrfs/inode.c   | 16 +---
 fs/dcache.c| 33 +
 fs/namei.c | 11 +--
 include/linux/dcache.h |  8 
 4 files changed, 3 insertions(+), 65 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 95542a1..0e5ca81 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4219,16 +4219,7 @@ struct inode *btrfs_lookup_dentry(struct inode *dir, 
struct dentry *dentry)
if (dentry->d_name.len > BTRFS_NAME_LEN)
return ERR_PTR(-ENAMETOOLONG);
 
-   if (unlikely(d_need_lookup(dentry))) {
-   memcpy(&location, dentry->d_fsdata, sizeof(struct btrfs_key));
-   kfree(dentry->d_fsdata);
-   dentry->d_fsdata = NULL;
-   /* This thing is hashed, drop it for now */
-   d_drop(dentry);
-   } else {
-   ret = btrfs_inode_by_name(dir, dentry, &location);
-   }
-
+   ret = btrfs_inode_by_name(dir, dentry, &location);
if (ret < 0)
return ERR_PTR(ret);
 
@@ -4298,11 +4289,6 @@ static struct dentry *btrfs_lookup(struct inode *dir, 
struct dentry *dentry,
struct dentry *ret;
 
ret = d_splice_alias(btrfs_lookup_dentry(dir, dentry), dentry);
-   if (unlikely(d_need_lookup(dentry))) {
-   spin_lock(&dentry->d_lock);
-   dentry->d_flags &= ~DCACHE_NEED_LOOKUP;
-   spin_unlock(&dentry->d_lock);
-   }
return ret;
 }
 
diff --git a/fs/dcache.c b/fs/dcache.c
index 3a463d0..1782be3 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -455,24 +455,6 @@ void d_drop(struct dentry *dentry)
 EXPORT_SYMBOL(d_drop);
 
 /*
- * d_clear_need_lookup - drop a dentry from cache and clear the need lookup 
flag
- * @dentry: dentry to drop
- *
- * This is called when we do a lookup on a placeholder dentry that needed to be
- * looked up.  The dentry should have been hashed in order for it to be found 
by
- * the lookup code, but now needs to be unhashed while we do the actual lookup
- * and clear the DCACHE_NEED_LOOKUP flag.
- */
-void d_clear_need_lookup(struct dentry *dentry)
-{
-   spin_lock(&dentry->d_lock);
-   __d_drop(dentry);
-   dentry->d_flags &= ~DCACHE_NEED_LOOKUP;
-   spin_unlock(&dentry->d_lock);
-}
-EXPORT_SYMBOL(d_clear_need_lookup);
-
-/*
  * Finish off a dentry we've decided to kill.
  * dentry->d_lock must be held, returns with it unlocked.
  * If ref is non-zero, then decrement the refcount too.
@@ -565,13 +547,7 @@ repeat:
if (d_unhashed(dentry))
goto kill_it;
 
-   /*
-* If this dentry needs lookup, don't set the referenced flag so that it
-* is more likely to be cleaned up by the dcache shrinker in case of
-* memory pressure.
-*/
-   if (!d_need_lookup(dentry))
-   dentry->d_flags |= DCACHE_REFERENCED;
+   dentry->d_flags |= DCACHE_REFERENCED;
dentry_lru_add(dentry);
 
dentry->d_count--;
@@ -1737,13 +1713,6 @@ struct dentry *d_add_ci(struct dentry *dentry, struct 
inode *inode,
}
 
/*
-* We are going to instantiate this dentry, unhash it and clear the
-* lookup flag so we can do that.
-*/
-   if (unlikely(d_need_lookup(found)))
-   d_clear_need_lookup(found);
-
-   /*
 * Negative dentry: instantiate it unless the inode is a directory and
 * already has a dentry.
 */
diff --git a/fs/namei.c b/fs/namei.c
index 937f9d5..9738f97 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1275,9 +1275,7 @@ static struct dentry *lookup_dcache(struct qstr *name, 
struct dentry *dir,
*need_lookup = false;
dentry = d_lookup(dir, name);
if (dentry) {
-   if (d_need_lookup(dentry)) {
-   *need_lookup = true;
-   } else if (dentry->d_flags & DCACHE_OP_REVALIDATE) {
+   if (dentry->d_flags & DCACHE_OP_REVALIDATE) {
error = d_revalidate(dentry, flags);
if (unlikely(error <= 0)) {
if (error < 0) {
@@ -1383,8 +1381,6 @@ static int lookup_fast(struct nameidata *nd, struct qstr 
*name,
return -ECHILD;
nd->seq = seq;
 
-   if (unlikely(d_need_lookup(dentry)))
-   goto unlazy;
if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE)) {
status = d_revalidate(dentry, nd->flags);
if (unlike

Re: [PATCH 02/16] ata: Convert dev_printk(KERN_ to dev_(

2012-11-28 Thread Jeff Garzik

On 10/28/2012 04:05 AM, Joe Perches wrote:

dev_ calls take less code than dev_printk(KERN_
and reducing object size is good.
Coalesce formats for easier grep.

Signed-off-by: Joe Perches 
---
  drivers/ata/pata_cmd64x.c |6 +++---
  1 files changed, 3 insertions(+), 3 deletions(-)


applied



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ata_piix: reenable MS Virtual PC guests

2012-11-28 Thread Jeff Garzik

On 09/18/2012 11:48 AM, Olaf Hering wrote:

An earlier commit cd006086fa5d91414d8ff9ff2b78fbb593878e3c ("ata_piix:
defer disks to the Hyper-V drivers by default") broke MS Virtual PC
guests. Hyper-V guests and Virtual PC guests have nearly identical DMI
info. As a result the driver does currently ignore the emulated hardware
in Virtual PC guests and defers the handling to hv_blkvsc. Since Virtual
PC does not offer paravirtualized drivers no disks will be found in the
guest.

One difference in the DMI info is the product version. This patch adds a
match for MS Virtual PC 2007 and "unignores" the emulated hardware.

This was reported for openSuSE 12.1 in bugzilla:
https://bugzilla.novell.com/show_bug.cgi?id=737532

Here is a detailed list of DMI info from example guests:

hwinfo --bios:

virtual pc guest:

   System Info: #1
 Manufacturer: "Microsoft Corporation"
 Product: "Virtual Machine"
 Version: "VS2005R2"
 Serial: "3178-9905-1533-4840-9282-0569-59"
 UUID: undefined, but settable
 Wake-up: 0x06 (Power Switch)
   Board Info: #2
 Manufacturer: "Microsoft Corporation"
 Product: "Virtual Machine"
 Version: "5.0"
 Serial: "3178-9905-1533-4840-9282-0569-59"
   Chassis Info: #3
 Manufacturer: "Microsoft Corporation"
 Version: "5.0"
 Serial: "3178-9905-1533-4840-9282-0569-59"
 Asset Tag: "7188-3705-6309-9738-9645-0364-00"
 Type: 0x03 (Desktop)
 Bootup State: 0x03 (Safe)
 Power Supply State: 0x03 (Safe)
 Thermal State: 0x01 (Other)
 Security Status: 0x01 (Other)

win2k8 guest:

   System Info: #1
 Manufacturer: "Microsoft Corporation"
 Product: "Virtual Machine"
 Version: "7.0"
 Serial: "9106-3420-9819-5495-1514-2075-48"
 UUID: undefined, but settable
 Wake-up: 0x06 (Power Switch)
   Board Info: #2
 Manufacturer: "Microsoft Corporation"
 Product: "Virtual Machine"
 Version: "7.0"
 Serial: "9106-3420-9819-5495-1514-2075-48"
   Chassis Info: #3
 Manufacturer: "Microsoft Corporation"
 Version: "7.0"
 Serial: "9106-3420-9819-5495-1514-2075-48"
 Asset Tag: "7076-9522-6699-1042-9501-1785-77"
 Type: 0x03 (Desktop)
 Bootup State: 0x03 (Safe)
 Power Supply State: 0x03 (Safe)
 Thermal State: 0x01 (Other)
 Security Status: 0x01 (Other)

win2k12 guest:

   System Info: #1
 Manufacturer: "Microsoft Corporation"
 Product: "Virtual Machine"
 Version: "7.0"
 Serial: "8179-1954-0187-0085-3868-2270-14"
 UUID: undefined, but settable
 Wake-up: 0x06 (Power Switch)
   Board Info: #2
 Manufacturer: "Microsoft Corporation"
 Product: "Virtual Machine"
 Version: "7.0"
 Serial: "8179-1954-0187-0085-3868-2270-14"
   Chassis Info: #3
 Manufacturer: "Microsoft Corporation"
 Version: "7.0"
 Serial: "8179-1954-0187-0085-3868-2270-14"
 Asset Tag: "8374-0485-4557-6331-0620-5845-25"
 Type: 0x03 (Desktop)
 Bootup State: 0x03 (Safe)
 Power Supply State: 0x03 (Safe)
 Thermal State: 0x01 (Other)
 Security Status: 0x01 (Other)

Signed-off-by: Olaf Hering 


applied.  Apologies for missing this one.  It was accidentally shifting 
into the low-priority pile.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-11-28 Thread Jeff Moyer
Hi,

In realtime environments, it may be desirable to keep the per-bdi
flusher threads from running on certain cpus.  This patch adds a
cpu_list file to /sys/class/bdi/* to enable this.  The default is to tie
the flusher threads to the same numa node as the backing device (though
I could be convinced to make it a mask of all cpus to avoid a change in
behaviour).

Comments, as always, are appreciated.

Cheers,
Jeff

Signed-off-by: Jeff Moyer 

diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 2a9a9ab..68263e0 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct page;
 struct device;
@@ -105,6 +106,9 @@ struct backing_dev_info {
 
struct timer_list laptop_mode_wb_timer;
 
+   cpumask_t *flusher_cpumask; /* used for writeback thread scheduling */
+   struct mutex flusher_cpumask_mutex;
+
 #ifdef CONFIG_DEBUG_FS
struct dentry *debug_dir;
struct dentry *debug_stats;
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index d3ca2b3..c4f7dde 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0);
@@ -221,12 +222,59 @@ static ssize_t max_ratio_store(struct device *dev,
 }
 BDI_SHOW(max_ratio, bdi->max_ratio)
 
+static ssize_t cpu_list_store(struct device *dev,
+   struct device_attribute *attr, const char *buf, size_t count)
+{
+   struct backing_dev_info *bdi = dev_get_drvdata(dev);
+   struct bdi_writeback *wb = &bdi->wb;
+   cpumask_var_t newmask;
+   ssize_t ret;
+   struct task_struct *task;
+
+   if (!alloc_cpumask_var(&newmask, GFP_KERNEL))
+   return -ENOMEM;
+
+   ret = cpulist_parse(buf, newmask);
+   if (!ret) {
+   spin_lock(&bdi->wb_lock);
+   task = wb->task;
+   get_task_struct(task);
+   spin_unlock(&bdi->wb_lock);
+   if (task)
+   ret = set_cpus_allowed_ptr(task, newmask);
+   put_task_struct(task);
+   if (ret == 0) {
+   mutex_lock(&bdi->flusher_cpumask_mutex);
+   cpumask_copy(bdi->flusher_cpumask, newmask);
+   mutex_unlock(&bdi->flusher_cpumask_mutex);
+   ret = count;
+   }
+   }
+   free_cpumask_var(newmask);
+
+   return ret;
+}
+
+static ssize_t cpu_list_show(struct device *dev,
+   struct device_attribute *attr, char *page)
+{
+   struct backing_dev_info *bdi = dev_get_drvdata(dev);
+   ssize_t ret;
+
+   mutex_lock(&bdi->flusher_cpumask_mutex);
+   ret = cpulist_scnprintf(page, PAGE_SIZE-1, bdi->flusher_cpumask);
+   mutex_unlock(&bdi->flusher_cpumask_mutex);
+
+   return ret;
+}
+
 #define __ATTR_RW(attr) __ATTR(attr, 0644, attr##_show, attr##_store)
 
 static struct device_attribute bdi_dev_attrs[] = {
__ATTR_RW(read_ahead_kb),
__ATTR_RW(min_ratio),
__ATTR_RW(max_ratio),
+   __ATTR_RW(cpu_list),
__ATTR_NULL,
 };
 
@@ -428,6 +476,7 @@ static int bdi_forker_thread(void *ptr)
writeback_inodes_wb(&bdi->wb, 1024,
WB_REASON_FORKER_THREAD);
} else {
+   int ret;
/*
 * The spinlock makes sure we do not lose
 * wake-ups when racing with 'bdi_queue_work()'.
@@ -437,6 +486,14 @@ static int bdi_forker_thread(void *ptr)
spin_lock_bh(&bdi->wb_lock);
bdi->wb.task = task;
spin_unlock_bh(&bdi->wb_lock);
+   mutex_lock(&bdi->flusher_cpumask_mutex);
+   ret = set_cpus_allowed_ptr(task,
+   bdi->flusher_cpumask);
+   mutex_unlock(&bdi->flusher_cpumask_mutex);
+   if (ret)
+   printk_once("%s: failed to bind flusher"
+   " thread %s, error %d\n",
+   __func__, task->comm, ret);
wake_up_process(task);
}
bdi_clear_pending(bdi);
@@ -509,6 +566,17 @@ int bdi_register(struct backing_dev_info *bdi, struct 
device *parent,
dev_name(dev));
if (IS_ERR(wb->task))
return PTR_ERR(wb-

Re: [PATCH] tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)

2012-11-28 Thread Jeff Liu
On 11/29/2012 12:15 PM, Jim Meyering wrote:
> Hugh Dickins wrote:
>> On Thu, 29 Nov 2012, Jaegeuk Hanse wrote:
> ...
>>> But this time in which scenario will use it?
>>
>> I was not very convinced by the grep argument from Jim and Paul:
>> that seemed to be grep holding on to a no-arbitrary-limits dogma,
>> at the expense of its users, causing an absurd line-length issue,
>> which use of SEEK_DATA happens to avoid in some cases.
>>
>> The cp of sparse files from Jeff and Dave was more convincing;
>> but I still didn't see why little old tmpfs needed to be ahead
>> of the pack.
>>
>> But at LinuxCon/Plumbers in San Diego in August, a more convincing
>> case was made: I was hoping you would not ask, because I did not take
>> notes, and cannot pass on the details - was it rpm building on tmpfs?
>> I was convinced enough to promise support on tmpfs when support on
>> ext4 goes in.
> 
> Re the cp-vs-sparse-file case, the current FIEMAP-based code in GNU
> cp is ugly and complicated enough that until recently it harbored a
> hard-to-reproduce data-corrupting bug[*].  Now that SEEK_DATA/SEEK_HOLE
> support work will work also for tmpfs and ext4, we can plan to remove
> the FIEMAP-based code in favor of a simpler SEEK_DATA/SEEK_HOLE-based
> implementation.
How do we teach du(1) to aware of the real disk footprint with Btrfs
clone or OCFS2 reflinked files if we remove the FIEMAP-based code?

How about if we still keep it there, and introduce SEEK_DATA/SEEK_HOLE
code to the extent-scan module which is dedicated to deal with sparse files?

Thanks,
-Jeff
> 
> With the rise of virtualization, copying sparse images efficiently
> (probably searching, too) is becoming more and more important.
> 
> So, yes, GNU cp will soon use this feature.
> 
> [*] https://plus.google.com/u/0/114228401647637059102/posts/FDV3JEaYsKD
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)

2012-11-28 Thread Jeff Liu
On 11/29/2012 02:53 PM, Jim Meyering wrote:
> Jeff Liu wrote:
> 
>> On 11/29/2012 12:15 PM, Jim Meyering wrote:
>>> Hugh Dickins wrote:
>>>> On Thu, 29 Nov 2012, Jaegeuk Hanse wrote:
>>> ...
>>>>> But this time in which scenario will use it?
>>>>
>>>> I was not very convinced by the grep argument from Jim and Paul:
>>>> that seemed to be grep holding on to a no-arbitrary-limits dogma,
>>>> at the expense of its users, causing an absurd line-length issue,
>>>> which use of SEEK_DATA happens to avoid in some cases.
>>>>
>>>> The cp of sparse files from Jeff and Dave was more convincing;
>>>> but I still didn't see why little old tmpfs needed to be ahead
>>>> of the pack.
>>>>
>>>> But at LinuxCon/Plumbers in San Diego in August, a more convincing
>>>> case was made: I was hoping you would not ask, because I did not take
>>>> notes, and cannot pass on the details - was it rpm building on tmpfs?
>>>> I was convinced enough to promise support on tmpfs when support on
>>>> ext4 goes in.
>>>
>>> Re the cp-vs-sparse-file case, the current FIEMAP-based code in GNU
>>> cp is ugly and complicated enough that until recently it harbored a
>>> hard-to-reproduce data-corrupting bug[*].  Now that SEEK_DATA/SEEK_HOLE
>>> support work will work also for tmpfs and ext4, we can plan to remove
>>> the FIEMAP-based code in favor of a simpler SEEK_DATA/SEEK_HOLE-based
>>> implementation.
>> How do we teach du(1) to aware of the real disk footprint with Btrfs
>> clone or OCFS2 reflinked files if we remove the FIEMAP-based code?
>>
>> How about if we still keep it there, and introduce SEEK_DATA/SEEK_HOLE
>> code to the extent-scan module which is dedicated to deal with sparse files?
> 
> Hi Jeff,
> By "removing the FIEMAP-based code" I mean the uses in copy.c.
> All of that should remain independent of how du does its job,
> so if FIEMAP is required for your planned du enhancement,
> then feel free to use it.
Hi Jim,
Thanks for the clarification, that's fine. :)

Regards,
-Jeff
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)

2012-11-29 Thread Jeff Chua
On Thu, Nov 29, 2012 at 2:45 PM, Al Viro  wrote:
> On Wed, Nov 28, 2012 at 10:37:27PM -0800, Linus Torvalds wrote:
>> On Wed, Nov 28, 2012 at 10:30 PM, Al Viro  wrote:
>> >
>> > Note that sync_blockdev() a few lines prior to that is good only if we
>> > have no other processes doing write(2) (or dirtying the mmapped pages,
>> > for that matter).  The window isn't too wide, but...
>>
>> So with Mikulas' patches, the write actually would block (at write
>> level) due to the locking. The mmap'ed patches may be around and
>> flushed, but the logic to not allow currently *active* mmaps (with the
>> rather nasty random -EBUSY return value) should mean that there is no
>> race.
>>
>> Or rather, there's a race, but it results in that EBUSY thing.
>
> Same as with fs mounted on it, or the sucker having been claimed for
> RAID array, etc.  Frankly, I'm more than slightly tempted to make
> bdev mmap() just claim the sodding device exclusive for as long as
> it's mmapped...
>
> In principle, I agree, but...  I still have nightmares from mmap/truncate
> races way back.  You are stepping into what used to be a really nasty
> minefield.  I'll look into that, but it's *definitely* not -rc8 fodder.

Just let me know which relevant patch(es) you want me to test or break.

Thanks,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFQ PATCH] cifs: Change default security error message

2012-11-29 Thread Jeff Layton
On Thu, 29 Nov 2012 18:30:53 +0100
Jesper Nilsson  wrote:

> Hi!
> 
> Connecting with a default security mechanism prompts an KERN_ERROR
> output warning to the user that the default mechanism will be changed
> in Linux 3.3.
> 
> We're now at 3.7, so we either could remove the warning completely
> (if the default has been changed), or we could bump the number to
> what our current target for the change is.
> 
> 
> The below patch changes the cERROR (which turns into a printk with KERN_ERROR)
> into a straight printk with KERN_WARNING and changes the text to indicate
> that it was changed in 3.3.
> 
> I expect that the patch is incorrect and that we should choose
> another of the alternative solutions above, but I'd like to get
> some input on this.
> 
> Not-Signed-off-by: Jesper Nilsson 
> ---
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index c83f5b65..968456f 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -2480,9 +2480,9 @@ cifs_get_smb_ses(struct TCP_Server_Info *server, struct 
> smb_vol *volume_info)
>   supported for many years, time to update default security mechanism */
>   if ((volume_info->secFlg == 0) && warned_on_ntlm == false) {
>   warned_on_ntlm = true;
> - cERROR(1, "default security mechanism requested.  The default "
> - "security mechanism will be upgraded from ntlm to "
> - "ntlmv2 in kernel release 3.3");
> + printk(KERN_WARNING "default security mechanism requested.  "
> + "The default security mechanism was changed "
> + " from ntlm to ntlmv2 in kernel release 3.3");
>   }
>   ses->overrideSecFlg = volume_info->secFlg;
>  
> 
> 
> /^JN - Jesper Nilsson

I think this warning has lived long enough and needs to go away. Steve
supposedly has a patch that finally makes this change, but it hasn't
been sent to the list yet... Steve?

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFQ PATCH] cifs: Change default security error message

2012-11-29 Thread Jeff Layton
On Thu, 29 Nov 2012 12:54:41 -0600
Steve French  wrote:

> On Thu, Nov 29, 2012 at 12:25 PM, Jeff Layton  wrote:
> > On Thu, 29 Nov 2012 18:30:53 +0100
> > Jesper Nilsson  wrote:
> >
> >> Hi!
> >>
> >> Connecting with a default security mechanism prompts an KERN_ERROR
> >> output warning to the user that the default mechanism will be changed
> >> in Linux 3.3.
> >>
> >> We're now at 3.7, so we either could remove the warning completely
> >> (if the default has been changed), or we could bump the number to
> >> what our current target for the change is.
> >>
> >>
> >> The below patch changes the cERROR (which turns into a printk with 
> >> KERN_ERROR)
> >> into a straight printk with KERN_WARNING and changes the text to indicate
> >> that it was changed in 3.3.
> >>
> >> I expect that the patch is incorrect and that we should choose
> >> another of the alternative solutions above, but I'd like to get
> >> some input on this.
> >>
> >> Not-Signed-off-by: Jesper Nilsson 
> >> ---
> >> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> >> index c83f5b65..968456f 100644
> >> --- a/fs/cifs/connect.c
> >> +++ b/fs/cifs/connect.c
> >> @@ -2480,9 +2480,9 @@ cifs_get_smb_ses(struct TCP_Server_Info *server, 
> >> struct smb_vol *volume_info)
> >>   supported for many years, time to update default security mechanism 
> >> */
> >>   if ((volume_info->secFlg == 0) && warned_on_ntlm == false) {
> >>   warned_on_ntlm = true;
> >> - cERROR(1, "default security mechanism requested.  The 
> >> default "
> >> - "security mechanism will be upgraded from ntlm to "
> >> - "ntlmv2 in kernel release 3.3");
> >> + printk(KERN_WARNING "default security mechanism requested.  "
> >> + "The default security mechanism was changed "
> >> + " from ntlm to ntlmv2 in kernel release 3.3");
> >>   }
> >>   ses->overrideSecFlg = volume_info->secFlg;
> >>
> >>
> >>
> >> /^JN - Jesper Nilsson
> >
> > I think this warning has lived long enough and needs to go away. Steve
> > supposedly has a patch that finally makes this change, but it hasn't
> > been sent to the list yet... Steve?
> 
> It was posted to list on November 25th (and you even included it in
> your git tree on samba.org ?!)
> 

Oops, my mistake. You're quite correct...

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] fs/aio.c: use get_user_pages_non_movable() to pin ring pages when support memory hotremove

2013-02-04 Thread Jeff Moyer
Lin Feng  writes:

> This patch gets around the aio ring pages can't be migrated bug caused by
> get_user_pages() via using the new function. It only works as configed with
> CONFIG_MEMORY_HOTREMOVE, otherwise it uses the old version of 
> get_user_pages().
>
> Cc: Benjamin LaHaise 
> Cc: Alexander Viro 
> Cc: Andrew Morton 
> Reviewed-by: Tang Chen 
> Reviewed-by: Gu Zheng 
> Signed-off-by: Lin Feng 
> ---
>  fs/aio.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 71f613c..0e9b30a 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -138,9 +138,15 @@ static int aio_setup_ring(struct kioctx *ctx)
>   }
>  
>   dprintk("mmap address: 0x%08lx\n", info->mmap_base);
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> + info->nr_pages = get_user_pages_non_movable(current, ctx->mm,
> + info->mmap_base, nr_pages,
> + 1, 0, info->ring_pages, NULL);
> +#else
>   info->nr_pages = get_user_pages(current, ctx->mm,
>   info->mmap_base, nr_pages, 
>   1, 0, info->ring_pages, NULL);
> +#endif

Can't you hide this in your 1/1 patch, by providing this function as
just a static inline wrapper around get_user_pages when
CONFIG_MEMORY_HOTREMOVE is not enabled?

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] SUNRPC: remove cache_detail->cache_upcall callback

2013-02-04 Thread Jeff Layton
s/dns_resolve.o
fs/nfs/dns_resolve.c: In function ‘nfs_dns_resolver_cache_init’:
fs/nfs/dns_resolve.c:377:4: error: ‘struct cache_detail’ has no member named 
‘cache_upcall’
fs/nfs/dns_resolve.c:377:35: warning: left-hand operand of comma expression has 
no effect [-Wunused-value]
fs/nfs/dns_resolve.c:377:35: warning: value computed is not used 
[-Wunused-value]
fs/nfs/dns_resolve.c:377:35: warning: value computed is not used 
[-Wunused-value]
fs/nfs/dns_resolve.c:377:35: warning: value computed is not used 
[-Wunused-value]
fs/nfs/dns_resolve.c:377:35: warning: value computed is not used 
[-Wunused-value]
fs/nfs/dns_resolve.c:377:35: warning: value computed is not used 
[-Wunused-value]
fs/nfs/dns_resolve.c:377:35: warning: value computed is not used 
[-Wunused-value]
fs/nfs/dns_resolve.c: At top level:
fs/nfs/dns_resolve.c:129:13: warning: ‘nfs_dns_request’ defined but not used 
[-Wunused-function]
make[1]: *** [fs/nfs/dns_resolve.o] Error 1
make: *** [_module_fs/nfs] Error 2

...looks like you need to convert that cache_detail to use your new
scheme as well?

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] setxattr bugs

2013-02-04 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 2/2/13 11:30 PM, Al Viro wrote:
> * JFS, since 2005: setxattr(name, "system.posix_acl_access", NULL,
> 0, 0) succeeds, creating an empty EA with "system.posix_acl_access"
> as name. Validity checks should apply _after_ if (value == NULL) {
> /* empty EA, do not remove */ value = ""; value_len = 0; } and not
> before it. * reiserfs, since 2009: setxattr(name, attr_name, NULL,
> 0, 0) is treated as removexattr(name, attr_name), not as emptying
> given xattr.
> 
> The question is, does either of those cross into "established 
> weirdness in ABI" or are they still at the "bugs to be fixed"
> stage?

Since the behavior changed once already in 2009 I'd call it a bug.
That code was in the SLES kernel for a while before then and I still
haven't seen a bug report on it.

- -Jeff

> FWIW, I'm seriously tempted to stop passing NULL as the third
> argument of ->setxattr(), essentially taking all those if (!value)
> value = ""; pieces from individual ->setxattr() instances to
> __vfs_setxattr_noperm() (all other callers of ->setxattr() never 
> pass NULL data or 0 size, so it's irrelevant for them).  Would fix 
> both jfs and reiserfs weirdness
> 
> Objections?




- -- 
Jeff Mahoney
SUSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.18 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQIcBAEBAgAGBQJREGrwAAoJEB57S2MheeWyHvMP/3kpy3Y4U0KNavnPaeL12LXe
RC6vIb/dPkoSemFiZ5om26aT70M7MdXJY2ZPCwgtlNpKV6aT0NFchtwiWos2lLLN
XndvFZ4M/kQLd9yDEmlcTDZn7p4fhU2Tn7FYrhPLRmOO3zP6fnUxLozSebOnGTO/
xEwV7Qtx7D4Au37khFW/hJvsAJE2Q3NrLgueIJLiTmFvSiOourZNmriNcB73MUeb
vYx5gc/bJexS2oFWeQqD6WiL8UQXg4XEKRk4inNVrJWpLV365w45Kpf2zBlvCQwQ
W8mdHcHoityOcQJtiXvnVDurUNpFwsthrhVquVgIopovlcvOjNtcpffH8YI9khP/
yol7+57ZDuVx2TY5DrEOa+TOTUrg5ghqagSSmOVDsOVeMngpdFNs8351QcX0IWBn
Xt8/eq46g/R7EHI3I1eYJHlMIie0hP1GDc66OP94hcKEWaHbPeKwkSTOlqYH++4h
ncSJcxHXWLUTGuV4b61whYTlJ2vBWwEvIteVaQmmXKaOTr41lajZBCWZDeUlzna8
XyJHE5FrcKDLzTNP1R7UNEj863fN0OUma1AKaT/6jNYMqFXOk39emTgZL5QfxP9X
uLWG1OVDf87uw5nYOKubNQiORpxl8iSIsQWvZeF9SvvmFA/JzpgZgtLlNqYa78Yv
oEq501m9BSEWVSGKxHcu
=G2cU
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Entropy generator with 100 kB/s throughput

2013-02-09 Thread Jeff Epler
On Sat, Feb 09, 2013 at 01:06:29PM -0500, Theodore Ts'o wrote:
> For that reasons, what I would suggest doing first is generate a
> series of outputs of jitterentropy_get_nstime() followed by
> schedule().  Look and see if there is any pattern.  That's the problem
> with the FIPS 140-2 tests.  Passing those tests are necessary, but
> *NOT* sufficient to prove that you have a good cryptographic
> generator.  Even the tiniest amount of post-processing, even if they
> aren't cryptographic, can result in an utterly predictable series of
> numbers to pass the FIPS 140-2 tests.

In fact, Stephan's 'xor and shift a counter' design, even with zero
input entropy (a counter incrementing at a constant rate), passes FIPS
and a surprising fraction of dieharder tests (though some of the tests,
such as diehard_count_1s_str, have this generator dead to rights); it
also gives an ent entropy well in excess of 7. bits per byte.  This
means it's hard to be confident that the entropy measured by ent is
coming from the input entropy as opposed to the (exceedingly
minimal-seeming on the surface!) amount of mixing done by the
xor-and-shift...

It appears the entropy counted is equal to the log2 of the difference
between successive samples (minus one?), but even if you have a good
argument why the ones bit is unpredictable it doesn't seem an argument
that applies as strongly to the weight-128 bit.

When the jitterrand loop runs 10 times, the LSB of that first loop has
only gotten up to the 30th bit, so there are 20+ MSBs of the register
that have not yet had bits rolled into them that are 'entropic' under
the definition being used.

Finally, in the 'turbid' random number generator
(http://www.av8n.com/turbid/), the author develops a
concept of hash saturation.  He concludes that if you have a noise
source with a known (or assumed!) entropy and a has function that is
well-distributed over N bits, you'd better put in M > N bits of entropy
in order to be confident that the output register has N bits.  He
appears to suggest adding around 10 extra bits of randomness, or 74 bits
randomness for a 64-bit hash, relatively indepently of the size of the
hash.  This design gathers only 64 bits if you trust the input entropy
calculation, which according to the hash saturation calculation means
that the output will only have about 63.2 bits of randomness per 64
bits output.


Here's my 'absolutely zero entropy' version of the jitter random
algorithm as I understand it:

#include 
#include 

const uint64_t dt = 65309;
uint64_t t, r;

static inline uint64_t rol64(uint64_t word, unsigned int shift)
{
return (word << shift) | (word >> (64 - shift));
}

uint64_t jitterrand() {
int i;
// each sample from the 'stuck counter' will be accounted as 7 bits of
// entropy, so 10 cycles to get >= 63 bits of randomness
for(i=0; i<10; i++) {
t += dt;
r = rol64(r ^ t, 3);
}
return r;
}

int main() {
while(1) {
    uint64_t val = jitterrand();
ssize_t res = write(1, &val, sizeof(val));
if(res < 0) break;
}
return 0;
}

// Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Entropy generator with 100 kB/s throughput

2013-02-10 Thread Jeff Epler
OK, my original reading of the mixing code was not accurate.  This time
around, I started with the original posted tarball and turned the use of
the CPU clock into a very simple and clearly bad "clock" that will
provide no entropy.


--- jitterentropy-0.1/jitterentropy.c   2013-02-08 15:22:22.0 -0600
+++ jitterentropy-0.1-me/jitterentropy.c2013-02-10 09:45:07.0 
-0600
@@ -270,12 +270,13 @@
 typedef uint64_t __u64;
 
 static int fips_enabled = 0;
-#define jitterentropy_schedule sched_yield()
+#define jitterentropy_schedule (0)
 static inline void jitterentropy_get_nstime(__u64 *out)
 {
-   struct timespec time;
-   if (clock_gettime(CLOCK_REALTIME, &time) == 0)
-   *out = time.tv_nsec;
+static __u64 t = 0;
+const __u64 delta2 = 257;
+static __u64 delta;
+*out = (t += (delta += delta2));
 }
 
 /* note: these helper functions are shamelessly stolen from the kernel :-) */


This give a generator that has Entropy = 7.07 bits per byte
and fails 6 in 1 FIPS 140-2 tests.  It also passes some (but not
all) dieharder tests.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/32] Generic dynamic per cpu refcounting

2013-02-11 Thread Jeff Moyer
Kent Overstreet  writes:

> On Fri, Feb 08, 2013 at 03:49:02PM +0100, Jens Axboe wrote:
[...]
>> I'd feel a lot better deferring the whole aio/dio performance series for
>> one merge window. There's very little point in rushing it, and I don't
>> think it's been reviewed/tested enough yet.
>
> It could probably use more review, but it has been sitting in linux-next
> and the issues that showed up there are all fixed. You going to help
> review it? :)
>
> I'm not really set on it going in this merge cycle, but testing wise I
> do think it's in pretty good shape and I'm not sure where we're going to
> get more testing from before it goes in.
>
> And Andrew - apologies for not getting you the benchmarks you asked for,
> getting hardware for it has turned out to be more troublesome than I
> expected. Still don't know what's going on with that.

I'll try to get some benchmarking numbers for this patch set.

-Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the final tree (nfsd tree related)

2013-04-03 Thread Jeff Layton
On Wed, 3 Apr 2013 17:42:19 +1100
Stephen Rothwell  wrote:

> Hi all,
> 
> After merging the final tree, today's linux-next build (arm defconfig)
> failed like this:
> 
> fs/built-in.o: In function `nfsd_reply_cache_stats_show':
> super.c:(.text+0x87308): undefined reference to `__udivdi3'
> 
> Probably caused by commit 187da2f90879 ("nfsd: keep track of the max and
> average time to search the cache") which adds a divide by u64
> (num_searches).
> 
> I have reverted that commit and the following one for today.

Thanks, known problem...

Looks like Bruce's tree has an older version of that patch series. I
think we just need to get him to drop that one and merge the new one.

-- 
Jeff Layton 


signature.asc
Description: PGP signature


Re: linux-next: build failure after merge of the final tree (nfsd tree related)

2013-04-03 Thread Jeff Layton
On Wed, 3 Apr 2013 07:33:01 -0400
"J. Bruce Fields"  wrote:

> On Wed, Apr 03, 2013 at 07:10:54AM -0400, Jeff Layton wrote:
> > On Wed, 3 Apr 2013 17:42:19 +1100
> > Stephen Rothwell  wrote:
> > 
> > > Hi all,
> > > 
> > > After merging the final tree, today's linux-next build (arm defconfig)
> > > failed like this:
> > > 
> > > fs/built-in.o: In function `nfsd_reply_cache_stats_show':
> > > super.c:(.text+0x87308): undefined reference to `__udivdi3'
> > > 
> > > Probably caused by commit 187da2f90879 ("nfsd: keep track of the max and
> > > average time to search the cache") which adds a divide by u64
> > > (num_searches).
> > > 
> > > I have reverted that commit and the following one for today.
> > 
> > Thanks, known problem...
> > 
> > Looks like Bruce's tree has an older version of that patch series. I
> > think we just need to get him to drop that one and merge the new one.
> 
> Arrgh, sorry--could you remind me which is the new one?
> 

It was the one I sent on 3/19. Those patches (plus a couple more) are
also in the current nfsd-3.10 branch of my git tree too, so it may be
easiest to just pick them from there.

Thanks,
-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the final tree (nfsd tree related)

2013-04-03 Thread Jeff Layton
On Wed, 3 Apr 2013 14:05:19 -0400
"J. Bruce Fields"  wrote:

> On Wed, Apr 03, 2013 at 07:38:57AM -0400, Jeff Layton wrote:
> > On Wed, 3 Apr 2013 07:33:01 -0400
> > "J. Bruce Fields"  wrote:
> > 
> > > On Wed, Apr 03, 2013 at 07:10:54AM -0400, Jeff Layton wrote:
> > > > On Wed, 3 Apr 2013 17:42:19 +1100
> > > > Stephen Rothwell  wrote:
> > > > 
> > > > > Hi all,
> > > > > 
> > > > > After merging the final tree, today's linux-next build (arm defconfig)
> > > > > failed like this:
> > > > > 
> > > > > fs/built-in.o: In function `nfsd_reply_cache_stats_show':
> > > > > super.c:(.text+0x87308): undefined reference to `__udivdi3'
> > > > > 
> > > > > Probably caused by commit 187da2f90879 ("nfsd: keep track of the max 
> > > > > and
> > > > > average time to search the cache") which adds a divide by u64
> > > > > (num_searches).
> > > > > 
> > > > > I have reverted that commit and the following one for today.
> > > > 
> > > > Thanks, known problem...
> > > > 
> > > > Looks like Bruce's tree has an older version of that patch series. I
> > > > think we just need to get him to drop that one and merge the new one.
> > > 
> > > Arrgh, sorry--could you remind me which is the new one?
> > > 
> > 
> > It was the one I sent on 3/19. Those patches (plus a couple more) are
> > also in the current nfsd-3.10 branch of my git tree too, so it may be
> > easiest to just pick them from there.
> 
> I hate rewriting that branch, but OK, done: does my for-3.10 look right
> to you now?
> 
> (It's still missing some of your latest patches.)
> 
> --b.

Yeah, sorry for that... I didn't find the problem with __udivdi3
until after I had asked you to merge the earlier set. Mea culpa...

Latest branch looks good. It would be good to get the later patches in
too, but those are less important than the DRC ones.

Thanks,
-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] ata: Fix DVD not dectected at some Haswell platforms

2013-04-03 Thread Jeff Garzik

On 03/06/2013 10:49 AM, Youquan Song wrote:

There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d
"ata_piix: make DVD Drive recognisable on systems with Intel Sandybridge
  chipsets(v2)" fixing the 4 ports IDE controller 32bit PIO mode.

We've hit a problem with DVD not recognized on Haswell Desktop platform which
includes Lynx Point 2-port SATA controller.

This quirk patch disables 32bit PIO on this controller in IDE mode.

v2: Change spelling error in statememnt pointed by Sergei Shtylyov.
v3: Change comment statememnt and spliting line over 80 characters pointed by
 Libor Pechacek and also rebase the patch against 3.8-rc7 kernel.

Tested-by: Lee, Chun-Yi 
Signed-off-by: Youquan Song 
Cc: sta...@vger.kernel.org
---
  drivers/ata/ata_piix.c |   14 +-
  1 files changed, 13 insertions(+), 1 deletions(-)


applied



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ata: HDIO_DRIVE_* ioctl() Linux 3.9 regression

2013-04-03 Thread Jeff Garzik

On 03/27/2013 08:51 AM, Krzysztof Mazur wrote:

On Mon, Mar 25, 2013 at 06:26:50PM +0100, Ronald wrote:

In reply to [1]: I have the same issue. Git bisect took 50+ rebuilds xD

Smartd does not work anymore since 84a9a8cd9 ([libata] Set proper SK
when CK_COND is set.).

I hope I'm not stepping on anyone's toe's by chosing the same title.
I'm not subscribed to this list.

Just wanted to add a 'me2'

[1] http://www.spinics.net/lists/linux-ide/msg45268.html


It seems that the SAM_STAT_CHECK_CONDITION is not cleared
causing -EIO, because that patch modified sensebuf and
the check for clearing SAM_STAT_CHECK_CONDITION is no longer valid.

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 318b413..ff44787 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -532,8 +532,8 @@ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user 
*arg)
struct scsi_sense_hdr sshdr;
scsi_normalize_sense(sensebuf, SCSI_SENSE_BUFFERSIZE,
 &sshdr);
-   if (sshdr.sense_key == 0 &&
-   sshdr.asc == 0 && sshdr.ascq == 0)
+   if (sshdr.sense_key == RECOVERED_ERROR &&
+   sshdr.asc == 0 && sshdr.ascq == 0x1d)
cmd_result &= ~SAM_STAT_CHECK_CONDITION;
}

@@ -618,8 +618,8 @@ int ata_task_ioctl(struct scsi_device *scsidev, void __user 
*arg)
struct scsi_sense_hdr sshdr;
scsi_normalize_sense(sensebuf, SCSI_SENSE_BUFFERSIZE,
&sshdr);
-   if (sshdr.sense_key == 0 &&
-   sshdr.asc == 0 && sshdr.ascq == 0)
+   if (sshdr.sense_key == RECOVERED_ERROR &&
+   sshdr.asc == 0 && sshdr.ascq == 0x1d)
cmd_result &= ~SAM_STAT_CHECK_CONDITION;


applied



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [libata] Fix HDIO_DRIVE_CMD ioctl sense data check

2013-04-03 Thread Jeff Garzik

On 03/29/2013 01:56 AM, Gwendal Grignou wrote:

commit 84a9a8cd9d0aa93c17e5815ab8a9cc4c0a765c63 changed the sense key
used for returning task registers, but HDIO_DRIVE_CMD ioctl was
not changed accordingly.

Tested: check that SMART ENABLE sent using HDIO_DRIVE_CMD returns 0
instead of EIO.

Signed-off-by: Gwendal Grignou 
---
  drivers/ata/libata-scsi.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


applied the version from Krzysztof Mazur, which covered both cases



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drivers: ata: Use resource_size function

2013-04-03 Thread Jeff Garzik

On 03/16/2013 10:32 AM, Alexandru Gheorghiu wrote:

Use resource_size function instead of explicit computation.
Patch found using coccinelle.

Signed-off-by: Alexandru Gheorghiu 
---
  drivers/ata/pata_octeon_cf.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


applied



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 1/1] AHCI: Optimize interrupt processing

2013-04-03 Thread Jeff Garzik

On 03/06/2013 06:26 AM, Alexander Gordeev wrote:

Split interrupt service routine into hardware context handler and
threaded context handler. That allows to protect ports with individual
locks rather than with a single host-wide lock, which results in better
parallelism.

Signed-off-by: Alexander Gordeev 
---
  drivers/ata/acard-ahci.c|8 ++---
  drivers/ata/ahci.c  |   54 ++-
  drivers/ata/ahci.h  |   10 +++--
  drivers/ata/ahci_platform.c |3 +-
  drivers/ata/libahci.c   |   74 +--
  5 files changed, 85 insertions(+), 64 deletions(-)


applied



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v0] Add SHA-3 hash algorithm

2013-04-03 Thread Jeff Garzik

On 10/03/2012 01:45 AM, Jeff Garzik wrote:


Whee -- SHA-3 is out!   I wanted to explore the new toy a bit, and
so, here is a blatantly untested rough draft of SHA-3 kernel support.

Why rough draft?  Because answers to the questions below will inform a
more polished version.


Just to update people...  this has been in a holding pattern, because 
apparently there are revisions to SHA-3 coming down the pipe.  They want 
to address preimage resistance, and make things faster in hardware.


Random quote from NIST, on the NIST hash-forum, which doesn't provide 
detail but does summarize general feeling: "As best we can tell, 
continuing to pay that performance penalty for all future uses of SHA3 
has no benefit.  (All this is a longwinded way of saying: we were wrong, 
but hopefully we got better.)"


Jeff




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 6/7] NFSv4: Add O_DENY* open flags support

2013-04-04 Thread Jeff Layton
On Thu, 4 Apr 2013 14:30:12 +0400
Pavel Shilovsky  wrote:

> 2013/3/12 Jeff Layton :
> > On Mon, 11 Mar 2013 14:54:34 -0400
> > Jeff Layton  wrote:
> >
> >> On Thu, 28 Feb 2013 19:25:32 +0400
> >> Pavel Shilovsky  wrote:
> >>
> >> > by passing these flags to NFSv4 open request.
> >> >
> >> > Signed-off-by: Pavel Shilovsky 
> >> > ---
> >> >  fs/nfs/nfs4xdr.c | 24 
> >> >  1 file changed, 20 insertions(+), 4 deletions(-)
> >> >
> >> > diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> >> > index 26b1439..58ddc74 100644
> >> > --- a/fs/nfs/nfs4xdr.c
> >> > +++ b/fs/nfs/nfs4xdr.c
> >> > @@ -1325,7 +1325,8 @@ static void encode_lookup(struct xdr_stream *xdr, 
> >> > const struct qstr *name, struc
> >> > encode_string(xdr, name->len, name->name);
> >> >  }
> >> >
> >> > -static void encode_share_access(struct xdr_stream *xdr, fmode_t fmode)
> >> > +static void encode_share_access(struct xdr_stream *xdr, fmode_t fmode,
> >> > +   int open_flags)
> >> >  {
> >> > __be32 *p;
> >> >
> >> > @@ -1343,7 +1344,22 @@ static void encode_share_access(struct xdr_stream 
> >> > *xdr, fmode_t fmode)
> >> > default:
> >> > *p++ = cpu_to_be32(0);
> >> > }
> >> > -   *p = cpu_to_be32(0);/* for linux, share_deny = 0 always 
> >> > */
> >> > +   if (open_flags & O_DENYMAND) {
> >>
> >>
> >> As Bruce mentioned, I think a mount option to enable this on a per-fs
> >> basis would be a better approach than this new O_DENYMAND flag.
> >>
> >>
> >> > +   switch (open_flags & (O_DENYREAD|O_DENYWRITE)) {
> >> > +   case O_DENYREAD:
> >> > +   *p = cpu_to_be32(NFS4_SHARE_DENY_READ);
> >> > +   break;
> >> > +   case O_DENYWRITE:
> >> > +   *p = cpu_to_be32(NFS4_SHARE_DENY_WRITE);
> >> > +   break;
> >> > +   case O_DENYREAD|O_DENYWRITE:
> >> > +   *p = cpu_to_be32(NFS4_SHARE_DENY_BOTH);
> >> > +   break;
> >> > +   default:
> >> > +   *p = cpu_to_be32(0);
> >> > +   }
> >> > +   } else
> >> > +   *p = cpu_to_be32(0);
> >> >  }
> >> >
> >> >  static inline void encode_openhdr(struct xdr_stream *xdr, const struct 
> >> > nfs_openargs *arg)
> >> > @@ -1354,7 +1370,7 @@ static inline void encode_openhdr(struct 
> >> > xdr_stream *xdr, const struct nfs_opena
> >> >   * owner 4 = 32
> >> >   */
> >> > encode_nfs4_seqid(xdr, arg->seqid);
> >> > -   encode_share_access(xdr, arg->fmode);
> >> > +   encode_share_access(xdr, arg->fmode, arg->open_flags);
> >> > p = reserve_space(xdr, 36);
> >> > p = xdr_encode_hyper(p, arg->clientid);
> >> > *p++ = cpu_to_be32(24);
> >> > @@ -1491,7 +1507,7 @@ static void encode_open_downgrade(struct 
> >> > xdr_stream *xdr, const struct nfs_close
> >> > encode_op_hdr(xdr, OP_OPEN_DOWNGRADE, decode_open_downgrade_maxsz, 
> >> > hdr);
> >> > encode_nfs4_stateid(xdr, arg->stateid);
> >> > encode_nfs4_seqid(xdr, arg->seqid);
> >> > -   encode_share_access(xdr, arg->fmode);
> >> > +   encode_share_access(xdr, arg->fmode, 0);
> >> >  }
> >> >
> >> >  static void
> >>
> >>
> >> Other than that, this seems reasonable.
> >>
> >> Acked-by: Jeff Layton 
> >
> > Oh duh...
> >
> > Please ignore my comment on patch #7 to add a patch for the NFS client.
> > This one does that. That said, there may be a potential problem here
> > that you need to consider.
> >
> > In the case of a local filesystem you'll want to set deny locks using
> > deny_lock_file(). For a network filesystem like CIFS or NFS though,
> > the server will handle that atomically during the open. You need to
> > ensure that you don't go trying to set LOCK_MAND locks on the file once
> > that's done.
> >
> > Perhaps you can use a fstype flag to indicate that the filesystem
> > handles this during the open and doesn't need to try and set a flock
> > lock?
> 
> Also, we can simply mask off O_DENY* flags in open (and atomic_open)
> codepath of filesystems that support these flags:
> 
> ...
> do open request to the storage
> ...
> file->f_flags &= ~(O_DENYREAD | O_DENYWRITE | O_DENYDELETE);
> ...
> return to VFS
> ...
> 
> Thoughts?
> 

I'd probably still stick with a FS_* flag for this...

That sort of mechanism would work (for now) but sounds like the sort of
subtle behavior that's difficult for filesystem authors to get right.
It would also be subject to subtle breakage later.

Also, suppose there are changes in the future that require you to
determine this before calling into ->open? Then you'll have to go back
and somehow mark the fs anyway...

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V6 00/30] loop: Issue O_DIRECT aio using bio_vec

2013-01-29 Thread Jeff Moyer
Dave Kleikamp  writes:

> Al,
> I'd like to push this patchset to linux-next. Would you like to pull it
> into your vfs tree, would you rather I submitted it separately, or do
> you have any issues with it before including it?

I'm still chasing one regression in this patchset.  If you use the ext4
driver for ext2 file systems, and you run the libaio test harness, then
you will be able to successfully write beyond the maximum file size in a
file (see test case 8).

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ptp: PTP_1588_CLOCK_PCH depends on x86

2013-01-30 Thread Jeff Mahoney
The EG20T PCH is only compatible with Intel Atom processors so it
should depend on x86.

Cc: Richard Cochran 
Signed-off-by: Jeff Mahoney 
---
 drivers/ptp/Kconfig |1 +
 1 file changed, 1 insertion(+)

--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -72,6 +72,7 @@ config DP83640_PHY
 
 config PTP_1588_CLOCK_PCH
tristate "Intel PCH EG20T as PTP clock"
+   depends on X86
select PTP_1588_CLOCK
help
  This driver adds support for using the PCH EG20T as a PTP

-- 
Jeff Mahoney
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V6 00/30] loop: Issue O_DIRECT aio using bio_vec

2013-01-30 Thread Jeff Moyer
Dave Kleikamp  writes:

> On 01/29/2013 12:42 PM, Jeff Moyer wrote:
>> Dave Kleikamp  writes:
>> 
>>> Al,
>>> I'd like to push this patchset to linux-next. Would you like to pull it
>>> into your vfs tree, would you rather I submitted it separately, or do
>>> you have any issues with it before including it?
>> 
>> I'm still chasing one regression in this patchset.  If you use the ext4
>> driver for ext2 file systems, and you run the libaio test harness, then
>> you will be able to successfully write beyond the maximum file size in a
>> file (see test case 8).
>
> I found the problem. iov_iter_shorten() wasn't setting i->count to the new
> value.
>
> This fixes it. I'll fix the patchset tomorrow.

I just re-ran the test, and I can confirm it fixed it as well.

Thanks!
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RESEND] [PATCH] kernel/res_counter.c: remove useless return statement at res_counter_member()

2013-02-01 Thread Jeff Liu
The return statement after BUG() is invalid, move BUG() to the default choice 
of the switch.

Signed-off-by: Jie Liu 
---
 kernel/res_counter.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index ff55247..748a3bc 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -135,10 +135,9 @@ res_counter_member(struct res_counter *counter, int member)
return &counter->failcnt;
case RES_SOFT_LIMIT:
return &counter->soft_limit;
+   default:
+   BUG();
};
-
-   BUG();
-   return NULL;
 }
 
 ssize_t res_counter_read(struct res_counter *counter, int member,
-- 
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: next-20130117 - kernel BUG with aio

2013-01-24 Thread Jeff Moyer
Zach Brown  writes:

>> No, I didn't see that bug until after I'd fixed the other three, but as
>> far as I can tell everything's fixed with the patches I'm about to mail
>> out - my test VM has been running for the past two days without errors,
>> it's kill -9'ing a process that's got iocbs in flight to a loopback
>> device every two seconds.
>
> I'm really worried that this patch series hasn't seen significant enough
> testing to justify being queued.
>
> I'll be first in line for blame for not finding the time to finish my
> review of the series.
>
> What specific tests has this gone through?  The aio tests in xfstests /
> ltp?  (And as you discovered while chasing this bug, whatever platform
> you were running on doesn't seem slow enough to catch some paths.. run
> all the tests over loop?)
>
> Jeff, can you suggest a more modern testing regime for the aio core?
> It's been so long since I had to hammer on this stuff..

Modern?  No.  ;-) I usually use xfstests (all of them, not just the aio
group), the libaio test harness, and then hand it off to our performance
team to stress the code under benchmarking workloads.  Oh, and usually
targeted testing for the thing that I'm working on.

I'll put a couple of kernels together to hand off to our performance
team, though I don't know how much time they have at present.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: IBM RamSan 70/80 device driver.

2013-01-25 Thread Jeff Moyer
"Philip J. Kelleher"  writes:

> From: Joshua H Morris 
>   Philip J Kelleher 
>
> This patch includes the device driver for the IBM RamSan family
> of PCI SSD flash storage cards. This driver will inlcude support for the
> RamSan 70 and 80. The driver presents a block device for device I/O.

Hi,

Your driver does not handle REQ_FLUSH.  Does that mean that the
supported cards do not have a volatile write-back cache?

> + blk_size = rsxx_get_logical_block_size(card);
> +
> + blk_queue_make_request(card->queue, rsxx_make_request);
> + blk_queue_bounce_limit(card->queue, BLK_BOUNCE_ANY);
> + blk_queue_dma_alignment(card->queue, blk_size - 1);
> + blk_queue_max_hw_sectors(card->queue, blkdev_max_hw_sectors);
> + blk_queue_logical_block_size(card->queue, blk_size);
> + blk_queue_physical_block_size(card->queue, RSXX_HW_BLK_SIZE);
> + blk_queue_max_discard_sectors(card->queue, RSXX_HW_BLK_SIZE >> 9);

Did you mean to set max_discard_sectors inside the below for loop?
Either way, do you really only support a single hardware sector discard?

> + queue_flag_set_unlocked(QUEUE_FLAG_NONROT, card->queue);
> + if (rsxx_discard_supported(card)) {
> + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, card->queue);
> + card->queue->limits.discard_granularity = RSXX_HW_BLK_SIZE;
> + card->queue->limits.discard_alignment   = RSXX_HW_BLK_SIZE;
> + card->queue->limits.discard_zeroes_data = 1;
> + }

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] ahci: AHCI-mode SATA patch for Intel Avoton DeviceIDs

2013-01-25 Thread Jeff Garzik

On 01/25/2013 03:01 PM, Seth Heasley wrote:

This patch adds the AHCI and RAID-mode SATA DeviceIDs for the Intel Avoton SOC.

Signed-off-by: Seth Heasley 
---
  drivers/ata/ahci.c | 16 
  1 file changed, 16 insertions(+)


applied 1-2



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] checkpatch.pl: Fix warnings on code comments

2013-01-27 Thread Jeff Kirsher
The following commit:
commit 058806007450489bb8f457b275e5cb5c946320c1
Author: Joe Perches 
Date:   Thu Oct 4 17:13:35 2012 -0700

checkpatch: check networking specific block comment style

Produces warnings on code comments which follow the Linux coding style
guide.  While the desired code comment style for networking my differ
from the rest of the kernel, both styles should be permitted.

This patch reverts a portion of the commit to allow multi-line code
comments to use either style.

Signed-off-by: Jeff Kirsher 
Tested-by: Jeff Pieper 
---
 scripts/checkpatch.pl | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 4d2c7df..d3ffec5 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1878,13 +1878,6 @@ sub process {
}
 
if ($realfile =~ m@^(drivers/net/|net/)@ &&
-   $rawline =~ /^\+[ \t]*\/\*[ \t]*$/ &&
-   $prevrawline =~ /^\+[ \t]*$/) {
-   WARN("NETWORKING_BLOCK_COMMENT_STYLE",
-"networking block comments don't use an empty /* 
line, use /* Comment...\n" . $hereprev);
-   }
-
-   if ($realfile =~ m@^(drivers/net/|net/)@ &&
$rawline !~ m@^\+[ \t]*\*/[ \t]*$@ &&   #trailing */
$rawline !~ m@^\+.*/\*.*\*/[ \t]*$@ &&  #inline /*...*/
$rawline !~ m@^\+.*\*{2,}/[ \t]*$@ &&   #trailing **/
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] checkpatch.pl: Fix warnings on code comments

2013-01-27 Thread Jeff Kirsher
On Sun, 2013-01-27 at 18:59 -0500, David Miller wrote:
> From: Jeff Kirsher 
> Date: Sun, 27 Jan 2013 03:35:39 -0800
> 
> > Produces warnings on code comments which follow the Linux coding style
> > guide.  While the desired code comment style for networking my differ
> > from the rest of the kernel, both styles should be permitted.
> 
> I was actually going to mention to you guys that I've been lackadasical
> about enforcing the comment style I want with the Intel drivers.
> 
> That was a mistake, I should have enforced it strictly, as I do for
> the other drivers and the core networking code, from the beginning.
> 
> And it's clearly a mistake if you feel the need to take out the very
> checkpatch working that's meant to enforce this comment style in all
> of the networking drivers and core.
> 
> Do not revert this, follow it's advice instead.

Ok, I am fine with that.  I just had not seen any emails/responses that
this was direction you wanted to go.

So will you be fine with cleanup patches which go through and convert
all the existing code comments to the desired format?  If so, I will get
started on patches to cleanup,convert the Intel drivers to the desired
code comment style.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] checkpatch.pl: Fix warnings on code comments

2013-01-28 Thread Jeff Kirsher
On Mon, 2013-01-28 at 09:30 -0800, Joe Perches wrote:
> On Mon, 2013-01-28 at 17:17 +, Allan, Bruce W wrote:
> > David Miller Sent: Sunday, January 27, 2013 7:07 PM
> > > From: Jeff Kirsher 
> > > > So will you be fine with cleanup patches which go through and
> > > > convert all the existing code comments to the desired format?
> > > Sure.
> > Not all Intel drivers...e1000e already conforms to the comment style :-)
> 
> In case anyone cares, here's a perl regex
> to do network comment style conversion.
> 
> $text =~ s@/\*[ \t]*\n[ \t]*\*@/*@g;
> $text =~ s@/\*([ \t]*)([^\n]+)\n[ \t]*\*/@/\*$1$2 \*/@g;
> 
> (assumes the entire file is in $text)
> 
> It creates a ~220KB diff for drivers/net/ethernet/intel/
> that I won't post.
> 

Thanks Joe, I will get patches to take care of the Intel drivers (minus
e1000e since Bruce has already completed that work).


signature.asc
Description: This is a digitally signed message part


Re: Reproduceable SATA lockup on 3.7.8 with SSD

2013-02-26 Thread Jeff Garzik

On 02/25/2013 07:27 PM, Marc MERLIN wrote:

Howdy,

I seem to have the same problem (or similar) as Mathieu Desnoyers in
https://lkml.org/lkml/2013/2/22/437

I can reliably get my SSD to drop from the SATA bus given the right workload
on linux.

How can I tell if it's linux's fault of the drive's fault?


Manually force speed to 3.0 Gbps, then 1.5 Gbps, and see what happens.

Try module/kernel parameter libata.force=1.5Gbps or libata.force=3.0Gbps

Jeff




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] ACPI and power management fixes for v3.9-rc1

2013-02-26 Thread Jeff Garzik

On 02/26/2013 11:58 AM, Tejun Heo wrote:

On Tue, Feb 26, 2013 at 08:47:30AM -0800, Linus Torvalds wrote:

Anyway, in the US it is definitely not a common term for normal people.


Googling "odd" doesn't give anything on optical drives on the first
page.  On the other hand, >70% is about optical drives on naver.com.
The discrepancy is funny given that most computer terms in Korea come
from US.  Maybe it's because the character combination "odd" doesn't
have any other meaning.  Even then, I'm surprised there's no optical
drive result at all in the first page of google search.  Definitely
doesn't seem like a common term in US.


There is just a lot more "odd" goings-on in the US.  Korea is simply 
less odd than the US :)


Will send a patch to fix...

Jeff




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CIFS: Decrease reconnection delay when switching nics

2013-02-27 Thread Jeff Layton
On Wed, 27 Feb 2013 12:06:14 +0100
"Stefan (metze) Metzmacher"  wrote:

> Hi Dave,
> 
> > When messages are currently in queue awaiting a response, decrease amount of
> > time before attempting cifs_reconnect to SMB_MAX_RTT = 10 seconds. The 
> > current
> > wait time before attempting to reconnect is currently 
> > 2*SMB_ECHO_INTERVAL(120
> > seconds) since the last response was recieved.  This does not take into 
> > account
> > the fact that messages waiting for a response should be serviced within a
> > reasonable round trip time.
> 
> Wouldn't that mean that the client will disconnect a good connection,
> if the server doesn't response within 10 seconds?
> Reads and Writes can take longer than 10 seconds...
> 

Where does this magic value of 10s come from? Note that a slow server
can take *minutes* to respond to writes that are long past the EOF.

> > This fixes the issue where user moves from wired to wireless or vice versa
> > causing the mount to hang for 120 seconds, when it could reconnect 
> > considerably
> > faster.  After this fix it will take SMB_MAX_RTT (10 seconds) from the last
> > time the user attempted to access the volume or SMB_MAX_RTT after the last
> > echo.  The worst case of the latter scenario being
> > 2*SMB_ECHO_INTERVAL+SMB_MAX_RTT+small scheduling delay (about 130 seconds).
> > Statistically speaking it would normally reconnect sooner.  However in the 
> > best
> > case where the user changes nics, and immediately tries to access the cifs
> > share it will take SMB_MAX_RTT=10 seconds.
> 
> I think it would be better to detect the broken connection
> by using an AF_NETLINK socket listening for RTM_DELADDR
> messages?
> 
> metze
> 

Ick -- that sounds horrid ;)

Dave, this problem sounds very similar to the one that your colleague
Chris J Arges was trying to solve several months ago. You may want to
go back and review that thread. Perhaps you can solve both problems at
the same time here...

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CIFS: Decrease reconnection delay when switching nics

2013-02-28 Thread Jeff Layton
On Wed, 27 Feb 2013 16:24:07 -0600
Dave Chiluk  wrote:

> On 02/27/2013 10:34 AM, Jeff Layton wrote:
> > On Wed, 27 Feb 2013 12:06:14 +0100
> > "Stefan (metze) Metzmacher"  wrote:
> > 
> >> Hi Dave,
> >>
> >>> When messages are currently in queue awaiting a response, decrease amount 
> >>> of
> >>> time before attempting cifs_reconnect to SMB_MAX_RTT = 10 seconds. The 
> >>> current
> >>> wait time before attempting to reconnect is currently 
> >>> 2*SMB_ECHO_INTERVAL(120
> >>> seconds) since the last response was recieved.  This does not take into 
> >>> account
> >>> the fact that messages waiting for a response should be serviced within a
> >>> reasonable round trip time.
> >>
> >> Wouldn't that mean that the client will disconnect a good connection,
> >> if the server doesn't response within 10 seconds?
> >> Reads and Writes can take longer than 10 seconds...
> >>
> > 
> > Where does this magic value of 10s come from? Note that a slow server
> > can take *minutes* to respond to writes that are long past the EOF.
> It comes from the desire to decrease the reconnection delay to something
> better than a random number between 60 and 120 seconds.  I am not
> committed to this number, and it is open for discussion.  Additionally
> if you look closely at the logic it's not 10 seconds per request, but
> actually when requests have been in flight for more than 10 seconds make
> sure we've heard from the server in the last 10 seconds.
> 
> Can you explain more fully your use case of writes that are long past
> the EOF?  Perhaps with a test-case or script that I can test?  As far as
> I know writes long past EOF will just result in a sparse file, and
> return in a reasonable round trip time *(that's at least what I'm seeing
> with my testing).  dd if=/dev/zero of=/mnt/cifs/a bs=1M count=100
> seek=10, starts receiving responses from the server in about .05
> seconds with subsequent responses following at roughly .002-.01 second
> intervals.  This is well within my 10 second value.  Even adding the
> latency of AT&T's 2g cell network brings it up to only 1s.  Still 10x
> less than my 10 second value.
> 
> The new logic goes like this
> if( we've been expecting a response from the server (in_flight), and
>  message has been in_flight for more than 10 seconds and
>  we haven't had any other contact from the server in that time
>   reconnect
> 

That will break writes long past the EOF. Note too that reconnects on
CIFS are horrifically expensive and problematic. Much of the state on a
CIFS mount is tied to the connection. When that drops, open files are
closed and things like locks are dropped. SMB1 has no real mechanism
for state recovery, so that can really be a problem.

> On a side note, I discovered a small race condition in the previous
> logic while working on this, that my new patch also fixes.
> 1s  request
> 2s  response
> 61.995 echo job pops
> 121.995 echo job pops and sends echo
> 122 server_unresponsive called.  Finds no response and attempts to
>reconnect
> 122.95 response to echo received
> 

Sure, here's a reproducer. Do this against a windows server, preferably
one exporting NTFS on relatively slow storage. Make sure that
"testfile" doesn't exist first:

 $ dd if=/dev/zero of=/path/to/cifs/share/testfile bs=1M count=1 seek=3192

NTFS doesn't support sparse files, so the OS has to zero-fill up to the
point where you're writing. That can take a long time on slow
storage (minutes even). What we do now is periodically send a SMB echo
to make sure the server is alive rather than trying to time out a
particular call.

The logic that handles that today is somewhat sub-optimal though. We
send an echo every 60s whether there are any calls in flight or not and
wait for 60s until we decide that the server isn't there. What would be
better is to only send one when we've been waiting a long time for a
response.

That "long time" is debatable -- 10s would be fine with me but the
logic needs to be fixed not to send echoes unless there is an
outstanding request first.

I think though that you're trying to use this mechanism to do something
that it wasn't really designed to do. A better method might be to try
and detect whether the TCP connection is really dead somehow. That
would be more immediate, but I'm unclear on how best to do that.
Probably it'll mean groveling around down in the TCP layer...

FWIW, there was a thread on the linux-cifs mailing list started on Dec
3, 2010 entitled "cifs client timeouts and hard/soft mounts" that lays
out the rationale for the current reconnection behavior. You may want
to look over that before you go making changes here...

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CIFS: Decrease reconnection delay when switching nics

2013-02-28 Thread Jeff Layton
On Thu, 28 Feb 2013 10:04:36 -0600
Steve French  wrote:

> On Thu, Feb 28, 2013 at 9:26 AM, Jeff Layton  wrote:
> > On Wed, 27 Feb 2013 16:24:07 -0600
> > Dave Chiluk  wrote:
> >
> >> On 02/27/2013 10:34 AM, Jeff Layton wrote:
> >> > On Wed, 27 Feb 2013 12:06:14 +0100
> >> > "Stefan (metze) Metzmacher"  wrote:
> >> >
> >> >> Hi Dave,
> >> >>
> >> >>> When messages are currently in queue awaiting a response, decrease 
> >> >>> amount of
> >> >>> time before attempting cifs_reconnect to SMB_MAX_RTT = 10 seconds. The 
> >> >>> current
> >> >>> wait time before attempting to reconnect is currently 
> >> >>> 2*SMB_ECHO_INTERVAL(120
> >> >>> seconds) since the last response was recieved.  This does not take 
> >> >>> into account
> >> >>> the fact that messages waiting for a response should be serviced 
> >> >>> within a
> >> >>> reasonable round trip time.
> >> >>
> >> >> Wouldn't that mean that the client will disconnect a good connection,
> >> >> if the server doesn't response within 10 seconds?
> >> >> Reads and Writes can take longer than 10 seconds...
> >> >>
> >> >
> >> > Where does this magic value of 10s come from? Note that a slow server
> >> > can take *minutes* to respond to writes that are long past the EOF.
> >> It comes from the desire to decrease the reconnection delay to something
> >> better than a random number between 60 and 120 seconds.  I am not
> >> committed to this number, and it is open for discussion.  Additionally
> >> if you look closely at the logic it's not 10 seconds per request, but
> >> actually when requests have been in flight for more than 10 seconds make
> >> sure we've heard from the server in the last 10 seconds.
> >>
> >> Can you explain more fully your use case of writes that are long past
> >> the EOF?  Perhaps with a test-case or script that I can test?  As far as
> >> I know writes long past EOF will just result in a sparse file, and
> >> return in a reasonable round trip time *(that's at least what I'm seeing
> >> with my testing).  dd if=/dev/zero of=/mnt/cifs/a bs=1M count=100
> >> seek=10, starts receiving responses from the server in about .05
> >> seconds with subsequent responses following at roughly .002-.01 second
> >> intervals.  This is well within my 10 second value.  Even adding the
> >> latency of AT&T's 2g cell network brings it up to only 1s.  Still 10x
> >> less than my 10 second value.
> >>
> >> The new logic goes like this
> >> if( we've been expecting a response from the server (in_flight), and
> >>  message has been in_flight for more than 10 seconds and
> >>  we haven't had any other contact from the server in that time
> >>   reconnect
> >>
> >
> > That will break writes long past the EOF. Note too that reconnects on
> > CIFS are horrifically expensive and problematic. Much of the state on a
> > CIFS mount is tied to the connection. When that drops, open files are
> > closed and things like locks are dropped. SMB1 has no real mechanism
> > for state recovery, so that can really be a problem.
> >
> >> On a side note, I discovered a small race condition in the previous
> >> logic while working on this, that my new patch also fixes.
> >> 1s  request
> >> 2s  response
> >> 61.995 echo job pops
> >> 121.995 echo job pops and sends echo
> >> 122 server_unresponsive called.  Finds no response and attempts to
> >>reconnect
> >> 122.95 response to echo received
> >>
> >
> > Sure, here's a reproducer. Do this against a windows server, preferably
> > one exporting NTFS on relatively slow storage. Make sure that
> > "testfile" doesn't exist first:
> >
> >  $ dd if=/dev/zero of=/path/to/cifs/share/testfile bs=1M count=1 
> > seek=3192
> >
> > NTFS doesn't support sparse files, so the OS has to zero-fill up to the
> > point where you're writing. That can take a long time on slow
> > storage (minutes even). What we do now is periodically send a SMB echo
> > to make sure the server is alive rather than trying to time out a
> > particular call.
> 
> Writing past end of file in Windows can be very slow, but note that it
> is pos

Re: [PATCH] CIFS: Decrease reconnection delay when switching nics

2013-02-28 Thread Jeff Layton
On Thu, 28 Feb 2013 11:31:54 -0600
Dave Chiluk  wrote:

> On 02/28/2013 10:47 AM, Jeff Layton wrote:
> > On Thu, 28 Feb 2013 10:04:36 -0600
> > Steve French  wrote:
> > 
> >> On Thu, Feb 28, 2013 at 9:26 AM, Jeff Layton  wrote:
> >>> On Wed, 27 Feb 2013 16:24:07 -0600
> >>> Dave Chiluk  wrote:
> >>>
> >>>> On 02/27/2013 10:34 AM, Jeff Layton wrote:
> >>>>> On Wed, 27 Feb 2013 12:06:14 +0100
> >>>>> "Stefan (metze) Metzmacher"  wrote:
> >>>>>
> >>>>>> Hi Dave,
> >>>>>>
> >>>>>>> When messages are currently in queue awaiting a response, decrease 
> >>>>>>> amount of
> >>>>>>> time before attempting cifs_reconnect to SMB_MAX_RTT = 10 seconds. 
> >>>>>>> The current
> >>>>>>> wait time before attempting to reconnect is currently 
> >>>>>>> 2*SMB_ECHO_INTERVAL(120
> >>>>>>> seconds) since the last response was recieved.  This does not take 
> >>>>>>> into account
> >>>>>>> the fact that messages waiting for a response should be serviced 
> >>>>>>> within a
> >>>>>>> reasonable round trip time.
> >>>>>>
> >>>>>> Wouldn't that mean that the client will disconnect a good connection,
> >>>>>> if the server doesn't response within 10 seconds?
> >>>>>> Reads and Writes can take longer than 10 seconds...
> >>>>>>
> >>>>>
> >>>>> Where does this magic value of 10s come from? Note that a slow server
> >>>>> can take *minutes* to respond to writes that are long past the EOF.
> >>>> It comes from the desire to decrease the reconnection delay to something
> >>>> better than a random number between 60 and 120 seconds.  I am not
> >>>> committed to this number, and it is open for discussion.  Additionally
> >>>> if you look closely at the logic it's not 10 seconds per request, but
> >>>> actually when requests have been in flight for more than 10 seconds make
> >>>> sure we've heard from the server in the last 10 seconds.
> >>>>
> >>>> Can you explain more fully your use case of writes that are long past
> >>>> the EOF?  Perhaps with a test-case or script that I can test?  As far as
> >>>> I know writes long past EOF will just result in a sparse file, and
> >>>> return in a reasonable round trip time *(that's at least what I'm seeing
> >>>> with my testing).  dd if=/dev/zero of=/mnt/cifs/a bs=1M count=100
> >>>> seek=10, starts receiving responses from the server in about .05
> >>>> seconds with subsequent responses following at roughly .002-.01 second
> >>>> intervals.  This is well within my 10 second value.  Even adding the
> >>>> latency of AT&T's 2g cell network brings it up to only 1s.  Still 10x
> >>>> less than my 10 second value.
> >>>>
> >>>> The new logic goes like this
> >>>> if( we've been expecting a response from the server (in_flight), and
> >>>>  message has been in_flight for more than 10 seconds and
> >>>>  we haven't had any other contact from the server in that time
> >>>>   reconnect
> >>>>
> >>>
> >>> That will break writes long past the EOF. Note too that reconnects on
> >>> CIFS are horrifically expensive and problematic. Much of the state on a
> >>> CIFS mount is tied to the connection. When that drops, open files are
> >>> closed and things like locks are dropped. SMB1 has no real mechanism
> >>> for state recovery, so that can really be a problem.
> >>>
> >>>> On a side note, I discovered a small race condition in the previous
> >>>> logic while working on this, that my new patch also fixes.
> >>>> 1s  request
> >>>> 2s  response
> >>>> 61.995 echo job pops
> >>>> 121.995 echo job pops and sends echo
> >>>> 122 server_unresponsive called.  Finds no response and attempts to
> >>>>reconnect
> >>>> 122.95 response to echo received
> >>>>
> >>>
> >>> Sure, here's a reproducer. Do this against a windows server, preferably
> >&

Re: [PATCH 2/2] ACPI / glue: Drop .find_bridge() callback from struct acpi_bus_type

2013-02-28 Thread Jeff Garzik

On 02/28/2013 04:53 PM, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

After PCI and USB have stopped using the .find_bridge() callback in
struct acpi_bus_type, the only remaining user of it is SATA, but SATA
only pretends to be a user, because it points that callback to a stub
always returning -ENODEV.

For this reason, drop the SATA's dummy .find_bridge() callback and
remove .find_bridge(), which is not used any more, from struct
acpi_bus_type entirely.

Signed-off-by: Rafael J. Wysocki 
---
  drivers/acpi/glue.c   |   26 +-
  drivers/ata/libata-acpi.c |6 --
  include/acpi/acpi_bus.h   |3 ---
  3 files changed, 1 insertion(+), 34 deletions(-)


patches 1-2 Acked-by: Jeff Garzik 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CIFS: Decrease reconnection delay when switching nics

2013-02-28 Thread Jeff Layton
On Thu, 28 Feb 2013 23:54:13 +0100
Björn JACKE  wrote:

> On 2013-02-28 at 07:26 -0800 Jeff Layton sent off:
> > NTFS doesn't support sparse files, so the OS has to zero-fill up to the
> > point where you're writing. That can take a long time on slow
> > storage (minutes even).
> 
> but you are talking about FAT here, right? NTFS does support sparse files if
> the sparse bit has been explicitly been set on it. Bit even if the sparse bit
> is not set filling a file with zeros by writing after a seek long beyond the
> end of the file is very fast because NTFS supports that feature what Unix
> filesystems like xfs call extents.
> 
> If writing beyond the end of a file is really slow via cifs vfs in the test
> case against a ntfs volume then I wonder if that operation is being really 
> done
> optimally over the wire. ntfs really isn't that bad with handling this kind of
> files.
> 

I'm not sure since I don't know the internals of NTFS. I had always
assumed that it didn't really handle sparse files well (hence the
"rabbit-pellet" thing that windows clients do).

All I can say however is that writes long past the EOF can take a
*really* long time to run. Typically we just issue a SMB_COM_WRITEX at
the offset to which we want to put the data. Is there some other way we
ought to be doing this?

In any case, it doesn't really change the fact that there is no
guaranteed time of response from CIFS servers. They can easily take a
really long time to respond to certain requests. The best method we
have to deal with that is to periodically "ping" the server with an
echo to see if it's still there.

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 6/7] NFSv4: Add O_DENY* open flags support

2013-03-12 Thread Jeff Layton
On Mon, 11 Mar 2013 14:54:34 -0400
Jeff Layton  wrote:

> On Thu, 28 Feb 2013 19:25:32 +0400
> Pavel Shilovsky  wrote:
> 
> > by passing these flags to NFSv4 open request.
> > 
> > Signed-off-by: Pavel Shilovsky 
> > ---
> >  fs/nfs/nfs4xdr.c | 24 
> >  1 file changed, 20 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> > index 26b1439..58ddc74 100644
> > --- a/fs/nfs/nfs4xdr.c
> > +++ b/fs/nfs/nfs4xdr.c
> > @@ -1325,7 +1325,8 @@ static void encode_lookup(struct xdr_stream *xdr, 
> > const struct qstr *name, struc
> > encode_string(xdr, name->len, name->name);
> >  }
> >  
> > -static void encode_share_access(struct xdr_stream *xdr, fmode_t fmode)
> > +static void encode_share_access(struct xdr_stream *xdr, fmode_t fmode,
> > +   int open_flags)
> >  {
> > __be32 *p;
> >  
> > @@ -1343,7 +1344,22 @@ static void encode_share_access(struct xdr_stream 
> > *xdr, fmode_t fmode)
> > default:
> > *p++ = cpu_to_be32(0);
> > }
> > -   *p = cpu_to_be32(0);/* for linux, share_deny = 0 always */
> > +   if (open_flags & O_DENYMAND) {
> 
> 
> As Bruce mentioned, I think a mount option to enable this on a per-fs
> basis would be a better approach than this new O_DENYMAND flag. 
> 
> 
> > +   switch (open_flags & (O_DENYREAD|O_DENYWRITE)) {
> > +   case O_DENYREAD:
> > +   *p = cpu_to_be32(NFS4_SHARE_DENY_READ);
> > +   break;
> > +   case O_DENYWRITE:
> > +   *p = cpu_to_be32(NFS4_SHARE_DENY_WRITE);
> > +   break;
> > +   case O_DENYREAD|O_DENYWRITE:
> > +   *p = cpu_to_be32(NFS4_SHARE_DENY_BOTH);
> > +   break;
> > +   default:
> > +   *p = cpu_to_be32(0);
> > +   }
> > +   } else
> > +   *p = cpu_to_be32(0);
> >  }
> >  
> >  static inline void encode_openhdr(struct xdr_stream *xdr, const struct 
> > nfs_openargs *arg)
> > @@ -1354,7 +1370,7 @@ static inline void encode_openhdr(struct xdr_stream 
> > *xdr, const struct nfs_opena
> >   * owner 4 = 32
> >   */
> > encode_nfs4_seqid(xdr, arg->seqid);
> > -   encode_share_access(xdr, arg->fmode);
> > +   encode_share_access(xdr, arg->fmode, arg->open_flags);
> > p = reserve_space(xdr, 36);
> > p = xdr_encode_hyper(p, arg->clientid);
> > *p++ = cpu_to_be32(24);
> > @@ -1491,7 +1507,7 @@ static void encode_open_downgrade(struct xdr_stream 
> > *xdr, const struct nfs_close
> > encode_op_hdr(xdr, OP_OPEN_DOWNGRADE, decode_open_downgrade_maxsz, hdr);
> > encode_nfs4_stateid(xdr, arg->stateid);
> > encode_nfs4_seqid(xdr, arg->seqid);
> > -   encode_share_access(xdr, arg->fmode);
> > +   encode_share_access(xdr, arg->fmode, 0);
> >  }
> >  
> >  static void
> 
> 
> Other than that, this seems reasonable.
> 
> Acked-by: Jeff Layton 

Oh duh...

Please ignore my comment on patch #7 to add a patch for the NFS client.
This one does that. That said, there may be a potential problem here
that you need to consider.

In the case of a local filesystem you'll want to set deny locks using
deny_lock_file(). For a network filesystem like CIFS or NFS though,
the server will handle that atomically during the open. You need to
ensure that you don't go trying to set LOCK_MAND locks on the file once
that's done.

Perhaps you can use a fstype flag to indicate that the filesystem
handles this during the open and doesn't need to try and set a flock
lock?

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] UAPI: Fix endianness conditionals in linux/aio_abi.h

2013-03-12 Thread Jeff Moyer
Benjamin LaHaise  writes:

> On Wed, Mar 06, 2013 at 08:47:33PM +, David Howells wrote:
>> In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be compared
>> against __BYTE_ORDER in preprocessor conditionals where these are exposed to
>> userspace (that is they're not inside __KERNEL__ conditionals).
>> 
>> However, in the main kernel the norm is to check for "defined(__XXX_ENDIAN)"
>> rather than comparing against __BYTE_ORDER and this has incorrectly leaked
>> into the userspace headers.
>> 
>> The definition of PADDED() in linux/aio_abi.h is wrong in this way.  Note 
>> that
>> userspace will likely interpret this and thus the order of fields in struct
>> iocb incorrectly as the little-endian variant on big-endian machines -
>> depending on header inclusion order.
>> 
>> [!!!] NOTE [!!!]  This patch may adversely change the userspace API.  It 
>> might
>> be better to fix the ordering of aio_key and aio_reserved1 in struct iocb.
>
> It is unlikely that anyone has used the existing kernel headers and hit this 
> issue given that most existing users use the libaio.h include (which does not 
> get the endianness tests wrong).  Given that the kernel has always used the 
> correct endian mappings, this change is correct.

Agreed.

Acked-by: Jeff Moyer 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] igb: SR-IOV init reordering

2013-03-13 Thread Jeff Kirsher
On Tue, 2013-03-12 at 15:25 -0600, Alex Williamson wrote:
> igb is ineffective at setting a lower total VFs because:
> 
> int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
> {
> ...
> /* Shouldn't change if VFs already enabled */
> if (dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE)
> return -EBUSY;
> 
> Swap init ordering.
> 
> Signed-off-by: Alex Williamson 
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-) 

I have added the patch to my igb queue, thanks!


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] igb: Fix null pointer dereference

2013-03-13 Thread Jeff Kirsher
On Tue, 2013-03-12 at 14:09 -0600, Alex Williamson wrote:
> The max_vfs= option has always been self limiting to the number of VFs
> supported by the device.  fa44f2f1 added SR-IOV configuration via
> sysfs, but in the process broke this self correction factor.  The
> failing path is:
> 
> igb_probe
>   igb_sw_init
> if (max_vfs > 7) {
> adapter->vfs_allocated_count = 7;
> ...
> igb_probe_vfs
> igb_enable_sriov(, max_vfs)
>   if (num_vfs > 7) {
> err = -EPERM;
> ...
> 
> This leaves vfs_allocated_count = 7 and vf_data = NULL, so we bomb out
> when igb_probe finally calls igb_reset.  It seems like a really bad
> idea, and somewhat pointless, to set vfs_allocated_count separate from
> vf_data, but limiting max_vfs is enough to avoid the null pointer.
> 
> Signed-off-by: Alex Williamson 
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-) 

I have added the patch to my igb queue, thanks!


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] cifs: Rename cERROR and cifserror to cifs_vfs_err

2013-03-13 Thread Jeff Layton
On Wed, 13 Mar 2013 04:36:54 -0700
Joe Perches  wrote:

> On Tue, 2013-03-12 at 15:44 -0700, Joe Perches wrote:
> > The cERROR macro is always used as cERROR(1, and cifserror
> > is just a printk(KERN_ERR "CIFS VFS: ".
> > 
> > Make a cifs_vfs_err function that uses the vsprintf %pV
> > extension to avoid duplicating the "CIFS VFS: " prefix.
> > 
> > Remove the cERROR macro and use cifs_vfs_err directly.
> 
> Perhaps a better idea than this patch is to
> change both the cERROR and cFYI macros to
> a new use of cifs_dbg(type, fmt, ...)
> 
> cERROR(set, fmt, ...)   -> cifs_dbg(VFS, fmt, ...)
> cFYI(set, fmt, ...) -> cifs_dbg(FYI, fmt, ...)
> 
> This conversion would mark both these macros
> as debug stataments as they are only enabled
> with CONFIG_CIFS_DEBUG.
> 
> Also CONFIG_CIFS_DEBUG2 use of DBG could also
> be integrated with the same style.
> 
> cFYI(DBG2, fmt, ...)-> cifs_dbg(NOISY, fmt, ...)
> 
> The reduced object size would still apply.
> 
> This would also enable an easier conversion to
> dynamic debugging of these debug macros.
> 
> I'd prefer to move the newline from the macro
> to the format as that is more consistent with
> the rest of the kernel.
> 
> Thoughts?
> 

I like this change overall, but the size of the patch is pretty
daunting. If you could change the code that underlies cERROR() and
cFYI() without needing to touch all of their call sites, it might be
a simpler initial step.

OTOH, I would also prefer to move the newline into the format and
that's impossible without touching most of these call sites.

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!

2013-03-13 Thread Jeff Layton
On Wed, 6 Mar 2013 13:40:16 -0800
Tejun Heo  wrote:

> On Wed, Mar 06, 2013 at 01:36:36PM -0800, Tejun Heo wrote:
> > On Wed, Mar 06, 2013 at 01:31:10PM -0800, Linus Torvalds wrote:
> > > So I do agree that we probably have *too* many of the stupid "let's
> > > check if we can freeze", and I suspect that the NFS code should get
> > > rid of the "freezable_schedule()" that is causing this warning
> > > (because I also agree that you should *not* freeze while holding
> > > locks, because it really can cause deadlocks), but I do suspect that
> > > network filesystems do need to have a few places where they check for
> > > freezing on their own... Exactly because freezing isn't *quite* like a
> > > signal.
> > 
> > Well, I don't really know much about nfs so I can't really tell, but
> > for most other cases, dealing with freezing like a signal should work
> > fine from what I've seen although I can't be sure before actually
> > trying.  Trond, Bruce, can you guys please chime in?
> 
> So, I think the question here would be, in nfs, how many of the
> current freezer check points would be difficult to conver to signal
> handling model after excluding the ones which are performed while
> holding some locks which we need to get rid of anyway?
> 

I think we can do this, but it isn't trivial. Here's what I'd envision,
but there are still many details that would need to be worked out...

Basically what we'd need is a way to go into TASK_KILLABLE sleep, but
still allow the freezer to wake these processes up. I guess that likely
means we need a new sleep state (TASK_FREEZABLE?).

We'd also need a way to return an -ERESTARTSYS like error (-EFREEZING?)
that tells the upper syscall handling layers to freeze the task and
then restart the syscall after waking back up. Maybe we don't need a
new error at all and -ERESTARTSYS is fine here? We also need to
consider the effects vs. audit code here, but that may be OK with the
overhaul that Al merged a few months ago.

Assuming we have those, then we need to fix up the NFS and RPC layers
to use this stuff:

1/ Prior to submitting a new RPC, we'd look and see whether "current"
is being frozen. If it is, then return -EFREEZING immediately without
doing anything.

2/ We might also need to retrofit certain stages in the RPC FSM to
return -EFREEZING too if it's a synchronous RPC and the task is being
frozen.

3/ A task is waiting for an RPC reply on an async RPC, we'd need to use
this new sleep state. If the process wakes up because something wants
it to freeze, then have it go back to sleep for a short period of time
to try and wait for the reply (up to 15s or so?). If we get the reply,
great -- return to userland and freeze the task there. If the reply
never comes in, give up on it and just return -EFREEZE and hope for the
best.

We might have to make this latter behavior contingent on a new mount
option (maybe "-o slushy" like Trond recommended). The current "hard"
and "soft" semantics don't really fit this situation correctly.

Of course, this is all a lot of work, and not something we can shove
into the kernel for 3.9 at this point. In the meantime, while Mandeep's
warning is correctly pointing out a problem, I think we ought to back
it out until we can fix this properly.

We're already getting a ton of reports on the mailing lists and in the
fedora bug tracker for this warning. Part of the problem is the
verbiage -- "BUG" makes people think "Oops", but this is really just a
warning.

We should also note that this is a problem too in the CIFS code since
it uses a similar mechanism for allowing the kernel to suspend while
waiting on SMB replies.

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cifs: Rename cERROR and cFYI to cifs_dbg

2013-03-15 Thread Jeff Layton
On Thu, 14 Mar 2013 12:24:37 -0700
Joe Perches  wrote:

> It's not obvious from reading the macro names that these macros
> are for debugging.  Convert the names to a single more typical
> kernel style cifs_dbg macro.
> 
>   cERROR(1, ...)   -> cifs_dbg(VFS, ...)
>   cFYI(1, ...) -> cifs_dbg(FYI, ...)
>   cFYI(DBG2, ...)  -> cifs_dbg(NOISY, ...)
> 
> Move the terminating format newline from the macro to the call site.
> 
> Add CONFIG_CIFS_DEBUG function cifs_vfs_err to emit the
> "CIFS VFS: " prefix for VFS messages.
> 
> Size is reduced ~ 1% when CONFIG_CIFS_DEBUG is set (default y)
> 
> $ size fs/cifs/cifs.ko*
>textdata bss dec hex filename
>  265245  2525 132  267902   4167e fs/cifs/cifs.ko.new
>  2683592525 132  271016   422a8 fs/cifs/cifs.ko.old
> 


This all looks like good stuff. I am a bit concerned about mashing all
of these cleanups into the same patch though.
> Other miscellaneous changes around these conversions:
> 
> o Miscellaneous typo fixes
> o Add terminating \n's to almost all formats and remove them
>   from the macros to be more kernel style like.  A few formats
>   previously had defective \n's
> o Remove unnecessary OOM messages as kmalloc() calls dump_stack
> o Coalesce formats to make grep easier,
>   added missing spaces when coalescing formats
> o Use %s, __func__ instead of embedded function name
> o Removed unnecessary "cifs: " prefixes
> o Convert kzalloc with multiply to kcalloc
> o Remove unused cifswarn macro
> 


-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cifs: Rename cERROR and cFYI to cifs_dbg

2013-03-15 Thread Jeff Layton
On Thu, 14 Mar 2013 12:24:37 -0700
Joe Perches  wrote:

> It's not obvious from reading the macro names that these macros
> are for debugging.  Convert the names to a single more typical
> kernel style cifs_dbg macro.
> 
>   cERROR(1, ...)   -> cifs_dbg(VFS, ...)
>   cFYI(1, ...) -> cifs_dbg(FYI, ...)
>   cFYI(DBG2, ...)  -> cifs_dbg(NOISY, ...)
> 
> Move the terminating format newline from the macro to the call site.
> 
> Add CONFIG_CIFS_DEBUG function cifs_vfs_err to emit the
> "CIFS VFS: " prefix for VFS messages.
> 
> Size is reduced ~ 1% when CONFIG_CIFS_DEBUG is set (default y)
> 
> $ size fs/cifs/cifs.ko*
>textdata bss dec hex filename
>  265245  2525 132  267902   4167e fs/cifs/cifs.ko.new
>  2683592525 132  271016   422a8 fs/cifs/cifs.ko.old
> 

(my apologies -- my MUA has a mind of its own sometimes)

This all looks like good stuff. I am a bit concerned about mashing all
of these cleanups into the same patch though.

> Other miscellaneous changes around these conversions:
> 
> o Miscellaneous typo fixes
> o Add terminating \n's to almost all formats and remove them
>   from the macros to be more kernel style like.  A few formats
>   previously had defective \n's
> o Remove unnecessary OOM messages as kmalloc() calls dump_stack
> o Coalesce formats to make grep easier,
>   added missing spaces when coalescing formats
> o Use %s, __func__ instead of embedded function name
> o Removed unnecessary "cifs: " prefixes

> o Convert kzalloc with multiply to kcalloc
^^^
Things like this really ought to be a separate patch, even though it is
a trivial change. That's a minor nit though...
 
> o Remove unused cifswarn macro
> 

I think we ought to go ahead and take this for 3.10. I do have some
minor concern about having to deal with backports of later patches to
kernels that don't have these changes, but hey, that's the price of
dealing with old kernels.

The sooner Steve merges this into his for-next tree, the better. This
bound to give us all sorts of merge conflicts for the 3.10 window, so
we want to make sure that people know what to base their work on.

Acked-by: Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Remove CONFIG_EXPERIMENTAL

2012-08-28 Thread Jeff Garzik
On Mon, Aug 27, 2012 at 5:53 PM, Kees Cook  wrote:
> This config item has not carried much meaning for a while now and is
> almost always enabled by default. Remove it and adjust various config
> logic and documentation.

It does have meaning...  !CONFIG_EXPERIMENTAL means more stable.  In
the past things would get CONFIG_EXPERIMENTAL until they've been tried
in the field or otherwise hit some goal in the developer's mind.

Is this a practical distinction?  Probably not, as the markers often
go unmaintained...

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NFS: Fix Oopses in nfs_lookup_revalidate and nfs4_lookup_revalidate

2012-08-28 Thread Jeff Layton
On Mon, 27 Aug 2012 13:23:11 -0700
Greg KH  wrote:

> On Mon, Aug 27, 2012 at 08:16:09PM +, Myklebust, Trond wrote:
> > On Mon, 2012-08-27 at 13:09 -0700, Greg KH wrote:
> > > On Wed, Aug 22, 2012 at 04:08:17PM -0400, Trond Myklebust wrote:
> > > > Fix the following Oops in 3.5.1:
> > > > 
> > > >  BUG: unable to handle kernel NULL pointer dereference at 
> > > > 0038
> > > >  IP: [] nfs_lookup_revalidate+0x2d/0x480 [nfs]
> > > >  PGD 337c63067 PUD 0
> > > >  Oops:  [#1] SMP
> > > >  CPU 5
> > > >  Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc 
> > > > af_packet binfmt_misc cpufreq_conservative cpufreq_userspace 
> > > > cpufreq_powersave dm_mod acpi_cpufreq mperf coretemp gpio_ich kvm_intel 
> > > > joydev kvm ioatdma hid_generic igb lpc_ich i7core_edac edac_core ptp 
> > > > serio_raw dca pcspkr i2c_i801 mfd_core sg pps_core usbhid crc32c_intel 
> > > > microcode button autofs4 uhci_hcd ttm drm_kms_helper drm i2c_algo_bit 
> > > > sysimgblt sysfillrect syscopyarea ehci_hcd usbcore usb_common 
> > > > scsi_dh_rdac scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh edd fan 
> > > > ata_piix thermal processor thermal_sys
> > > > 
> > > >  Pid: 30431, comm: java Not tainted 3.5.1-2-default #1 Supermicro 
> > > > X8DTT/X8DTT
> > > >  RIP: 0010:[]  [] 
> > > > nfs_lookup_revalidate+0x2d/0x480 [nfs]
> > > >  RSP: 0018:8801b418bd38  EFLAGS: 00010292
> > > >  RAX: fff6 RBX: 88032016d800 RCX: 0020
> > > >  RDX:  RSI:  RDI: 8801824a7b00
> > > >  RBP: 8801b418bdf8 R08: 7f0034323030 R09: f04c03ed
> > > >  R10: 8801824a7b00 R11: 0002 R12: 8801824a7b00
> > > >  R13: 8801824a7b00 R14:  R15: 8803201725d0
> > > >  FS:  2b53a46cb700() GS:88033fc2() 
> > > > knlGS:
> > > >  CS:  0010 DS:  ES:  CR0: 80050033
> > > >  CR2: 0038 CR3: 00020a426000 CR4: 07e0
> > > >  DR0:  DR1:  DR2: 
> > > >  DR3:  DR6: 0ff0 DR7: 0400
> > > >  Process java (pid: 30431, threadinfo 8801b418a000, task 
> > > > 8801b5d20600)
> > > >  Stack:
> > > >   8801b418be44 88032016d800 8801b418bdf8 
> > > >   8801824a7b00 8801b418bdd7 8803201725d0 8116a9c0
> > > >   8801b5c38dc0 0007 88032016d800 
> > > >  Call Trace:
> > > >   [] lookup_dcache+0x80/0xe0
> > > >   [] __lookup_hash+0x23/0x90
> > > >   [] lookup_one_len+0xc5/0x100
> > > >   [] nfs_sillyrename+0xe3/0x210 [nfs]
> > > >   [] vfs_unlink.part.25+0x7f/0xe0
> > > >   [] do_unlinkat+0x1ac/0x1d0
> > > >   [] system_call_fastpath+0x16/0x1b
> > > >   [<2b5348b5f527>] 0x2b5348b5f526
> > > >  Code: ec 38 b8 f6 ff ff ff 4c 89 64 24 18 4c 89 74 24 28 49 89 fc 48 
> > > > 89 5c 24 08 48 89 6c 24 10 49 89 f6 4c 89 6c 24 20 4c 89 7c 24 30  
> > > > 46 38 40 0f 85 d1 00 00 00 e8 c4 c4 df e0 48 8b 58 30 49 89
> > > >  RIP  [] nfs_lookup_revalidate+0x2d/0x480 [nfs]
> > > >   RSP 
> > > >  CR2: 0038
> > > >  ---[ end trace 845113ed191985dd ]---
> > > > 
> > > > This Oops affects 3.5 kernels and older, and is due to lookup_one_len()
> > > > calling down to the dentry revalidation code with a NULL pointer
> > > > to struct nameidata.
> > > > 
> > > > It is fixed upstream by commit 0b728e1911c (stop passing nameidata *
> > > > to ->d_revalidate())
> > > 
> > > So this is just a nfs-only backport of the larger patch 0b728e1911c,
> > > right?  Should we also do this for other filesystems as well?  Or just
> > > backport the whole commit?
> > 
> > The larger patch involves a VFS api change (the atomic open code) which
> > has a bunch of pre- and post-requirements. I'd assume that is a too
> > large change for stable. I think that the smaller per-filesystem changes
> > are probably more appropriate. The list of filesystems that care are
> > likely to be small. Off the top of my head, I can only think of NFS,
> > CIFS, FUSE and possibly ceph.
> 
> Ok, I'll take this one for NFS, care to break this up also for FUSE and
> CIFS and send me a patch for it?
> 

A similar problem was already fixed quite some time ago in cifs in
commit f5bc1e755d, shortly after the RCU lookup code went in.

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: Fix bad range check in bio_sector_offset

2012-08-29 Thread Jeff Moyer
"Martin K. Petersen"  writes:

> DM would occasionally end up splitting data integrity-enabled requests
> incorrectly. The culprit was a bad range check in bio_sector_offset.

The patch looks ok to me, but what is the user visible behavior when
this happens?

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: Fix bad range check in bio_sector_offset

2012-08-29 Thread Jeff Moyer
"Martin K. Petersen"  writes:

>>>>>> "Jeff" == Jeff Moyer  writes:
>
>>> DM would occasionally end up splitting data integrity-enabled
>>> requests incorrectly. The culprit was a bad range check in
>>> bio_sector_offset.
>
> Jeff> The patch looks ok to me, but what is the user visible behavior
> Jeff> when this happens?
>
> We'd occasionally end up mapping a bad integrity scatterlist and the HBA
> would abort the I/O with a protection information error.

Thanks for the explanation, Martin!

Acked-by: Jeff Moyer 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: at fs/inode.c:280 drop_nlink+0x31/0x33()

2012-08-29 Thread Jeff Layton
On Wed, 29 Aug 2012 09:25:27 -0700
Nick Pasich  wrote:

> 
> I'm using kernel 3.5.3 ... 
> 
> It happens on 3.5.1 and 3.5.2 also.
> 
> I know that Nick Bowler has already reported this...
> 
> I'm experiencing the same thing.
> 
> It happens when moving files from one directory to another
> on the same partition (NFS). 
> 
>   --( Nick Pasich )--
> 
> 
> #
> ##
> ## Happens when PSTs are moved from one directory to another on the ISCSI ...
> ##
> #
> 
> Aug 29 08:06:16 localhost kernel: [ cut here ]
> Aug 29 08:06:16 localhost kernel: WARNING: at fs/inode.c:280 
> drop_nlink+0x31/0x33()
> Aug 29 08:06:16 localhost kernel: Hardware name: To Be Filled By O.E.M.
> Aug 29 08:06:16 localhost kernel: Modules linked in: ecb md4 cifs w83627hf 
> eeprom asb100 hwmon_vid hwmon nfsd exportfs ipv6 psmouse usb_storage 
> io_edgeport usbserial sg r8169 mii evdev intel_agp uhci_hcd i2c_i801 i2c_core 
> shpchp intel_gtt agpgart ehci_hcd microcode serio_raw
> Aug 29 08:06:16 localhost kernel: Pid: 31477, comm: rm Tainted: GW
> 3.5.3 #1
> Aug 29 08:06:16 localhost kernel: Call Trace:
> Aug 29 08:06:16 localhost kernel:  [] ? drop_nlink+0x31/0x33
> Aug 29 08:06:16 localhost kernel:  [] ? 
> warn_slowpath_common+0x7b/0x90
> Aug 29 08:06:16 localhost kernel:  [] ? drop_nlink+0x31/0x33
> Aug 29 08:06:16 localhost kernel:  [] ? warn_slowpath_null+0x1b/0x1f
> Aug 29 08:06:16 localhost kernel:  [] ? drop_nlink+0x31/0x33
> Aug 29 08:06:16 localhost kernel:  [] ? cifs_unlink+0x134/0x63d 
> [cifs]
> Aug 29 08:06:16 localhost kernel:  [] ? dput+0x11/0x117
> Aug 29 08:06:16 localhost kernel:  [] ? mntput_no_expire+0xf/0xf7
> Aug 29 08:06:16 localhost kernel:  [] ? vfs_unlink+0x4e/0xb6
> Aug 29 08:06:16 localhost kernel:  [] ? __lookup_hash+0x54/0xac
> Aug 29 08:06:16 localhost kernel:  [] ? do_unlinkat+0x10a/0x12d
> Aug 29 08:06:16 localhost kernel:  [] ? sys_ioctl+0x34/0x57
> Aug 29 08:06:16 localhost kernel:  [] ? syscall_call+0x7/0xb
> Aug 29 08:06:16 localhost kernel: ---[ end trace 756b427e3bd671f9 ]---
> 

(cc'ing linux-cifs ml)

This stack trace comes from cifs, not nfs.

Steve French has a patch queued in his tree to silence this warning
that I believe he intends to send to Linus for 3.6. Perhaps we should
consider backporting it for 3.5.z too?

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Drop support for x86-32

2012-08-29 Thread Jeff Garzik
On Wed, Aug 29, 2012 at 7:03 PM, Mark Lord  wrote:
> On 12-08-26 10:15 AM, wbrana wrote:
>> On 8/26/12, Mark Lord  wrote:
>>> Here are a couple of real scenarios you don't seem to have thought about.
>>> A 32-bit kernel on a legacy (or even new) system in 2017 will still need
>>> regular kernel updates (not "long term" un0maintained kernels)
>>> in order to work with new USB devices, new 4KB+ sector hard drives,
>>> newer generations of SSDs, etc..
>> 12-years-old machine is trash.
>
> There you go making assumptions again.
> Who said anything about a 12-year old machine?
>
> Much more likely is a 5-year old software installation
> that gets moved to a new box.

Or a brand new software installation into a 32-bit virtual machine.

 Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUESTION] about NFS sub system between Public Kernel and Red Hat Kernel.

2012-08-31 Thread Jeff Layton
On Fri, 31 Aug 2012 13:40:16 +0800
gchen  wrote:

> Hi linux-...@vger.kernel.org
> 
> I have 1 question, and also 2 conclusions which need confirm.
> 
> 
> 1) Question:
> 
> Jeff Layton said in Red Hat Bugzilla (bug 848706):
> "Have configuration where the same host is acting as both NFS client
> and server. That's a configuration known to cause deadlocks."
> 
> Does it mean that the public Linux kernel (not Red Hat) also can cause
> deadlocks if NFS client and server are under the same machine ?
> 

Yes.

> 
> 2) Confirm 1: (better by Jeff Layton)
> 
> For function nfs_commit_set_lock in ./fs/nfs/write.c
> 
> for latest public kernel version:
> the parameters of out_of_line_wait_on_bit_lock() are
> (&nfsi->flags, NFS_INO_COMMIT, nfs_wait_killable, TASK_KILLABLE)
> for Red Hat kernel version: kernel-2.6.18-308.4.1.el5
> the parameters of out_of_line_wait_on_bit_lock() are
> (&nfsi->flags, NFS_INO_COMMIT,
> nfs_wait_bit_uninterruptible, TASK_UNINTERRUPTIBLE)
> 
> It means for red hat version:
> when deadlock occurs, we can not boot machine in normal way
> (it is true for my test machine, the deadlock task can not be killed)
> It means for public kernel version:
> "Assume deadlock occurs", we can still boot machine in normal way,
> because the task can be killed.
> 
> Is what I said above correct ?
> 

Not sure I understand your question. RHEL5 doesn't have support for
TASK_KILLABLE, and I didn't backport it, hence the difference in that
function.

> 
> 3) Confirm 2:
> 
> Is LTP (Linux Test Project) still a suitable test tools for public kernel ?
> (for ltp-full-20100331.gz stress test, it mounts NFS on local machine,
> and the latest LTP ltp-full-20120401.bz2 also seems the same).
> 

That I'm not sure of. All I can tell you is that mounts over loopback
(or similar configurations) are easily deadlockable under load.

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: at fs/inode.c:280 drop_nlink+0x31/0x33()

2012-08-31 Thread Jeff Layton
On Fri, 31 Aug 2012 08:32:06 -0700
Nick Pasich  wrote:

> On Fri, Aug 31, 2012 at 12:00:26PM +0400, Pavel Shilovsky wrote:
> > 2012/8/31 Nick Pasich :
> > > Jeff,
> > >
> > > I applied this patch to Kernel 3.5.3 from Pavel and the
> > > the warning is gone with no problems.
> > >
> > > Thanks,
> > >
> > > --( Nick Pasich
> > >
> > > ##
> > >
> > > From df2d6b1fbf2401c5ee04f2ac143ea0954e3a87a6 Mon Sep 17 00:00:00 2001
> > > From: Pavel Shilovsky 
> > > Date: Fri, 13 Jul 2012 11:59:45 +0400
> > > Subject: [PATCH] CIFS: Protect i_nlink from being negative
> > >
> > > that can cause warning messages.
> > >
> > > Signed-off-by: Pavel Shilovsky 
> > > ---
> > >  fs/cifs/inode.c |   13 +++--
> > >  1 files changed, 11 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
> > > index 7354877..88afb1a 100644
> > > --- a/fs/cifs/inode.c
> > > +++ b/fs/cifs/inode.c
> > > @@ -1110,6 +1110,15 @@ undo_setattr:
> > > goto out_close;
> > >  }
> > >
> > > +/* copied from fs/nfs/dir.c with small changes */
> > > +static void
> > > +cifs_drop_nlink(struct inode *inode)
> > > +{
> > > +   spin_lock(&inode->i_lock);
> > > +   if (inode->i_nlink > 0)
> > > +   drop_nlink(inode);
> > > +   spin_unlock(&inode->i_lock);
> > > +}
> > >
> > >  /*
> > >   * If dentry->d_inode is null (usually meaning the cached dentry
> > > @@ -1166,13 +1175,13 @@ retry_std_delete:
> > >  psx_del_no_retry:
> > > if (!rc) {
> > > if (inode)
> > > -   drop_nlink(inode);
> > > +   cifs_drop_nlink(inode);
> > > } else if (rc == -ENOENT) {
> > > d_drop(dentry);
> > > } else if (rc == -ETXTBSY) {
> > > rc = 
> > > cifs_rename_pending_delete(full_path, dentry, xid);
> > > if (rc == 0)
> > > -   drop_nlink(inode);
> > > +   cifs_drop_nlink(inode);
> > > } else if ((rc == -EACCES) && (dosattr == 0) && inode) {
> > > attrs = kzalloc(sizeof(*attrs), 
> > > GFP_KERNEL);
> > > if (attrs == NULL) {
> > > --
> > > 1.7.3.3
> > >
> > > ##
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > This one fixes only a part of the problem. Now we have another patch
> > for this problem:
> > 
> > https://git.samba.org/sfrench/?p=sfrench/cifs-2.6.git;a=commitdiff;h=b7ca69289680cf631fb20b7d436467c4ec1153cd;hp=6dab7ede9390d4d937cb89feca932e4fd575d2da
> > 
> > -- 
> > Best regards,
> > Pavel Shilovsky.
> 
> 
> 
> Since I'm using kernel 3.5.3 , I get an error on hunk 7 of the patch.
> 
> I can do it by hand... But I want to check with you first.
> 
> Thanks,
> 
> --( Nick Pasich )--
> 

If you fix it up by hand, consider submitting it as a backport for the
stable series as well.

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] Fix a crash when block device is read and block size is changed at the same time

2012-08-31 Thread Jeff Moyer
Mikulas Patocka  writes:

> On Fri, 31 Aug 2012, Mikulas Patocka wrote:
>
>> Hi
>> 
>> This is a series of patches to prevent a crash when when someone is 
>> reading block device and block size is changed simultaneously. (the crash 
>> is already happening in the production environment)
>> 
>> The first patch adds a rw-lock to struct block_device, but doesn't use the 
>> lock anywhere. The reason why I submit this as a separate patch is that on 
>> my computer adding an unused field to this structure affects performance 
>> much more than any locking changes.
>> 
>> The second patch uses the rw-lock. The lock is locked for read when doing 
>> I/O on the block device and it is locked for write when changing block 
>> size.
>> 
>> The third patch converts the rw-lock to a percpu rw-lock for better 
>> performance, to avoid cache line bouncing.
>> 
>> The fourth patch is an alternate percpu rw-lock implementation using RCU 
>> by Eric Dumazet. It avoids any atomic instruction in the hot path.
>> 
>> Mikulas
>
> I tested performance of patches. I created 4GB ramdisk, I initially filled 
> it with zeros (so that ramdisk allocation-on-demand doesn't affect the 
> results).
>
> I ran fio to perform 8 concurrent accesses on 8 core machine (two 
> Barcelona Opterons):
> time fio --rw=randrw --size=4G --bs=512 --filename=/dev/ram0 --direct=1 
> --name=job1 --name=job2 --name=job3 --name=job4 --name=job5 --name=job6 
> --name=job7 --name=job8
>
> The results actually show that the size of struct block_device and 
> alignment of subsequent fields in struct inode have far more effect on 
> result that the type of locking used. (struct inode is placed just after 
> struct block_device in "struct bdev_inode" in fs/block-dev.c)
>
> plain kernel 3.5.3: 57.9s
> patch 1: 43.4s
> patches 1,2: 43.7s
> patches 1,2,3: 38.5s
> patches 1,2,3,4: 58.6s
>
> You can see that patch 1 improves the time by 14.5 seconds, but all that 
> patch 1 does is adding an unused structure field.
>
> Patch 3 is 4.9 seconds faster than patch 1, althogh patch 1 does no 
> locking at all and patch 3 does per-cpu locking. So, the reason for the 
> speedup is different sizeof of struct block_device (and subsequently, 
> different alignment of struct inode), rather than locking improvement.

How many runs did you do?  Did you see much run to run variation?

> I would be interested if other people did performance testing of the 
> patches too.

I'll do some testing next week, but don't expect to get to it before
Wednesday.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 15/16] ARM: samsung: move platform_data definitions

2012-09-11 Thread Jeff Garzik

On 09/11/2012 09:02 AM, Arnd Bergmann wrote:

Platform data for device drivers should be defined in
include/linux/platform_data/*.h, not in the architecture
and platform specific directories.

This moves such data out of the samsung include directories

Signed-off-by: Arnd Bergmann 
Cc: Kukjin Kim 
Cc: Kyungmin Park 
Cc: Ben Dooks 
Cc: Mark Brown 
Cc: Jeff Garzik 
Cc: Guenter Roeck 
Cc: "Wolfram Sang (embedded platforms)" 
Cc: Dmitry Torokhov 
Cc: Bryan Wu 
Cc: Richard Purdie 
Cc: Sylwester Nawrocki 
Cc: Mauro Carvalho Chehab 
Cc: Chris Ball 
Cc: David Woodhouse 
Cc: Grant Likely 
Cc: Felipe Balbi 
Cc: Greg Kroah-Hartman 
Cc: Alan Stern 
Cc: Sangbeom Kim 
Cc: Liam Girdwood 
Cc: linux-samsung-...@vger.kernel.org
---
  arch/arm/mach-exynos/dev-audio.c   |2 +-
  arch/arm/mach-exynos/dev-ohci.c|2 +-
  arch/arm/mach-exynos/mach-nuri.c   |6 +++---
  arch/arm/mach-exynos/mach-origen.c |6 +++---
  arch/arm/mach-exynos/mach-smdk4x12.c   |2 +-
  arch/arm/mach-exynos/mach-smdkv310.c   |6 +++---
  arch/arm/mach-exynos/mach-universal_c210.c |4 ++--
  arch/arm/mach-exynos/setup-i2c0.c  |2 +-
  arch/arm/mach-exynos/setup-i2c1.c  |2 +-
  arch/arm/mach-exynos/setup-i2c2.c  |2 +-
  arch/arm/mach-exynos/setup-i2c3.c  |2 +-
  arch/arm/mach-exynos/setup-i2c4.c  |2 +-
  arch/arm/mach-exynos/setup-i2c5.c  |2 +-
  arch/arm/mach-exynos/setup-i2c6.c  |2 +-
  arch/arm/mach-exynos/setup-i2c7.c  |2 +-
  arch/arm/mach-s3c24xx/common-smdk.c|4 ++--
  arch/arm/mach-s3c24xx/mach-amlm5900.c  |2 +-
  arch/arm/mach-s3c24xx/mach-anubis.c|6 +++---
  arch/arm/mach-s3c24xx/mach-at2440evb.c |6 +++---
  arch/arm/mach-s3c24xx/mach-bast.c  |8 
  arch/arm/mach-s3c24xx/mach-gta02.c |   10 +-
  arch/arm/mach-s3c24xx/mach-h1940.c |8 
  arch/arm/mach-s3c24xx/mach-jive.c  |6 +++---
  arch/arm/mach-s3c24xx/mach-mini2440.c  |   10 +-
  arch/arm/mach-s3c24xx/mach-n30.c   |8 
  arch/arm/mach-s3c24xx/mach-nexcoder.c  |2 +-
  arch/arm/mach-s3c24xx/mach-osiris.c|4 ++--
  arch/arm/mach-s3c24xx/mach-otom.c  |2 +-
  arch/arm/mach-s3c24xx/mach-qt2410.c|8 
  arch/arm/mach-s3c24xx/mach-rx1950.c|   10 +-
  arch/arm/mach-s3c24xx/mach-rx3715.c|2 +-
  arch/arm/mach-s3c24xx/mach-smdk2410.c  |2 +-
  arch/arm/mach-s3c24xx/mach-smdk2413.c  |4 ++--
  arch/arm/mach-s3c24xx/mach-smdk2416.c  |8 
  arch/arm/mach-s3c24xx/mach-smdk2440.c  |2 +-
  arch/arm/mach-s3c24xx/mach-smdk2443.c  |2 +-
  arch/arm/mach-s3c24xx/mach-tct_hammer.c|2 +-
  arch/arm/mach-s3c24xx/mach-vr1000.c|6 +++---
  arch/arm/mach-s3c24xx/mach-vstms.c |4 ++--
  arch/arm/mach-s3c24xx/setup-i2c.c  |2 +-
  arch/arm/mach-s3c24xx/simtec-audio.c   |2 +-
  arch/arm/mach-s3c24xx/simtec-usb.c |2 +-
  arch/arm/mach-s3c64xx/dev-audio.c  |2 +-
  arch/arm/mach-s3c64xx/mach-anw6410.c   |2 +-
  arch/arm/mach-s3c64xx/mach-crag6410-module.c   |2 +-
  arch/arm/mach-s3c64xx/mach-crag6410.c  |4 ++--
  arch/arm/mach-s3c64xx/mach-hmt.c   |4 ++--
  arch/arm/mach-s3c64xx/mach-mini6410.c  |4 ++--
  arch/arm/mach-s3c64xx/mach-ncp.c   |2 +-
  arch/arm/mach-s3c64xx/mach-real6410.c  |4 ++--
  arch/arm/mach-s3c64xx/mach-smartq.c|8 
  arch/arm/mach-s3c64xx/mach-smdk6400.c  |2 +-
  arch/arm/mach-s3c64xx/mach-smdk6410.c  |6 +++---
  arch/arm/mach-s3c64xx/setup-i2c0.c |2 +-
  arch/arm/mach-s3c64xx/setup-i2c1.c |2 +-
  arch/arm/mach-s3c64xx/setup-ide.c  |2 +-
  arch/arm/mach-s5p64x0/dev-audio.c  |2 +-
  arch/arm/mach-s5p64x0/mach-smdk6440.c  |4 ++--
  arch/arm/mach-s5p64x0/mach-smdk6450.c  |4 ++--
  arch/arm/mach-s5p64x0/setup-i2c0.c |2 +-
  arch/arm/mach-s5p64x0/setup-i2c1.c |2 +-
  arch/arm/mach-s5pc100/dev-audio.c  |2 +-
  arch/arm/mach-s5pc100/mach-smdkc100.c  |8 
  arch/arm/mach-s5pc100/setup-i2c0.c |2 +-
  arch/arm/mach-s5pc100/setup-i2c1.c |2 +-
  arch/arm/mach-s5pv210/d

Re: [PATCH 2/2] [trivial] Documentation: broken URL in libata

2012-09-12 Thread Jeff Garzik

On 02/13/2012 12:22 PM, Randy Dunlap wrote:

On 02/13/2012 01:09 AM, Michael Opdenacker wrote:

Fix broken link to license text:
http://www.opensource.org/licenses/osl-1.1.txt
The text for version 1.1 of the Open Sofware license doesn't seem
to be available anywhere on http://www.opensource.org/ any more.
Replace it with a snapshot from the Internet Wayback Machine.


That's one option.
Too bad opensource.org doesn't provide archives.

OSL v1.1 is also available here:
http://fedoraproject.org/wiki/Licensing:OSL1.1

and here:
http://www.samurajdata.se/opensource/mirror/licenses/osl.php

Jeff, I don't suppose there is any chance of changing this file's license?
(since the Debian people found it to be a problem .. long ago)


Yeah, that's fine...



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git patches] libata new PCI IDs

2012-09-12 Thread Jeff Garzik

Please pull 7b4f6ecacb14f384adc1a5a67ad95eb082c02bd1 from
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git 
tags/upstream-linus


to receive the following updates:

 drivers/ata/ahci.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

Alan Cox (2):
  ahci: Add alternate identifier for the 88SE9172
  ahci: Add identifiers for ASM106x devices

Ben Hutchings (1):
  ahci: Add JMicron 362 device IDs

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 50d5dea..7862d17 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -268,6 +268,9 @@ static const struct pci_device_id ahci_pci_tbl[] = {
/* JMicron 360/1/3/5/6, match class to avoid IDE function */
{ PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
  PCI_CLASS_STORAGE_SATA_AHCI, 0xff, board_ahci_ign_iferr },
+   /* JMicron 362B and 362C have an AHCI function with IDE class code */
+   { PCI_VDEVICE(JMICRON, 0x2362), board_ahci_ign_iferr },
+   { PCI_VDEVICE(JMICRON, 0x236f), board_ahci_ign_iferr },
 
/* ATI */
{ PCI_VDEVICE(ATI, 0x4380), board_ahci_sb600 }, /* ATI SB600 */
@@ -393,6 +396,8 @@ static const struct pci_device_id ahci_pci_tbl[] = {
  .driver_data = board_ahci_yes_fbs },  /* 88se9125 */
{ PCI_DEVICE(0x1b4b, 0x917a),
  .driver_data = board_ahci_yes_fbs },  /* 88se9172 */
+   { PCI_DEVICE(0x1b4b, 0x9192),
+ .driver_data = board_ahci_yes_fbs },  /* 88se9172 on 
some Gigabyte */
{ PCI_DEVICE(0x1b4b, 0x91a3),
  .driver_data = board_ahci_yes_fbs },
 
@@ -400,7 +405,10 @@ static const struct pci_device_id ahci_pci_tbl[] = {
{ PCI_VDEVICE(PROMISE, 0x3f20), board_ahci },   /* PDC42819 */
 
/* Asmedia */
-   { PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci },   /* ASM1061 */
+   { PCI_VDEVICE(ASMEDIA, 0x0601), board_ahci },   /* ASM1060 */
+   { PCI_VDEVICE(ASMEDIA, 0x0602), board_ahci },   /* ASM1060 */
+   { PCI_VDEVICE(ASMEDIA, 0x0611), board_ahci },   /* ASM1061 */
+   { PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci },   /* ASM1062 */
 
/* Generic, PCI class code for AHCI */
{ PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] sound fixes for 3.6-rc6

2012-09-13 Thread Jeff King
On Thu, Sep 13, 2012 at 02:28:51PM +0200, Takashi Iwai wrote:

> > FWIW, it was an output from git-pull-request, which fell back to the
> > equivalent branch.  Usually I check it manually but I forgot it at
> > this time just before going to a meeting.
> > 
> > This was with git 1.7.11.5.  I'll check whether this still happens
> > with 1.7.12.
> 
> The same problem still happens with git 1.7.12.
> This is rather annoying than useful.

I can't reproduce here. What is your exact request-pull invocation? Is
request-pull showing a warning like:

  warn: You locally have sound-3.6 but it does not (yet)
  warn: appear to be at 
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git
  warn: Do you want to push it there, perhaps?

(it should do so since v1.7.11.2). Maybe we need to make it possible to
bump that warning to a fatal error?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] remove smbfs

2008-01-30 Thread Jeff Layton
On Wed, 30 Jan 2008 22:16:13 +0100
Guenter Kukkukk <[EMAIL PROTECTED]> wrote:

> Am Montag, 28. Januar 2008 schrieb Adrian Bunk:
> > I remember that there were some small things missing in CIFS for 
> > completely replacing the unmaintained smbfs when we discussed
> > removing smbfs back in 2005 due to smbfs being unmaintained.
> > 
> > CIFS has improved since, smbfs is still unmaintained, and it's
> > becoming time to finally remove smbfs.
> > 
> > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> > 
> 
> "... unmaintained smbfs ..." is not quite right, see
>http://lkml.org/lkml/2007/11/6/94
> Before removing it now completely, drop
>   Jeff Layton <[EMAIL PROTECTED]>
> a note.
> Afaik, Redhat still has customers which rely on smbfs.
> 

Some of our older products use smbfs, but our newer stuff (RHEL5 and
up) have smbfs disabled. Fedora has had smbfs disabled for quite some
time as well. I've heard very few complaints (though maybe they're just
not getting to me).

I have no problem with targeting smbfs for removal, but I thought
Andrew had an unofficial policy that we should first mark things to be
deprecated, and then remove them 2 releases later. That seems like a
sensible policy to me. If we mark it deprecated in 2.6.25 then we can
remove it after 2.6.26 is released.

It might not even hurt to have a nice loud printk when the smbfs
module is plugged in to warn users that it's slated to be removed,
and that they should move to CIFS as soon as possible.

> In addition, cifs cannot completely replace smbfs atm.
> Even todays sold NAS-boxes (often running anchient 
> samba-2.x.x) work only with smbfs on the client side.

It would be ideal if someone were to report these problems as bugs. I
remember some of those in the past, but haven't heard of any cases of
that sort of thing for some time. When I have, Steve has generally
been very good about tracking down the cause and fixing it.

Cheers,
-- 
Jeff Layton <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] remove smbfs

2008-01-30 Thread Jeff Layton
On Thu, 31 Jan 2008 00:58:10 +0200
Adrian Bunk <[EMAIL PROTECTED]> wrote:

> On Wed, Jan 30, 2008 at 05:41:03PM -0500, Jeff Layton wrote:
> > On Wed, 30 Jan 2008 22:16:13 +0100
> > Guenter Kukkukk <[EMAIL PROTECTED]> wrote:
> > 
> > > Am Montag, 28. Januar 2008 schrieb Adrian Bunk:
> > > > I remember that there were some small things missing in CIFS
> > > > for completely replacing the unmaintained smbfs when we
> > > > discussed removing smbfs back in 2005 due to smbfs being
> > > > unmaintained.
> > > > 
> > > > CIFS has improved since, smbfs is still unmaintained, and it's
> > > > becoming time to finally remove smbfs.
> > > > 
> > > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> > > > 
> > > 
> > > "... unmaintained smbfs ..." is not quite right, see
> > >http://lkml.org/lkml/2007/11/6/94
> > > Before removing it now completely, drop
> > >   Jeff Layton <[EMAIL PROTECTED]>
> > > a note.
> > > Afaik, Redhat still has customers which rely on smbfs.
> > > 
> > 
> > Some of our older products use smbfs, but our newer stuff (RHEL5 and
> > up) have smbfs disabled. Fedora has had smbfs disabled for quite
> > some time as well. I've heard very few complaints (though maybe
> > they're just not getting to me).
> > 
> > I have no problem with targeting smbfs for removal, but I thought
> > Andrew had an unofficial policy that we should first mark things to
> > be deprecated, and then remove them 2 releases later. That seems
> > like a sensible policy to me. If we mark it deprecated in 2.6.25
> > then we can remove it after 2.6.26 is released.
> > 
> > It might not even hurt to have a nice loud printk when the smbfs
> > module is plugged in to warn users that it's slated to be removed,
> > and that they should move to CIFS as soon as possible.
> 
> Andrew has this with a target date of December 2006 (sic) for the 
> removal buried in -mm...
> 

True, but most users don't run -mm. I think we should have this marked
as deprecated in mainline kernels before removing it. I rather like the
idea of a runtime warning too...

smbfs has the unfortunate quality of momentum. A lot of users aren't
aware of CIFS at all since smbfs basically does what they need it to
do. Some extra warning for those users would be nice.

> > > In addition, cifs cannot completely replace smbfs atm.
> > > Even todays sold NAS-boxes (often running anchient 
> > > samba-2.x.x) work only with smbfs on the client side.
> > 
> > It would be ideal if someone were to report these problems as bugs.
> > I remember some of those in the past, but haven't heard of any
> > cases of that sort of thing for some time. When I have, Steve has
> > generally been very good about tracking down the cause and fixing
> > it.
> 
> More exactly, one of the main advantages of removing redundant code
> like smbfs is that people are finally forced to report their bugs.
> 

Indeed. I'm all for removing it, but I think we should try to have a
clear transition path to avoid some of the "WTF happened to smbfs?"
emails we're bound to get. Marking it deprecated in mainline and
stating that it'll be removed in version 2.6.26 (or whenever) seems like
a reasonable thing to do.

Just my $.02...

-- 
Jeff Layton <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] remove smbfs

2008-01-30 Thread Jeff Layton
On Thu, 31 Jan 2008 02:47:17 +0200
Adrian Bunk <[EMAIL PROTECTED]> wrote:

> On Wed, Jan 30, 2008 at 07:34:12PM -0500, Jeff Layton wrote:
> > On Thu, 31 Jan 2008 00:58:10 +0200
> > Adrian Bunk <[EMAIL PROTECTED]> wrote:
> > 
> > > On Wed, Jan 30, 2008 at 05:41:03PM -0500, Jeff Layton wrote:
> > > > On Wed, 30 Jan 2008 22:16:13 +0100
> > > > Guenter Kukkukk <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > Am Montag, 28. Januar 2008 schrieb Adrian Bunk:
> > > > > > I remember that there were some small things missing in CIFS
> > > > > > for completely replacing the unmaintained smbfs when we
> > > > > > discussed removing smbfs back in 2005 due to smbfs being
> > > > > > unmaintained.
> > > > > > 
> > > > > > CIFS has improved since, smbfs is still unmaintained, and
> > > > > > it's becoming time to finally remove smbfs.
> > > > > > 
> > > > > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> > > > > > 
> > > > > 
> > > > > "... unmaintained smbfs ..." is not quite right, see
> > > > >http://lkml.org/lkml/2007/11/6/94
> > > > > Before removing it now completely, drop
> > > > >   Jeff Layton <[EMAIL PROTECTED]>
> > > > > a note.
> > > > > Afaik, Redhat still has customers which rely on smbfs.
> > > > > 
> > > > 
> > > > Some of our older products use smbfs, but our newer stuff
> > > > (RHEL5 and up) have smbfs disabled. Fedora has had smbfs
> > > > disabled for quite some time as well. I've heard very few
> > > > complaints (though maybe they're just not getting to me).
> > > > 
> > > > I have no problem with targeting smbfs for removal, but I
> > > > thought Andrew had an unofficial policy that we should first
> > > > mark things to be deprecated, and then remove them 2 releases
> > > > later. That seems like a sensible policy to me. If we mark it
> > > > deprecated in 2.6.25 then we can remove it after 2.6.26 is
> > > > released.
> > > > 
> > > > It might not even hurt to have a nice loud printk when the smbfs
> > > > module is plugged in to warn users that it's slated to be
> > > > removed, and that they should move to CIFS as soon as possible.
> > > 
> > > Andrew has this with a target date of December 2006 (sic) for the 
> > > removal buried in -mm...
> > > 
> > 
> > True, but most users don't run -mm. I think we should have this
> > marked as deprecated in mainline kernels before removing it. I
> > rather like the idea of a runtime warning too...
> 
> drivers/pcmcia/pcmcia_ioctl.c was scheduled for removal in November
> 2005 and has such a printk since 2005.
> 
> Without any good reason why it's still in the kernel.
> 
> > smbfs has the unfortunate quality of momentum. A lot of users aren't
> > aware of CIFS at all since smbfs basically does what they need it to
> > do. Some extra warning for those users would be nice.
> 
> And many users will start whining loudly that the not deprecated
> driver (in this case cifs) has this or that bug not before the patch
> to finally remove the deprecated feature got applied or at least
> posted.
> 
> And will demand that it therefore does not get removed.
> 

Sucks for them then. They're always welcome to take the old code and
maintain it out of tree if they wish.

> > > > > In addition, cifs cannot completely replace smbfs atm.
> > > > > Even todays sold NAS-boxes (often running anchient 
> > > > > samba-2.x.x) work only with smbfs on the client side.
> > > > 
> > > > It would be ideal if someone were to report these problems as
> > > > bugs. I remember some of those in the past, but haven't heard
> > > > of any cases of that sort of thing for some time. When I have,
> > > > Steve has generally been very good about tracking down the
> > > > cause and fixing it.
> > > 
> > > More exactly, one of the main advantages of removing redundant
> > > code like smbfs is that people are finally forced to report their
> > > bugs.
> > > 
> > 
> > Indeed. I'm all for removing it, but I think we should try to have a
> > clear transition path to avoid some of the "WTF happened to smbfs?"
> > emails we're bound to get. Marking it deprecated in mainline and
> > stating that it'll be removed in version 2.6.26 (or whenever) seems
> > like a reasonable thing to do.
> 
> How many "WTF happened to smbfs?" emails did you get at RedHat?
> 

Not too many, though we did have customers who demanded that we fix
smbfs and refused to move to cifs. We did that for some of the older
releases but they're out of luck on the new ones.

I don't feel strongly about this either way, really. In general, I
think offering some warning is the better approach, but if you think
that removing smbfs immediately is the right one then go for it...

-- 
Jeff Layton <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cups slow on linux-2.6.24

2008-01-30 Thread Jeff Chua
On Jan 30, 2008 9:47 PM, Patrick McHardy <[EMAIL PROTECTED]> wrote:

> A binary dump would be more useful:
>
> tcpdump -i lo -w 
>
> and I guess Jozsef also wants "-s 0" so the full packets are included.

Attached. Again, both runs with this command to print ...

for((i=1; i<1001;i++)); do echo $i | lpr -Plp; done

In the good file (lo.good), look for this timestamp (08:38:11.818587)
when it paused then continue again at 08:38:22.477261. That's almost
11 seconds of "sleep" ... (may be a feature of TCP/IP?).

In the bad file (lo.bad), look for 08:47:55.434722 where it paused,
and then continue at
08:48:24.449176. That's 28 seconds of sleep. But, after it continued,
lpstat shows it's printing a job every 3 seconds. All 100 jobs take
approx 1400 seconds to complete as compared to under 100 seconds for
the good run.

Again, using latest linux, one with
17311393f969090ab060540bd9dbe7dc885a76d5 reverted, and the other
without.


Thanks,
Jeff.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cups slow on linux-2.6.24

2008-01-30 Thread Jeff Chua
On Jan 31, 2008 10:41 AM, Patrick McHardy <[EMAIL PROTECTED]> wrote:

> Thanks. In the dump we can see that connections reusing ports
> always have their first SYN dropped and retransmissted three
> seconds later. I'm not sure whats causing this yet, do you have
> any firewall rules that affect loopback traffic?

No firewall.

And the "lp" is just to print to a [EMAIL PROTECTED]

# lpadmin -p lp -i /etc/cups/interfaces/lp -v lpd://localhost/file -o
printer-error-policy=retry-job
# lpadmin -p file -i /etc/cups/interfaces/file -v file:/dev/null -o
printer-error-policy=retry-job

Filter for lp is ..
  cat $6

Filter for file is ...
  cat $6 >/tmp/$$


Thanks,
Jeff.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cups slow on linux-2.6.24

2008-01-30 Thread Jeff Chua
On Jan 31, 2008 11:25 AM, Patrick McHardy <[EMAIL PROTECTED]> wrote:

> Actually its probably the SYN/ACK that is dropped. Please try whether
>
> modprobe ipt_LOG
> echo 255 >/proc/sys/net/netfilter/nf_conntrack_log_invalid


On the good run, I don't get any message, which is good.

On the bad run, I got the following message ...

boston kernel: nf_ct_tcp: invalid packed ignored IN= OUT=
SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=8162
DF PROTO=TCP SPT=1016 DPT=515 SEQ=3834958843 ACK=0 WINDOW=32792
RES=0x00 SYN URGP=0 OPT (0204400C0402080ACC1901030307)
UID=0 GID=65534


This message is displayed repeatedly after the job got "stuck", once
ever 3 seconds coinciding with every 3 seconds of the print job sent.

Hope this helps.

Jeff.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] ata_piix.c: make piix_merge_scr() static

2008-02-01 Thread Jeff Garzik

Adrian Bunk wrote:

piix_merge_scr() can become static.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>


applied


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Are Section mismatches out of control?

2008-02-01 Thread Jeff Garzik

Andrew Morton wrote:

On Fri, 1 Feb 2008 11:47:18 +0100 Sam Ravnborg <[EMAIL PROTECTED]> wrote:


James said in a related posting that the Section mismatch
warnings were getting out of control.


eh.  They're easy - the build system tells you about them!


The list is here:


Question is: why do people keep adding new ones when they are so easy to
detect and fix?

Asnwer: because neither they nor their patch integrators are doing adequate
compilation testing.


I will look at drivers/isdn as next step.


Thanks.


Another way to look at it...  All of a sudden, different from 2.6.24, 
kernel 2.6.25-git build spews so many warnings that I need to disable 
section mismatch checking completely, because there is so much noise 
that __normal build messages scroll off the screen__.


    Jeff




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] [libata] Blackfin pata-bf54x driver: Remove obsolete PM function

2008-02-01 Thread Jeff Garzik

Bryan Wu wrote:

From: Sonic Zhang <[EMAIL PROTECTED]>

Signed-off-by: Sonic Zhang <[EMAIL PROTECTED]>
Signed-off-by: Bryan Wu <[EMAIL PROTECTED]>
---
 drivers/ata/pata_bf54x.c |4 
 1 files changed, 0 insertions(+), 4 deletions(-)


applied 1-4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] isdn: fix section mismatch warnings in isdn

2008-02-01 Thread Jeff Garzik

Sam Ravnborg wrote:

I know Jeff Garzik has some ISDN clean-up patches pending
but rather than waiting forever to have them acked I deciced
to fix the warnings in the current kernel.



Please do...

Those patches are not going to be submitted for the current merge 
window, as I had higher priorities.  The PCI hotplug conversion is 
complete, but a few "rough edges" remain to be cleaned up -- then we 
must test, since none of this work is tested at all yet.


For anyone else curious about the ISDN PCI hotplug API conversion, it is 
available on the 'isdn-pci' branch of

git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git


Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix for completion handling

2008-02-01 Thread Jeff Garzik

Robert Hancock wrote:

This patch is based on an original patch from Kuan Luo of NVIDIA,
posted under subject "fixed a bug of adma in rhel4u5 with HDS7250SASUN500G".
His description follows. I've reworked it a bit to avoid some unnecessary
repeated checks but it should be functionally identical.

"The patch is to solve the error message "ata1: CPB flags CMD err,
flags=0x11" when testing HDS7250SASUN500G in rhel4u5.
I tested this hd in 2.6.24-rc7 which needed to remove the mask in
blacklist to run the ncq and the same error also showed up. 


I traced the  bug and found that the interrupt finished a command (for
example, tag=0) when the driver got that adma status is
NV_ADMA_STAT_DONE  and  cpb->resp_flags is NV_CPB_RESP_DONE.
However, For this hd, the drive maybe didn't clear bit 0 at this moment.
It meaned the hardware  had not completely finished the command.
If at the same time  the driver freed the command(tag 0) and sended
another command (tag 0), the error happened.

The notifier register is 32-bit register containing notifier value.
Value is bit vector containing one bit per tag number (0-31) in
corresponding bit positions (bit 0 is for tag 0, etc). When bit is set
then ADMA indicates that command with corresponding tag number completed
execution.

So i added the check notifier code. Sometimes i saw that the notifier
reg set some bits  , but the adma status set NV_ADMA_STAT_CMD_COMPLETE
,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE check
code."

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>


applied, thanks all for investigating this stuff



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PCI: modify SB700 SATA MSI quirk

2008-02-01 Thread Jeff Garzik

Tejun Heo wrote:

From: Shane Huang <[EMAIL PROTECTED]>

SB700 SATA MSI bug will be fixed in SB700 revision A21 at hardware
level, but the SB700 revision older than A21 will also be found in the
market.  This patch modify the original quirk commit
bc38b411fe696fad32b261f492cb4afbf1835256 instead of withdrawing it.
The patch also removes quirk to 0x4395 because 0x4395 is SB800 device
ID.

Signed-off-by: Shane Huang <[EMAIL PROTECTED]>
Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
Okay, here's reformatted in-line version.  Shane, please invest some
time into setting up email environment.  Sending patches via email is
an important part of the linux kernel development process and if
you're gonna submit patches, you're just gonna have to do it.

 drivers/pci/quirks.c |   29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)


FWIW, I'm happy with whatever this thread results in...   it sounds like 
Tejun and Shane are iterating towards a satisfactory final result.


Just let me know if I need to merge something, since I'm assuming that 
GregKH will push this through the PCI tree.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] arch/um/include/init.h: Fix missing macro definitions

2008-02-01 Thread Jeff Dike
On Thu, Jan 31, 2008 at 11:06:34PM +0800, WANG Cong wrote:
> This patch fixed the following build error in current -git tree.
> 
> arch/um/kernel/config.c:10: error: expected declaration specifiers or '...' 
> before '.' token
> ...

This is close to uml-arch-um-include-inith-needs-a-definition-of-__used.patch
that's currently in -mm.

Andrew, could you replace
uml-arch-um-include-inith-needs-a-definition-of-__used.patch with the
version below and push it to Linus?

Jeff

-- 
Work email - jdike at linux dot intel dot com


init.h started breaking now for some reason.  It turns out that there wasn't a
definition of __used.  Fixed this by copying the relevant stuff from
compiler.h in the userspace case, and including compiler.h in the kernel case.

>From WANG Cong <[EMAIL PROTECTED]> - added definition of __section

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
Cc: WANG Cong <[EMAIL PROTECTED]>
---
 arch/um/include/init.h |   25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

Index: linux-2.6-git/arch/um/include/init.h
===
--- linux-2.6-git.orig/arch/um/include/init.h   2008-02-01 10:41:14.0 
-0500
+++ linux-2.6-git/arch/um/include/init.h2008-02-01 10:52:34.0 
-0500
@@ -40,6 +40,20 @@
 typedef int (*initcall_t)(void);
 typedef void (*exitcall_t)(void);
 
+#ifndef __KERNEL__
+#ifndef __section
+# define __section(S) __attribute__ ((__section__(#S)))
+#endif
+
+#if __GNUC_MINOR__ >= 3
+# define __used__attribute__((__used__))
+#else
+# define __used__attribute__((__unused__))
+#endif
+
+#else
+#include 
+#endif
 /* These are for everybody (although not all archs will actually
discard it in modules) */
 #define __init __section(.init.text)
@@ -127,14 +141,3 @@ extern struct uml_param __uml_setup_star
 #endif
 
 #endif /* _LINUX_UML_INIT_H */
-
-/*
- * Overrides for Emacs so that we follow Linus's tabbing style.
- * Emacs will notice this stuff at the end of the file and automatically
- * adjust the settings for this buffer only.  This must remain at the end
- * of the file.
- * ---
- * Local variables:
- * c-file-style: "linux"
- * End:
- */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH #upstream] libata: implement libata.force module parameter

2008-02-01 Thread Jeff Garzik

Tejun Heo wrote:

This patch implements libata.force module parameter which can
selectively override ATA port, link and device configurations
including cable type, SATA PHY SPD limit, transfer mode and NCQ.

For example, you can say "use 1.5Gbps for all fan-out ports attached
to the second port but allow 3.0Gbps for the PMP device itself, oh,
the device attached to the third fan-out port chokes on NCQ and
shouldn't go over UDMA4" by the following.

 libata.force=2:1.5g,2.15:3.0g,2.03:noncq,udma4

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
I guess it's about time we add something like this.  More than
anything else this should help debugging and can serve as a last
resort to work around problems.

Thanks.

 Documentation/kernel-parameters.txt |   35 +++
 drivers/ata/libata-core.c   |  375 +++-
 drivers/ata/libata-eh.c |8 
 drivers/ata/libata.h|1 
 4 files changed, 415 insertions(+), 4 deletions(-)


ACK, but it breaks the build due to section type conflicts:

drivers/ata/libata-core.c:108: error: ata_force_param_buf causes a 
section type conflict


Given that the data is marked __initdata and the code is marked __init, 
I cannot see the problem.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git Patch] UML: a build error fix

2008-02-01 Thread Jeff Dike
On Thu, Jan 31, 2008 at 11:17:41PM +0800, WANG Cong wrote:
> This patch fixed this error:
> 
> arch/um/kernel/skas/syscall.c: In function 'handle_syscall':
> arch/um/kernel/skas/syscall.c:33: error: 'NR_syscalls' undeclared (first use 
> in this function)

That works, but I think doing things the way that i386 does them is cleaner.

Andrew, can you stick the patch below into -mm and push it to Linus?

    Jeff

-- 
Work email - jdike at linux dot intel dot com



Redo the calculation of NR_syscalls since that disappeared from i386
and use a similar mechanism on x86_64.

We now figure out the size of the system call table in arch code and
stick that in syscall_table_size.  arch/um/kernel/skas/syscall.c
defines NR_syscalls in terms of that since its the only thing that
needs to know how many system calls there are.

The old mechananism that was used on x86_64 is gone.

arch/um/include/sysdep-i386/syscalls.h got some formatting since I was
looking at it.

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
Cc: WANG Cong <[EMAIL PROTECTED]>
---
 arch/um/include/sysdep-i386/syscalls.h |5 +++--
 arch/um/include/sysdep-x86_64/kernel-offsets.h |9 -
 arch/um/include/sysdep-x86_64/syscalls.h   |2 --
 arch/um/kernel/skas/syscall.c  |3 +++
 arch/um/sys-i386/sys_call_table.S  |5 +
 arch/um/sys-x86_64/syscall_table.c |   17 ++---
 6 files changed, 25 insertions(+), 16 deletions(-)

Index: linux-2.6-git/arch/um/include/sysdep-x86_64/syscalls.h
===
--- linux-2.6-git.orig/arch/um/include/sysdep-x86_64/syscalls.h 2008-02-01 
11:24:32.0 -0500
+++ linux-2.6-git/arch/um/include/sysdep-x86_64/syscalls.h  2008-02-01 
11:47:51.0 -0500
@@ -30,6 +30,4 @@ extern long old_mmap(unsigned long addr,
 extern syscall_handler_t sys_modify_ldt;
 extern syscall_handler_t sys_arch_prctl;
 
-#define NR_syscalls (UM_NR_syscall_max + 1)
-
 #endif
Index: linux-2.6-git/arch/um/kernel/skas/syscall.c
===
--- linux-2.6-git.orig/arch/um/kernel/skas/syscall.c2008-02-01 
11:24:32.0 -0500
+++ linux-2.6-git/arch/um/kernel/skas/syscall.c 2008-02-01 11:48:02.0 
-0500
@@ -9,6 +9,9 @@
 #include "sysdep/ptrace.h"
 #include "sysdep/syscalls.h"
 
+extern int syscall_table_size;
+#define NR_syscalls (syscall_table_size / sizeof(void *))
+
 void handle_syscall(struct uml_pt_regs *r)
 {
struct pt_regs *regs = container_of(r, struct pt_regs, regs);
Index: linux-2.6-git/arch/um/sys-i386/sys_call_table.S
===
--- linux-2.6-git.orig/arch/um/sys-i386/sys_call_table.S2008-02-01 
11:24:32.0 -0500
+++ linux-2.6-git/arch/um/sys-i386/sys_call_table.S 2008-02-01 
12:08:17.0 -0500
@@ -9,4 +9,9 @@
 
 #define old_mmap old_mmap_i386
 
+.section .rodata,"a"
+
 #include "../../x86/kernel/syscall_table_32.S"
+
+ENTRY(syscall_table_size)
+.long .-sys_call_table
Index: linux-2.6-git/arch/um/include/sysdep-i386/syscalls.h
===
--- linux-2.6-git.orig/arch/um/include/sysdep-i386/syscalls.h   2007-11-28 
13:01:17.0 -0500
+++ linux-2.6-git/arch/um/include/sysdep-i386/syscalls.h    2008-02-01 
11:48:02.0 -0500
@@ -1,5 +1,5 @@
 /* 
- * Copyright (C) 2000 Jeff Dike ([EMAIL PROTECTED])
+ * Copyright (C) 2000 - 2008 Jeff Dike ([EMAIL PROTECTED],linux.intel}.com)
  * Licensed under the GPL
  */
 
@@ -18,7 +18,8 @@ extern syscall_handler_t old_mmap_i386;
 extern syscall_handler_t *sys_call_table[];
 
 #define EXECUTE_SYSCALL(syscall, regs) \
-   ((long (*)(struct syscall_args)) 
(*sys_call_table[syscall]))(SYSCALL_ARGS(®s->regs))
+   ((long (*)(struct syscall_args)) \
+(*sys_call_table[syscall]))(SYSCALL_ARGS(®s->regs))
 
 extern long sys_mmap2(unsigned long addr, unsigned long len,
  unsigned long prot, unsigned long flags,
Index: linux-2.6-git/arch/um/include/sysdep-x86_64/kernel-offsets.h
===
--- linux-2.6-git.orig/arch/um/include/sysdep-x86_64/kernel-offsets.h   
2007-12-03 23:56:34.0 -0500
+++ linux-2.6-git/arch/um/include/sysdep-x86_64/kernel-offsets.h
2008-02-01 11:48:01.0 -0500
@@ -17,16 +17,7 @@
 #define OFFSET(sym, str, mem) \
DEFINE(sym, offsetof(struct str, mem));
 
-#define __NO_STUBS 1
-#undef __SYSCALL
-#undef _ASM_X86_64_UNISTD_H_
-#define __SYSCALL(nr, sym) [nr] = 1,
-static char syscalls[] = {
-#include 
-};
-
 void foo(void)
 {
 #include 
-DEFINE(UM_NR_syscall_max, sizeof(syscalls) - 1);
 }
Index: linux-2.6-git/arch/um/sys-x86_64/syscall_table.c
=

  1   2   3   4   5   6   7   8   9   10   >