The pull request you sent on Fri, 14 Jun 2019 10:44:42 -0600:
> git://git.kernel.dk/linux-block.git tags/for-linus-20190614
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/7b10315128c697b32c3a920cd79a8301c7466637
Thank you!
--
Deet-doot-dot, I am a bot.
ht
On Fri, Jun 14, 2019 at 09:38:04AM -0400, Josef Bacik wrote:
> On Fri, Jun 14, 2019 at 12:33:43PM +0200, Wouter Verhelst wrote:
> > On Thu, Jun 13, 2019 at 10:55:36AM -0400, Josef Bacik wrote:
> > > Also I mean that there are a bunch of different nbd servers out there.
> > > We have
> > > our own
> Il giorno 14 giu 2019, alle ore 22:22, Tejun Heo ha scritto:
>
> Hello,
>
> On Thu, Jun 13, 2019 at 08:10:38AM +0200, Paolo Valente wrote:
>> BFQ does not implement weight_device, but we are not talking about
>> weight_device here. More precisely, *nothing* implements weight_device
>> any
Tejun Heo writes:
> Hello, Toke.
>
> On Fri, Jun 14, 2019 at 02:17:45PM +0200, Toke Høiland-Jørgensen wrote:
>> One question: How are equal-weight cgroups scheduled relative to each
>> other? Or requests from different processes within a single cgroup for
>> that matter? FIFO? Round-robin? Someth
On Fri, Jun 14, 2019 at 02:14:01PM -0600, Jonathan Corbet wrote:
> On Wed, 12 Jun 2019 14:52:41 -0300
> Mauro Carvalho Chehab wrote:
>
> > Convert the cgroup-v1 files to ReST format, in order to
> > allow a later addition to the admin-guide.
> >
> > The conversion is actually:
> > - add blank
Hello,
On Thu, Jun 13, 2019 at 08:10:38AM +0200, Paolo Valente wrote:
> BFQ does not implement weight_device, but we are not talking about
> weight_device here. More precisely, *nothing* implements weight_device
> any longer in cgroups-v1, so the documentation is wrong altogether.
I can't see ho
On Wed, 12 Jun 2019 14:52:41 -0300
Mauro Carvalho Chehab wrote:
> Convert the cgroup-v1 files to ReST format, in order to
> allow a later addition to the admin-guide.
>
> The conversion is actually:
> - add blank lines and identation in order to identify paragraphs;
> - fix tables markups;
>
On Thu, Jun 13, 2019 at 06:56:10PM -0700, Tejun Heo wrote:
...
> The patchset is also available in the following git branch.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-iow
Updated patchset available in the following branch. Just build fixes
and cosmetic changes for n
Hello,
Added a separate patch to add dummy css_get() and EXPORT's for the
build failures. It's in the following branch. I'll wait for a while
and send out if nothing else is broken.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git
review-btrfs-cgroup-updates-v2
Thanks.
--
tejun
On Wed, Jun 12, 2019 at 08:17:32AM +0200, Andreas Herrmann wrote:
> CFQ is gone. No need anymore to document its "proportional weight time
> based division of disk policy".
BFQ might provide a compat interface. Let's wait a bit.
Thanks.
--
tejun
Hello, Alexei.
On Fri, Jun 14, 2019 at 04:35:35PM +, Alexei Starovoitov wrote:
> the example bpf prog looks flexible enough to allow some degree
> of experiments. The question is what kind of new algorithms you envision
> it will do? what other inputs it would need to make a decision?
> I thin
(Greg)
- Extend NOPLM quirk for ST1000LM024 drives (Hans)
- Remove error path warning that can now trigger after the queue
removal/addition fixes (Ming)
Please pull!
git://git.kernel.dk/linux-block.git tags/for-linus-20190614
On 6/14/19 7:52 AM, Tejun Heo wrote:
> Hello, Quentin.
>
> On Fri, Jun 14, 2019 at 12:32:09PM +0100, Quentin Monnet wrote:
>> Please make sure to update the documentation and bash
>> completion when adding the new type to bpftool. You
>> probably want something like the diff below.
>
> Thank you
Hello, David.
On Fri, Jun 14, 2019 at 05:15:24PM +0200, David Sterba wrote:
> fs/btrfs/inode.c: In function ‘cow_file_range_async’:
> fs/btrfs/inode.c:1278:4: error: implicit declaration of function ‘css_get’;
> did you mean ‘css_put’? [-Werror=implicit-function-declaration]
> 1278 |css_get(
extent_write_locked_range() is used when we're falling back to buffered
IO from inside of compression. It allocates its own wbc and should
associate it with the inode's i_wb to make sure the IO goes down from
the correct cgroup.
Also, export the needed symbols from fs-writeback.c.
Signed-off-by:
On Thu, Jun 13, 2019 at 05:33:49PM -0700, Tejun Heo wrote:
> @@ -1251,12 +1258,29 @@ static int cow_file_range_async(struct inode *inode,
> struct page *locked_page,
>* to unlock it.
>*/
> if (locked_page) {
> + /*
> +
Hello, Toke.
On Fri, Jun 14, 2019 at 02:17:45PM +0200, Toke Høiland-Jørgensen wrote:
> One question: How are equal-weight cgroups scheduled relative to each
> other? Or requests from different processes within a single cgroup for
> that matter? FIFO? Round-robin? Something else?
Once each cgroup
Hello, Quentin.
On Fri, Jun 14, 2019 at 12:32:09PM +0100, Quentin Monnet wrote:
> Please make sure to update the documentation and bash
> completion when adding the new type to bpftool. You
> probably want something like the diff below.
Thank you so much. Will incorporate them. Just in case, wh
On Thu, Jun 13, 2019 at 05:33:50PM -0700, Tejun Heo wrote:
> From: Chris Mason
>
> extent_write_locked_range() is used when we're falling back to buffered
> IO from inside of compression. It allocates its own wbc and should
> associate it with the inode's i_wb to make sure the IO goes down from
On Thu, Jun 13, 2019 at 05:33:49PM -0700, Tejun Heo wrote:
> From: Chris Mason
>
> Async CRCs and compression submit IO through helper threads, which
> means they have IO priority inversions when cgroup IO controllers are
> in use.
>
> This flags all of the writes submitted by btrfs helper threa
On Thu, Jun 13, 2019 at 05:33:48PM -0700, Tejun Heo wrote:
> From: Chris Mason
>
> The btrfs writepages function collects a large range of pages flagged
> for delayed allocation, and then sends them down through the COW code
> for processing. When compression is on, we allocate one async_cow
> s
On Thu, Jun 13, 2019 at 05:33:47PM -0700, Tejun Heo wrote:
> From: Chris Mason
>
> Now that we're not using btrfs_schedule_bio() anymore, delete all the
> code that supported it.
>
> Signed-off-by: Chris Mason
Reviewed-by: Josef Bacik
Thanks,
Josef
On Thu, Jun 13, 2019 at 05:33:46PM -0700, Tejun Heo wrote:
> From: Chris Mason
>
> btrfs_schedule_bio() hands IO off to a helper thread to do the actual
> submit_bio() call. This has been used to make sure async crc and
> compression helpers don't get stuck on IO submission. To maintain good
>
On Thu, Jun 13, 2019 at 05:33:45PM -0700, Tejun Heo wrote:
> When a shared kthread needs to issue a bio for a cgroup, doing so
> synchronously can lead to priority inversions as the kthread can be
> trapped waiting for that cgroup. This patch implements
> REQ_CGROUP_PUNT flag which makes submit_bi
On Thu, Jun 13, 2019 at 05:33:44PM -0700, Tejun Heo wrote:
> Add a helper to determine the target blkcg from wbc.
>
> Signed-off-by: Tejun Heo
Reviewed-by: Josef Bacik
Thanks,
Josef
On Thu, Jun 13, 2019 at 05:33:43PM -0700, Tejun Heo wrote:
> When writeback IOs are bounced through async layers, the IOs should
> only be accounted against the wbc from the original bdi writeback to
> avoid confusing cgroup inode ownership arbitration. Add
> wbc->no_wbc_acct to allow disabling wb
On Fri, Jun 14, 2019 at 02:44:11PM +0300, Pavel Begunkov (Silence) wrote:
> From: Pavel Begunkov
>
> There are implicit assumptions about struct blk_rq_stats, which make
> it's very easy to misuse. The first patch fixes consequences, and the
> second employs type-system to prevent recurrences.
>
On Fri, Jun 14, 2019 at 12:33:43PM +0200, Wouter Verhelst wrote:
> On Thu, Jun 13, 2019 at 10:55:36AM -0400, Josef Bacik wrote:
> > Also I mean that there are a bunch of different nbd servers out there. We
> > have
> > our own here at Facebook, qemu has one, IIRC there's a ceph one.
>
> I can't
When cache set starts, bch_btree_check() will check all bkeys on cache
device by calculating the checksum. This operation will consume a huge
number of system memory if there are a lot of data cached. Since bcache
uses its own mca cache to maintain all its read-in btree nodes, and only
releases the
If a bcache device is in dirty state and its cache set is not
registered, this bcache device will not appear in /dev/bcache,
and there is no way to stop it or remove the bcache kernel module.
This is an as-designed behavior, but sometimes people has to reboot
whole system to release or stop the pe
When enable lockdep and reboot system with a writeback mode bcache
device, the following potential deadlock warning is reported by lockdep
engine.
[ 101.536569][ T401] kworker/2:2/401 is trying to acquire lock:
[ 101.538575][ T401] bbf6e6c7
((wq_completion)bcache_writeback_wq){+.+.},
This patch adds more error message in bch_cached_dev_run() to indicate
the exact reason why an error value is returned. Please notice when
printing out the "is running already" message, pr_info() is used here,
because in this case also -EBUSY is returned, the bcache device can
continue to attach to
It is quite frequently to observe deadlock in bcache_reboot() happens
and hang the system reboot process. The reason is, in bcache_reboot()
when calling bch_cache_set_stop() and bcache_device_stop() the mutex
bch_register_lock is held. But in the process to stop cache set and
bcache device, bch_reg
When enable lockdep engine, a lockdep warning can be observed when
reboot or shutdown system,
[ 3142.764557][T1] bcache: bcache_reboot() Stopping all devices:
[ 3142.776265][ T2649]
[ 3142.777159][ T2649] ==
[ 3142.780039][ T2649] WARNING: po
Now there is variable bcache_is_reboot to prevent device register or
unregister during reboot, it is unncessary to still hold mutex lock
bch_regsiter_lock before stopping writeback_rate_update kworker and
writeback kthread. And if the stopping kworker or kthread holding
bch_register_lock inside the
The purpose of following code in bset_search_tree() is to avoid a branch
instruction,
994 if (likely(f->exponent != 127))
995 n = j * 2 + (((unsigned int)
996 (f->mantissa -
997bfloat_mantissa(search, f))) >>
When everything is OK in bch_journal_read(), finally the return value
is returned by,
return ret;
which assumes ret will be 0 here. This assumption is wrong when all
journal buckets as are full and filled with valid journal entries. In
such cache the last location referencess read_bucket()
When md raid device (e.g. raid456) is used as backing device, read-ahead
requests on a degrading and recovering md raid device might be failured
immediately by md raid code, but indeed this md raid array can still be
read or write for normal I/O requests. Therefore such failed read-ahead
request ar
This reverts commit 6147305c73e4511ca1a975b766b97a779d442567.
Although this patch helps the failed bcache device to stop faster when
too many I/O errors detected on corresponding cached device, setting
CACHE_SET_IO_DISABLE bit to cache set c->flags was not a good idea. This
operation will disable
In function bset_search_tree(), when p >= t->size, t->tree[0] will be
prefetched by the following code piece,
974 unsigned int p = n << 4;
975
976 p &= ((int) (p - t->size)) >> 31;
977
978 prefetch(&t->tree[p]);
The purpose of the above code is
This patch adds more error message for attaching cached device, this is
helpful to debug code failure during bache device start up.
Signed-off-by: Coly Li
---
drivers/md/bcache/super.c | 4
1 file changed, 4 insertions(+)
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
i
When backing device super block is written by bch_write_bdev_super(),
the bio complete callback write_bdev_super_endio() simply ignores I/O
status. Indeed such write request also contribute to backing device
health status if the request failed.
This patch checkes bio->bi_status in write_bdev_super
If CACHE_SET_IO_DISABLE of a cache set flag is set by too many I/O
errors, currently allocator routines can still continue allocate
space which may introduce inconsistent metadata state.
This patch checkes CACHE_SET_IO_DISABLE bit in following allocator
routines,
- bch_bucket_alloc()
- __bch_bucke
This patch adds more accurate error message for specific
ssyfs_create_link() call, to help debugging failure during
bcache device start tup.
Signed-off-by: Coly Li
---
drivers/md/bcache/super.c | 11 ---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/md/bcache/supe
In struct cache_set, retry_flush_write is added for commit c4dc2497d50d
("bcache: fix high CPU occupancy during journal") which is reverted in
previous patch.
Now it is useless anymore, and this patch removes it from bcache code.
Signed-off-by: Coly Li
---
drivers/md/bcache/bcache.h | 1 -
driv
When cache_set_flush() is called for too many I/O errors detected on
cache device and the cache set is retiring, inside the function it
doesn't make sense to flushing cached btree nodes from c->btree_cache
because CACHE_SET_IO_DISABLE is set on c->flags already and all I/Os
onto cache device will b
In previous bcache patches for Linux v5.2, the failure code path of
run_cache_set() is tested and fixed. So now the following comment
line can be removed from run_cache_set(),
/* XXX: test this, it's broken */
Signed-off-by: Coly Li
---
drivers/md/bcache/super.c | 2 +-
1 file changed, 1
This patch adds return value check to bch_cached_dev_run(), now if there
is error happens inside bch_cached_dev_run(), it can be catched.
Signed-off-by: Coly Li
---
drivers/md/bcache/bcache.h | 2 +-
drivers/md/bcache/super.c | 33 ++---
drivers/md/bcache/sysfs.c |
Function bch_btree_keys_init() initializes b->set[].size and
b->set[].data to zero. As the code comments indicates, these code indeed
is unncessary, because both struct btree_keys and struct bset_tree are
nested embedded into struct btree, when struct btree is filled with 0
bits by kzalloc() in mca
Now we have counters for how many times jouranl is reclaimed, how many
times cached dirty btree nodes are flushed, but we don't know how many
jouranl buckets are really reclaimed.
This patch adds reclaimed_journal_buckets into struct cache_set, this
is an increasing only counter, to tell how many
From: Alexandru Ardelean
The arrays (of strings) that are passed to __sysfs_match_string() are
static, so use sysfs_match_string() which does an implicit ARRAY_SIZE()
over these arrays.
Functionally, this doesn't change anything.
The change is more cosmetic.
It only shrinks the static arrays by
When too many I/O errors happen on cache set and CACHE_SET_IO_DISABLE
bit is set, bch_journal() may continue to work because the journaling
bkey might be still in write set yet. The caller of bch_journal() may
believe the journal still work but the truth is in-memory journal write
set won't be writ
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it
This reverts commit 6268dc2c4703aabfb0b35681be709acf4c2826c6.
This patch depends on commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal") which is reverted in previous patch. So
revert this one too.
Fixes: 6268dc2c4703 ("bcache: free heap cache_set->flush_btree in
bch_journal_fre
In journal_read_bucket() when setting ja->seq[bucket_index], there might
be potential case that a later non-maximum overwrites a better sequence
number to ja->seq[bucket_index]. This patch adds a check to make sure
that ja->seq[bucket_index] will be only set a new value if it is bigger
then current
This reverts commit c4dc2497d50d9c6fb16aa0d07b6a14f3b2adb1e0.
This patch enlarges a race between normal btree flush code path and
flush_btree_write(), which causes deadlock when journal space is
exhausted. Reverts this patch makes the race window from 128 btree
nodes to only 1 btree nodes.
Fixes:
This patch adds more code comments in journal_read_bucket(), this is an
effort to make the code to be more understandable.
Signed-off-by: Coly Li
---
drivers/md/bcache/journal.c | 24
1 file changed, 24 insertions(+)
diff --git a/drivers/md/bcache/journal.c b/drivers/md
Hi folks,
Now I am testing the bcache patches for Linux v5.3, here I collect
all previously posted patches for your information. Any code review
and comment is welcome.
Thanks in advance.
Coly Li
---
Alexandru Ardelean (1):
bcache: use sysfs_match_string() instead of __sysfs_match_string()
C
Tejun Heo writes:
> This patchset implements IO cost model based work-conserving
> proportional controller.
>
> While io.latency provides the capability to comprehensively prioritize
> and protect IOs depending on the cgroups, its protection is binary -
> the lowest latency target cgroup which is
From: Pavel Begunkov
Split struct blk_rq_stat into 2 structs, so each would explicitely
represent one of the mentioned states. That duplicates code, but
1. prevents misuses (compile-time check by type-system)
2. reduces memory needed (inc. per-cpu)
3. makes it easier to extend stats
Signed-off-b
From: Pavel Begunkov
struct blk_rq_stat has two implicit states in which it can be:
(1) per-cpu intermediate stats (i.e. staging, intermediate)
(2) final stats / aggregation of (1) (see blk_rq_stat_collect)
The states use different sets of fields. E.g. (1) uses @batch but not
@mean, and vise ver
From: Pavel Begunkov
There are implicit assumptions about struct blk_rq_stats, which make
it's very easy to misuse. The first patch fixes consequences, and the
second employs type-system to prevent recurrences.
Pavel Begunkov (2):
blk-iolatency: Fix zero mean in previous stats
blk-stats: In
From: Pavel Begunkov
struct blk_rq_stat::mean is a u64 value, so use %llu
Signed-off-by: Pavel Begunkov
---
block/blk-mq-debugfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 6aea0ebc3a73..f69da381cb1e 100644
--- a/b
2019-06-13 18:56 UTC-0700 ~ Tejun Heo
> Currently, blkcg implements one builtin IO cost model - lienar. To
> allow customization and experimentation, allow a bpf program to
> override IO cost model.
>
> Signed-off-by: Tejun Heo
> ---
[...]
> diff --git a/tools/bpf/bpftool/feature.c b/tools/bp
On Thu, Jun 13, 2019 at 10:55:36AM -0400, Josef Bacik wrote:
> Also I mean that there are a bunch of different nbd servers out there. We
> have
> our own here at Facebook, qemu has one, IIRC there's a ceph one.
I can't claim to know about the Facebook one of course, but the qemu one
uses the sam
65 matches
Mail list logo