hmm. so you trade 265% degradation of creation for 40% improvement of unlink?
thanks, Alex
Coly Li wrote:
normal ext4 ext4 with dir inode
reservation
mount options: -o data=writeback -o
data=writeback,dir_ireserve=low
On 9/19/07, David Chinner <[EMAIL PROTECTED]> wrote:
> The problem is this: to alter the fundamental block size of the
> filesystem we also need to alter the data block size and that is
> exactly the piece that linux does not support right now. So while
> we have the capability to use large block
ACK, of course.
thanks, Alex
Mingming Cao wrote:
On Thu, 2007-06-14 at 19:29 +0400, Dmitriy Monakhov wrote:
I just cant belive my eyes then i saw this at the first time...
simple test: strace dd if=/dev/zero of=/mnt/file
Thanks for reporting it.
open("/dev/zero", O_RDONLY) = 0
Andrew Morton wrote:
I'm still not understanding. The terms you're using are a bit ambiguous.
What does "find some dirty unallocated blocks" mean? Find a page which is
dirty and which does not have a disk mapping?
Normally the above operation would be implemented via
ext4_writeback_writepage(
Andrew Morton wrote:
On Fri, 04 May 2007 10:18:12 +0400 Alex Tomas <[EMAIL PROTECTED]> wrote:
Andrew Morton wrote:
Yes, there can be issues with needing to allocate journal space within the
context of a commit. But
no-no, this isn't required. we only need to mark pages/bl
Andrew Morton wrote:
Yes, there can be issues with needing to allocate journal space within the
context of a commit. But
no-no, this isn't required. we only need to mark pages/blocks within
transaction, otherwise race is possible when we allocate blocks in transaction,
then transacton starts t
Andrew Morton wrote:
We can make great improvements here, and I've (twice) previously decribed
how: hoist the entire ordered-mode data handling out of ext3, and out of
the buffer_head layer and move it up into the VFS pagecache layer.
Basically, do ordered-data with a commit-time inode walk, cal
Theodore Ts'o wrote:
P.S. One bug which I've noted --- if there is a failure due to disk
filling up, running e2fsck on the filesystem will show that the i_blocks
fields on the inodes where there was a failure to allocate disk blocks
are left incorrect. I'm guessing this is a bug in the delayed
I think one problem with mmap/msync is that they can't maintain
i_size atomically like regular write does. so, one needs to
implement own i_size management in userspace.
thanks, Alex
> Side note: the only reason O_DIRECT exists is because database people are
> too used to it, because other OS's
> Peter Staubach (PS) writes:
PS> Just out of curosity, what keeps i_nlink from going to 0 immediately
PS> after the new test is executed?
i_mutex in vfs_link() and vfs_unlink()
thanks, Alex
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message
> Eric Sandeen (ES) writes:
ES> Al says "no" and I'm not arguing. :)
ES> Apparently this may be OK with some filesystems, and Al says he doesn't
ES> want to know about i_nlink in the vfs in any case.
well, generic_drop_inode() uses i_nlink ...
ES> But I suppose there may be other files
> Eric Sandeen (ES) writes:
ES> I tend to agree, chatting w/ Al I think he does too. :) I'll test
ES> a patch that kicks out ext3_link() with -ENOENT at the top, and resubmit
ES> that if things go well.
shouldn't VFS do that?
thanks, Alex
-
To unsubscribe from this list: send the line "u
> Eric Sandeen (ES) writes:
ES> so I think it's possible that link can sneak in there & find it after
ES> the mutex is dropped...? Is this ok? :) It's certainly -happening-
ES> anyway
yes, but it shouldn't allow to re-link such inode back, IMHO.
a filesystem may start some non-revert
ck i_nlink.
thanks, Alex
>>>>> Alex Tomas (AT) writes:
AT> interesting ..
AT> I thought VFS doesn't allow concurrent operations.
AT> if unlink goes first, then link should wait on the
AT> parent's i_mutex and then found no source name.
AT> thanks, Alex
interesting ..
I thought VFS doesn't allow concurrent operations.
if unlink goes first, then link should wait on the
parent's i_mutex and then found no source name.
thanks, Alex
> Eric Sandeen (ES) writes:
ES> )
ES> I've been looking at a case where many threads are opening, unlinking, a
> David Chinner (DC) writes:
DC> So that mean's we'll have 2 separate mechanisms for marking
DC> pages as delalloc. XFS uses the BH_delay flag to indicate
DC> that a buffer (block) attached to the page is using delalloc.
>>
>> well, for blocksize=pagesize we can save 56 bytes on every pa
Hi,
> Andrew Morton (AM) writes:
AM> Should be cacheline_aligned_in_smp.
AM> That's assuming it needs to be cacheline aligned at all. It can consume a
AM> lot of space.
the idea is to make block reservation cheap because it's called
for every page.
AM>
AM> oh, this should be
> Christoph Hellwig (CH) writes:
CH> Note that recording delayed alloc state at a page granularity in addition
CH> to just the buffer heads has a lot of advantages aswell and would help
CH> xfs, too. But I think it makes a lot more sense to record it as a radix
CH> tree tag to speed up th
Good day,
> David Chinner (DC) writes:
DC> So that mean's we'll have 2 separate mechanisms for marking
DC> pages as delalloc. XFS uses the BH_delay flag to indicate
DC> that a buffer (block) attached to the page is using delalloc.
well, for blocksize=pagesize we can save 56 bytes on ever
Index: linux-2.6.20-rc1/include/linux/ext4_fs.h
===
--- linux-2.6.20-rc1.orig/include/linux/ext4_fs.h 2006-12-14
04:14:23.0 +0300
+++ linux-2.6.20-rc1/include/linux/ext4_fs.h2006-12-22 20:21:12.0
+0300
@@
+0300
+++ linux-2.6.20-rc1/fs/ext4/writeback.c2006-12-22 22:59:33.00000
+0300
@@ -0,0 +1,1167 @@
+/*
+ * Copyright (c) 2003-2006, Cluster File Systems, Inc, [EMAIL PROTECTED]
+ * Written by Alex Tomas <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribut
Index: linux-2.6.20-rc1/include/linux/page-flags.h
===
--- linux-2.6.20-rc1.orig/include/linux/page-flags.h2006-12-14
04:14:23.0 +0300
+++ linux-2.6.20-rc1/include/linux/page-flags.h 2006-12-22 20:05:31.0
+0300
Good day,
probably the previous set of patches (including mballoc/lg)
is too large. so, I reworked delayed allocation a bit so
that it can be used on top of regular balloc, though it
still can be used with extents-enabled files only.
this time series contains just 3 patches:
- booked-page-flag
> Andrew Morton (AM) writes:
AM> What lock protects the fields in struct ext[234]_reserve_window from being
AM> concurrently modified by two CPUs? None, it seems. Ditto
AM> ext[234]_reserve_window_node. i_mutex will cover it for write(), but not
AM> for pageout over a file hole. If we
> Jan Blunck (JB) writes:
JB> Nope, d_alloc() is setting d_flags to DCACHE_UNHASHED. Therefore it is not
found
JB> by __d_lookup() until it is rehashed which is implicit done by ->lookup().
that means we can have two processes allocated dentry for
same name. they'll call ->lookup() each ag
> Jan Blunck (JB) writes:
JB> i_sem does NOT protect the dcache. Also not in real_lookup(). The lock
must be
JB> acquired for ->lookup() and because we might sleep on i_sem, we have to
get it
JB> early and check for repopulation of the dcache.
dentry is part of dcache, right? i_sem prote
> Jan Blunck (JB) writes:
>> 1) i_sem protects dcache too
JB> Where? i_sem is the per-inode lock, and shouldn't be used else.
read comments in fs/namei.c:read_lookup()
>> 2) tmpfs has no "own" data, so we can use it this way (see 2nd patch)
>> 3) I have pdirops patch for ext3, but it ne
> Jan Blunck (JB) writes:
JB> With luck you have s_pdirops_size (or 1024) different renames altering
JB> concurrently one directory inode. Therefore you need a lock protecting
JB> your filesystem data. This is basically the job done by i_sem. So in
JB> my opinion you only move "The Problem
Index: linux-2.6.10/mm/shmem.c
===
--- linux-2.6.10.orig/mm/shmem.c2005-01-28 19:32:16.0 +0300
+++ linux-2.6.10/mm/shmem.c 2005-02-19 20:05:32.642599576 +0300
@@ -1849,7 +1849,7 @@
#endif
};
-static int shmem_
fs/inode.c |1
fs/namei.c | 66 ++---
include/linux/fs.h | 11
3 files changed, 54 insertions(+), 24 deletions(-)
Index: linux-2.6.10/fs/namei.c
===
Good day Al and all
could you review couple patches that implement $subj
for vfs and tmpfs. In short the idea is that we can
protect operations taking semaphore related for set
of names. definitely, protection at vfs layer isn't
enough and filesystem will need to protect their own
structures by i
> Sonny Rao (SR) writes:
SR> Alex, small buglet, If the FIBMAP-ioctl get's called on a file with
SR> delayed allocation, you need to flush it (or at least allocate) before
SR> returning the mappings. This doesn't seem to work properly at
SR> present.
good catch. thanks.
-
To unsubsc
Good day all,
I've updated the patchset against 2.6.10. A bunch of bugs have been
fixed and mballoc now behaves smarter a bit. Extents and mballoc
patches collects some stats they print upon umount. NOTE: they must
not be used to store important data. A lot of things are to be done.
Please rev
>>>>> Stephen C Tweedie (SCT) writes:
SCT> Hi,
SCT> On Tue, 2005-01-25 at 19:30, Alex Tomas wrote:
>> >> journal_dirty_metadata(handle, bh)
>> >> {
>> >> transaction->t_reserved--;
>> >> handle->h_buffer_
> Stephen C Tweedie (SCT) writes:
>> journal_dirty_metadata(handle, bh)
>> {
>> transaction->t_reserved--;
>> handle->h_buffer_credits--;
>> if (jh->b_tcount > 0) {
>> /* modifed, no need to track it any more */
>> transaction-> t_outstanding_credits++;
>>
Hi, could you review the following solution?
t_outstanding_credits - number of _modified_ blocks in the transaction
t_reserved - number of blocks all running handle reserved
transaction size = t_outstanding_credits + t_reserved;
#define TSIZE(t)((t)->t_outstanding_credits + (t)
> Stephen C Tweedie (SCT) writes:
>> + /* return credit back to the handle if it was really spent */
>> + if (credits) {
>> + handle->h_buffer_credits++;
>> + spin_lock(&handle->h_transaction->t_handle_lock);
>> + handle->h_transaction->t_outstandi
mmit is expensive
and correct reservation allows us to avoid needless commits. here
is the patch. tested on UP.
Signed-off-by: Alex Tomas <[EMAIL PROTECTED]>
Index: linux-2.6.7/fs/jbd/transaction.c
===
--- linux-2.6.7.orig/fs/jbd/t
> Stephen C Tweedie (SCT) writes:
>> + /* return credit back to the handle if it was really spent */
>> + if (credits)
>> + handle->h_buffer_credits++;
>> + jh->b_tcount--;
>> + if (jh->b_tcount == 0) {
>> + /*
>> +* this was last reference to
> Stephen C Tweedie (SCT) writes:
SCT> /*
SCT>* Be pessimistic here about the number of those free blocks which
SCT>* might be required for log descriptor control blocks.
SCT>*/
SCT> ...
SCT> left -= (left >> 3);
oops. i overlooked this line. so, the fix becomes minor
> Stephen C Tweedie (SCT) writes:
SCT> I don't see how that "limit" is relevant here. wbuf is nothing but the
SCT> size of the IO batches we pass to ll_rw_block() during that commit
SCT> phase. j_free affects the total size of space the *entire* commit has
SCT> to run into, and (as akpm h
ion. for example, removal of 500MB file reserves
136 blocks, but only 10 blocks go to the log. a commit is expensive
and correct reservation allows us to avoid needless commits. here
is the patch. tested on UP.
thanks, Alex
Signed-off-by: Alex Tomas <[EMAIL PROTECTED]>
Index: linu
xattr very much.
Signed-off-by: Alex Tomas <[EMAIL PROTECTED]>
Index: linux-2.6.7/include/linux/journal-head.h
===
--- linux-2.6.7.orig/include/linux/journal-head.h 2003-06-24
18:05:26.0 +0400
+++ linux-2.6.7/include
nerates too many descriptor blocks
because static array wbuf can hold 64 blocks only. The fix is to have
persistent array big enough to hold max. possible blocks.
Signed-off-by: Alex Tomas <[EMAIL PROTECTED]>
Index: linux-2.6.7/include/linux/jbd.h
44 matches
Mail list logo