On Wed, Mar 16, 2016 at 10:18:19PM -0700, Gregory Farnum wrote:
> would've been nice if they were upstream. What *is* a big deal for
> FileStore (and would be easy to take advantage of) is the thematically
> similar O_NOMTIME flag, which is also about reducing metadata updates
> and got blocked on
> "Jeff" == Jeff Moyer writes:
Jeff> TRIM/UNMAP isn't just supported on solid state devices, though. I
Jeff> do recall some enterprise thinly provisioned storage that would
Jeff> take ages to discard large regions. I think that caused us to
Jeff> change the defaults for mkfs, right?
I thin
On Wed, Mar 16, 2016 at 5:33 PM, Eric Sandeen wrote:
> I may have lost the thread at this point, with poor Darrick's original
> patch submission devolving into a long thread about a NO_HIDE_STALE patch
> used at Google, but I don't *think* Ceph ever asked for NO_HIDE_STALE.
>
> At least I can't fi
On Thu, Mar 17, 2016 at 02:00:18PM -0700, Chris Mason wrote:
>
> Thinking more, my guess is that google will just keep doing what they
> are already doing ;) But there could be a flag in sysfs dedicated to
> trim-for-fallocate so admins can see what their devices are reporting.
> readonly in main
On Wed, Mar 16 2016, Theodore Ts'o wrote:
> On Wed, Mar 16, 2016 at 09:33:13AM +1100, Dave Chinner wrote:
>>
>> Stale data escaping containment is a security issue. Enabling
>> generic kernel mechanisms to *enable containment escape* is
>> fundamentally wrong, and relying on userspace to Do The R
On Thu, Mar 17, 2016 at 02:49:06PM -0600, Andreas Dilger wrote:
> On Mar 17, 2016, at 12:35 PM, Chris Mason wrote:
> >
> > On Thu, Mar 17, 2016 at 10:47:29AM -0700, Linus Torvalds wrote:
> >> On Wed, Mar 16, 2016 at 10:18 PM, Gregory Farnum wrote:
> >>>
> >>> So we've not asked for NO_HIDE_STAL
On Wed, Mar 16, 2016 at 10:18 PM, Gregory Farnum wrote:
>
> So we've not asked for NO_HIDE_STALE on the mailing lists, but I think
> it was one of the problems Sage had using xfs in his BlueStore
> implementation and was a big part of why it moved to pure userspace.
> FileStore might use NO_HIDE_S
"Theodore Ts'o" writes:
> I do think that using TRIM in various causes where we are doing an
> fallocate does make sense for non-rotational devices. In general TRIM
> should be fast enough that that I'd be surprised that people would be
> complaining --- especially since most of the time, falloc
On Wed, Mar 16, 2016 at 03:45:49PM -0600, Andreas Dilger wrote:
> > Clearly, the performance hit of unwritten extent conversion is large
> > enough to tempt people to ask for no-hide-stale. But I'd rather hear
> > that directly from a developer, Ceph or otherwise.
>
> I suspect that this gets sig
On Thu, Mar 17, 2016 at 10:50 AM, Ric Wheeler wrote:
>>
>> That argues against worrying about this all in the kernel unless there
>> are other users.
>
> Just a note, when Greg says "user space solution", Ceph is looking at
> writing directly to raw block devices which is kind of a through back to
On Tue, Mar 15, 2016 at 05:51:17PM -0700, Chris Mason wrote:
> On Tue, Mar 15, 2016 at 07:30:14PM -0500, Eric Sandeen wrote:
> > On 3/15/16 7:06 PM, Linus Torvalds wrote:
> > > On Tue, Mar 15, 2016 at 4:52 PM, Dave Chinner wrote:
> > >> >
> > >> > It is pretty clear that the onus is on the patch s
On Thu, Mar 17, 2016 at 10:47:29AM -0700, Linus Torvalds wrote:
> On Wed, Mar 16, 2016 at 10:18 PM, Gregory Farnum wrote:
> >
> > So we've not asked for NO_HIDE_STALE on the mailing lists, but I think
> > it was one of the problems Sage had using xfs in his BlueStore
> > implementation and was a b
On Tue, Mar 15, 2016 at 06:51:39PM -0700, Darrick J. Wong wrote:
> On Tue, Mar 15, 2016 at 06:52:24PM -0400, Theodore Ts'o wrote:
> > On Wed, Mar 16, 2016 at 09:33:13AM +1100, Dave Chinner wrote:
> > >
> > > Stale data escaping containment is a security issue. Enabling
> > > generic kernel mechani
On Mar 15, 2016, at 7:51 PM, Darrick J. Wong wrote:
>
> On Tue, Mar 15, 2016 at 06:52:24PM -0400, Theodore Ts'o wrote:
>> On Wed, Mar 16, 2016 at 09:33:13AM +1100, Dave Chinner wrote:
>>>
>>> Stale data escaping containment is a security issue. Enabling
>>> generic kernel mechanisms to *enable c
On Thu, Mar 17, 2016 at 11:52 PM, Gregory Farnum wrote:
>
> I wasn't really involved in this stuff but I gather from looking at
> http://www.spinics.net/lists/xfs/msg36869.html that any durability
> command other than fdatasync is going to write out the mtime updates
> to the inodes on disk. Given
On Thu, Mar 17, 2016 at 12:01:16PM +1100, Dave Chinner wrote:
> On Tue, Mar 15, 2016 at 06:51:39PM -0700, Darrick J. Wong wrote:
> > On Tue, Mar 15, 2016 at 06:52:24PM -0400, Theodore Ts'o wrote:
> > > On Wed, Mar 16, 2016 at 09:33:13AM +1100, Dave Chinner wrote:
> > > >
> > > > Stale data escapin
On Mar 17, 2016, at 12:35 PM, Chris Mason wrote:
>
> On Thu, Mar 17, 2016 at 10:47:29AM -0700, Linus Torvalds wrote:
>> On Wed, Mar 16, 2016 at 10:18 PM, Gregory Farnum wrote:
>>>
>>> So we've not asked for NO_HIDE_STALE on the mailing lists, but I think
>>> it was one of the problems Sage had
On Wed, Mar 16, 2016 at 07:33:55PM -0500, Eric Sandeen wrote:
> I may have lost the thread at this point, with poor Darrick's original
> patch submission devolving into a long thread about a NO_HIDE_STALE patch
> used at Google, but I don't *think* Ceph ever asked for NO_HIDE_STALE.
>
> At least I
On 03/16/2016 06:23 PM, Chris Mason wrote:
On Tue, Mar 15, 2016 at 05:51:17PM -0700, Chris Mason wrote:
On Tue, Mar 15, 2016 at 07:30:14PM -0500, Eric Sandeen wrote:
On 3/15/16 7:06 PM, Linus Torvalds wrote:
On Tue, Mar 15, 2016 at 4:52 PM, Dave Chinner wrote:
It is pretty clear that the onu
On 3/16/16 7:15 PM, Theodore Ts'o wrote:
> On Wed, Mar 16, 2016 at 03:45:49PM -0600, Andreas Dilger wrote:
>>> Clearly, the performance hit of unwritten extent conversion is large
>>> enough to tempt people to ask for no-hide-stale. But I'd rather hear
>>> that directly from a developer, Ceph or o
On 03/17/2016 01:47 PM, Linus Torvalds wrote:
On Wed, Mar 16, 2016 at 10:18 PM, Gregory Farnum wrote:
So we've not asked for NO_HIDE_STALE on the mailing lists, but I think
it was one of the problems Sage had using xfs in his BlueStore
implementation and was a big part of why it moved to pure u
On Thu, Mar 17, 2016 at 10:47 AM, Linus Torvalds
wrote:
> On Wed, Mar 16, 2016 at 10:18 PM, Gregory Farnum wrote:
>>
>> So we've not asked for NO_HIDE_STALE on the mailing lists, but I think
>> it was one of the problems Sage had using xfs in his BlueStore
>> implementation and was a big part of
On Tue, Mar 15, 2016 at 06:52:24PM -0400, Theodore Ts'o wrote:
> On Wed, Mar 16, 2016 at 09:33:13AM +1100, Dave Chinner wrote:
> >
> > Stale data escaping containment is a security issue. Enabling
> > generic kernel mechanisms to *enable containment escape* is
> > fundamentally wrong, and relying
On Tue, Mar 15, 2016 at 07:30:14PM -0500, Eric Sandeen wrote:
> On 3/15/16 7:06 PM, Linus Torvalds wrote:
> > On Tue, Mar 15, 2016 at 4:52 PM, Dave Chinner wrote:
> >> >
> >> > It is pretty clear that the onus is on the patch submitter to
> >> > provide justification for inclusion, not for the rev
On 3/15/16 7:06 PM, Linus Torvalds wrote:
> On Tue, Mar 15, 2016 at 4:52 PM, Dave Chinner wrote:
>> >
>> > It is pretty clear that the onus is on the patch submitter to
>> > provide justification for inclusion, not for the reviewer/Maintainer
>> > to have to prove that the solution is unworkable.
On Tue, Mar 15, 2016 at 04:14:32PM -0700, Linus Torvalds wrote:
> On Tue, Mar 15, 2016 at 4:06 PM, Linus Torvalds
> wrote:
> >
> > And yes, "keep the patch entirely inside google" is obviously one good
> > way to limit the interface. But if there are really other groups that
> > want to explore th
On Tue, Mar 15, 2016 at 4:52 PM, Dave Chinner wrote:
>
> It is pretty clear that the onus is on the patch submitter to
> provide justification for inclusion, not for the reviewer/Maintainer
> to have to prove that the solution is unworkable.
I agree, but quite frankly, performance is a good justi
On Tue, Mar 15, 2016 at 04:06:10PM -0700, Linus Torvalds wrote:
> On Tue, Mar 15, 2016 at 3:33 PM, Dave Chinner wrote:
> >
> >> There's no "group based containment wall" that is some kind of
> >> absolute protection border.
> >
> > Precisely my point - it's being pitched as a generic containment
>
On Tue, Mar 15, 2016 at 4:06 PM, Linus Torvalds
wrote:
>
> And yes, "keep the patch entirely inside google" is obviously one good
> way to limit the interface. But if there are really other groups that
> want to explore this, then that sounds like a pretty horrible model
> too.
Side note: I reall
On Tue, Mar 15, 2016 at 3:33 PM, Dave Chinner wrote:
>
>> There's no "group based containment wall" that is some kind of
>> absolute protection border.
>
> Precisely my point - it's being pitched as a generic containment
> mechanism, but it really isn't.
No it hasn't.
It has been pitched as
"C
On Wed, Mar 16, 2016 at 09:33:13AM +1100, Dave Chinner wrote:
>
> Stale data escaping containment is a security issue. Enabling
> generic kernel mechanisms to *enable containment escape* is
> fundamentally wrong, and relying on userspace to Do The Right Thing
> is even more of a gamble, IMO.
We a
On 3/15/16 3:14 PM, Dave Chinner wrote:
> What we are missing is actual numbers that show that exposing stale
> data is a /significant/ win for these applications that are
> demanding it. And then we need evidence proving that the problem is
> actually systemic and not just a hack around a bad impl
On Tue, Mar 15, 2016 at 01:43:01PM -0700, Linus Torvalds wrote:
> On Tue, Mar 15, 2016 at 1:14 PM, Dave Chinner wrote:
> >
> > Root can still change the group id of a file that has exposed stale
> > data and hence make it visible outside of the group based
> > containment wall.
>
> Ok, Dave, now
On Tue, Mar 15, 2016 at 01:43:01PM -0700, Linus Torvalds wrote:
> Put another way: this is not about theoretical leaks - because those
> are totally irrelevant (in theory, the original discard writer had
> access to all that stale data anyway). This is about making it a
> practical interface that d
On Tue, Mar 15, 2016 at 1:14 PM, Dave Chinner wrote:
>
> Root can still change the group id of a file that has exposed stale
> data and hence make it visible outside of the group based
> containment wall.
Ok, Dave, now you're just being ridiculous.
The issue has never been - and *should* never b
On Mon, Mar 14, 2016 at 10:46:03AM -0400, Theodore Ts'o wrote:
> On Mon, Mar 14, 2016 at 06:34:00AM -0400, Ric Wheeler wrote:
> > I think that once we enter this mode, the local file system has effectively
> > ceded its role to prevent stale data exposure to the upper layer. In effect,
> > this cea
On Mon, Mar 14, 2016 at 06:34:00AM -0400, Ric Wheeler wrote:
> I think that once we enter this mode, the local file system has effectively
> ceded its role to prevent stale data exposure to the upper layer. In effect,
> this ceases to become a normal file system for any enabled process if we
> cont
On 03/13/2016 07:30 PM, Dave Chinner wrote:
On Fri, Mar 11, 2016 at 04:44:16PM -0800, Linus Torvalds wrote:
On Fri, Mar 11, 2016 at 4:35 PM, Theodore Ts'o wrote:
At the end of the day it's about whether you trust the userspace
program or not.
There's a big difference between "give the user ro
On Fri, Mar 11, 2016 at 04:44:16PM -0800, Linus Torvalds wrote:
> On Fri, Mar 11, 2016 at 4:35 PM, Theodore Ts'o wrote:
> >
> > At the end of the day it's about whether you trust the userspace
> > program or not.
>
> There's a big difference between "give the user rope", and "tie the
> rope in a
On 03/12/2016 08:19 AM, Theodore Ts'o wrote:
On Fri, Mar 11, 2016 at 04:44:16PM -0800, Linus Torvalds wrote:
There's a big difference between "give the user rope", and "tie the
rope in a noose and put a banana peel so that the user might stumble
into the rope and hang himself", though.
[...]
On Fri, Mar 11, 2016 at 04:44:16PM -0800, Linus Torvalds wrote:
> On Fri, Mar 11, 2016 at 4:35 PM, Theodore Ts'o wrote:
> >
> > At the end of the day it's about whether you trust the userspace
> > program or not.
>
> There's a big difference between "give the user rope", and "tie the
> rope in a
On Fri, Mar 11, 2016 at 4:35 PM, Theodore Ts'o wrote:
>
> At the end of the day it's about whether you trust the userspace
> program or not.
There's a big difference between "give the user rope", and "tie the
rope in a noose and put a banana peel so that the user might stumble
into the rope and h
On Sat, Mar 12, 2016 at 09:30:47AM +1100, Dave Chinner wrote:
> It's all well and good to restrict access to the fallocate() call to
> limit who can expose stale data, but it doesn't remove the fact it
> is easy for stale data to unintentionally escape the privileged
> group once it has been expose
On Fri, Mar 11, 2016 at 2:30 PM, Dave Chinner wrote:
> On Fri, Mar 11, 2016 at 10:25:30AM -0800, Linus Torvalds wrote:
>>
>> So you'd have to explicitly say "my setup is ok with hole punching".
>
> Except it's not hole punching that is the problem. [..]
> The problem here is
> preallocation of unw
On Fri, Mar 11, 2016 at 10:25:30AM -0800, Linus Torvalds wrote:
> On Fri, Mar 11, 2016 at 9:30 AM, Andy Lutomirski wrote:
> >
> > What if we had an ioctl to do these data-leaking operations that took,
> > as an extra parameter, an fd to the block device node. They allow
> > access if the fd point
On Fri, Mar 11, 2016 at 9:30 AM, Andy Lutomirski wrote:
>
> What if we had an ioctl to do these data-leaking operations that took,
> as an extra parameter, an fd to the block device node. They allow
> access if the fd points to the right inode and has FMODE_READ (and LSM
> checks say it's okay).
On Fri, Mar 11, 2016 at 9:23 AM, Linus Torvalds
wrote:
> On Fri, Mar 11, 2016 at 5:59 AM, One Thousand Gnomes
> wrote:
>>
>> > > We can do the security check at the filesystem level, because we have
>> > > sb->s_bdev->bd_inode, and if you have read and write permissions to
>> > > that inode, you
On Fri, Mar 11, 2016 at 5:59 AM, One Thousand Gnomes
wrote:
>
> > > We can do the security check at the filesystem level, because we have
> > > sb->s_bdev->bd_inode, and if you have read and write permissions to
> > > that inode, you might as well have permission to create a unsafe hole.
>
> Not i
On Fri, Mar 11, 2016 at 01:59:52PM +, One Thousand Gnomes wrote:
> > > We can do the security check at the filesystem level, because we have
> > > sb->s_bdev->bd_inode, and if you have read and write permissions to
> > > that inode, you might as well have permission to create a unsafe hole.
>
> > We can do the security check at the filesystem level, because we have
> > sb->s_bdev->bd_inode, and if you have read and write permissions to
> > that inode, you might as well have permission to create a unsafe hole.
Not if you don't have access to a block device node to open it, or there
are
On 03/11/2016 12:03 AM, Linus Torvalds wrote:
On Thu, Mar 10, 2016 at 6:58 AM, Ric Wheeler wrote:
What was objectionable at the time this patch was raised years back (not
just to me, but to pretty much every fs developer at LSF/MM that year)
centered on the concern that this would be viewed as
On Thu, Mar 10, 2016 at 10:33:49AM -0800, Linus Torvalds wrote:
> On Thu, Mar 10, 2016 at 6:58 AM, Ric Wheeler wrote:
> >
> > What was objectionable at the time this patch was raised years back (not
> > just to me, but to pretty much every fs developer at LSF/MM that year)
> > centered on the conc
On Thu, Mar 10, 2016 at 6:58 AM, Ric Wheeler wrote:
>
> What was objectionable at the time this patch was raised years back (not
> just to me, but to pretty much every fs developer at LSF/MM that year)
> centered on the concern that this would be viewed as a "performance" mode
> and we get pressur
On 03/10/2016 04:38 AM, Theodore Ts'o wrote:
On Wed, Mar 09, 2016 at 02:20:31PM -0800, Gregory Farnum wrote:
I really am sensitive to the security concerns, just know that if it's
a permanent blocker you're essentially blocking out a growing category
of disk users (who run on an awfully large nu
On Wed, Mar 09, 2016 at 02:20:31PM -0800, Gregory Farnum wrote:
> I really am sensitive to the security concerns, just know that if it's
> a permanent blocker you're essentially blocking out a growing category
> of disk users (who run on an awfully large number of disks!).
Or they just have to use
On Thu, Mar 3, 2016 at 3:10 PM, Dave Chinner wrote:
> On Thu, Mar 03, 2016 at 05:39:52PM -0500, Theodore Ts'o wrote:
>> On Thu, Mar 03, 2016 at 01:54:54PM -0500, Martin K. Petersen wrote:
>> > > "Christoph" == Christoph Hellwig writes:
>> >
>> > Christoph> - FALLOC_FL_PUNCH_HOLE assures zero
On 03/03/2016 11:56 PM, Dave Chinner wrote:
> That "new kind of write command" would enable delayed allocation
> algorithms to continue to work at the filesystem level on block
> devices that freespace management completely is offloaded to...
> Cheers, Dave.
This would advocate a uniform /interna
On Fri, Mar 04, 2016 at 10:10:50AM +1100, Dave Chinner wrote:
> You can tempt all you want, but it does not change the basic fact
> that it is dangerous and compromises system security. As such, it
> does not belong in upstream kernels. Especially in this day and age
> where ensuring the fundamenta
On Thu, Mar 03, 2016 at 05:39:52PM -0500, Theodore Ts'o wrote:
> On Thu, Mar 03, 2016 at 01:54:54PM -0500, Martin K. Petersen wrote:
> > > "Christoph" == Christoph Hellwig writes:
> >
> > Christoph> - FALLOC_FL_PUNCH_HOLE assures zeroes are returned, but
> > Christoph> space is deallocated a
On Thu, Mar 03, 2016 at 01:54:54PM -0500, Martin K. Petersen wrote:
> > "Christoph" == Christoph Hellwig writes:
>
> Christoph> - FALLOC_FL_PUNCH_HOLE assures zeroes are returned, but
> Christoph> space is deallocated as much as possible -
> Christoph> FALLOC_FL_ZERO_RANGE assures zeroes are
On Thu, Mar 03, 2016 at 01:54:54PM -0500, Martin K. Petersen wrote:
> > "Christoph" == Christoph Hellwig writes:
>
> Christoph> - FALLOC_FL_PUNCH_HOLE assures zeroes are returned, but
> Christoph> space is deallocated as much as possible -
> Christoph> FALLOC_FL_ZERO_RANGE assures zeroes are
> "Christoph" == Christoph Hellwig writes:
Christoph> - FALLOC_FL_PUNCH_HOLE assures zeroes are returned, but
Christoph> space is deallocated as much as possible -
Christoph> FALLOC_FL_ZERO_RANGE assures zeroes are returned, AND blocks
Christoph> are actually allocated
That works for me. I
On Thu, Mar 03, 2016 at 09:55:38AM -0800, Linus Torvalds wrote:
> But that essentially says that we shouldn't expose this interface at
> all (unless we trust our white-lists - I'm sure they are getting
> better, but if nobody has ever really _relied_ on the zeroing behavior
> of trim, then I guess
On Thu, Mar 3, 2016 at 10:01 AM, Martin K. Petersen
wrote:
>> "Linus" == Linus Torvalds writes:
>
> Linus> .. but the flag doesn't even set that. Even if you avoid TRIM,
> Linus> there is absolutely zero guarantees that WRITE_SAME would do
> Linus> "real storage blocks full of zeroes backing
> "Linus" == Linus Torvalds writes:
Linus> On Thu, Mar 3, 2016 at 9:02 AM, Theodore Ts'o wrote:
>>
>> There is a massive bug in the SATA specs about trim, which is that it
>> is considered advisory. So the storage device can throw it away
>> whenever it feels like it. (In practice, when i
On Thu, Mar 03, 2016 at 10:09:24AM -0800, Christoph Hellwig wrote:
> On Thu, Mar 03, 2016 at 01:01:11PM -0500, Martin K. Petersen wrote:
> > That's not entirely true. Writing the blocks may cause them to be
> > allocated on the storage device (depending on which flags we feed it in
> > WRITE SAME).
On Thu, Mar 03, 2016 at 01:01:11PM -0500, Martin K. Petersen wrote:
> That's not entirely true. Writing the blocks may cause them to be
> allocated on the storage device (depending on which flags we feed it in
> WRITE SAME).
>
> The filesystems people were wanted the following semantics:
>
> - d
> "Linus" == Linus Torvalds writes:
Linus> .. but the flag doesn't even set that. Even if you avoid TRIM,
Linus> there is absolutely zero guarantees that WRITE_SAME would do
Linus> "real storage blocks full of zeroes backing the LBAs they just
Linus> wrote out".
That's not entirely true. Wri
On Thu, Mar 03, 2016 at 09:55:38AM -0800, Linus Torvalds wrote:
> Ugh.
>
> But that essentially says that we shouldn't expose this interface at
> all (unless we trust our white-lists - I'm sure they are getting
> better, but if nobody has ever really _relied_ on the zeroing behavior
> of trim, the
On Thu, Mar 3, 2016 at 9:02 AM, Theodore Ts'o wrote:
>
> There is a massive bug in the SATA specs about trim, which is that it
> is considered advisory. So the storage device can throw it away
> whenever it feels like it. (In practice, when it's too busy doing
> other things).
Ugh.
But that es
On Wed, Mar 02, 2016 at 03:49:53PM -0800, Linus Torvalds wrote:
> > No. This is not about enabling use of "that idiotic discard behavior", for
> > that there's BLKDISCARD. This ioctl does NOT use the handwavy old TRIM
> > advisory request thing that could return "fuzzy wuzzy" without violating th
On Wed, Mar 2, 2016 at 2:56 PM, Darrick J. Wong wrote:
>
> Oh yes we do. Adding required-zero padding to allow for future increases of
> the expressiveness of an ioctl is very common.
>
> $ egrep -rn '(reserved|padding).*;' include/uapi/ | wc -l
> 564
Most of those should be for alignment reason
On Wed, Mar 02, 2016 at 10:52:01AM -0800, Linus Torvalds wrote:
> On Tue, Mar 1, 2016 at 8:09 PM, Darrick J. Wong
> wrote:
> > Create a new ioctl to expose the block layer's newfound ability to
> > issue either a zeroing discard, a WRITE SAME with a zero page, or a
> > regular write with the zero
On Tue, Mar 1, 2016 at 8:09 PM, Darrick J. Wong wrote:
> Create a new ioctl to expose the block layer's newfound ability to
> issue either a zeroing discard, a WRITE SAME with a zero page, or a
> regular write with the zero page. This BLKZEROOUT2 ioctl takes
> {start, length, flags} as parameters
Looks fine,
Reviewed-by: Christoph Hellwig
Create a new ioctl to expose the block layer's newfound ability to
issue either a zeroing discard, a WRITE SAME with a zero page, or a
regular write with the zero page. This BLKZEROOUT2 ioctl takes
{start, length, flags} as parameters. So far, the only flag available
is to enable the zeroing disc
76 matches
Mail list logo