Am 14.03.2011 15:47, schrieb Anthony Liguori: > On 03/14/2011 09:15 AM, Kevin Wolf wrote: >>> The file system can keep a lot of these things around pretty easily but >>> with your proposal, it seems like there can only be one. If you support >>> many of them, I think you'll degenerate to something as complex as a >>> reference count table. >> IIUC, he already uses a refcount table. > > Well, he needs a separate mechanism to make trim/discard work, but for > the snapshot discussion, a reference count table is avoided. > > The bitmap only covers whether the guest has accessed a block or not. > Then there is a separate table that maps guest offsets to offsets within > the file. > > I haven't thought hard about it, but my guess is that there is an > ordering constraint between these two pieces of metadata which is why > the journal is necessary. I get worried about the complexity of a > journal even more than a reference count table.
Honestly I think that a journal is a good idea that we'll want to implement in the long run. There are people who aren't really happy about the dirty flag + fsck approach, and there are people who are concerned about cluster leaks without fsck. Both problems should be solved with a journal. Compared to other questions in the discussio, I think it's only a nice-to-have addition, though. >> Actually, I think that a >> refcount table is a requirement to provide the interesting properties >> that internal snapshots have (see my other mail). > > Well the trick here AFAICT is that you're basically storing external > snapshots internally. So it's sort of like a bunch of FVD formats > embedded into a single image. CQ, can you please clarify? From your description, Anthony seems to understand something completely different than I do. Are its characteristics more like qcow2's internal snapshots (which is what I understand) or more like external snapshots (which is what Anthony seems to understand). >> Refcount tables aren't a very complex thing either. In fact, it makes a >> format much simpler to have one concept like refcount tables instead of >> adding another different mechanism for each new feature that would be >> natural with refcount tables. > > I think it's a reasonable design goal to minimize any metadata updates > in the fast path. If we can write 1 piece of metadata verses writing 2, > then it's worth exploring IMHO. > >> The only problem with them is that they are metadata that must be >> updated. However, I think we have discussed enough how to avoid the >> greatest part of that cost. > > Maybe I missed it, but in the WCE=0 mode, is it really possible to avoid > the writes for the refcount table? Protected by a dirty flag (and/or a journal), sure. I mean, wasn't that the whole point of starting the qcow3 discussion? Kevin