On Thu, Sep 08, 2011 at 08:11:00AM -0400, Stefan Berger wrote: > On 09/08/2011 06:32 AM, Michael S. Tsirkin wrote: > >On Wed, Sep 07, 2011 at 08:16:27PM -0400, Stefan Berger wrote: > >>On 09/07/2011 02:55 PM, Michael S. Tsirkin wrote: > >>>On Thu, Sep 01, 2011 at 10:23:51PM -0400, Stefan Berger wrote: > >>>>>>An additional 'layer' for reading and writing the blobs to the > >>>>>>underlying > >>>>>>block storage is added. This layer encrypts the blobs for writing if a > >>>>>>key is > >>>>>>available. Similarly it decrypts the blobs after reading. > >>>So a couple of further thoughts: > >>>1. Raw storage should work too, and with e.g. NFS migration will be fine, > >>>right? > >>> So I'd say it's worth supporting. > >>NFS via shared storage, yes, but not migration via Qemu's block > >>migration mechanism. If snapshotting was supposed to be a feature to > >>support then that's only possible via block storage (QCoW2 in > >>particular). > >As disk has the same limitation, that sounds fine. > >Let the user decide whether snapshoting is needed, > >same as disk. > > > >>Adding plain file support to the TPM code so it can store its 3 > >>blobs into adds quite a bit of complexity to the code. The command > >>line parameter that previously pointed to QCoW2 image file would > >>probably have to point to a directory where files for the 3 blobs > >>can be written into. Besides that, snapshotting would actually have > >>to be prevented maybe through registering a (fake) file of other > >>than QCoW2 type since the plain TPM files won't handle snapshotting > >>correctly, either, and QEMU pretty much would have to be prevented > >>from doing snapshotting at all. Maybe there's an API for this, but I > >>don't know. Though why create this additional complexity? I don't > >>mind relaxing the requirement of using a QCoW2 image and allowing > >>for example RAW images (that then automatically prevent the > >>snapshotting from happening) but the same code I now have would work > >>for writing the blobs into it the single file. > >Right. Write all blobs into a single files at different > >offsets, or something. > > That's exactly what I am doing already. Just that I am doing this > with Qemu's BlockStorage (bdrv) writing to sectors rather than > seek()ing in files. To avoid more complexity I'd rather not > introduce more code handling plain files but rely on all the image > formats that qemu already supports and that give features like > encryption (QCoW2 only), snapshotting (QCoW2 only) and block > migration (presumably all of them). Plain files offer none of that. > Devices that need to write their state to persistent storage really > have to aim for doing this through Qemu's bdrv since they will > otherwise be the ones killing the snapshot feature. TPM certainly > doesn't want to be one of them. If the user doesn't want > snapshotting to be supported since his VM image files are not QCoW2 > type of files, just create a raw image file for the TPM's persistent > state and bdrv will automatically prevent snapshotting. The point is > that the TPM code now using the bdrv layer works with any image > format already.
Ah, that's fine then. I had an impression there was a qcow only limitation, not sure what in code gave me that impression. > >>>2. File backed nvram is interesting outside tpm. > >>> For example,vpd and chassis number for pci, eeprom emulation for > >>> network cards. > >>> Using a file per device might be inconvenient though. > >>> So please think of a format and API that will allow sections > >>> for use by different devices. > >>Also here 'snapshotting' is the most 'demanding' feature of QEMU I > >>would say. Snapshotting isn't easily supported outside of the block > >>layer from what I understand. Once you are tied to the block layer > >>you end up having to use images and those don't grow quite well. So > >>other devices wanting to use those type of devices would need to > >>know what the worst case sizes are for writing their state into -- > >>unless an image format is created that can grow. > >> > >>As for the format: Ideally all devices could write into one file, > >>right? That would at least prevent too many files besides the VM's > >>image file from floating around which presumably makes image > >>management easier. Following the above, you add up all the worst > >>case sizes the individual devices may need for their blobs and > >>create an image with that capacity. Then you need some form of a > >>(primitive?) directory that lets you write blobs into that storage. > >>Assuming there were well defined names for those devices one could > >>say for example store this blobs under the name > >>'tpm-permanent-state' and later on load it under that name. The > >>possible size of the directory would have to be considered as > >>well... I do something like that for the TPM where I have up to 3 > >>such blobs that I store. > >> > >>The bad thing about the above is of course the need to know what the > >>sum of all the worst case sizes is. > >A typical usecase I know about has prepared vpd/eeprom content. > >We'll typically need a tool to get binary blobs and put that into the > >file image. That tool can do the necessary math. > >We could also integrate this into qemu-img if we like. > > > >>So a growable image format would > >>be quite good to have. I haven't followed the conversations much, > >>but is that something QCoW3 would support? > >I don't follow - does TPM need a growable image format? Why? > >Hardware typically has fixed amount of memory :) > Ideally the user wouldn't have to worry about creating the single > file for persistent storage for all the devices at all but Qemu > could 'somehow' do this. > Assume the user starts the VM with a device having an EEPROM. Now > that device has the need for 10k of persistent storage. So somehow > with the limitations of images that don't grow you have to have > created an image of at least 10k a priori. Later the user adds > another device to the same VM that needs 40k of persistent storage. > What now? Dispose the old image with the EPPROM data and create a > new image with at least 50k to hold both their data? Or add another > image with just 40k to hold that device's persistent state? I'd > rather have the 10k image grow to 50k and accommodate both state > blobs... I see, yes, might be useful. But even without that, simple users - without hotplug - will be able to have a single file with all data, and advanced users will be able to have a file per device. Not ideal but I think manageable. > >>>3. Home-grown file formats give us enough trouble in migration. > >>> Could this use one of the variants of ASN.1? > >>> There are portable libraries to read/write that, even. > >>> > >>I am not sure what 'this' refers to. What I am doing with the TPM is > >>writing 3 independent blobs at certain offset into the QCoW2 block > >>file. A directory in the first sector holds the offsets, sizes and > >>crc32's of these (unencrypted) blobs. By the way, why do we checksum data? Should be optional? > >Right. It's the encoding of the directory that is custom, > >and that bothers me. I'd prefer a format that is self-describing and > >self-delimiting, give a way to inspect the data using external tools. > Nothing would prevent us from defining a data structure for that > directory as long as that data structure accommodates all use cases > of today and especially tomorrow :-). Right, so please give thought to the proposal of using a subset of BER. > >>I am not that familiar with ASN.1 except that from what I have seen > >>it looks like a fairly terrible format needing an object language to > >>create a parser from etc. not to mention the problems I had with > >>snacc trying to compile the ASN.1 object language of an RFC... > >> > >> Stefan > >Sorry about the confusion, we don't need the notation, I don't mean that. > >I mean use a subset of the ASN.1 basic encoding > >http://homepages.dcc.ufmg.br/~coelho/nm/asn.1.intro.pdf > > > >So we could have a set of sequences, with an ascii string (a tag) > >followed by an octet string (content). > > > > > I think the data layout in the image should be in such format that > you don't have to re-write the whole content of the image if a blob > is stored. With predefined blob size, we can use octet strings and not have to rewrite anything, find the right octet and change it inplace. > I think a directory at the beginning could solve this. It could, but it's not needed for that. > To make it simple one probably would need to know how big the > 'directory' could be otherwise one has to get into allocation of > sectors so that once the directory was to grow beyond 512 bytes that > one would know where its next data are written into. The same is > true for the devices' data blobs. If one knows the sizes of all the > blobs one can lay them out to start and end at specific offsets in > the image. And knowing the size of all the blobs helps in creating > the image of correct size. > Well, all this is a work-around for not having a 'filesystem'. > > Stefan > Sounds like overkill. A sequence with tags in DER format is much easier. -- MST