On Mon, 23 Dec 2002, Brian S. Julin wrote: > On Mon, 23 Dec 2002, Rodolphe Ortalo wrote: > > 0) What should a KGI-accel-oriented developper do next? :-) > > I'd like to see the get/put issues worked out personally, and of > course, whatever can be done to bring the accelerator command queues > up to max speed (by using optimal data transfer modes) deserves attention.
I get the point on get/put (more on this later). However, do you mean AGP too here when speaking of "optimal data transfer mode"? > Also, do remember that we are supposed to be providing a *secure* graphics > system, so putting some thought into how to prevent an application > from using the accelerator or graphics registers in a hostile fashion > is important, rather than reworking the whole design later because > it wasn't up to that task. I don't know what your code's status is on > these two points. My driver walks the command stream and probably is already resistant to bugs and possible hardware freezes (see mga_chipset_accel_{1x64,g200,g400}_check() in chipset/Matrox/Gx00-meta.c). (Except for the AGP issues IIRC.) Of course, this would deserve some more testing, but I think the infrastructure is in place to prevent an application from sending a malicious command stream. Wrt the Gx00 driver I think two points deserve some (possibly difficult) testing: 1- safety/security of the accel DMA buffers checks; 2- validity of graphic context switches (2 processes sharing the same accelerator engine) - this is written but nearly untested. And then of course, there is the functionality range issues (24/32 bits, more resolutions, etc.). Fortunately, there are not so many users to ask for them so I still have time for experiments... ;-)) Concerning security, I think KGI itself still lacks the necessary verifications of access rights wrt resources. This should probably be controlled/solved first. > > 1) Are {Get,Put} functions required for an accel sublib? Currently they > > are not implemented and I am reluctant to implement them the straight > > way - though I may try it for experiments. (If the userspace buffer is not > > in DMA-capable memory, or not big enough, I guess I may end up locking the > > machine.) > > They are not required, but IMO we should be testing out ways to > do this. I agree. > I'd like to see the put/get functions and the pixel functions all > use the accelerator, because [...] We already have some resource that works for commands sets sent to accel engine. But apparently, we really need something too for *data* sets used (read or written) by the engine. The primary problem is probably not really in the accel commands. I guess most chipsets have some way of busmastering transfers from main memory and simply wants the bus adresses to be correct (and the memory area to be adequate). The big problem is in the data transfer from userspace to kernelspace. Given the constraints that should apply to such memory areas, I guess the kernel should also allocate the areas. It will be too difficult to try to handle normal userspace-allocated memory as source for accel commands. We probably also need a new mechanism in KGI to do that. The accel buffers currently available have some of the needed characteristics, but the way they are exchanged with userspace and the fact that there is a ring of buffers is not adequate. (Apart from that, they are pretty close to what is needed - remember you can mmap several accel resources...) I've already played with such things with KGI already with some 3D commands of the G200/G400. The issue was to transfer triangle lists to the engine. In this mode, the chipset gets triangles points and draws them. I solved that specific issue trying to use the current accel buffers. Hence, I've introduced a buffer "tag", put at the top of each buffers, that says if it contains regular commands or triangle lists. (Look in the unused function GX00_WRITE_TRIANGLE() and the associated data structures in default/kgi/Matrox/Gx00/Gx00_accel.h.) It works for that specific issue. But it could be solved much more cleanly by indicating a separate memory area containing the 3D data (I would still need a fake "command" to start execution - but that would be a preferable hack than the current way of multiplexing several buffer types on the same buffer ring). Surely, such an approach does not really extend to sending data to the graphic board. In theory you could do it like this (for example, copy picture data to a command buffer, with a "IMAGE" tag) but that would not be very flexible nor convenient. Note that an application can mmap *several* accel engine command streams, so maybe we could have several types for them (one for commands, one for image, one for triangles) - but then synchronizing them would be difficult. I'd really rather have two separate resource types: commands (FIFO-like) and data areas (array-like). > 2) is easy to implement without any modifications to the KGI driver > (at least in the Radeon's case.) The overhead of the extra copy is > absorbed entirely by the GPU and the graphics memory bus. However, the > CPU still spins on the slow(er than main memory) graphics bus while > initially transferring the data. You still require the CPU to copy to the framebuffer, no? So I don't really see the improvement wrt a straightforward put like in the stubs currently. And you also have the problem of synchronizing the swatch state (CPU-driven) and the command stream (GPU-driven). > 3) requires that KGI know how to allocate and mmap nonswappable RAM buffers. > I've done this before and though the linux kernel has been in flux in the > past on this matter, what I've gleaned from changelogs is that the > developers have finally been convinced that yes, it *is* important to > some hardware to be able to allocate and MMAP nonswappable RAM, so if anything > this should be getting easier. (Note the RAM should also be MTRR'd > as non-cached if possible.) I guess it is also very useful for Gigabit Ethernet cards, or for some database that want to control directly the hard disks load. Anyway, that's what is already done in KGI for accel command streams... No? > I favor 3) for reasons listed below, but 3) and 2) can cooexist, at least > on Radeon, because the only difference to the command stream is the base > address used for the swatch data. An argument could be provided to > display-kgi to change the size of the swatch and whether it is located > in system RAM or VRAM. And also the "origin" flag of the src-like register in the command stream (and such a flag should be controlled by the driver before execution). Personnally, I would not like to see a "host-PCI" source adress pass untouched to the GPU. I'd rather my driver change the parms, and define itself the adresses to use (plus check the size, etc.). All in all, I would also favor solution 3. But it seems to me that a single swatch area would not be very convenient. The application should be able to allocate some "data" areas and then use virtual adresses within these areas as a source for some accel commands. Then the in-kernel driver can check/convert these adresses; and execute normally. To summarize, concerning these "data" areas, we would need a resource that would: 1) allocate some amount of (DMA-capable, PCI/AGP-capable, unswappable, continuous) memory adequate for adressing directly by the graphics accel engine. 2) mmap this area into userspace for use by the application as nearly normal memory. 3) allow the driver to quickly find that a virtual adress is within one area, and check sizes wrt this adress (and the area end); 4) possibly offer an interface to unmmap the area from userspace while it is in use by the engine (I do not know if this is mandatory or if we can allow concurrent underterministic accesses - after all this is only data). If so: 4bis) block the application if it tries to access the area while in use and allow the driver to wake the application up (or do it automatically via 5). 5) offer an interface to the driver to keep track of "in-use" areas, lock these areas on behalf of the accel engine and some way to release them (possibly in conjuction with the accel command buffers execution state or wait queues). Note that 4 and 4bis could also solve many of the "synchronization" issues. Such a memory area can operate as a signal/message from the engine to the application. (For example, on the Matrox, you can ask the engine to write a value somewhere to signal the execution of some pieces of the command stream.) In fact 4+4bis and 5 are very similar (4+4bis is the userspace-kernel side, 5 is the KGI-KGIdriver side). I have to admit that I do not yet see very well how such a resource should be implemented (except that I suspect it has a lot in common with the accel resource currently available in KGI). But I really see very well how I would use it... Rodolphe > 1) If the RAM is "special," get/put operation using a swatch may return > before LibGGI is done with data in the swatch. So, before altering or > retreiving the data, one should perform a ggiFlush manually to idle the accel. So you allow concurrent accesses? Well, if the hardware busses can prevent us from problems, I have no objection. But we would need some external mean for synchronisation between the engine and the application (probably ioctl()-based). I tend to like data-driven synchronisation a lot, so I'd probably favor to block/unblock the app and not let it perform the flush itself. > 2) There may be some alignment restrictions on what valid start > addresses are if not referencing the first pixel location in the swatch. Yes. Exact. IMO such restrictions are driver-dependent no? > In the case of the Radeon, additionally, using 3D primitives and > implementing 2) and 3) would allow me to simply skip writing a separate > rendering sublib for LibBuf/Radeon entirely, as all LibBuf would need to do > is alter some kgi-Radeon renderer internal data telling the renderer that the > registers that control Alpha and Z need to be updated before the > next primitive is dispatched, and a quick test added to the > top of the kgi-Radeon renderer's primitives. Except if you want the data buffers to be on-board, no? > > > 4) Do you have a {Mystique,Millenium,G400,G550}? > > I have a Mystique, as soon as I fix the poor system it's installed in. Note that I am not sure the Mystique can do bus-mastering; so you may not really be able to take much advantage of all this with it. Currently, the driver does a "puts" of content of a command buffer to the engine. Put/Get transfers would go through a "ILOAD" area. And the driver would probably need to do this via a puts while walking the command stream. That would be a different implementation, but the same logic: walk the command stream, as soon as you see that some data should be fed, either modifiy the src address to point to a KGI-alloced buffer, or copy the data from this buffer to the ILOAD area. In all case, the driver can really control what goes to the engine (especially the size of transfers, or the AGP vs PCI constraints). I don't know for the Radeon (does someone have the specs for them?) but for the Matrox it would be nice. I guess such data buffers might also be useful for video encoding/decoding (except if you want all operations to be done in the on-board memory). Rodolphe > > > 4) Do you have a {Mystique,Millenium,G400,G550}? > > I have a Mystique, as soon as I fix the poor system it's installed in. Note that I am not sure the Mystique can do bus-mastering; so you may not really be able to take much advantage of all this with it. Currently, the driver does a "puts" of content of a command buffer to the engine. Put/Get transfers would go through a "ILOAD" area. And the driver would probably need to do this via a puts while walking the command stream. That would be a different implementation, but the same logic: walk the command stream, as soon as you see that some data should be fed, either modifiy the src address to point to a KGI-alloced buffer, or copy the data from this buffer to the ILOAD area. In all case, the driver can really control what goes to the engine (especially the size of transfers, or the AGP vs PCI constraints). I don't know for the Radeon (does someone have the specs for them?) but for the Matrox it would be nice. I guess such data buffers might also be useful for video encoding/decoding (except if you want all operations to be done in the on-board memory). Rodolphe > > > 4) Do you have a {Mystique,Millenium,G400,G550}? > > I have a Mystique, as soon as I fix the poor system it's installed in. Note that I am not sure the Mystique can do bus-mastering; so you may not really be able to take much advantage of all this with it. Currently, the driver does a "puts" of content of a command buffer to the engine. Put/Get transfers would go through a "ILOAD" area. And the driver would probably need to do this via a puts while walking the command stream. That would be a different implementation, but the same logic: walk the command stream, as soon as you see that some data should be fed, either modifiy the src address to point to a KGI-alloced buffer, or copy the data from this buffer to the ILOAD area. In all case, the driver can really control what goes to the engine (especially the size of transfers, or the AGP vs PCI constraints). I don't know for the Radeon (does someone have the specs for them?) but for the Matrox it would be nice. I guess such data buffers might also be useful for video encoding/decoding (except if you want all operations to be done in the on-board memory). Rodolphe