On Mon, 23 Dec 2002, Brian S. Julin wrote:
> On Mon, 23 Dec 2002, Rodolphe Ortalo wrote:
> >  0) What should a KGI-accel-oriented developper do next? :-)
> 
> I'd like to see the get/put issues worked out personally, and of
> course, whatever can be done to bring the accelerator command queues
> up to max speed (by using optimal data transfer modes) deserves attention.

I get the point on get/put (more on this later). However, do you mean AGP
too here when speaking of "optimal data transfer mode"?

> Also, do remember that we are supposed to be providing a *secure* graphics
> system, so putting some thought into how to prevent an application
> from using the accelerator or graphics registers in a hostile fashion
> is important, rather than reworking the whole design later because
> it wasn't up to that task.  I don't know what your code's status is on
> these two points.

My driver walks the command stream and probably is already resistant to
bugs and possible hardware freezes (see
mga_chipset_accel_{1x64,g200,g400}_check() in chipset/Matrox/Gx00-meta.c).
(Except for the AGP issues IIRC.) Of course, this would deserve some more
testing, but I think the infrastructure is in place to prevent an
application from sending a malicious command stream.
 Wrt the Gx00 driver I think two points deserve some (possibly difficult)
testing:
 1- safety/security of the accel DMA buffers checks;
 2- validity of graphic context switches (2 processes sharing the same
accelerator engine) - this is written but nearly untested.

And then of course, there is the functionality range issues (24/32 bits,
more resolutions, etc.). Fortunately, there are not so many users to ask
for them so I still have time for experiments... ;-))

Concerning security, I think KGI itself still lacks the necessary
verifications of access rights wrt resources. This should probably be
controlled/solved first.

> >  1) Are {Get,Put} functions required for an accel sublib? Currently they
> > are not implemented and I am reluctant to implement them the straight
> > way - though I may try it for experiments. (If the userspace buffer is not
> > in DMA-capable memory, or not big enough, I guess I may end up locking the
> > machine.)
> 
> They are not required, but IMO we should be testing out ways to
> do this.

I agree.

> I'd like to see the put/get functions and the pixel functions all
> use the accelerator, because [...]

We already have some resource that works for commands sets sent to accel
engine. But apparently, we really need something too for *data* sets used
(read or written) by the engine.

The primary problem is probably not really in the accel commands. I guess
most chipsets have some way of busmastering transfers from main memory and
simply wants the bus adresses to be correct (and the memory area to be
adequate).
 The big problem is in the data transfer from userspace to kernelspace.
Given the constraints that should apply to such memory areas, I guess the
kernel should also allocate the areas. It will be too difficult to try to
handle normal userspace-allocated memory as source for accel commands.
 We probably also need a new mechanism in KGI to do that. The accel
buffers currently available have some of the needed characteristics, but
the way they are exchanged with userspace and the fact that there is a
ring of buffers is not adequate. (Apart from that, they are pretty close
to what is needed - remember you can mmap several accel resources...)

I've already played with such things with KGI already with some 3D
commands of the G200/G400. The issue was to transfer triangle lists to the
engine. In this mode, the chipset gets triangles points and draws them.
 I solved that specific issue trying to use the current accel buffers.
Hence, I've introduced a buffer "tag", put at the top of each buffers,
that says if it contains regular commands or triangle lists. (Look in the
unused function GX00_WRITE_TRIANGLE() and the associated data structures
in default/kgi/Matrox/Gx00/Gx00_accel.h.) It works for that specific
issue. But it could be solved much more cleanly by indicating a separate
memory area containing the 3D data (I would still need a fake "command" to
start execution - but that would be a preferable hack than the current
way of multiplexing several buffer types on the same buffer ring).
 Surely, such an approach does not really extend to sending data to the
graphic board. In theory you could do it like this (for example, copy
picture data to a command buffer, with a "IMAGE" tag) but that would not
be very flexible nor convenient. Note that an application can mmap
*several* accel engine command streams, so maybe we could have several
types for them (one for commands, one for image, one for triangles) - but
then synchronizing them would be difficult. I'd really rather have two
separate resource types: commands (FIFO-like) and data areas (array-like).


> 2) is easy to implement without any modifications to the KGI driver
> (at least in the Radeon's case.)  The overhead of the extra copy is 
> absorbed entirely by the GPU and the graphics memory bus.  However, the 
> CPU still spins on the slow(er than main memory) graphics bus while 
> initially transferring the data.

You still require the CPU to copy to the framebuffer, no? So I don't
really see the improvement wrt a straightforward put like in the stubs
currently. And you also have the problem of synchronizing the swatch state
(CPU-driven) and the command stream (GPU-driven).

> 3) requires that KGI know how to allocate and mmap nonswappable RAM buffers.
> I've done this before and though the linux kernel has been in flux in the
> past on this matter, what I've gleaned from changelogs is that the
> developers have finally been convinced that yes, it *is* important to
> some hardware to be able to allocate and MMAP nonswappable RAM, so if anything
> this should be getting easier.  (Note the RAM should also be MTRR'd
> as non-cached if possible.)

I guess it is also very useful for Gigabit Ethernet cards, or for some
database that want to control directly the hard disks load.
 Anyway, that's what is already done in KGI for accel command
streams... No?

> I favor 3) for reasons listed below, but 3) and 2) can cooexist, at least
> on Radeon, because the only difference to the command stream is the base 
> address used for the swatch data.  An argument could be provided to 
> display-kgi to change the size of the swatch and whether it is located
> in system RAM or VRAM.

And also the "origin" flag of the src-like register in the command stream
(and such a flag should be controlled by the driver before execution).
 Personnally, I would not like to see a "host-PCI" source adress pass
untouched to the GPU. I'd rather my driver change the parms, and define
itself the adresses to use (plus check the size, etc.).

All in all, I would also favor solution 3. But it seems to me that a
single swatch area would not be very convenient. The application should be
able to allocate some "data" areas and then use virtual adresses within
these areas as a source for some accel commands. Then the in-kernel driver
can check/convert these adresses; and execute normally.

To summarize, concerning these "data" areas, we would need a resource that
would:
 1) allocate some amount of (DMA-capable, PCI/AGP-capable, unswappable,
continuous) memory adequate for adressing directly by the graphics accel
engine.
 2) mmap this area into userspace for use by the application as nearly
normal memory.
 3) allow the driver to quickly find that a virtual adress is within one
area, and check sizes wrt this adress (and the area end);
 4) possibly offer an interface to unmmap the area from userspace while it
is in use by the engine (I do not know if this is mandatory or if we can
allow concurrent underterministic accesses - after all this is only
data). If so:
 4bis) block the application if it tries to access the area while in use
and allow the driver to wake the application up (or do it automatically
via 5).
 5) offer an interface to the driver to keep track of "in-use" areas, lock
these areas on behalf of the accel engine and some way to release them
(possibly in conjuction with the accel command buffers execution state
or wait queues).

Note that 4 and 4bis could also solve many of the "synchronization"
issues. Such a memory area can operate as a signal/message from the engine
to the application. (For example, on the Matrox, you can ask the engine to
write a value somewhere to signal the execution of some pieces of the
command stream.)
 In fact 4+4bis and 5 are very similar (4+4bis is the userspace-kernel
side, 5 is the KGI-KGIdriver side).

I have to admit that I do not yet see very well how such a resource should
be implemented (except that I suspect it has a lot in common with the
accel resource currently available in KGI). But I really see very well how
I would use it...

Rodolphe

> 1) If the RAM is "special," get/put operation using a swatch may return 
> before LibGGI is done with data in the swatch.  So, before altering or 
> retreiving the data, one should perform a ggiFlush manually to idle the accel.

So you allow concurrent accesses? Well, if the hardware busses can prevent
us from problems, I have no objection. But we would need some external
mean for synchronisation between the engine and the application (probably
ioctl()-based). I tend to like data-driven synchronisation a lot, so I'd
probably favor to block/unblock the app and not let it perform the flush
itself.

> 2) There may be some alignment restrictions on what valid start
> addresses are if not referencing the first pixel location in the swatch.

Yes. Exact. IMO such restrictions are driver-dependent no?

> In the case of the Radeon, additionally, using 3D primitives and
> implementing 2) and 3) would allow me to simply skip writing a separate
> rendering sublib for LibBuf/Radeon entirely, as all LibBuf would need to do
> is alter some kgi-Radeon renderer internal data telling the renderer that the
> registers that control Alpha and Z need to be updated before the 
> next primitive is dispatched, and a quick test added to the 
> top of the kgi-Radeon renderer's primitives.

Except if you want the data buffers to be on-board, no?

> 
> >  4) Do you have a {Mystique,Millenium,G400,G550}?
> 
> I have a Mystique, as soon as I fix the poor system it's installed in.

Note that I am not sure the Mystique can do bus-mastering; so you may not
really be able to take much advantage of all this with it. Currently, the
driver does a "puts" of content of a command buffer to the engine.
 Put/Get transfers would go through a "ILOAD" area. And the driver would
probably need to do this via a puts while walking the command stream. That
would be a different implementation, but the same logic: walk the command
stream, as soon as you see that some data should be fed, either modifiy
the src address to point to a KGI-alloced buffer, or copy the data from
this buffer to the ILOAD area.
 In all case, the driver can really control what goes to the engine
(especially the size of transfers, or the AGP vs PCI constraints).

I don't know for the Radeon (does someone have the specs for them?) but
for the Matrox it would be nice.

I guess such data buffers might also be useful for video encoding/decoding
(except if you want all operations to be done in the on-board memory).

Rodolphe


> 
> >  4) Do you have a {Mystique,Millenium,G400,G550}?
> 
> I have a Mystique, as soon as I fix the poor system it's installed in.

Note that I am not sure the Mystique can do bus-mastering; so you may not
really be able to take much advantage of all this with it. Currently, the
driver does a "puts" of content of a command buffer to the engine.
 Put/Get transfers would go through a "ILOAD" area. And the driver would
probably need to do this via a puts while walking the command stream. That
would be a different implementation, but the same logic: walk the command
stream, as soon as you see that some data should be fed, either modifiy
the src address to point to a KGI-alloced buffer, or copy the data from
this buffer to the ILOAD area.
 In all case, the driver can really control what goes to the engine
(especially the size of transfers, or the AGP vs PCI constraints).

I don't know for the Radeon (does someone have the specs for them?) but
for the Matrox it would be nice.

I guess such data buffers might also be useful for video encoding/decoding
(except if you want all operations to be done in the on-board memory).

Rodolphe

> 
> >  4) Do you have a {Mystique,Millenium,G400,G550}?
> 
> I have a Mystique, as soon as I fix the poor system it's installed in.

Note that I am not sure the Mystique can do bus-mastering; so you may not
really be able to take much advantage of all this with it. Currently, the
driver does a "puts" of content of a command buffer to the engine.
 Put/Get transfers would go through a "ILOAD" area. And the driver would
probably need to do this via a puts while walking the command stream. That
would be a different implementation, but the same logic: walk the command
stream, as soon as you see that some data should be fed, either modifiy
the src address to point to a KGI-alloced buffer, or copy the data from
this buffer to the ILOAD area.
 In all case, the driver can really control what goes to the engine
(especially the size of transfers, or the AGP vs PCI constraints).

I don't know for the Radeon (does someone have the specs for them?) but
for the Matrox it would be nice.

I guess such data buffers might also be useful for video encoding/decoding
(except if you want all operations to be done in the on-board memory).

Rodolphe


Reply via email to