>On Fri, 2002-05-17 at 04:42, Benjamin Herrenschmidt wrote:
>> >The source buffer doesn't matter after we have filled the DMA buffers,
>> >does it?
>> 
>> Ah, you are right, I forgot about the fact we did an additional copy
>> here. Too bad we can't just DMA from the source buffer, that sucks,
>> remember the RAM throughput of most Macs isn't that good... This is
>> probably one reason why DMA doesn't show that a significant perf
>> improvement.
>
>One might be able to DMA directly from the source buffer, but one would
>have to walk the pages and set up descriptor tables with the bus
>addresses. Do you think that could still be better? (Assuming one could
>work out alignment etc.)

Provided that those source pages aren't in swap... Though I think some
of the v4l drivers used to do such tricks.

The ideal way, but probably not possible with current APIs, would be
to have control over Xv (MC ?) allocation routines so that when the client
frames are allocated, it really gets a pair of AGP memory blocks allocated
from the AGP aperture and mapped into the client process space.

Ideally, we could then make it cacheable, and then have Xv flush the cache
when feeding the frame to the ring.
>
>> >Anyway, I don't see any explicit synchronisation in the driver, so
>> >probably the problem is the players calling XSync().
>> 
>> Well, do we wait for DMA to finish or not ? If we do, then we are
>> doing explicit sync.
>
>Again, I don't see that in the driver, but maybe I'm just blind.

Could be implicit as part as a wait for engine ready or a 2D sync,
though I yet have to look at the impl. We should try to figure out
where X is actually spending those cycles. But I suspect that since
the AGP memory is mapped uncacheable (and guarded), the time you
spend blitting from the source buffer to the AGP buffer is almost
as long as blitting directly to the FB ... I remember trying your
first r128 implementation on the pismo, I had approx. similar CPU
usage doing DMA blits and doing manual blits to the FB using FP
registers (64 bits bursts on the bus).

>> If we don't, then we should, at least, wait
>> on frame N+1 wait for frame N to finish. The point is to have at
>> least one frame in advance to not busyloop when there is still
>> work to do. The ideal case would be to block on command completion
>> using an IRQ of course.
>> 
>> XSync() can be faked. We can perfectly decide to buffer one frame
>> in advance, can't we ? Then XSync on the next frame, thus we won't
>> block on double buffer setup if the second buffer isn't filled. That
>> make sure we only block (or spinloop) if the decoder is feeding
>> us faster than the framerate, and only when we have the 2 buffers filled.
>> That should smooth the whole data flow and avoid a lot of useless
>> sleeps/busyloops.
>
>Smells like a hack, this is stretching the XSync() semantics to say the
>least, in particular I think XSync() is the only feedback the players
>get for the timing, I wonder if such a change wouldn't have bad effects
>of its own there.

I may, but by one-frame off, this isn't too bad, especially since Xv
isn't good enough to do real frame sync on things like broadcast interlaced
display.

Ben.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to