Hi !

> OK. So, you need to unmap (to trigger page faults at least). I agree with
> you. I thought finer kernel control was possible.

If it would be the case, you would have a severe performance penalty, as 
each and every read cycle would first pass through the kernel code.

> > > However, it seems to me that, for such memory mapping, some code
> > > may be executed sometimes in the kernel, no ?
> > What do you mean by sometimes ?

> I thought to the SMP case - because memory sharing between processes
> may already have given rise to mechanisms we could reuse. But I suppose
> also, as you say, that such mechanisms will also need to re-program the
> MMU and incur the same perf. penalty as unmapping the fb.
> 
> What happens in the SMP case then? If two processes share a memory area?

The MMU-maps of both processes are then pointed towards the same piece of 
logical RAM (i.e. to the same memory page or swap area).

> Are shared memory mechanisms totally different from those of memory mapping?

No. Shared memory just uses the same mapping for multiple processes.

> I ask this question because, with multiple CPUs and thus multiple L1 caches,
> the MMU should then _also_ handle cache coherence issues... But I guess it
> is the case.

Yes. This is handled in the hardware. Another point on why the MMU reprograms 
are so expensive. I heard on some architectures, reprogramming the MMU
requires to flushh _all_ processor caches ...

> BTW, is it the unmap-ping process that is expensive, or is it the
> subsequent (potential) page fault handling?

The unmapping. The later page-fault itself is relatively cheap, though when
it comes to cointinuing the program that raised it, you again need to mmap
the area which again has the performance penalty.

The problem is the TLB flush that is required after changing the MMU setup,
which will take the CPU quite some cycles to recover from afterwards.

> There is a similar issue that _may_ arise between the 2d and 3d
> accelerated engines. But for them, in a KGI controlled design,
> we can do the arbitration inside the KGI driver. 

Yes.

> (BTW, this is
> again a statement that supports the idea of kernel-controlled
> access to the card processing units - as opposed to memory
> mapping the whole accel registers set to a userspace program. ;-)

Only for badly designed hardware. Normally all those issues should not be 
present at all, as they could be resolved by a few hundred gates of 
arbitration logic on the card.

CU, ANdy

-- 
Andreas Beck              |  Email :  <[EMAIL PROTECTED]>

Reply via email to