> > No. The first pinned entry (0...256M) is inserted by the asm code in > > head_44x.S. The code in 44x_mmu.c will later map the rest of lowmem > > (typically up to 768M but various settings can change that) using more > > 256M entries. > > Thanks Ben, appreciate all your wisdom and insight. > > Ok, so my 460ex board has 512MB total, so how does that figure into > the 768M? Is there some other heuristic that determines how these > are mapped?
Not really, it all fits in lowmem so it will be mapped with two pinned 256M entries. Basically, we try to map all memory with those entries in the linear mapping. But since we only have 1G of address space available when PAGE_OFFSET is c0000000, and we need some of that for vmalloc, ioremap, etc... we thus limit that mapping to 768M currently. If you have more memory, you will see only 768M unless you use CONFIG_HIGHMEM, which allows the kernel to exploit more physical memory. In this case, only the first 768M are permanently mapped (and accessible), but you can allocate pages in "highmem" which can still be mapped into user space and need kmap/kunmap calls to be accessed by the kernel. However, in your case you don't need highmem, everything fits in lowmem, so the kernel will just use 2x256M of bolted TLB entries to map that permanently. Note also that kmalloc() always return lowmem. > So is it reasonable to assume that everything on my system will come from > pinned TLB entries? Yes. > The DMA is what I use in the "real world case" to get data into and out > of these buffers. However, I can disable the DMA completely and do only > the kmalloc. In this case I still see the same poor performance. My > prefetching is part of my algo using the dcbt instructions. I know the > instructions are effective b/c without them the algo is much less > performant. So yes, my prefetches are explicit. Could be some "effect" of the cache structure, L2 cache, cache geometry (number of ways etc...). You might be able to alleviate that by changing the "stride" of your prefetch. Unfortunately, I'm not familiar enough with the 440 micro architecture and its caches to be able to help you much here. > Ok, I will give that a try ... in addition, is there an easy way to use > any sort of gprof like tool to see the system performance? What about > looking at the 44x performance counters in some meaningful way? All > the experiments point to the fetching being slower in the full program > as opposed to the algo in a testbench, so I want to determine what it is > that could cause that. Does it have any useful performance counters ? I didn't think it did but I may be mistaken. Cheers, Ben. _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev