> We are also want to test it and wait until you release it for mmc mailing > list. I think I will be able to send out code for mailing list mid-end January.
> I saw the mmc performance blueprint and we're now suffer from mmc > performance when low cpu frequency. > even though input clock is consistent at 50MHz. the performance > depends on cpu frequency. Do you run in PIO or DMA mode? > Need to investigate it. Please let me know when you find out. /Per On 18 December 2010 16:29, Kyungmin Park <kmp...@infradead.org> wrote: > Thanks > > No problem. > > We are also want to test it and wait until you release it for mmc mailing > list. > I saw the mmc performance blueprint and we're now suffer from mmc > performance when low cpu frequency. > even though input clock is consistent at 50MHz. the performance > depends on cpu frequency. > > Need to investigate it. > > Thank you, > Kyungmin Park > > On Sat, Dec 18, 2010 at 11:19 PM, Per Forlin <per.for...@linaro.org> wrote: >> Hi, >> >> Thanks for your interest. I am in the middle of rewriting parts due to >> my findings about dma_unmap. If everything goes well I should have a >> new prototype ready on Tuesday. >> My code base is 2.6.37 rc4. Will that work for you? >> >> After Tuesday I will go on vacation until Linaro sprint in Dallas Jan >> 10. I will not make any new updates on my code during my vacation but >> I try to keep up with my emails. >> I don't want to send it out for a full review yet because the code is >> far from ready. It would only cause to much noise I'm afraid, and >> since I am going on vacation it is not the best timing. >> >> Patches. >> Is it ok for you to wait until Tuesday (or a few days later if I run >> into trouble) and then you can test my latest version supporting >> double buffering for unmap. I can send out the patches directly to >> you. >> >> BR >> Per >> >> On 18 December 2010 03:50, Kyungmin Park <kmp...@infradead.org> wrote: >>> Hi, >>> >>> It's interesting. >>> >>> Can you send the your working codes to test it in our environment. Samsung >>> SoC. >>> >>> Thank you, >>> Kyungmin Park >>> >>> On Sat, Dec 18, 2010 at 12:38 AM, Per Forlin <per.for...@linaro.org> wrote: >>>> Hi again, >>>> >>>> I made a mistake in my double buffering implementation. >>>> I assumed dma_unmap did not do any cache operations. Well, it does. >>>> Due to L2 read prefetch the L2 needs to be invalidated at dma_unmap. >>>> >>>> I made a quick test to see how much throughput would improved if >>>> dma_unmap could be run in parallel. >>>> In this run dma_unmap is removed. >>>> >>>> Then the figures for read becomes: >>>> * 7-16 % gain if double buffering in the ideal case. Closing on the >>>> same performance as for PIO. >>>> >>>> Relative diff: MMC-VANILLA-DMA-LOG -> MMC-MMCI-2-BUF-DMA-LOG-NO-UNMAP >>>> CPU is abs diff >>>> random random >>>> KB reclen write rewrite read reread read write >>>> 51200 4 +0% +0% +7% +8% +2% +0% >>>> cpu: +0.0 +0.0 +0.7 +0.7 -0.0 +0.0 >>>> >>>> 51200 8 +0% +0% +10% +10% +6% +0% >>>> cpu: -0.1 +0.1 +0.6 +0.9 +0.3 +0.0 >>>> >>>> 51200 16 +0% +0% +11% +11% +8% +0% >>>> cpu: -0.0 -0.1 +0.9 +1.0 +0.3 +0.0 >>>> >>>> 51200 32 +0% +0% +13% +13% +10% +0% >>>> cpu: -0.1 +0.0 +1.0 +0.5 +0.8 +0.0 >>>> >>>> 51200 64 +0% +0% +13% +13% +12% +1% >>>> cpu: +0.0 +0.0 +0.4 +1.0 +0.9 +0.1 >>>> >>>> 51200 128 +0% +5% +14% +14% +14% +1% >>>> cpu: +0.0 +0.2 +1.0 +0.9 +1.0 +0.0 >>>> >>>> 51200 256 +0% +2% +13% +13% +13% +1% >>>> cpu: +0.0 +0.1 +0.9 +0.3 +1.6 -0.1 >>>> >>>> 51200 512 +0% +1% +14% +14% +14% +8% >>>> cpu: -0.0 +0.3 +2.5 +1.8 +2.4 +0.3 >>>> >>>> 51200 1024 +0% +2% +14% +15% +15% +0% >>>> cpu: +0.0 +0.3 +1.3 +1.4 +1.3 +0.1 >>>> >>>> 51200 2048 +2% +2% +15% +15% +15% +4% >>>> cpu: +0.3 +0.1 +1.6 +2.1 +0.9 +0.3 >>>> >>>> 51200 4096 +5% +3% +15% +16% +16% +5% >>>> cpu: +0.0 +0.4 +1.1 +1.7 +1.7 +0.5 >>>> >>>> 51200 8192 +5% +3% +16% +16% +16% +2% >>>> cpu: +0.0 +0.4 +2.0 +1.3 +1.8 +0.1 >>>> >>>> 51200 16384 +1% +1% +16% +16% +16% +4% >>>> cpu: +0.1 -0.2 +2.3 +1.7 +2.6 +0.2 >>>> >>>> I will work on adding unmap to double buffering next week. >>>> >>>> /Per >>>> >>>> On 16 December 2010 15:15, Per Forlin <per.for...@linaro.org> wrote: >>>>> Hi, >>>>> >>>>> I am working on the blueprint >>>>> https://blueprints.launchpad.net/linux-linaro/+spec/other-storage-performance-emmc. >>>>> Currently I am investigating performance for DMA vs PIO on eMMC. >>>>> >>>>> Pros and cons for DMA on MMC >>>>> + Offloads CPU >>>>> + Fewer interrupts, one single interrupt for each transfer compared to >>>>> 100s or even 1000s >>>>> + Power save, DMA consumes less power than CPU >>>>> - Less bandwidth / throughput compared to PIO-CPU >>>>> >>>>> The reason for introducing double buffering in the MMC framework is to >>>>> address the throughput issue for DMA on MMC. >>>>> The assumption is that the CPU and DMA have higher throughput than the >>>>> MMC / SD-card. >>>>> My hypothesis is that the difference in performance between PIO-mode >>>>> and DMA-mode for MMC is due to latency for preparing a DMA-job. >>>>> If the next DMA-job could be prepared while the current job is ongoing >>>>> this latency would be reduced. The biggest part of preparing a DMA-job >>>>> is maintenance of caches. >>>>> In my case I run on U5500 (mach-ux500) which has both L1 and L2 >>>>> caches. The host mmc driver in use is the mmci driver (PL180). >>>>> >>>>> I have done a hack in both the MMC-framework and mmci in order to make >>>>> a prove of concept. I have run IOZone to get measurements to prove my >>>>> case worthy. >>>>> The next step, if the results are promising will be to clean up my >>>>> work and send out patches for review. >>>>> >>>>> The DMAC in ux500 support to modes LOG and PHY. >>>>> LOG - Many logical channels are multiplex on top of one physical channel >>>>> PHY - Only one channel per physical channel >>>>> >>>>> DMA mode LOG and PHY have different latency both HW and SW wise. One >>>>> could almost treat them as "two different DMACs. To get a wider test >>>>> scope I have tested using both modes. >>>>> >>>>> Summary of the results. >>>>> * It is optional for the mmc host driver to utitlize the 2-buf >>>>> support. 2-buf in framework requires no change in the host drivers. >>>>> * IOZone shows no performance hit on existing drivers* if adding 2-buf >>>>> to the framework but not in the host driver. >>>>> (* So far I have only test one driver) >>>>> * The performance gain for DMA using 2-buf is probably proportional to >>>>> the cache maintenance time. >>>>> The faster the card is the more significant the cache maintenance >>>>> part becomes and vice versa. >>>>> * For U5500 with 2-buf performance for DMA is: >>>>> Throughput: DMA vanilla vs DMA 2-buf >>>>> * read +5-10 % >>>>> * write +0-3 % >>>>> CPU load: CPU vs DMA 2-buf >>>>> * read large data: minus 10-20 units of % >>>>> * read small data: same as PIO >>>>> * write: same load as PIO ( why? ) >>>>> >>>>> Here follows two of the measurements from IOZones comparing MMC with >>>>> double buffering and without. The rest you can find in the text files >>>>> attached. >>>>> >>>>> === Performance CPU compared with DMA vanilla kernel === >>>>> Absolute diff: MMC-VANILLA-CPU -> MMC-VANILLA-DMA-LOG >>>>> random random >>>>> KB reclen write rewrite read reread read write >>>>> 51200 4 -14 -8 -1005 -988 -679 -1 >>>>> cpu: -0.0 -0.1 -0.8 -0.9 -0.7 +0.0 >>>>> >>>>> 51200 8 -35 -34 -1763 -1791 -1327 +0 >>>>> cpu: +0.0 -0.1 -0.9 -1.2 -0.7 +0.0 >>>>> >>>>> 51200 16 +6 -38 -2712 -2728 -2225 +0 >>>>> cpu: -0.1 -0.0 -1.6 -1.2 -0.7 -0.0 >>>>> >>>>> 51200 32 -10 -79 -3640 -3710 -3298 -1 >>>>> cpu: -0.1 -0.2 -1.2 -1.2 -0.7 -0.0 >>>>> >>>>> 51200 64 +31 -16 -4401 -4533 -4212 -1 >>>>> cpu: -0.2 -0.2 -0.6 -1.2 -1.2 -0.0 >>>>> >>>>> 51200 128 +58 -58 -4749 -4776 -4532 -4 >>>>> cpu: -0.2 -0.0 -1.2 -1.1 -1.2 +0.1 >>>>> >>>>> 51200 256 +192 +283 -5343 -5347 -5184 +13 >>>>> cpu: +0.0 +0.1 -1.2 -0.6 -1.2 +0.0 >>>>> >>>>> 51200 512 +232 +470 -4663 -4690 -4588 +171 >>>>> cpu: +0.1 +0.1 -4.5 -3.9 -3.8 -0.1 >>>>> >>>>> 51200 1024 +250 +68 -3151 -3318 -3303 +122 >>>>> cpu: -0.1 -0.5 -14.0 -13.5 -14.0 -0.1 >>>>> >>>>> 51200 2048 +224 +401 -2708 -2601 -2612 +161 >>>>> cpu: -1.7 -1.3 -18.4 -19.5 -17.8 -0.5 >>>>> >>>>> 51200 4096 +194 +417 -2380 -2361 -2520 +242 >>>>> cpu: -1.3 -1.6 -19.4 -19.9 -19.4 -0.6 >>>>> >>>>> 51200 8192 +228 +315 -2279 -2327 -2291 +270 >>>>> cpu: -1.0 -0.9 -20.8 -20.3 -21.0 -0.6 >>>>> >>>>> 51200 16384 +254 +289 -2260 -2232 -2269 +308 >>>>> cpu: -0.8 -0.8 -20.5 -19.9 -21.5 -0.4 >>>>> >>>>> === Performance CPU compared with DMA with MMC double buffering === >>>>> Absolute diff: MMC-VANILLA-CPU -> MMC-MMCI-2-BUF-DMA-LOG >>>>> random random >>>>> KB reclen write rewrite read reread read write >>>>> 51200 4 -7 -11 -533 -513 -365 +0 >>>>> cpu: -0.0 -0.1 -0.5 -0.7 -0.4 +0.0 >>>>> >>>>> 51200 8 -19 -28 -916 -932 -671 +0 >>>>> cpu: -0.0 -0.0 -0.3 -0.6 -0.2 +0.0 >>>>> >>>>> 51200 16 +14 -13 -1467 -1479 -1203 +1 >>>>> cpu: +0.0 -0.1 -0.7 -0.7 -0.2 -0.0 >>>>> >>>>> 51200 32 +61 +24 -2008 -2088 -1853 +4 >>>>> cpu: -0.3 -0.2 -0.7 -0.7 -0.2 -0.0 >>>>> >>>>> 51200 64 +130 +84 -2571 -2692 -2483 +5 >>>>> cpu: +0.0 -0.4 -0.1 -0.7 -0.7 +0.0 >>>>> >>>>> 51200 128 +275 +279 -2760 -2747 -2607 +19 >>>>> cpu: -0.1 +0.1 -0.7 -0.6 -0.7 +0.1 >>>>> >>>>> 51200 256 +558 +503 -3455 -3429 -3216 +55 >>>>> cpu: -0.1 +0.1 -0.8 -0.1 -0.8 +0.0 >>>>> >>>>> 51200 512 +608 +820 -2476 -2497 -2504 +154 >>>>> cpu: +0.2 +0.5 -3.3 -2.1 -2.7 +0.0 >>>>> >>>>> 51200 1024 +652 +493 -818 -977 -1023 +291 >>>>> cpu: +0.0 -0.1 -13.2 -12.8 -13.3 +0.1 >>>>> >>>>> 51200 2048 +654 +809 -241 -218 -242 +501 >>>>> cpu: -1.5 -1.2 -16.9 -18.2 -17.0 -0.2 >>>>> >>>>> 51200 4096 +482 +908 -80 +82 -154 +633 >>>>> cpu: -1.4 -1.2 -19.1 -18.4 -18.6 -0.2 >>>>> >>>>> 51200 8192 +643 +810 +199 +186 +182 +675 >>>>> cpu: -0.8 -0.7 -19.8 -19.2 -19.5 -0.7 >>>>> >>>>> 51200 16384 +684 +724 +275 +323 +269 +724 >>>>> cpu: -0.6 -0.7 -19.2 -18.6 -19.8 -0.2 >>>>> >>>> >>>> _______________________________________________ >>>> linaro-dev mailing list >>>> linaro-dev@lists.linaro.org >>>> http://lists.linaro.org/mailman/listinfo/linaro-dev >>>> >>> >> > _______________________________________________ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev