Hi,

It's interesting.

Can you send the your working codes to test it in our environment. Samsung SoC.

Thank you,
Kyungmin Park

On Sat, Dec 18, 2010 at 12:38 AM, Per Forlin <per.for...@linaro.org> wrote:
> Hi again,
>
> I made a mistake in my double buffering implementation.
> I assumed dma_unmap did not do any cache operations. Well, it does.
> Due to L2 read prefetch the L2 needs to be invalidated at dma_unmap.
>
> I made a quick test to see how much throughput would improved if
> dma_unmap could be run in parallel.
> In this run dma_unmap is removed.
>
> Then the figures for read becomes:
> * 7-16 % gain if double buffering in the ideal case. Closing on the
> same performance as for PIO.
>
> Relative diff: MMC-VANILLA-DMA-LOG -> MMC-MMCI-2-BUF-DMA-LOG-NO-UNMAP
> CPU is abs diff
>                                                        random  random
>        KB      reclen  write   rewrite read    reread  read    write
>        51200   4       +0%     +0%     +7%     +8%     +2%     +0%
>        cpu:            +0.0    +0.0    +0.7    +0.7    -0.0    +0.0
>
>        51200   8       +0%     +0%     +10%    +10%    +6%     +0%
>        cpu:            -0.1    +0.1    +0.6    +0.9    +0.3    +0.0
>
>        51200   16      +0%     +0%     +11%    +11%    +8%     +0%
>        cpu:            -0.0    -0.1    +0.9    +1.0    +0.3    +0.0
>
>        51200   32      +0%     +0%     +13%    +13%    +10%    +0%
>        cpu:            -0.1    +0.0    +1.0    +0.5    +0.8    +0.0
>
>        51200   64      +0%     +0%     +13%    +13%    +12%    +1%
>        cpu:            +0.0    +0.0    +0.4    +1.0    +0.9    +0.1
>
>        51200   128     +0%     +5%     +14%    +14%    +14%    +1%
>        cpu:            +0.0    +0.2    +1.0    +0.9    +1.0    +0.0
>
>        51200   256     +0%     +2%     +13%    +13%    +13%    +1%
>        cpu:            +0.0    +0.1    +0.9    +0.3    +1.6    -0.1
>
>        51200   512     +0%     +1%     +14%    +14%    +14%    +8%
>        cpu:            -0.0    +0.3    +2.5    +1.8    +2.4    +0.3
>
>        51200   1024    +0%     +2%     +14%    +15%    +15%    +0%
>        cpu:            +0.0    +0.3    +1.3    +1.4    +1.3    +0.1
>
>        51200   2048    +2%     +2%     +15%    +15%    +15%    +4%
>        cpu:            +0.3    +0.1    +1.6    +2.1    +0.9    +0.3
>
>        51200   4096    +5%     +3%     +15%    +16%    +16%    +5%
>        cpu:            +0.0    +0.4    +1.1    +1.7    +1.7    +0.5
>
>        51200   8192    +5%     +3%     +16%    +16%    +16%    +2%
>        cpu:            +0.0    +0.4    +2.0    +1.3    +1.8    +0.1
>
>        51200   16384   +1%     +1%     +16%    +16%    +16%    +4%
>        cpu:            +0.1    -0.2    +2.3    +1.7    +2.6    +0.2
>
> I will work on adding unmap to double buffering next week.
>
> /Per
>
> On 16 December 2010 15:15, Per Forlin <per.for...@linaro.org> wrote:
>> Hi,
>>
>> I am working on the blueprint
>> https://blueprints.launchpad.net/linux-linaro/+spec/other-storage-performance-emmc.
>> Currently I am investigating performance for DMA vs PIO on eMMC.
>>
>> Pros and cons for DMA on MMC
>> + Offloads CPU
>> + Fewer interrupts, one single interrupt for each transfer compared to
>> 100s or even 1000s
>> + Power save, DMA consumes less power than CPU
>> - Less bandwidth / throughput compared to PIO-CPU
>>
>> The reason for introducing double buffering in the MMC framework is to
>> address the throughput issue for DMA on MMC.
>> The assumption is that the CPU and DMA have higher throughput than the
>> MMC / SD-card.
>> My hypothesis is that the difference in performance between PIO-mode
>> and DMA-mode for MMC is due to latency for preparing a DMA-job.
>> If the next DMA-job could be prepared while the current job is ongoing
>> this latency would be reduced. The biggest part of preparing a DMA-job
>> is maintenance of caches.
>> In my case I run on U5500 (mach-ux500) which has both L1 and L2
>> caches. The host mmc driver in use is the mmci driver (PL180).
>>
>> I have done a hack in both the MMC-framework and mmci in order to make
>> a prove of concept. I have run IOZone to get measurements to prove my
>> case worthy.
>> The next step, if the results are promising will be to clean up my
>> work and send out patches for review.
>>
>> The DMAC in ux500 support to modes LOG and PHY.
>> LOG - Many logical channels are multiplex on top of one physical channel
>> PHY - Only one channel per physical channel
>>
>> DMA mode LOG and PHY have different latency both HW and SW wise. One
>> could almost treat them as "two different DMACs. To get a wider test
>> scope I have tested using both modes.
>>
>> Summary of the results.
>> * It is optional for the mmc host driver to utitlize the 2-buf
>> support. 2-buf in framework requires no change in the host drivers.
>> * IOZone shows no performance hit on existing drivers* if adding 2-buf
>> to the framework but not in the host driver.
>>  (* So far I have only test one driver)
>> * The performance gain for DMA using 2-buf is probably proportional to
>> the cache maintenance time.
>>  The faster the card is the more significant the cache maintenance
>> part becomes and vice versa.
>> * For U5500 with 2-buf performance for DMA is:
>> Throughput: DMA vanilla vs DMA 2-buf
>>  * read +5-10 %
>>  * write +0-3 %
>> CPU load: CPU vs DMA 2-buf
>>  * read large data: minus 10-20 units of %
>>  * read small data: same as PIO
>>  * write: same load as PIO ( why? )
>>
>> Here follows two of the measurements from IOZones comparing MMC with
>> double buffering and without. The rest you can find in the text files
>> attached.
>>
>> === Performance CPU compared with DMA vanilla kernel ===
>> Absolute diff: MMC-VANILLA-CPU -> MMC-VANILLA-DMA-LOG
>>                                                        random  random
>>        KB      reclen  write   rewrite read    reread  read    write
>>        51200   4       -14     -8      -1005   -988    -679    -1
>>        cpu:            -0.0    -0.1    -0.8    -0.9    -0.7    +0.0
>>
>>        51200   8       -35     -34     -1763   -1791   -1327   +0
>>        cpu:            +0.0    -0.1    -0.9    -1.2    -0.7    +0.0
>>
>>        51200   16      +6      -38     -2712   -2728   -2225   +0
>>        cpu:            -0.1    -0.0    -1.6    -1.2    -0.7    -0.0
>>
>>        51200   32      -10     -79     -3640   -3710   -3298   -1
>>        cpu:            -0.1    -0.2    -1.2    -1.2    -0.7    -0.0
>>
>>        51200   64      +31     -16     -4401   -4533   -4212   -1
>>        cpu:            -0.2    -0.2    -0.6    -1.2    -1.2    -0.0
>>
>>        51200   128     +58     -58     -4749   -4776   -4532   -4
>>        cpu:            -0.2    -0.0    -1.2    -1.1    -1.2    +0.1
>>
>>        51200   256     +192    +283    -5343   -5347   -5184   +13
>>        cpu:            +0.0    +0.1    -1.2    -0.6    -1.2    +0.0
>>
>>        51200   512     +232    +470    -4663   -4690   -4588   +171
>>        cpu:            +0.1    +0.1    -4.5    -3.9    -3.8    -0.1
>>
>>        51200   1024    +250    +68     -3151   -3318   -3303   +122
>>        cpu:            -0.1    -0.5    -14.0   -13.5   -14.0   -0.1
>>
>>        51200   2048    +224    +401    -2708   -2601   -2612   +161
>>        cpu:            -1.7    -1.3    -18.4   -19.5   -17.8   -0.5
>>
>>        51200   4096    +194    +417    -2380   -2361   -2520   +242
>>        cpu:            -1.3    -1.6    -19.4   -19.9   -19.4   -0.6
>>
>>        51200   8192    +228    +315    -2279   -2327   -2291   +270
>>        cpu:            -1.0    -0.9    -20.8   -20.3   -21.0   -0.6
>>
>>        51200   16384   +254    +289    -2260   -2232   -2269   +308
>>        cpu:            -0.8    -0.8    -20.5   -19.9   -21.5   -0.4
>>
>> === Performance CPU compared with DMA with MMC double buffering ===
>> Absolute diff: MMC-VANILLA-CPU -> MMC-MMCI-2-BUF-DMA-LOG
>>                                                        random  random
>>        KB      reclen  write   rewrite read    reread  read    write
>>        51200   4       -7      -11     -533    -513    -365    +0
>>        cpu:            -0.0    -0.1    -0.5    -0.7    -0.4    +0.0
>>
>>        51200   8       -19     -28     -916    -932    -671    +0
>>        cpu:            -0.0    -0.0    -0.3    -0.6    -0.2    +0.0
>>
>>        51200   16      +14     -13     -1467   -1479   -1203   +1
>>        cpu:            +0.0    -0.1    -0.7    -0.7    -0.2    -0.0
>>
>>        51200   32      +61     +24     -2008   -2088   -1853   +4
>>        cpu:            -0.3    -0.2    -0.7    -0.7    -0.2    -0.0
>>
>>        51200   64      +130    +84     -2571   -2692   -2483   +5
>>        cpu:            +0.0    -0.4    -0.1    -0.7    -0.7    +0.0
>>
>>        51200   128     +275    +279    -2760   -2747   -2607   +19
>>        cpu:            -0.1    +0.1    -0.7    -0.6    -0.7    +0.1
>>
>>        51200   256     +558    +503    -3455   -3429   -3216   +55
>>        cpu:            -0.1    +0.1    -0.8    -0.1    -0.8    +0.0
>>
>>        51200   512     +608    +820    -2476   -2497   -2504   +154
>>        cpu:            +0.2    +0.5    -3.3    -2.1    -2.7    +0.0
>>
>>        51200   1024    +652    +493    -818    -977    -1023   +291
>>        cpu:            +0.0    -0.1    -13.2   -12.8   -13.3   +0.1
>>
>>        51200   2048    +654    +809    -241    -218    -242    +501
>>        cpu:            -1.5    -1.2    -16.9   -18.2   -17.0   -0.2
>>
>>        51200   4096    +482    +908    -80     +82     -154    +633
>>        cpu:            -1.4    -1.2    -19.1   -18.4   -18.6   -0.2
>>
>>        51200   8192    +643    +810    +199    +186    +182    +675
>>        cpu:            -0.8    -0.7    -19.8   -19.2   -19.5   -0.7
>>
>>        51200   16384   +684    +724    +275    +323    +269    +724
>>        cpu:            -0.6    -0.7    -19.2   -18.6   -19.8   -0.2
>>
>
> _______________________________________________
> linaro-dev mailing list
> linaro-dev@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-dev
>

_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Reply via email to