Hi,

I'd like to raise a topic here for discussion as I believe there may be a
more fundamental architectural issue in our utilisation of Nuttx for 4-wire
eMMC on a risc-based core and would like to get some help from the experts
:)  NOTE: some initial discussions on this topic are captured here:
https://github.com/apache/nuttx/issues/9080

*Problem description:*

To provide some background, we are utilising the littlefs filesystem on
SD/eMMC on a risc-based emulated core and are encountering read/write
speeds of approximately 75KiB/sec.  Our architecture and hardware design
will boot from eMMC and require close-to instant bootup time for a firmware
image of ~2MiB. At 75KiB/s the image load itself would consume >30 seconds
which is unacceptable for our products use case. Based on the hardware
architecture, speeds of around 10MiB/sec to/from eMMC are what we are
expecting/targeting.

Examining the issue at hand, the issue does not seem to be related to the
hardware (the same performance is observed when tested on an STM32), nor is
it because of the use of LittleFS (however this does contribute).
Interestingly, littlefs performs considerably better than vFAT when testing
transfer speeds to/from SD/eMMC, which on the same setup FAT provides
read/writes speeds of an unacceptable ~12KiB/s, compared to Littlefs's
75KiB/s. In comparison, Littlefs transfer speeds tested between RAM disks
provides speeds of 560KiB/s for write - considerably slower than vFAT at
~3.2MiB/s.  Even 3.2MiB/s for RAM-based transfers is slower than what we
require for eMMC transfers, which suggests there may be an underlying
architectural issue within the block driver and/or OS.

*Configurations Tested:*

For eMMC, I've tried optimising the menuconfig settings to improve it,
including options such as below.   However, the performance remains lacking:

   - Turning on CONFIG_MEMCPY_VIK gave slight improvement
   - Setting USEC_PER_TICK to 1000 or below gave most improvement, however
   at the detriment of other aspects of the system. The fastest speeds
   observed by adjusting this was ~370KiB/s write and 700KiB/s read though
   overall this was unacceptable given the effect on the rest of the system.
   - Adjusting LittleFS parameters (e.g.
   CONFIG_FS_LITTLEFS_PROGRAM_SIZE_FACTOR,
   CONFIG_FS_LITTLEFS_READ_SIZE_FACTOR, CONFIG_FS_LITTLEFS_BLOCK_SIZE_FACTOR,
   CONFIG_FS_LITTLEFS_CACHE_SIZE_FACTOR, CONFIG_FS_LITTLEFS_LOOKAHEAD_SIZE
   - Ensuring SD/eMMC DMA read/writes are enabled.
   - Setting MMCSD_MULTIBLOCK_LIMIT to 0

*Details of build:*

   - risc-v architecture using running kernel in S-mode, main clock running
   @ 75MHz, SD peripheral @ 25MHz
   - Toolchain:
   riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14
   - Nuttx GIT version: 555506a5846ed2c46346755095ac5cd31b104509
   - Built with commands:

   $ make distclean -j16
   $ ./tools/configure.sh rv-virt:knsh64
   $ make -j16
   $ make export -j16
   $ make import -j16


   - Executed with command:
   $ qemu-system-riscv64 -semihosting -M virt,aclint=on -cpu rv64 -smp 8
   -bios none -kernel nuttx -nographic -serial mon:stdio

*Other Observations Made*

In attempting to understand where the delays are coming from, the following
observations were made:


   - The DMA used for our target includes 512-bytes buffers (1 block) for
   both read and write.
   - Every transmit/recieve block transfer consists of multiple memcpy's -
   e.g. each 512 bytes triggers the interrupt, which memcpy's from the DMA
   buffer into the receive buffer provided by littlefs driver, which
   subsequently restructures/memcpy's it into the filesystem structure.
   - Given the frequency of required interrupts due to 512-byte transfers,
   and the subsequent context switching and copying created by each interrupt,
   the speed at which the kernel services the interrupts becomes the throttle
   for the transfer speed.
   - We are running the core in secure-mode therefore the memcpy's are
   required due to the inability for the higher level drivers to directly
   access the kernel-space memory (DMA buffers).
   - Given  read/write transfers between RAM disks on littlefs can only
   achieve 0.5MiB/s, I don't believe  a larger DMA buffer would help improve
   transfer speeds to the minimal target of 10MiB/s.


Given the observations, I feel there is a need for more complicated and
fundamental changes within either the littlefs driver, block driver
or Nuttx itself - the complexity of which is beyond my current
understanding and therefore would greatly appreciate the thoughts and
assistance of the experts on this.  Perhaps this has already been observed
by the OS maintainers and is an item with planned future or current work?

Kind Regards,
Radek.
*Kind regards,*

*Radek Pesina*
Senior Electronics Engineer

*[image: MoTeC] <http://www.motec.com.au>*
[image: Facebook |] <http://www.facebook.com/motec.global>[image: YouTube |]
<http://www.youtube.com/user/MoTeCAustralia>[image: Instagram |]
<https://www.instagram.com/motec_global/>[image: LinkedIn |]
<https://www.linkedin.com/company/motec-global>[image: Twitter]
<https://twitter.com/motec_global>

*MoTeC Pty Ltd*
121 Merrindale Drive
Croydon South 3136
Victoria Australia
*T: *61 3 9761 5050
*W: *www.motec.com

Disclaimer Notice: This message, including any attachments, contains
confidential information intended for a specific individual and purpose and
is protected by law. If you are not the intended recipient you should
delete this message. Any disclosure, copying, or distribution of this
message or the taking of any action based on it is strictly prohibited.

Reply via email to