Hi, I'd like to raise a topic here for discussion as I believe there may be a more fundamental architectural issue in our utilisation of Nuttx for 4-wire eMMC on a risc-based core and would like to get some help from the experts :) NOTE: some initial discussions on this topic are captured here: https://github.com/apache/nuttx/issues/9080
*Problem description:* To provide some background, we are utilising the littlefs filesystem on SD/eMMC on a risc-based emulated core and are encountering read/write speeds of approximately 75KiB/sec. Our architecture and hardware design will boot from eMMC and require close-to instant bootup time for a firmware image of ~2MiB. At 75KiB/s the image load itself would consume >30 seconds which is unacceptable for our products use case. Based on the hardware architecture, speeds of around 10MiB/sec to/from eMMC are what we are expecting/targeting. Examining the issue at hand, the issue does not seem to be related to the hardware (the same performance is observed when tested on an STM32), nor is it because of the use of LittleFS (however this does contribute). Interestingly, littlefs performs considerably better than vFAT when testing transfer speeds to/from SD/eMMC, which on the same setup FAT provides read/writes speeds of an unacceptable ~12KiB/s, compared to Littlefs's 75KiB/s. In comparison, Littlefs transfer speeds tested between RAM disks provides speeds of 560KiB/s for write - considerably slower than vFAT at ~3.2MiB/s. Even 3.2MiB/s for RAM-based transfers is slower than what we require for eMMC transfers, which suggests there may be an underlying architectural issue within the block driver and/or OS. *Configurations Tested:* For eMMC, I've tried optimising the menuconfig settings to improve it, including options such as below. However, the performance remains lacking: - Turning on CONFIG_MEMCPY_VIK gave slight improvement - Setting USEC_PER_TICK to 1000 or below gave most improvement, however at the detriment of other aspects of the system. The fastest speeds observed by adjusting this was ~370KiB/s write and 700KiB/s read though overall this was unacceptable given the effect on the rest of the system. - Adjusting LittleFS parameters (e.g. CONFIG_FS_LITTLEFS_PROGRAM_SIZE_FACTOR, CONFIG_FS_LITTLEFS_READ_SIZE_FACTOR, CONFIG_FS_LITTLEFS_BLOCK_SIZE_FACTOR, CONFIG_FS_LITTLEFS_CACHE_SIZE_FACTOR, CONFIG_FS_LITTLEFS_LOOKAHEAD_SIZE - Ensuring SD/eMMC DMA read/writes are enabled. - Setting MMCSD_MULTIBLOCK_LIMIT to 0 *Details of build:* - risc-v architecture using running kernel in S-mode, main clock running @ 75MHz, SD peripheral @ 25MHz - Toolchain: riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14 - Nuttx GIT version: 555506a5846ed2c46346755095ac5cd31b104509 - Built with commands: $ make distclean -j16 $ ./tools/configure.sh rv-virt:knsh64 $ make -j16 $ make export -j16 $ make import -j16 - Executed with command: $ qemu-system-riscv64 -semihosting -M virt,aclint=on -cpu rv64 -smp 8 -bios none -kernel nuttx -nographic -serial mon:stdio *Other Observations Made* In attempting to understand where the delays are coming from, the following observations were made: - The DMA used for our target includes 512-bytes buffers (1 block) for both read and write. - Every transmit/recieve block transfer consists of multiple memcpy's - e.g. each 512 bytes triggers the interrupt, which memcpy's from the DMA buffer into the receive buffer provided by littlefs driver, which subsequently restructures/memcpy's it into the filesystem structure. - Given the frequency of required interrupts due to 512-byte transfers, and the subsequent context switching and copying created by each interrupt, the speed at which the kernel services the interrupts becomes the throttle for the transfer speed. - We are running the core in secure-mode therefore the memcpy's are required due to the inability for the higher level drivers to directly access the kernel-space memory (DMA buffers). - Given read/write transfers between RAM disks on littlefs can only achieve 0.5MiB/s, I don't believe a larger DMA buffer would help improve transfer speeds to the minimal target of 10MiB/s. Given the observations, I feel there is a need for more complicated and fundamental changes within either the littlefs driver, block driver or Nuttx itself - the complexity of which is beyond my current understanding and therefore would greatly appreciate the thoughts and assistance of the experts on this. Perhaps this has already been observed by the OS maintainers and is an item with planned future or current work? Kind Regards, Radek. *Kind regards,* *Radek Pesina* Senior Electronics Engineer *[image: MoTeC] <http://www.motec.com.au>* [image: Facebook |] <http://www.facebook.com/motec.global>[image: YouTube |] <http://www.youtube.com/user/MoTeCAustralia>[image: Instagram |] <https://www.instagram.com/motec_global/>[image: LinkedIn |] <https://www.linkedin.com/company/motec-global>[image: Twitter] <https://twitter.com/motec_global> *MoTeC Pty Ltd* 121 Merrindale Drive Croydon South 3136 Victoria Australia *T: *61 3 9761 5050 *W: *www.motec.com Disclaimer Notice: This message, including any attachments, contains confidential information intended for a specific individual and purpose and is protected by law. If you are not the intended recipient you should delete this message. Any disclosure, copying, or distribution of this message or the taking of any action based on it is strictly prohibited.