On 11-Nov-18 2:19 AM, 范建明 wrote:
Hi, Burakov

   Thanks very much for your reply.
I run the testpmd dpdk18.08 on my router with 200GB huge page configured.
   And find it still takes 50s in zeroing the 200GB huge page each time the 
program restarts.
   As you mentioned app shall calls rte_eal_cleanup() to do the cleanup.
   However, this api is not designed to accelerate startup time.
   >>
      During rte_eal_init() EAL allocates memory from hugepages to enable its 
core libraries to perform their tasks.
         The rte_eal_cleanup() function releases these resources, ensuring that 
no hugepage memory is leaked.
         It is expected that all DPDK applications call rte_eal_cleanup() 
before exiting.
         Not calling this function could result in leaking hugepages, leading 
to failure during initialization of secondary processes.
   >>
   I guess you suggest to use secondary process which uses share memory for 
fast startup. However, as you know, exist applications need to do a lot of 
change to use it well.

Hi,

No, what i was suggesting is to use this function on application exit (as well as free any other used memory, such as mempools, hashtables, minor things allocated through rte_malloc, etc.). This will allow the memory subsystem to release hugepage memory back to the system after DPDK shutdown.

And You mentioned faster initialization is one of the key reasons why the new memory subsystem was developed.
  However, as I read the following code, I guess the community doesn't really 
consider the power of reuse the exist hugepage fs page cache.
        hugepage_info_init(void)
        {
                /       * clear out the hugepages dir from unused pages */
                if (clear_hugedir(hpi->hugedir) == -1)
                 break;
        }


...which wouldn't be necessary if those hugepages weren't used in the first place! If your DPDK process has released all of its memory before shutdown, there would have been no hugepages to reuse. Granted, our sample applications do not follow this convention and routinely leak memory, but it is possible to fix this problem.

For example, if you run test application from current master without any hardware devices, and then exit - there will be no leftover hugepage files after test app exit. That's because test application will clean after itself and release all hugepage memory back to the system at shutdown. For example:

anatoly@xxx:~$ ls /mnt/huge*/
/mnt/huge/:

/mnt/huge_1G/:

### as you can see, no hugepages are allocated
### now run test app with non-existent hardware

anatoly@xxx:~$ sudo build/DPDK/app/test -w 01:00.0
EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
APP: HPET is not enabled, using TSC as default timer
RTE>>memzone_autotest

### runs memzone autotest, which allocates and free a bunch of memory

RTE>>quit

### test app quit, now check hugetlbfs again

anatoly@silpixa00399498:~$ ls /mnt/huge*/
/mnt/huge/:

/mnt/huge_1G/:

### as you can see - no hugepages are leftover from execution

The "non-existent device" part is because currently, test application does not clean up any driver-allocated memory (and i don't even think there's a way to do that without hot-unplugging everything right before shutdown), but the key point is - you wouldn't have had gigabytes' worth of hugepages to clean if you cleaned up all your memory before application shutdown. Device memory takes up maybe 10-12 megabytes per device (depending on device, of course).

Also, like Stephen mentioned, leftover hugepages would still be a problem after a crash, but you'd get *huge* problems reusing that memory, which is great reason to dislike this patch as well. But even putting crash scenarios aside, given that in normal and correct API usage, the problem you are trying to solve is not a problem in the first place, and i don't think reusing page cache is the correct way to do this. The proper way to fix the problem is to fix your application to release all of its DPDK memory on shutdown.



   The key to this patch is that it takes advantage of the page cache of huge 
page fs.
   with this patch, when you first startup the program, the following steps by 
be taken.
1. user space: create files under /dev/hugepages
   2. user space: do mmap with shared and populate flag set.
   3. kernel space:
       3.1 find free vma.
          3.2 alloc huge page from huge page pool reserved by the hugepage fs.
          3.3 call clear_huge_page to zero the page.
                        ******************This step is very 
time-consuming********************
                
          3.4 insert the page to the file inode's page cache
          3.5 insert the page into the page table
        
        
   then if you restart the program, the following steps will be taken
   1. user space: open files under /dev/hugepages.
   2. user space: do mmap with shared and populate flag set.
   3. kernel space:
       3.1 find free vma
          3.2 it search the file's inode page cache, and find there is page 
there.
          3.3 insert the page into the page table
   Note restart the program doesn't need to do clear_huge_page any more!!!
   Btw, i worked for intel serveral years ago. It's a great place to work.

Best regards
jianming

-----邮件原件-----
发件人: Burakov, Anatoly [mailto:anatoly.bura...@intel.com]
发送时间: 2018年11月9日 22:03
收件人: jianmingfan <jianming...@126.com>; dev@dpdk.org
抄送: 范建明 <fanjianm...@jd.com>
主题: Re: [dpdk-dev] [PATCH v2] mem: accelerate dpdk program startup by reuse 
page from page cache

On 09-Nov-18 12:20 PM, Burakov, Anatoly wrote:
On 09-Nov-18 9:23 AM, jianmingfan wrote:
--- fix coding style of the previous patch

During procless startup, dpdk invokes clear_hugedir() to unlink all
hugepage files under /dev/hugepages. Then in map_all_hugepages(), it
invokes mmap to allocate and zero all the huge pages as configured in
/sys/kernel/mm/hugepages/xxx/nr_hugepages.

This cause startup process extreamly slow with large size of huge
page configured.

In our use case, we usually configure as large as 200GB hugepages in
our router. It takes more than 50s each time dpdk process startup to
clear the pages.

To address this issue, user can turn on --reuse-map switch. With it,
dpdk will check the validity of the exiting page cache under
/dev/hugespages. If valid, the cache will be reused not deleted, so
that the os doesn't need to zero the pages again.

However, as there are a lot of users ,e.g. rte_kni_alloc, rely on the
os zeor page behavior. To keep things work, I add memset during
malloc_heap_alloc(). This makes sense due to the following reason.
1) user often configure hugepage size too large to be used by the
program.
In our router, 200GB is configured, but less than 2GB is actually used.
2) dpdk users don't call heap allocation in performance-critical path.
They alloc memory during process bootup.

Signed-off-by: Jianming Fan <fanjianm...@jd.com>
---

I believe this issue is better solved by actually fixing all of the
memory that DPDK leaves behind. We already have rte_eal_cleanup() call
which will deallocate any EAL-allocated memory that have been
reserved, and an exited application should free any memory it was
using so that memory subsystem could free it back to the system,
thereby not needing any cleaning of hugepages at startup.

If your application does not e.g. free its mempools on exit, it should
:) Chances are, the problem will go away. The only circumstance where
this may not work is if you preallocated your memory using
-m/--socket-mem flag.


To clarify - all of the above is only applicable to 18.05 and beyond.
The map_all_hugepages() function only gets called in the legacy mem
init, so this patch solves a problem that does not exist on recent DPDK
versions in the first place - faster initialization is one of the key
reasons why the new memory subsystem was developed.



--
Thanks,
Anatoly

Reply via email to