On 28/06/2022 15:40, Bertrand Marquis wrote:
Hi Julien,
Hi Bertrand,
On 9 Jun 2022, at 09:30, Julien Grall <jul...@xen.org> wrote:
From: Hongyan Xia <hongy...@amazon.com>
The idea is to split the range into multiple aligned power-of-2 regions
which only needs to call free_heap_pages() once each. We check the least
significant set bit of the start address and use its bit index as the
order of this increment. This makes sure that each increment is both
power-of-2 and properly aligned, which can be safely passed to
free_heap_pages(). Of course, the order also needs to be sanity checked
against the upper bound and MAX_ORDER.
Testing on a nested environment on c5.metal with various amount
of RAM. Time for end_boot_allocator() to complete:
Before After
- 90GB: 1426 ms 166 ms
- 8GB: 124 ms 12 ms
- 4GB: 60 ms 6 ms
On a arm64 Neoverse N1 system with 32GB of Ram I have:
- 1180 ms before
- 63 ms after
and my internal tests are passing on arm64.
Thanks for the testing! The number are a lot better than I was actually
expecting on arm64.
Great optimisation :-)
You will have to thanks Hongyan. He came up with the idea :).
(I will do a full review of code the in a second step).
I am planning to send a new version in the next few days. So you may
want to wait before reviewing the series.
Cheers,
--
Julien Grall