On Tue, Feb 19, 2019 at 01:12:07PM +0530, Anshuman Khandual wrote: > But the location of this temp page matters as well because you would like to > saturate the inter node interface. It needs to be either of the nodes where > the source or destination page belongs. Any other node would generate two > internode copy process which is not what you intend here I guess.
That makes no sense. It should be allocated on the local node of the CPU performing the copy. If the CPU is in node A, the destination is in node B and the source is in node C, then you're doing 4k worth of reads from node C, 4k worth of reads from node B, 4k worth of writes to node C followed by 4k worth of writes to node B. Eventually the 4k of dirty cachelines on node A will be written back from cache to the local memory (... or not, if that page gets reused for some other purpose first). If you allocate the page on node B or node C, that's an extra 4k of writes to be sent across the inter-node link.