Ivan, I think that we should use mmap/munmap to allocate huge chunks of memory.
I've experimented with JNA and invoke mmap/munmap with it and it works fine. May be we can create module (similar to direct-io) that use mmap/munap on platforms, that support them and fallback to Unsafe if not? чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov <bessonov...@gmail.com>: > Hello Igniters, > > I'd like to discuss the current issue with "out of memory" fails on > TeamCity. Particularly suites [1] > and [2], they have quite a lot of "Exit code 137" failures. > > I investigated the "PDS (Indexing)" suite under [3]. There's another > similar issue as well: [4]. > I came to the conclusion that the main problem is inside the default memory > allocator (malloc). > Let me explain the way I see it right now: > > "malloc" is allowed to allocate (for internal usages) up to 8 * (number of > cores) blocks called > ARENA, 64 mb each. This may happen when a program creates/stops threads > frequently and > allocates a lot of memory all the time, which is exactly what our tests do. > Given that TC agents > have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like the whole > amount of RAM on the > single agent. > > The total amount of arenas can be manually lowered by setting > the MALLOC_ARENA_MAX > environment variable to 4 (or other small value). I tried it locally and in > PDS (Indexing) suite > settings on TC, results look very promising: [5] > > It is said that changing this variable may lead to some performance > degradation, but it's hard to tell whether we have it or not, because the > suite usually failed before it was completed. > > So, I have two questions right now: > > - can those of you, who are into hardcore Linux and C, confirm that the > solution can help us? Experiments show that it completely solves the > problem. > - can you please point me to a person who usually does TC maintenance? I'm > not entirely sure > that I can propagate this environment variable to all suites by myself, > which is necessary to > avoid occasional error 137 (resulted from the same problem) in future. I > just don't know all the > details about suites structure. > > Thank you! > > [1] > > https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E > [2] > > https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed > [3] https://issues.apache.org/jira/browse/IGNITE-13266 > [4] https://issues.apache.org/jira/browse/IGNITE-13263 > [5] > > https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead > > -- > Sincerely yours, > Ivan Bessonov > -- Sincerely yours, Ivan Daschinskiy