Hello Igniters, I'd like to discuss the current issue with "out of memory" fails on TeamCity. Particularly suites [1] and [2], they have quite a lot of "Exit code 137" failures.
I investigated the "PDS (Indexing)" suite under [3]. There's another similar issue as well: [4]. I came to the conclusion that the main problem is inside the default memory allocator (malloc). Let me explain the way I see it right now: "malloc" is allowed to allocate (for internal usages) up to 8 * (number of cores) blocks called ARENA, 64 mb each. This may happen when a program creates/stops threads frequently and allocates a lot of memory all the time, which is exactly what our tests do. Given that TC agents have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like the whole amount of RAM on the single agent. The total amount of arenas can be manually lowered by setting the MALLOC_ARENA_MAX environment variable to 4 (or other small value). I tried it locally and in PDS (Indexing) suite settings on TC, results look very promising: [5] It is said that changing this variable may lead to some performance degradation, but it's hard to tell whether we have it or not, because the suite usually failed before it was completed. So, I have two questions right now: - can those of you, who are into hardcore Linux and C, confirm that the solution can help us? Experiments show that it completely solves the problem. - can you please point me to a person who usually does TC maintenance? I'm not entirely sure that I can propagate this environment variable to all suites by myself, which is necessary to avoid occasional error 137 (resulted from the same problem) in future. I just don't know all the details about suites structure. Thank you! [1] https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E [2] https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed [3] https://issues.apache.org/jira/browse/IGNITE-13266 [4] https://issues.apache.org/jira/browse/IGNITE-13263 [5] https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead -- Sincerely yours, Ivan Bessonov