Hello Igniters,

I'd like to discuss the current issue with "out of memory" fails on
TeamCity. Particularly suites [1]
and [2], they have quite a lot of "Exit code 137" failures.

I investigated the "PDS (Indexing)" suite under [3]. There's another
similar issue as well: [4].
I came to the conclusion that the main problem is inside the default memory
allocator (malloc).
Let me explain the way I see it right now:

"malloc" is allowed to allocate (for internal usages) up to 8 * (number of
cores) blocks called
ARENA, 64 mb each. This may happen when a program creates/stops threads
frequently and
allocates a lot of memory all the time, which is exactly what our tests do.
Given that TC agents
have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like the whole
amount of RAM on the
single agent.

The total amount of arenas can be manually lowered by setting
the MALLOC_ARENA_MAX
environment variable to 4 (or other small value). I tried it locally and in
PDS (Indexing) suite
settings on TC, results look very promising: [5]

It is said that changing this variable may lead to some performance
degradation, but it's hard to tell whether we have it or not, because the
suite usually failed before it was completed.

So, I have two questions right now:

- can those of you, who are into hardcore Linux and C, confirm that the
solution can help us? Experiments show that it completely solves the
problem.
- can you please point me to a person who usually does TC maintenance? I'm
not entirely sure
that I can propagate this environment variable to all suites by myself,
which is necessary to
avoid occasional error 137 (resulted from the same problem) in future. I
just don't know all the
details about suites structure.

Thank you!

[1]
https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E
[2]
https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed
[3] https://issues.apache.org/jira/browse/IGNITE-13266
[4] https://issues.apache.org/jira/browse/IGNITE-13263
[5]
https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead

-- 
Sincerely yours,
Ivan Bessonov

Reply via email to