Hello Ivan,

It feels like the problem is more about new starting threads rather than the
allocation of offheap regions. Plus I'd like to see results soon, your
proposal is
a major change for Ignite that can't be implemented fast enough.

Anyway, I think this makes sense, considering that one day Unsafe will be
removed. But I wouldn't think about it right now, maybe as a separate
proposal...



чт, 23 июл. 2020 г. в 13:40, Ivan Daschinsky <ivanda...@gmail.com>:

> Ivan, I think that we should use mmap/munmap to allocate huge chunks of
> memory.
>
> I've experimented with JNA and invoke mmap/munmap with it and it works
> fine.
> May be we can create module (similar to direct-io) that use mmap/munap on
> platforms, that support them
> and fallback to Unsafe if not?
>
> чт, 23 июл. 2020 г. в 13:31, Ivan Bessonov <bessonov...@gmail.com>:
>
> > Hello Igniters,
> >
> > I'd like to discuss the current issue with "out of memory" fails on
> > TeamCity. Particularly suites [1]
> > and [2], they have quite a lot of "Exit code 137" failures.
> >
> > I investigated the "PDS (Indexing)" suite under [3]. There's another
> > similar issue as well: [4].
> > I came to the conclusion that the main problem is inside the default
> memory
> > allocator (malloc).
> > Let me explain the way I see it right now:
> >
> > "malloc" is allowed to allocate (for internal usages) up to 8 * (number
> of
> > cores) blocks called
> > ARENA, 64 mb each. This may happen when a program creates/stops threads
> > frequently and
> > allocates a lot of memory all the time, which is exactly what our tests
> do.
> > Given that TC agents
> > have 32 cores, 8 * 32 * 64 mb gives 16 gigabytes, that's like the whole
> > amount of RAM on the
> > single agent.
> >
> > The total amount of arenas can be manually lowered by setting
> > the MALLOC_ARENA_MAX
> > environment variable to 4 (or other small value). I tried it locally and
> in
> > PDS (Indexing) suite
> > settings on TC, results look very promising: [5]
> >
> > It is said that changing this variable may lead to some performance
> > degradation, but it's hard to tell whether we have it or not, because the
> > suite usually failed before it was completed.
> >
> > So, I have two questions right now:
> >
> > - can those of you, who are into hardcore Linux and C, confirm that the
> > solution can help us? Experiments show that it completely solves the
> > problem.
> > - can you please point me to a person who usually does TC maintenance?
> I'm
> > not entirely sure
> > that I can propagate this environment variable to all suites by myself,
> > which is necessary to
> > avoid occasional error 137 (resulted from the same problem) in future. I
> > just don't know all the
> > details about suites structure.
> >
> > Thank you!
> >
> > [1]
> >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&state=failed&branch_IgniteTests24Java8=%3Cdefault%3E
> > [2]
> >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds4&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E&state=failed
> > [3] https://issues.apache.org/jira/browse/IGNITE-13266
> > [4] https://issues.apache.org/jira/browse/IGNITE-13263
> > [5]
> >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PdsIndexing&tab=buildTypeHistoryList&branch_IgniteTests24Java8=pull%2F8051%2Fhead
> >
> > --
> > Sincerely yours,
> > Ivan Bessonov
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>


-- 
Sincerely yours,
Ivan Bessonov

Reply via email to