Just to add to what Ondrej said - there are two different settings in the 
initial cgroup integration implemented.
One allows to over-commit memory as long as there is no memory pressure in the 
kernel. But the actual
behavior depends on the Linux kernel. For debugging what Grid Engine set you 
can inspect the cgroup
settings in the filesystem on the exec host.

Maybe that helps: 
http://www.gridengine.eu/index.php/grid-engine-internals/171-main-memory-limitation-with-grid-engine-a-short-introduction-into-cgroups-20130825
 
<http://www.gridengine.eu/index.php/grid-engine-internals/171-main-memory-limitation-with-grid-engine-a-short-introduction-into-cgroups-20130825>

Cheers

Daniel

> Am 07.08.2020 um 20:10 schrieb Ondrej Valousek <ondrej.valou...@diasemi.com>:
> 
> Well, keep in mind that h_vmem control the job's max virtual memory whereas 
> m_mem_free controls the cgroup virtual memory. This looks similar, but works 
> differently:
> h_vmem: takes into account allocations, once the job wants more, allocation 
> (i.e. malloc) is refused. Job needs to handle the situation. SGE is not 
> killing any jobs.
> M_mem_free: controls cgroups which (depending on the running kernel memory 
> overcommit configuration) might have the same effect, but in case overcommit 
> is enabled (it is by default) yields into different behaviour - malloc 
> allocations typically succeeds, but kernel oom killer can reap the task if is 
> too memory hungry.
> 
> HTH,
> Ondrej
> 
> Get Outlook for Android <https://aka.ms/ghei36>
> From: berg...@merctech.com <berg...@merctech.com>
> Sent: Friday, August 7, 2020 7:11:11 PM
> To: Ondrej Valousek <ondrej.valou...@diasemi.com>
> Cc: users@gridengine.org <users@gridengine.org>; Trimboli, David 
> <trimb...@cshl.edu>
> Subject: Re: [gridengine users] m_mem_free and cgroups
>  
> In the message dated: Fri, 07 Aug 2020 16:24:16 -0000,
> The pithy ruminations from Ondrej Valousek on 
> [Re: [gridengine users] m_mem_free and cgroups] were:
> => Short answer: Use a different tool than stress Long answer: linux kernel
> => is too clever for tests like stress because allocating a memory is
> => one thing (which is taken only like "alright, i'll see what i can do,
> => here is the pointer") but actually _using_ that memory is something
> => completely different.
> 
> Yep.  In our environment (SoGE 8.1.9, using "h_vmem") we see the same
> thing -- memory allocations in the multi-TB range 'succeed' but SGE
> promptly (and correctly) kills jobs only when they actually consume more
> than their h_vmem request.
> 
> I don't know if this list allows attachments, so I'm including some bad
> C code (below) that I use for just this kind of testing. Once compiled,
> it'll take a memory specification, allocate & fill that memory, sleep,
> then exit.
> 
> This deliberately makes the process run long enough to capture some
> memory profiling info.
> 
> Thanks,
> 
> Mark
> 
> /******************************************************************/
> /*      gcc -Wall -o use_memory  this_file.c                    */
> /*                                                                */
> /* Usage:                                                        */
> /*      use_memory 200G                                    */
> /******************************************************************/
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> 
> int main (int argc, char *argv[])
> {
>         unsigned long int i;
>         long unsigned int tenthsize;
>         long unsigned int size;
>         long unsigned int maxsize;
>         char *unit = NULL;
>         long int multiplier = 1;
>         char *data_ptr = NULL;
>         int length = 0;
>         const int snooze = 5 * 1;       /* sleep for 1 minutes */
> 
>         if (argc < 2 || argc > 2)
>         {
>                 fprintf (stderr, "Allocate memory for testing.\n\n");
>                 fprintf (stderr, "Usage: use_memory <memsize>\n\n");
>                 fprintf (stderr, "<memsize>   Size of memory to allocate. 
> Optionally, a unit\n");
>                 fprintf (stderr, "            specification can be appended 
> to this number.\n");
>                 fprintf (stderr, "            Valid units are B, K, M, and 
> G.\n");
> 
>                 if (argc > 2)
>                         fprintf (stderr, "\n\nInvalid arguments (%d)\n", 
> argc);
>                 exit (2);
>         }
> 
>         length = strlen (argv[1]);
>         unit = strdup (argv[1]);
>         unit[0] = unit[length - 1];
>         unit[1] = '\0';
> 
>         switch (*unit)
>         {
>                 case 'B':
>                         multiplier = 1;
>                         argv[1][length - 1] = '\0';     /* throw away the 
> units character */
>                         break;
> 
>                 case 'K':
>                         multiplier = 1024;
>                         argv[1][length - 1] = '\0';     /* throw away the 
> units character */
>                         break;
> 
>                 case 'M':
>                         multiplier = 1024 * 1024;
>                         argv[1][length - 1] = '\0';     /* throw away the 
> units character */
>                         break;
> 
>                 case 'G':
>                         multiplier = 1024 * 1024 * 1024;
>                         argv[1][length - 1] = '\0';     /* throw away the 
> units character */
>                         break;
>         }
> 
>         free (unit);
> 
>         maxsize = multiplier * atof (argv[1]);
>         if (maxsize == 0)
>         {
>                 fprintf (stderr, "Invalid memory size.\n");
>                 exit (1);
>         }
> 
>         tenthsize=maxsize / 10;
>         size=maxsize / 10;
> 
>         while ( size <= maxsize )
>         {
>                 size=size + tenthsize;
>                 printf ("About to allocate %ld bytes\n", size);
>                 data_ptr = (char *) malloc (size);
>                 if (data_ptr == NULL)
>                 {
>                         fprintf (stderr, "Could not allocate memory.\n");
>                         fflush (stderr);
>                         exit (1);
>                 }
>                 fprintf (stdout, "Memory allocation succeeded\nFilling:\n");
>                 fflush (stdout);
> 
>                 /* put some values into memory */
>                 /* print tick marks at each 10th & 100th of the allocation, 
> so we know something is
>                  * happening. */
>                 int hundredth = (int) ((size - 1) / 100);
>                 long unsigned int tenth = (long unsigned int) ((size - 1) / 
> 10);
> 
>                 for (i = 0; i < (size - 1); i++)
>                 {
>                         data_ptr[i] = 'a';
>                         if ((i % tenth) == 0)
>                         {
>                                 fprintf (stdout, "+");
>                                 fflush (stdout);
>                         }
>                         else
>                         {
>                                 if ((i % hundredth) == 0)
>                                 {
>                                         fprintf (stdout, ".");
>                                         fflush (stdout);
>                                 }
>                         }
>                 }
> 
>                 fprintf (stdout, "\nSleeping...");
>                 fflush (stdout);
>                 sleep (snooze);
> 
>                 // Cleanup our allocated memory
>                 free (data_ptr);
>                 data_ptr = NULL;
>                 printf ("done\n");
>         }
>         return 0;
> }
> /******************************************************************/
> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to