Re: random process crashes on RPI3 and RPI4

Crystal Kolipe Sat, 06 Nov 2021 07:36:39 -0700

> > However, interestingly, the outputs from the md5
> > processes always seem correct, whereas I would expect them to be
> > wrong occasionally if some kind of memory corruption is happening..
> 
> I don't think there is any memory corruption,
> and have no idea why there would be.


Maybe I shouldn't have used the term "memory corruption", as that
suggests that individual memory locations are returning the wrong
values, (I.E. not what was previously written).  What I suspect is
actually happening when no swap is configured, is that the wrong
pages are being accessed, (due to bugs in the kernel VM code).

So memory isn't actually being corrupted per se, but the effect on the
running code would be similar, (it's seeing different values read
back from memory previously written).

Using an arm64 system configured without swap, I tested recursively
compiling kernels, booting into the newly compiled kernel, and then
compiling again, restarting the compile every time it stopped.  After
six or seven recursions the system became very unstable, suggesting
that the compiled code was incorrect, even though the build process
completed with no hard errors.  This to me suggests that bad data
is occasionally being read from RAM, probably due to reading the
wrong page.

> > Also worth noting is that it depends on what the process is doing.
> > I've run invocations of md5 -tt on all cores, loading the CPU 100%
> > for several hours and not seen a crash.  Yet a kernel compile fails
> > within minutes.  Presumably it's because the compiler is manipulating
> > a large number of pointers, and quickly tries to make an invalid
> > memory access.
> 
> I don't know what you mean by that. If a compiler (or any other process
> for that matter) "makes an invalid memory access", it should be killed,
> regardless of whether swap space exists or not.

What I mean is that if the process memory space is being corrupted, or
otherwise returning bad data on reads, then I would expect the md5
processes to continue running, but produce occasional bad hashes.  I
would expect the compiler to fail due to a segmentation fault, because
a lot of the 'data' that it handles will be pointers to memory locations
and if one of those is incorrectly read, the chances are very high that
it will point to a nonsense memory location, and cause a seg fault.


> > and with certain workloads may give better performance.
> > For example, a large operation on a database
> > generating temporary data in RAM that will not be used immediately
> 
> What would be an example of such an operation?

Imagine a 3D graphics rendering program that calculates vectors and other
data for a number of objects in a large scene.  Some of the objects are
hidden from view by other objects or not rendered because they are not
within the current scene.

The program doesn't necessarily know which of those vectors and other
associated data that have been calculated might be used as the viewpoint
changes, or when they might be needed, (seconds later, minutes later,
hours later, or never).  The data is worth keeping available, to avoid
the need to re-calculate it later, but keeping it in physical RAM would
mean that another process wanting to allocate and use a large amount of
memory for a short period of time would have to first wait for the
graphics data to be swapped out.

It's clear to see that memory which has been written but not read back
for a period of time is a good candidate to write to swap.  If the
data is then requested immediately after having written it to swap,
there is no need to read it back from disk, as the data in physical RAM
has not been overwritten.  So you have the best of both worlds, the data
is still in physical RAM, (fast), but that physical RAM can be immediately
re-used for something else if required, because the data also exists on
disk.

The reverse is also true.  Imagine that you have a lot of physical RAM
in use as buffer cache, then delete the corresponding files from disk.
That cached data is no longer needed.  Instead of just leaving the RAM
untouched, you could wait until the system was idle and then read data
from swap that had most recently been written out due to memory pressure.
It would then be ready in physical RAM if it was needed, and if not,
you've basically not wasted any resources because the system was already
idle.

> > might benefit from being swapped out in anticipation, so that
> > a future memory allocation can be made from physical ram immediately
> > without needing to swap out the other data first.
> 
> Does OpenBSD do any such "preemptive swapping"?

On amd64, data is moved from the lower 4 Gb into high memory to free up
more pages in the 32-bit DMA capable region.  This is not exactly the
same thing, but some of the same principles apply.

Re: random process crashes on RPI3 and RPI4

Reply via email to