[Hampshire] Repeated server crash overnight

2023-03-13 Thread rmluglist2--- via Hampshire
Hi all

 

I have an Ubuntu box which is on 24/7/365.   It has ufw running allowing
nothing from outside my lan.

 

A couple of times recently, I've come in to find the machine locked up with
a lot of disk access (it can be ping'd but I can't ssh into it and it
doesn't respond to mouse or keyboard on the console - only power cycling
brings it back).   As I say, this has now happened twice in the last 3-4
nights.

 

It may have been hacked (but I doubt it looking at kern.log and auth.log -
and I'm behind a NAT router with no ports open).   Does anyone know if
Ubuntu (Jammy) does some indexing or some other regular task overnight?
The reason I ask is I'm wondering if it's said indexing that's crashed the
(very old) system.   It's fine for a file server but not really fit for
anything else.   Incidentally, I've checked crontab and there's nothing in
there.

 

Anything else I should be checking?

 

Cheers

Rob

 

 

-- 
Please post to: Hampshire@mailman.lug.org.uk
Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL: http://www.hantslug.org.uk
--


Re: [Hampshire] Repeated server crash overnight

2023-03-13 Thread Simon Reap via Hampshire
Hi Rob, anything in /etc/cron.daily which would run at or about 
midnight? Or any files in /var/spool/cron/crontabs or /etc/cron.d? Or 
even a self-re-scheduling "at" job? ("sudo atq" will list any pending jobs)


On 13/03/2023 08:02, rmluglist2--- via Hampshire wrote:


Hi all

I have an Ubuntu box which is on 24/7/365.   It has ufw running 
allowing nothing from outside my lan.


A couple of times recently, I’ve come in to find the machine locked up 
with a lot of disk access (it can be ping’d but I can’t ssh into it 
and it doesn’t respond to mouse or keyboard on the console – only 
power cycling brings it back).   As I say, this has now happened twice 
in the last 3-4 nights.


It may have been hacked (but I doubt it looking at kern.log and 
auth.log – and I’m behind a NAT router with no ports open).   Does 
anyone know if Ubuntu (Jammy) does some indexing or some other regular 
task overnight?   The reason I ask is I’m wondering if it’s said 
indexing that’s crashed the (very old) system.   It’s fine for a file 
server but not really fit for anything else.   Incidentally, I’ve 
checked crontab and there’s nothing in there.


Anything else I should be checking?

Cheers

Rob

-- 
Please post to: Hampshire@mailman.lug.org.uk
Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL: http://www.hantslug.org.uk
--


Re: [Hampshire] Repeated server crash overnight

2023-03-13 Thread Gareth Evans via Hampshire
Hi Rob,

You didn't say if you had checked

 /var/log/syslog

Is there anything indicative of the issue there?

The only indexing task I can think of is updatedb for locate, which I think is 
a cron.daily thing - haven't used Ubuntu for a few years so may be wrong.  

Which filesystem(s)?

Do you have anything like recoll installed?

Best wishes,
Gareth


> On 13 Mar 2023, at 08:03, rmluglist2--- via Hampshire 
>  wrote:
> 
> 
> Hi all
>  
> I have an Ubuntu box which is on 24/7/365.   It has ufw running allowing 
> nothing from outside my lan.
>  
> A couple of times recently, I’ve come in to find the machine locked up with a 
> lot of disk access (it can be ping’d but I can’t ssh into it and it doesn’t 
> respond to mouse or keyboard on the console – only power cycling brings it 
> back).   As I say, this has now happened twice in the last 3-4 nights.
>  
> It may have been hacked (but I doubt it looking at kern.log and auth.log – 
> and I’m behind a NAT router with no ports open).   Does anyone know if Ubuntu 
> (Jammy) does some indexing or some other regular task overnight?   The reason 
> I ask is I’m wondering if it’s said indexing that’s crashed the (very old) 
> system.   It’s fine for a file server but not really fit for anything else.   
> Incidentally, I’ve checked crontab and there’s nothing in there.
>  
> Anything else I should be checking?
>  
> Cheers
> Rob
>  
>  
> -- 
> Please post to: Hampshire@mailman.lug.org.uk
> Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
> LUG URL: http://www.hantslug.org.uk
> --
-- 
Please post to: Hampshire@mailman.lug.org.uk
Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL: http://www.hantslug.org.uk
--


Re: [Hampshire] Repeated server crash overnight

2023-03-13 Thread rmluglist2--- via Hampshire
[snip]

> /var/log/syslog

> 

>Is there anything indicative of the issue there?

 

Nothing that I can see.   All I can tell is something called freshclam which 
I’d never even heard of.   Ufw is blocking a lot of requests – but only from 
two media clients (box in question is my media server) so I don’t think it’s 
that.

 

[snip]

 

>Which filesystem(s)?

 

I’m assuming it’s / - how do I tell?

 

>Do you have anything like recoll installed?

 

No.   Never heard of it.

 

By the looks of it, it’s something to do with:

[system] Failed to activate service 'org

.freedesktop.nm_dispatcher': timed out (service_start_timeout=25000ms)

 

This (from auth.log) is the only thing I can see which isn’t to do with local 
media clients (minidlna etc).

 

Cheers

Rob

-- 
Please post to: Hampshire@mailman.lug.org.uk
Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL: http://www.hantslug.org.uk
--


Re: [Hampshire] Repeated server crash overnight

2023-03-13 Thread Gareth Evans via Hampshire


> On 13 Mar 2023, at 13:31, rmluglist2--- via Hampshire 
>  wrote:
> 
> 
> [snip]
> > /var/log/syslog
> > 
> >Is there anything indicative of the issue there?
>  
> Nothing that I can see.   All I can tell is something called freshclam which 
> I’d never even heard of.  

That's the automatic updater for clamav (antivirus) definitions/sigs etc


> Ufw is blocking a lot of requests – but only from two media clients (box in 
> question is my media server) so I don’t think it’s that.
>  
> [snip]
>  

> >Which filesystem(s)?
>  
> I’m assuming it’s / - how do I tell?

Sorry I meant ext4? Btrfs? Other?


>  
> >Do you have anything like recoll installed?
>  
> No.   Never heard of it.
>  
> By the looks of it, it’s something to do with:
> [system] Failed to activate service 'org
> .freedesktop.nm_dispatcher': timed out (service_start_timeout=25000ms)

I can't find much that's instructive about that error in isolation from a quick 
Google/ddg search.

Does 

sudo journalctl -b

show similar issues, and anything near it? (History of previous boot)

>  
> This (from auth.log) is the only thing I can see which isn’t to do with local 
> media clients (minidlna etc).
>  
> Cheers
> Rob
> -- 
> Please post to: Hampshire@mailman.lug.org.uk
> Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
> LUG URL: http://www.hantslug.org.uk
> --
-- 
Please post to: Hampshire@mailman.lug.org.uk
Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL: http://www.hantslug.org.uk
--


Re: [Hampshire] Repeated server crash overnight

2023-03-13 Thread James Dutton via Hampshire
On Mon, 13 Mar 2023 at 08:03, rmluglist2--- via Hampshire <
hampshire@mailman.lug.org.uk> wrote:

> Hi all
>
>
>
> I have an Ubuntu box which is on 24/7/365.   It has ufw running allowing
> nothing from outside my lan.
>
>
>
> A couple of times recently, I’ve come in to find the machine locked up
> with a lot of disk access (it can be ping’d but I can’t ssh into it and it
> doesn’t respond to mouse or keyboard on the console – only power cycling
> brings it back).   As I say, this has now happened twice in the last 3-4
> nights.
>
>
>
> I have seen this behaviour sometimes.
By default Linux can block all interactive conversations when using high
disk access
High disk access can be caused by a number of things:
1) some app actually needs the disk
2) Faults on the disk, causing many retries.
3) Swap file access

After a reboot, you can look for faults on the disk with "smartctl -a
/dev/sda" and see if there are any log messages there about failed sectors,
or sector reallocation counts increasing etc.

If an app needs the disk, it is probably something kicked off by cron.
You can force these apps to use a lower priority for io with "ionice"
Google ionice for suitable ways to run it.
But, I think a good diagnosis is probably to disable cron altogether for
say a week, and see if the problem disappears.
Then at least you will then know that cron and the apps it runs are the
problem.

Another possible cause, is an app causing it to run low on memory that
results in unpredictable behaviour when memory allocation fails, and it
seems a lot of programs don't behave well when that happens. This might
also cause excessive swap file access.

These are all problems that are difficult to diagnose while they are
happening, so the trick is to set up monitoring to watch for each of the
cases.
E.g. take metrics of free RAM and when the fault happens, you can look at
the metrics graph, to see if that is the problem etc.
take metrics of the disk access on a per app basis.
Normally the lock up will not be immediate, it will get slow first and then
eventually lock up. So at least some metrics are written before the lock up.

Kind Regards

James
-- 
Please post to: Hampshire@mailman.lug.org.uk
Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL: http://www.hantslug.org.uk
--


Re: [Hampshire] Repeated server crash overnight

2023-03-13 Thread Brad Macpherson via Hampshire

G'day all,

On 13/03/2023 14:32, James Dutton via Hampshire wrote:
On Mon, 13 Mar 2023 at 08:03, rmluglist2--- via Hampshire 
mailto:hampshire@mailman.lug.org.uk>> wrote:


Hi all

__ __

I have an Ubuntu box which is on 24/7/365.   It has ufw running
allowing nothing from outside my lan.

__ __

A couple of times recently, I’ve come in to find the machine locked
up with a lot of disk access (it can be ping’d but I can’t ssh into
it and it doesn’t respond to mouse or keyboard on the console – only
power cycling brings it back).   As I say, this has now happened
twice in the last 3-4 nights.

__ __


I have seen this behaviour sometimes.
By default Linux can block all interactive conversations when using high 
disk access

High disk access can be caused by a number of things:
1) some app actually needs the disk
2) Faults on the disk, causing many retries.
3) Swap file access

After a reboot, you can look for faults on the disk with "smartctl -a  
/dev/sda" and see if there are any log messages there about failed 
sectors, or sector reallocation counts increasing etc.


If an app needs the disk, it is probably something kicked off by cron.
You can force these apps to use a lower priority for io with "ionice"   
Google ionice for suitable ways to run it.
But, I think a good diagnosis is probably to disable cron altogether for 
say a week, and see if the problem disappears.
Then at least you will then know that cron and the apps it runs are the 
problem.




I've seen this behaviour with ClamAV; in the end I had to remove it. The 
database gets to a certain point where it won't fit in memory along with 
the rest of the system; swap doesn't help, you'd need to add RAM to 
accommodate it.


https://unix.stackexchange.com/questions/114709/how-to-reduce-clamav-memory-usage/278110

Another possible cause, is an app causing it to run low on memory that 
results in unpredictable behaviour when memory allocation fails, and it 
seems a lot of programs don't behave well when that happens. This might 
also cause excessive swap file access.


These are all problems that are difficult to diagnose while they are 
happening, so the trick is to set up monitoring to watch for each of the 
cases.
E.g. take metrics of free RAM and when the fault happens, you can look 
at the metrics graph, to see if that is the problem etc.

take metrics of the disk access on a per app basis.
Normally the lock up will not be immediate, it will get slow first and 
then eventually lock up. So at least some metrics are written before the 
lock up.


Kind Regards

James



HTH

Brad




OpenPGP_signature
Description: OpenPGP digital signature
-- 
Please post to: Hampshire@mailman.lug.org.uk
Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL: http://www.hantslug.org.uk
--