First, checked if the AWS dashboard has any troubleshooting info for me. But "This user does not have permissions to view AWS Health events" - so I wouldn't get much help from it. Used my earlier post on investigating another
Linux server crash as well as ChatGPT for going step by step.
ChatGPT suggested these,
dmesg -T | grep -i oom
journalctl -k | grep -i oom
which had no results. No out-of-memory events seen with those commands.
du -sh /var/log showed 4 GB of journal
journalctl --disk-usage
also showed nearly 4 GB of current and archived logs.
But the system had enough hard disk space left,
df -h
showed around 30% remaining for the OS disk.
Checking authentication log with
tail /var/log/auth.log -n4000 | more
found an entry
2026-01-25T14:38:58.186744+05:30 ip-10-0-0-73 systemd-logind[600]: New seat seat0.
But ChatGPT said that's normal, this is apparently done for each login.
Then, copy-pasting the lines just before the crash from syslog, using T13 since the crash was around 13:30
sudo cat syslog | grep T13 > /home/theadminaccount/syslogsnippet.txt
ChatGPT immediately found the issue.
ourtestapi.service: Scheduled restart job, restart counter is at 101203
and the reason the service was failing was
The application '/home/thepath/www/ourapi/ourtest.dll' does not exist.
Stopped the service, disabled it, and deleted it from /etc/systemd/system
sudo systemctl stop ourtestapi
sudo systemctl disable ourtestapi
sudo rm /etc/systemd/system/ourtestapi.service
Additionally, as ChatGPT suggested, set up some limits for journalctl's logging by editing
/etc/systemd/journald.conf
and uncommenting these lines and adding values for them - under
[Journal]
SystemMaxUse=900M
SystemKeepFree=1G
MaxRetentionSec=30day
and restarted,
sudo systemctl restart systemd-journald
Also vacuumed (removed archived logs) with
sudo journalctl --vacuum-size=3000M
(0 bytes removed - archives were smaller than that)
sudo journalctl --vacuum-size=300M
(500M of archives were removed.)