Saturday, November 09, 2024

OpenSpace - running version 0.20.1 on Linux Mint 22

The space visualization tool OpenSpace - https://www.openspaceproject.com/ - has a deb file for download, but that leads to dependency issues with libboost - version 1.83-dev is what is installed for Ubuntu 24.04 on which Linux Mint 22 is based, and the .deb file depends on version 1.74-dev. So, advised to compile from source.

Following the instructions at Ubuntu — OpenSpace documentation - except had to replace 
sudo apt install libboost1.74-dev
with
sudo apt install libboost-all-dev following https://askubuntu.com/questions/922625/cannot-install-libboost-all-dev-because-dependencies-are-not-getting-installed

On the first try, with the downloaded source from the github releases page, got errors - 
include could not find requested file:
    /path/OpenSpace-releases-v0.20.1/ext/ghoul/support/cmake/message_macros.cmake

was because I had not done git clone --recursive

After the recursive git clone, the first few times make gave errors.

make -j was going quite fast, 65% in just a couple of minutes, but then the machine would hang. Finally worked out by running System Monitor that (a) swap was not enabled, and (b) certain longer jobs would cause significant RAM usage leading to all 8 GB being used and then a system crash. After enabling swap of 32 GB, continued with
make -j 2
doing 2 concurrent jobs instead of unlimited concurrent jobs - and now the linker complained that some symbols were not found. I suspected that this was due to the interrupted builds due to the crashes - so, 
make clean
make -j 2

That built the executable in around an hour on this ~10 year old Lenovo Thinkcenter desktop. When I tried to run it, after setting mrf cache to true in openspace.cfg and copying over the sync, user and mrf_cache directories from an earlier install on Windows, and setting the OPENSPACE_SYNC environment variable in .bashrc, errors just before the screen becomes usable - 
terminate called after throwing an instance of 'sgct::Error'
what():  [Engine] (3010): GLFW error (65547): X11: Standard cursor shape unavailable

Checked if it was running Wayland instead of X11 - using this method,
echo $XDG_SESSION_TYPE
x11

So it was running x11. And not Wayland. Checking for cursors in Software Manager, installing the base cursors did not solve the issue - 
sudo apt install xcursor-themes

OpenSpace would work so long as we didn't move the cursor inside its window. Controlling it via the HTML interface, through a web browser pointed to localhost:4680 was working.

Then thought of display managers and window managers - tried out Xfce desktop -
 sudo apt install task-xfce-desktop -y

Rebooted, logged in to Xfce desktop using the chooser next to the login name, and now OpenSpace worked fine. So, there's something in Cinnamon desktop which doesn't play well with GLFW.

Saturday, November 02, 2024

Lenovo Thinkcentre desktop - Linux Mint notes

Decided to try out Linux Mint on one of the desktops which had Windows 10 installed. To minimize interference with the borrowed machine, we installed Linux on a separate 256 GB SSD. To connect the SSD, we disconnected the CD/DVD drive. 

To boot into the install USB, F10 is the key we have to press on boot, which gives the option to go into System Settings or BIOS. 

The machine was rebooting instead of shutting down when the 'Shut down' option was chosen.
https://askubuntu.com/questions/1321208/computer-reboots-instead-of-shutting-down
All Power > "Power on X" settings should be turned off in BIOS. I also found a setting which said, "When lose power" --> "go back to previous state". Changed that to 'shut down', too. 

With these changes, the machine shuts down. But when power is removed and re-applied to the SMPS (by putting on the UPS the next day), the machine still automatically boots even without pressing the power button - so a separate switchable plug would be desirable.

Then, added swap to prevent sudden freezes due to memory issues (though system freezes due to insufficient power is still possible, the SMPS is underpowered) - 

https://thelinuxcode.com/add-swap-space-linux-mint/

free -h
sudo fallocate -l 32G /swapfile
sudo mkswap /swapfile
sudo chmod 0600 /swapfile
sudo swapon /swapfile
free -h

Edit: The above method of adding a swap file does not persist on rebooting - apparently it will persist only if added to /etc/fstab - so I added a line to root cron - 

sudo crontab -e

#(at the bottom of the file)
# all other cron jobs must be above this
@reboot swapon /swapfile


Tuesday, October 29, 2024

updating a dot net app running on Ubuntu

When replacing the compiled dll for a dot net app running on Linux - Ubuntu 24.04 server - we need to reload the daemons.

Just

sudo service ourapi restart

results in
Warning: The unit file, source configuration file or drop-ins of ourapi.service changed on disk. Run 'systemctl daemon-reload' to reload units.

73:~$ sudo systemctl daemon-reload

Then, 

73:~$ sudo service ourapi restart
73:~$ sudo service ourtestapi restart

are done without errors or warnings.

Sunday, October 27, 2024

troubleshooting a cron job - Ubuntu 24.04

Server maintenance - I had put in a cron job to alert me when disk space crosses a threshold - following this -

and another to alert me when load average crosses 0.8 * number of CPUs, following this - 
https://gist.github.com/santhoshvr/bd6db5b1cab4f8d181d2ac2b59dd08aa
(need to install bc to get this to work - sudo apt install bc)

The disk space cron was working, but the load average cron would not work - it would work if run from the command-line when I tested by putting in a threshold like 0.01, but not in the cron.

Troubleshooting the cron with the techniques given here - 


/var/log/cron
or
var/log/syslog
don't exist on this minimal install of Ubuntu 24.04.

Neither does /var/spool/mail/root - /var/spool/mail is empty.

So, trying to redirect errors to a log file with 2>&1

loadalert.sh  > /path/to/loadalertcron.log 2>&1

That log file showed /path/to/loadalert.sh: 22: 1: not found

Turns out it was bash which was not being found, since the "if" command was triggering the "not found". In the first line, the #!/bin/bash had a missing ! - and the path to bash, found with
which bash
was actually /usr/bin/bash and not /bin/bash

So, just changed the first line to
#!/usr/bin/bash
and the script started working fine in the cron job.

upgrade options for Windows 10 PCs

Copy-pasting content from an email I sent out - 

Windows 10 is supported for one more year - after October 2025, Microsoft will stop issuing updates for Windows 10, including security updates. The issue with the PCs under discussion is that the i5-4430 processor is not supported by Win11.

Some options -
 
1. Install Linux instead - this is my preferred solution. Google Docs etc would continue to work, but if you wish to use Microsoft Office, you need to use the (free) online version. Or the free LibreOffice suite or other alternatives. Similarly, Adobe Photoshop would not work, but you can use the free GIMP. And so on. Linux Mint desktop is surprisingly easy to use, some people migrating from older Windows versions find it easier to use than Windows 11.

which mentions the caveat that doing so "gives Microsoft the right to refuse updates to such PCs at any time in the future." 

3. Buy new computers.

4. Continue to use the Windows 10 computers, without security updates, with the risk of malware etc.

Friday, October 25, 2024

New reasons prevent pages from being indexed - Google

There was an email from Google search regarding another blog on blogger - which has a url ending with blogspot.com - about some pages not being indexed:
Search Console has identified that some pages on your site are not being indexed due to the following new reason: Alternative page with proper canonical tag

According to this blogger support page, 

https://support.google.com/blogger/thread/207060056/how-can-i-fix-blogger-alternate-page-with-proper-canonical-tag-error?hl=en

"So there is nothing you need to do."

The m=1 page (mobile page) was not being indexed, but the non-mobile page was being indexed. That's fine, so no action needed. 

Thursday, October 24, 2024

moving multiple Moodle instances from Ubuntu 20.04 to 24.04

This is going to be a long post. As planned in an earlier post, the Moodle upgrade from 4.1 to 4.5 was to be done along with server upgrade from Ubuntu 20.04 to 24.04. I thought I would be done in a day or two, but tweaking and troubleshooting continued for a week. Then setting up backups, documentation and so on, went on for another week.

Setting up the server VMs
Instead of a bitnami MySQL server, opted for a minimal Ubuntu 24.04 VM image for both the database server as well as the php application server. 
Server types
Server-php = Standard B2s (2 vcpus, 4 GiB memory) Ubuntu 20.04
30 GB OS disk, 320 GB data disk.
Since B2s is capped at 23 Mbps, doesn't make sense to use SSD.

LMS-Database-VM = Standard B2s (2 vcpus, 4 GiB memory) Debian 10.13
(bitnami mysql)
30 GB OS disk.

Created with Ubuntu 24.04 LTS minimal.
Chose standard SSD for OS disk. No Azure backup.
Azure prompts to save the private key in pem format, saved as ppk also using puttygen.

The db vm configuration following
sudo apt install mysql-server
sudo apt install mysqltuner
sudo apt install mysql-client

sudo nano  /etc/ssh/sshd_config
for changing the port 22
Then change the opened port in Azure portal also.
(had to reboot for changes to take effect, just restarting the ssh daemon did not work - 
 sudo ufw status
to verify ufw is not running, then rebooted.

Verified that updates are enabled - 
sudo apt-get install unattended-upgrades
showed that it is already installed.

The php server configuration:
Changed port 22 as above
Then installed apache and php
sudo apt install apache2
sudo apt install php libapache2-mod-php php-cli php-mysql
and rebooted.
As mentioned in my previous post

we have to attach the newly created disk using the portal, and then connect inside the vm

parted /dev/sda
mklabel gpt

(parted) mkpart                                                          
Partition name?  []? lmsdata                                              
File system type?  [ext2]? ext4                                          
Start?                                                                    
Start? 0%                                                                
End? 100%                                                                 
(parted) print

To update fstab with UUID, used blkid to get UUID.
blkid was showing only partuuid and not uuid - because it was not formatted!

But first do mount and test
mount did not work because it was not formatted!

sudo mkfs.ext4 /dev/sda1

blkid to get UUID

nano /etc/fstab to add the line
UUID=333dd333-ca33-3333-33c7-abcdef   /var/www/our_data_disk   ext4   defaults,nofail   1   2

rebooted, OK.

Tuning Apache on php server for Moodle with

AcceptPathInfo On
in /etc/apache2/apache2.conf

Also, from previous server,
<DirectoryMatch "^/.*/\.git/">
    Require all denied
</DirectoryMatch>


Also, if needed,
4 Handling 40x errors
5 Hiding internal paths 

PHP setup and installing the required extensions:
From
1. All the settings were OK in /etc/php/8.3/apache2/php.ini
except post_max_size and upload_max_filesize, set to 64M from 2M

2. extensions - installed by
sudo apt install php8.3-curl
etc
sudo apt install php8.3-ctype
sudo apt install php8.3-curl
sudo apt install php8.3-dom
sudo apt install php8.3-gd
sudo apt install php8.3-iconv
sudo apt install php8.3-intl
sudo apt install php8.3-json
sudo apt install php8.3-mbstring
sudo apt install php8.3-pcre
sudo apt install php8.3-simplexml
sudo apt install php8.3-spl
sudo apt install php8.3-xml
sudo apt install php8.3-zip
sudo apt install php8.3-ctype
sudo apt install php8.3-curl
sudo apt install php8.3-dom
sudo apt install php8.3-gd
sudo apt install php8.3-iconv
sudo apt install php8.3-intl
sudo apt install php8.3-json
sudo apt install php8.3-mbstring
sudo apt install php8.3-pcre
sudo apt install php8.3-simplexml
sudo apt install php8.3-spl
sudo apt install php8.3-xml
sudo apt install php8.3-mysql
# for CMS, we need postgres also
sudo apt-get install php8.3-pgsql
#and restart apache with
sudo systemctl restart apache2

A good guide for LAMP installation - https://www.digitalocean.com/community/tutorials/how-to-install-lamp-stack-on-ubuntu

We need postgres for CMS. So,
sudo apt install postgresql
 edit the file for changes if any needed
nano /etc/postgresql/*/main/postgresql.conf
checking the old server for the values, no changes to be made; 
listen addresses also did not need to be changed, since localhost is enough for us. 
Initially I had thought of using rclone over ssh to copy the files. But, without specifying pem file means we need to create a config file

so, just copied the script over with filezilla.

Next, rclone over ssh

Unfortunately,
rclone lsd php2024:
2024/10/08 04:50:01 Failed to create file system for "php2024:": failed to parse private key file: ssh: no key found

says check the config file.
rclone config file

The config file had both key_file and key_pem set. So, deleted key_file line.
But ... doesn't work.

So, trying with sftp.

sftp  -P 8800 -i key.pem username@ipaddress
put -r directoryname
works, but all timestamps are put to todays.

scp can retain timestamps
so using that with screen
scp -rpC sourceDirName username@server:destDirName

scp -P 8800 -i /home/ouruser/our_key.pem -rpC sourcedir remoteuser@remote.domain:/home/remoteuser/destdir

Need to change permissions and owners as per old server.

changed to 
chmod 755  moodledata
chown -R www-data:www-data  moodledata
etc. 

Takes a few minutes for our biggest instance with >100GB of data.

For the moodle code directories, I just copied within the remote machine instead of copying the identical code from the old server. 
to move hidden directories also,
mv -f /path/subfolder/{.,}* /path/

and to try rsync, 
rsync: command not found
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(235) [sender=3.1.3]

Solution - install rsync on remote machine - 

$sites-available$ rsync -av -e 'ssh -p 8800 -i /home/ouruser/our_key.pem' . ouruser@52.172.103.81:/home/ouruser/varwww

But for certificates in /etc/letsencrypt, we can't use this, since it has to be done as root.

So,
 zip -r /home/ouruser/le.zip letsencrypt


For other instances, just copying files between directories - 

cp -rT sourcedir destdir

Added the autossh server user as per this post, for database connections over ssh tunnel. Initial setup - on remote server, 
sudo adduser theserveruser
and then when copied the public key to authorized_keys, had to nano as root and edit it, removing carriage returns. Then it worked.

But found that autossh wasn't restarting the tunnel when the server was rebooted. We need to do this - https://stackoverflow.com/questions/6758897/how-to-set-up-an-automatic-restart-of-a-background-ssh-tunnel
sudo systemctl enable our-tunnel-service-name

Copied root crontab.
sudo su -
apt install cron
crontab -e
for pg_dump
sudo su postgres
pg_dump ourdb > /tmp/ourdbdump.sql


Checking the usernames in root server directory
pg_service.conf which is readable only by root.

sudo su postgres
psql
create database dbname;
create user username;
(To change, alter user username with encrypted password 'MyPassWord';)
grant all privileges on database dbname to username;

But tried psql -U ourdb_admin -d ourdb < tmp/ourdbdump.sql -W
and got
error, 
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL: Peer authentication failed for user "ourdb_admin"
tried adding a line to
sudo nano /etc/postgresql/16/main/pg_hba.conf
local   all             all                                     md5
and then
 sudo systemctl restart postgresql

psql -h 127.0.0.1 -U ourdb_admin -d ourdb < tmp/ourdbdump.sql -W

But lots of errors.

Did again as postgres user. Still, errors like role read_only_user does not exist etc. 

When try to open with dbeaver using the admin user,
 permission denied for schema postgres

So, need to do even after grant all privileges,
Step 1
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA name_schema TO name_user;

Step 2
GRANT USAGE ON SCHEMA name_schema TO name_user;


But, when trying that,
ERROR:  schema "postgres" does not exist
postgresql - How to switch databases in psql? - Stack Overflow
we should use \connect ourdb

list users with \du 

list schemas with
\dnS

Maybe should drop database and do again with --no-owner
drop database ourdb;

pg_dump --no-owner --dbname=postgresql://ourdb_admin:ourdbpw@localhost:5432/ourdb > ourdbdumpasadmin.sql

pg_dump: error: query failed: ERROR:  permission denied for sequence batch_job_execution_seq
alter role ourdb_admin superuser;
That worked.

Then, trying that dump in new server:
First 
create database ourdb;
alter role ourdb_admin superuser;
create user admin_readonly;

GRANT CONNECT ON DATABASE mydb TO xxx;
-- This assumes you're actually connected to mydb..
GRANT USAGE ON SCHEMA public TO xxx;
GRANT SELECT ON mytable TO xxx;< want all tables
ALTER DEFAULT PRIVILEGES IN SCHEMA public
   GRANT SELECT ON TABLES TO xxx;
alter default privileges in schema postgres grant select on tables to admin_readonly;

Next, the mysql databases.
First, set up db users.

At first, Can't connect to MySQL server on '127.0.0.1:3306'
Checking the tunnel,
sudo service tunnel status
× tunnel.service - 
---snip---
 autossh[14025]: Host key verification failed.
Solution was to sudo ssh into the host, so that the host key was stored in /root/.ssh
Then, restart autossh, worked.

CREATE DATABASE master_db;
CREATE USER 'admin'@'%' IDENTIFIED BY 'thepassword';
(did not do ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';)
GRANT ALL ON master_db.* TO 'admin'@'%';

Used the existing backup script on existing mysql server to take the backup,
$MYSQLDUMP --defaults-extra-file=/home/ouruser/.my.cnf -u $MyUSER -h $MyHOST -p$MyPASS $db | $GZIP -9 > $FILE

gzip -d thefile.gz
date;mysql -h 127.0.0.1 -u db_admin -p master_db < Database-VM.09-10-2024;date
showed that our biggest instance took around 2 hours 15 minutes to get restored.
After copying over the config files and creating the home directories of the various websites, tried enabling one of the sites and got errors.
systemctl reload apache2      Job for apache2.service failed.
See "systemctl status apache2.service" and "journalctl -xeu apache2.service" for details.
root# systemctl status apache2.service
● apache2.service - The Apache HTTP Server
---snip---     
 apachectl[8721]: Invalid command 'RewriteEngine', perhaps misspelled or defined by a module not included in the server configuration
Oct 08 15:22:40 SSSVV-php-2024 systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
Oct 08 15:22:40 SSSVV-php-2024 systemd[1]: Reload failed for apache2.service - The Apache HTTP Server.


a2enmod rewrite

Solved. Then, checking with a browser, testing by editing
/etc/hosts on Linux and
C:\Windows\System32\drivers\etc\hosts
on Windows.
The browser was getting "connection refused".
Trying the upgrade directly from 4.1 to 4.5 using git, ran into some errors. Following this - https://hnsws.blogspot.com/2021/08/detailed-steps-for-moodle-upgrade-using.html
sudo apt install git
git branch --track MOODLE_405_STABLE origin/MOODLE_405_STABLE
git checkout MOODLE_405_STABLE
sudo chown -R azureuser:www-data *
sudo -u www-data /usr/bin/php admin/cli/maintenance.php --enable
sudo chmod -R 775 .
sudo -u www-data /usr/bin/php admin/cli/upgrade.php

First error was:
== Environment ==
!! max_input_vars !!
[System] this test must pass - PHP setting max_input_vars must be at least 5000. 
Corrected that with
Setting max_input_vars to 9000 in the ini files - need to uncomment also -
sudo nano /etc/php/8.3/apache2/php.ini
and
sudo nano /etc/php/8.3/cli/php.ini

Then, 
!! Asynchronous backups disabled !!
[System] if this test fails, it indicates a potential problem - Your site is currently configured to use synchronous backups. Asynchronous
backups provide a better user experience, allowing users to do other
operations while a backup or restore is in progress. Asynchronous backups
are enabled for all sites from Moodle 4.5 LTS. Synchronous backups will be
removed from Moodle LMS the version after 4.5 LTS.

OK - we can enable that after finishing the upgrade.

But then, 
Database transaction aborted automatically in upgrade.php
Default exception handler: Exception - Undefined constant "MESSAGE_DEFAULT_LOGGEDIN" Debug:
Error code: generalexceptionmessage
* line 37 of /mod/hvp/db/messages.php: Error thrown

Tried deleting mod_hvp plugin directory and then upgrading. Reports successfully upgraded from 4.5 to 4.5. But on disabling maintenance mode, home page shows 500 error. So, trying putting back version 4.1 - then drop db and restore db again.

sudo mysql -p
Enter password:
Welcome to the MySQL monitor. 

DROP DATABASE master_db;
Query OK, 563 rows affected (19.94 sec)

mysql> CREATE DATABASE master_db;
Query OK, 1 row affected (0.03 sec)

mysql> GRANT ALL ON master_db.* TO 'db_admin'@'%';
Query OK, 0 rows affected (0.02 sec)

10 minutes for importing this db, of our smallest instance.

mod_hvp supported only in 4.4 So upgrading only to 4.4. Again, error 500 in moove theme, so trying upgrade to 4.4 with moove plugin directory deleted. Upgrade to 4.4 successful.

Downloaded and installed the latest Moove theme. Added to
Moove | Custom theme settings | Appearance | Administration |
server/admin/settings.php?section=themesettingmoove#theme_moove_advanced
Raw SCSS
footer#page-footer .madeby  {display:none;}
to remove the intrusive ad.

mod_hvp upgraded to 4.4
Gave message about php cache being 2M, need to increase to >5M.
Checking in php.ini for 2M, did not find any.

Then, copied this directory to the other moodle install directories - along with the hidden files like .git directory etc, instead of doing git clone.
cp -r ../firstInstance/* .
cp -r ../firstInstance/.* .

Then edited config.php to point to the appropriate db for each instance, copied over missing files like html files, customsearch, etc - verified by doing an 
ls > textfile.txt
on old server as well as new server, and doing a diff of the two textfiles to check for missing files.

Verified mail setup on moodle was working, by sending a test email. It went to spam for both old server as well as new server, but XOAuth-based mail send working.
 
Enabled Async backups - According to
https://tracker.moodle.org/browse/MDL-81172
Site administrations > Courses > Asynchronous backup/restore
Confirm Asynchronous backups are still disabled
Enable Asynchronous backups
Enabled all the tickboxes in
admin/settings.php?section=asyncgeneralsettings
The very next day, users were unable to log in. Found that the database server had run out of disk space. logbin files were the culprit. 
But the bin log is already commented out. 
/etc/mysql/my.cnf includes
mysql.conf.d/mysql.cnf where it is commented out.

With this version of MySQL, binary logging is enabled by default - Mysql bin log is enabled by default,
and would need some startup command to not do binlogs

For purging, (I had just directly deleted)
we need to use the above syntax.

We need to specify skip-log-bin to prevent binary logs - 
# nano /etc/mysql/mysql.conf.d/mysqld.cnf
and added skip-log-bin near the end,
# disable binary log
skip-log-bin
# server-id             = 1
# log_bin                       = /var/log/mysql/mysql-bin.log
# binlog_expire_logs_seconds    = 2592000

and then restarted with sudo systemctl restart mysql.service
The newly created instance, which I had cloned from the smallest one of our existing instances, had around 2000 users which needed to be deleted. Doing bulk delete from the web gui was timing out. Doing one by one, saw that each delete would take around 10 seconds from the web ui. Bulk delete of 10 at a time would be possible, but doing that 200 times would be a pain. So, decided to do user delete using moosh, writing a script to wait for a second after each user was deleted to avoid overloading the database server. 
 
A script created by chatgpt for deleting, as well as this line 
 while read in; do echo moosh -n user-delete "$in";sleep 2 | bash; done < todelete.txt
suggested by this thread - https://stackoverflow.com/questions/61670610/bash-script-executing-command-with-file-lines-as-parameters
would fail, saying user not found. Even supplying the command-line parameters -n and - user directory would indicate user not found. Finally, just ran a script from the moodle directory, which just had lines like
moosh -n user-delete theusername@wewantto.del ; sleep 2;
moosh -n user-delete theusername2@wewantto.del ; sleep 2;
etc - that worked. Most probably the issues were due to some sort of bash variables quoting problem, I guess. Ran it using screen for some 15 hours. The script progress was interrupted twice by two users who could not be deleted without timeouts. Just skipped those users and deleted the rest. 
The cloned Moodle instance was showing a 'modify registration' screen instead of 'new registration' screen (just to remove the annoying 'Your site is not registered' message). 
https://docs.moodle.org/405/en/Site_registration#I_can't_register_my_cloned_site

1. Remove the site identifier from the database:
Delete from {config} where name = 'siteidentifier';
delete from {registration_hubs};

2. Clear the cache:
php admin/cli/purge_caches.php

3. Attempt a new manual registration (admin/registration/index.php) 

But even after doing this, perhaps due to cloudflare caching, it was going back to the 'modify registration' page. Finally, tried to copy-paste the new token in the submit url, it says invalid token. But now when I registered again, it showed old site as our cloned site and new site also as our cloned site, so I clicked on move registration to new site, and registration worked. So maybe the token replacement in the url worked.
The web server was allowing directory listings by default. To disable this, first, tried editing apache2.conf

<Directory /var/www/>
        Options -Indexes FollowSymLinks
instead of

<Directory /var/www/>
        Options Indexes FollowSymLinks

Caused apache2 restart to fail, so put it back. It turns out it needs to be
Options -Indexes +FollowSymLinks

Then it works.
The database server was showing a load average of 8 or 9 when it had a 2 CPU safe value of 1.5 or so. This was leading to slow performance. First, checked for slow queries - https://hnsws.blogspot.com/2024/07/moodle-mysql-database-cpu-usage-at-200.html

set global slow_query_log_file ='/var/log/mysql/slow-query.log';
set global slow_query_log = 'ON';
show variables like '%slow%';

tail -f /var/log/mysql/slow-query.log

did not show any. Disabled the binary logs as noted above. The problem still persists with 150 to 180% cpu usage for mysql. Took several instances out of proxying via cloudflare - that made the sites load a bit snappier. But the database is still overloaded. 
 
Checking Tasks in the biggest Moodle instance,  I see that there is a scheduled task, scheduled as "ASAP" so that it starts running as soon as it finishes,

Issue certificates task
\mod_customcert\task\issue_certificates_task

This is doing heavy database reads - > 27K records, and taking nearly 5 minutes to complete. And once it is done, it starts again.

We can either
(a) clean up the database so that only a few certificates remain - probably not desirable, since we would need the teachers / students to retain their certificates in their accounts
(b) change the scheduling to only once a day, at midnight

According to the docs, the scheduled task emails the certificate to the users / teachers if they have not yet received it.
https://docs.moodle.org/403/en/Custom_certificate_module

So hopefully it should be fine if I change the scheduling to only once a day in the middle of the night - so I did that. 

Then, disabling superfluous notifications - https://moodle.org/mod/forum/discuss.php?d=440290
server/admin/message.php
(Site Administration > General > Messaging > Notification settings )

That is the page where we can disable all email / mobile / web notifications and also disable new login notification. So, disabled new login notifications for every instance. I asked if I can turn off all notifications except email notifications for password reset etc for the five instances, but apparently they need the notifications to increase user engagement. 

Then, like I did previously, I cleared the task_adhoc table of old entries. The load average has dropped from 8.5 to 2.5, but need it to go below 1.5.

So, went through the task log on each moodle instance and see which task takes 15 seconds or more - Site Administration > Server > Tasks > Task logs - and later, any task > 1 second.

Our biggest instance - 
Category enrolment sync task - 3 sec

From https://docs.moodle.org/401/en/Category_enrolments
Warning: The use of the category enrolments plugin may cause performance problems. If so, it is recommended that you use Cohort sync instead, though this will require a cohort sync enrolment method to be added to each course in the category.

Changed this from * to */5 (every 5 minutes.) in Site Administration > Server > Tasks > Scheduled tasks.

Award badge - currently set to */5,  changed to */7
sending badges - */5 changed to */9

Then, an instance with lots of users - 
Update Course Progress Data 18 sec
*/5 -> */11
Global search indexing 10 sec
*/30 -> */43
Analytics cleanup 10 sec
Once an hour, left it as is.
 
Then the newly cloned instance - 
Global search indexing 2 sec
*/30 -> */47
Analytics cleanup 7 sec
Once an hour, left it as is.
 
Then, the development instance - 
Cleanup old sessions 2sec
Changed from * to */5
 
Then the instance we're running for a sister org - 
All tasks report failed, with the reason OAuth token refresh failed.
This is probably due to a password change - would need to put in the correct password and reconnect to that system account, in Site Administration > Server > Server > OAuth 2 services

After doing all this, plus deleting of unnecessary users in the newly created instance, the db server load average came down to 0.6 or so.

Edit: Also did the MySQL optimizations noted in a separate post on tuning mysql for Moodle.
The largest of our Moodle instances was complaining of failed tasks - send user notifications - every day, once the Tasks max fail delay time was being exceeded. And there were no send user notifications tasks which showed success at all, in the task logs. Checking email sending with a test email - works fine. So, it was probably the mobile messaging notifications which were the culprit. For starting up our customized airnotifier instance. 
sudo apt install nodejs
sudo apt install npm
sudo npm install forever -g

then tried startscript.sh - seemed to work.

On top, node briefly showed up.
sudo apt install net-tools
sudo netstat -tunlp

Node seems to be listening on 8080.

seems to indicate that if email is working, and sending notifications to mobile is working, this error should go away.

Just now enabled mobile notifications from
Site admin -> General -> Notification settings

Tested with mobile app - working.

But when I tried killing node and restarting the process using a cron, after a few hours, node showed CPU usage. Probably that was the wrong way to do it - should maybe have done a server reboot, because that was probably the reason for the node CPU usage. Did not have to go through all the possibilities as in this post - https://stackoverflow.com/questions/13375735/node-js-cpu-100 - because the log file itself had the reason.

In the log file, there were lots of entries like
errorInfo: {
    code: 'messaging/registration-token-not-registered',
    message: 'Requested entity was not found.'
  },
"So apparently, this error happens when the FCM token I'm trying to send to is not registered anymore, as evidenced by the "messaging/registration-token-not-registered" error code. In that case I just need to remove this token from the user's token and be done with it."

Then, the err.log shows "Error: listen EADDRINUSE: address already in use :::8080 - maybe that was the culprit. Rebooting the server solved the issue.
I wanted to use cloudflare proxying and cloudflare origin certificates for all the instances as in https://hnsws.blogspot.com/2022/11/cloudflare-origin-server-certificate.html . But unfortunately, with cloudflare proxying, the server was timing out for certain admin tasks like bulk-adding users, uploading, etc. within two minutes. Disabled Cloudflare proxying, put back the LetsEncrypt certificates, then
in php.ini, set max_execution_time to 600 (10 minutes) instead of 30 (the default) as suggested by Moodle and also the timeout value in /etc/apache2/apache2.conf - it was already set to 300, bumped up to 600. When proxied via cloudflare, still times out in two minutes though the official limit is 400 seconds - Cloudflare connection limits
https://developers.cloudflare.com/fundamentals/reference/connection-limits/ . So, no proxying. Then, timeout is around 10 minutes, which is manageable.
sudo apt install certbot python3-certbot-apache

https://www.digitalocean.com/community/tutorials/how-to-secure-apache-with-let-s-encrypt-on-ubuntu-20-04

But errors like
Renewal configuration file /etc/letsencrypt/renewal/name.org.conf is broken.
The error was: expected /etc/letsencrypt/live/name.org/cert.pem to be a symlink
Skipping.
Maybe the way to fix is to delete the letsencrypt directory and try again.
So, tried renaming the /etc/letsencrypt directory and re-running 
sudo certbot --apache
After apache crashed initially with errors that the file /etc/letsencrypt/options-ssl-apache.conf was not found, just disabled all the ssl sites with
sudo a2dissite *ssl*
sudo service apache2 restart
sudo certbot --apache

All OK.
The database backups could be set up without issues by following the method at https://hnsws.blogspot.com/2023/01/backing-up-mysql-databases-to-google.html
The moodledata backups did not go so smoothly. Initial test with a smaller instance - using rclone sync to copy and sync the moodledata to a shared google drive - was taking more than an hour to do the sync after copying all the files - so this would not be practical for our large instance. Then, internet searches led to the thought of trying restic. Restic with rclone to Google Shared Drive worked for the smaller instance, but failed for the larger instance. Probably due to too many retries with google rate limiting. So, the options were to use restic with ssh to a local machine, or restic and Azure Blob Storage. That worked, with the caveat that I had to use the user-level key and not a SAS token - with SAS token, the repo creation would fail.

Instead of trying ssh and local server, first tried using azure blob storage, since that is easy to set up.

Set up a new blob container with private access only, inside our existing storage account.
indicates that we get the storage access key from
Azure portal -> Storage Account -> Security + networking -> Access keys
key1 & key2 - can show key and connection string

Regenerated key1 and key2 (last used 1200+ days ago.)

But we need SAS - searching for SAS in the account, we find
Shared access signature under Security + networking.
- changed the expiry date, enabled Container access also, and clicked generate SAS and connection string.

Initial trial with the version 0.16.4 of restic installed by apt on Ubuntu 24 - 
restic -r azure:our-container:/our-repo init
Invalid backend error,
need to use the official binary.

For using the official binary, just unzipped it, renamed it and overwrote the existing binary - https://superuser.com/questions/480950/how-to-decompress-a-bz2-file
 bzip2 -d restic_0.17.1_linux_amd64.bz2
chmod +x restic_0.17*
which restic
 sudo mv  restic_0.17* /usr/bin/restic
restic version

Workaround - use the account key just to do the init, and the sas token thereafter?

So, SAS token doesn't work. But account key works. And our largest instance initial backup, 100+ GB, completed in just over an hour. So, this would be the way to go. Will add more details soon.
The email sending credentials with a php script were not working initially, so set up ssmtp on both the php server and the db server, with email notification for disk space use using bash scripts.
sudo apt install ssmtp
sudo apt install mail-utils 

While testing the mail command on the command line, if Ctrl-D doesn't work (like on Mac systems?), we can use the syntax 
echo "[email body]" | mail -s "[subject]" [recipient]

changed from php disk alert to bash disk alert.
After a day, our largest instance started showing 'Failed task - convert submission' in admin alerts. This seems to be assignfeedback_editpdf\task\convert_submission - 

Simply
sudo apt install ghostscript
and a restart of apache or reboot just to make sure, and the problem went away. 

Upgrading a dot net + Wordpress server from Ubuntu 22.04 to 24.04

Instead of a fresh install on a fresh server, I tried the alternate method of upgrading in-place for our AWS server running a dot net app and a couple of Wordpress instances. A snapshot was taken by PB just before the upgrade, to be able to roll back in case of a failed upgrade. The snapshot took around half an hour to complete. 

Running do-release-upgrade resulted in the error "Please install all available updates for your release before upgrading"
Apparently, just apt upgrade is not enough, we have to do a dist-upgrade to upgrade the packages which have been "held back". https://itsfoss.com/apt-get-upgrade-vs-dist-upgrade/
sudo apt update
sudo apt upgrade
sudo apt dist-upgrade
sudo apt autoremove
sudo do-release-upgrade

During the upgrade, it warned that the upgrade is being done over ssh, and that it is starting an ssh daemon on port 1022 also. But I didn't have to use that port or request PB to open that port in the AWS firewall. There were several warnings about configuration files existing, and the default was to not overwrite them - which is what I chose. The upgrade itself completed in ten minutes, and a reboot was uneventful except for an apache error.

service apache2 status
showed that the web server was down, an error in the conf file, php8.1 module not found. Now, I knew that 24.04 had php8.3 and not 8.1, so the error was understandable. I thought I would need to again follow the LAMP stack install - https://www.digitalocean.com/community/tutorials/how-to-install-lamp-stack-on-ubuntu - but no - just had to do
 sudo a2dismod php8.1
 sudo a2enmod php8.3
and 
 sudo systemctl restart apache2

The next issue was that the dot net service was not running. It was stuck in a restart loop. Disabled the service we had created as per this post - https://hnsws.blogspot.com/2023/05/dot-net-core-ubuntu-linux-server.html

Running the command to be executed on the command line  
dotnet --version
Command 'dotnet' not found, but can be installed with:
sudo snap install dotnet-sdk       # version 8.0.403, or
sudo apt  install dotnet-host-8.0  # version 8.0.10-0ubuntu1~24.04.1
sudo apt  install dotnet-host      # version 6.0.135-0ubuntu1~22.04.1
sudo apt  install dotnet-host-7.0  # version 7.0.119-0ubuntu1~22.04.1


Also, we had earlier pinned dot net to install from Microsoft's servers, but with Ubuntu 24, we're supposed to use Ubuntu's repositories. So, deleted the lines mentioned in this post,
sudo nano /etc/apt/preferences
Package: dotnet-* aspnetcore-* netstandard-*
Pin: origin "packages.microsoft.com"
Pin-Priority: 999
(deleted those lines)

sudo apt-get install -y aspnetcore-runtime-8.0
/usr/bin/dotnet /home/path/to/OurWebApi.dll
This gave the error
You must install or update .NET to run this application.
F
ramework: 'Microsoft.NETCore.App', version '7.0.0' (x64)
.NET location: /usr/lib/dotnet/
The following frameworks were found:
  8.0.10

which said framework 7 had reached end of life. So, requested the maintainers to target dot net 8.0 instead of 7.0, which they did in a few hours, after which I restarted the service, and the app was running again.

Tuesday, October 22, 2024

Deleting many GBs of course content from Moodle

Deleting large courses had a tendency to make one of our Moodle instances time out. As I found out by trial and error, and as mentioned at https://moodle.org/mod/forum/discuss.php?d=207712
in such cases, we have to delete module by module, I guess. That worked, but found that for one of our instances, there were still many GB of data in the moodledata filedir, so probably it was data which was not referenced in the database, due to earlier site cloning. 

cd /the/moodle/directory
moosh file-dbcheck > /path/to/filestodelete.txt

Then I just edited the filestodelete.txt to pre-pend sudo rm -f to each filename in the list, and ran it to get rid of around 8 GB. Completed in less than a minute. 

checking ssd size

After purchasing some SSDs at a 50% discount on Amazon, I wanted to check if they are really 256 GB SSDs. 

With the aim of filling up at least 100 GB of space on the disk, I just tried
truncate --size 1G foo
shred --iterations 1 foo

That completed in less than a minute. But the same thing with 100G instead of 1G - truncate completed instantly, but shred was going to take time. After ten minutes or so, I cancelled it. Then tried copying over the 100 GB file from one disk to another etc, that was also not complaining, so I concluded that the disk was fine.