Tuesday, May 16, 2023

How to train a Large Language Model (LLM) with your own data

There's an interesting blog post about how much effort (and money for infrastructure) is required to train a Large Language Model (LLM) with custom data - https://blog.replit.com/llm-training

Huggingface does have free rate limited APIs for some of the smaller pre-trained models, like
https://huggingface.co/google/flan-t5-base

And they do have free autotrain for custom data,

https://huggingface.co/autotrain

More links:


how to use a LLM



About FLAN-T5

Hardware and time costs - 

https://www.philschmid.de/fine-tune-flan-t5-deepspeed

Edit: Oct 2023 - Tutorial on how to use an LLM with Google Colab's Free tier - 
https://betterprogramming.pub/set-up-an-llm-project-using-a-free-gpu-in-google-colab-e55453bfc760

 

Tuesday, May 09, 2023

google groups email delivery issues

There have been delivery issues with a google group with > 40k subscribers, and over the last 3-4 days, there were issues with automated sending of emails to the group. Earlier, we used a Google Apps Script to send email, taking content from a web page. But overnight, the team changed to sending via smtp, and the group refused the incoming messages.

My first suspicion was that the From: or some other header was wrong, due to which the messages were getting rejected as not having permission to post. But on checking the sent messages to another group - which had been sent successfully - found that the headers were right, only the sending method was SMTP instead of Google Apps Script, and the content template was different.

After a couple of days of manual sending by the team with the same new template, the SMTP sending started working automatically. So probably this was some sort of anti-spam strategy by google groups. 

Sunday, May 07, 2023

dot net core Ubuntu Linux server - software setup

Listing all the tasks which needed to be done:

1. Change default port of SSH server
by editing
/etc/ssh/sshd_config
#Port = 22 <-- change this to something else.

2. install and set up mysql

How To Install MySQL on Ubuntu 22.04 | DigitalOcean


3. Set up DNS in cloudflare.

4. Set up Apache virtual servers

sudo apt install apache2

needed to see the documentation, in /var/www/html/index.html, which says that
"By default, Ubuntu does not allow access through the web browser to any file outside of those located in /var/www, public_html directories (when enabled) and /usr/share (for web applications). If your site is using a web document root located elsewhere (such as in /srv) you may need to whitelist your document root directory in /etc/apache2/apache2.conf."

So, added like this,
<Directory our/custom/www/home>
        AllowOverride None
        Require all granted
</Directory>


and also, needed read permission for the entire directory tree in /our/custom/www/home,
Why apache throwing forbidden when directory is in home? - Stack Overflow


5. Set up ssl with cloudflare server origin cert - this did not work till I first set up SSL using certbot, then replaced the private key and certificate paths to point to the cloudflare origin cert. Most probably because certbot automatically set up the required configuration with
Include /etc/letsencrypt/options-ssl-apache.conf
in the virtual host file.

How To Secure Apache with Let's Encrypt on Ubuntu 22.04 | DigitalOcean

sudo apt install certbot python3-certbot-apache
sudo certbot

 Link dump of other stuff I tried:

Origin server · Cloudflare SSL/TLS docs

Ubuntu with Apache2: CSR & SSL Installation (OpenSSL) (digicert.com)

(We saved it in /etc/ssl/certs/ )

How To Troubleshoot Common Apache Errors | DigitalOcean

needed to enable modssl, so
/etc/apache2/mods-available

sudo a2enmod ssl
sudo a2enmod headers


from SSL: How to enable HTTPS with Apache 2 on Ubuntu 20.04 | ArubaCloud.com
Syntax error on line 33 of /etc/apache2/sites-enabled/002-our-site-ssl.conf:
May 06 10:28:11 ip-10-0-0-73 apachectl[2467]: SSLCertificateKeyFile: file '/etc/ssl/private/cloudflare-oursite.org.privatekey.pem'


sudo apachectl configtest
SSLCertificateKeyFile: file '/etc/ssl/private/cloudflare-oursite.org.privatekey.pem' does not exist or is empty

(This was because of saving the private key elsewhere instead of in this path).

sudo apt install certbot
Certbot doesn't know how to automatically configure the web server on this system.

(This was because python3-certbot-apache also needed to be installed.)

After changing the cert to cloudflare origin cert, I wanted to disable the cron job to get updated letsencrypt certs - but
no crontab for root. The cron job for certbot was at
/etc/cron.d/certbot
 - commented out everything.

Then just copied the same thing for uat virtual server.

6. Set up dot net

https://www.syncfusion.com/blogs/post/hosting-multiple-asp-net-core-apps-in-ubuntu-linux-server-using-apache.aspx

find which dot net core version is used by your application in IIS - Google Search

dotnet --version
3.1.410


So we need to use "Microsoft feed"
.NET and Ubuntu overview - .NET | Supported distributions

https://learn.microsoft.com/en-us/dotnet/core/install/linux-package-mixup?pivots=os-linux-ubuntu#i-need-a-version-of-net-that-isnt-provided-by-my-linux-distribution

https://manpages.ubuntu.com/manpages/xenial/man5/sources.list.5.html

https://learn.microsoft.com/en-us/dotnet/core/install/linux-ubuntu#register-the-microsoft-package-repository

https://tecadmin.net/how-to-install-dotnet-core-on-ubuntu-22-04/
Does not support 3.1

Would need to migrate
https://learn.microsoft.com/en-us/aspnet/core/migration/31-to-60?view=aspnetcore-7.0&tabs=visual-studio





provisioning a dot net core Ubuntu Linux server on AWS

Some notes about the provisioning and setup:

  • AWS EC2 is where the VMs are seen. Choosing the region is important - the dashboard will not show the VMs in your account if the region is not set correctly!

  • AWS offers a very large selection of machine types. We need to research costs before choosing one. For example, when I checked the pricing calculator for Mumbai region, a12xlarge with 16 MB RAM was $0.217 hourly, c5a2xlarge was $0.207, c6a2xlarge was $0.2057, but c52xlarge was $0.362. We can't assume that any family will be lower based on name prefixes, though some series like d are much higher due to higher specs. 

  • In order to provision users who can access the VM via amazon's web console, the method chosen was to create a user group, create policies to give permissions to users in that user group, and create IAM users in that user group. 

  • Creating custom policies - since we wanted the users to be able to do all EC2-related tasks like starting and stopping the VM, the method used was to make a copy of the supplied "All EC2 permissions" policy, edit the JSON and instead of having * as the Resources to which it applied, filling in the Amazon Resource Name instead.
    https://docs.aws.amazon.com/managedservices/latest/userguide/find-arn.html

  • Amazon gives custom login urls, like
    https://our-name.signin.aws.amazon.com/console
    https://docs.aws.amazon.com/signin/latest/userguide/introduction-to-iam-user-sign-in-tutorial.html
    After logging in, we need to go to EC2, choose Instances on the left-hand side, ensure that we have chosen the correct region in the drop-down list on top of the page, and select our instance. For restarting, we can then choose the Actions menu on top of the page, Manage Instance State --> Start / Stop / Reboot. We have set "Terminate protection" on, so that the Terminate option is greyed out - which should be used only to completely remove the instance.
  • If we don't opt for a dedicated IP - called Elastic IP in AWS parlance - we can use the hostname like ec2-9-999-999-99.ap-south-1.compute.amazonaws.com where the nines seem to indicate the ip address. But I guess this name would change along with the dynamically assigned IP address if the machine were stopped and restarted after a while, so we need to assign an Elastic IP. One Elastic IP per instance are free if used on a running instance.

Since this post has already become long, I will put the software installation part on a separate post

Wednesday, May 03, 2023

mpg123 works well for icecast or shoutcast streams - curl to help it play https

mpg123 http://stream.sssmediacentre.org:8000

works well on raspberry pi, even doing retries so that playout resumes after short network outages, unlike browsers like Chrome or Firefox. The buffer is also a minute long, so that short outages of a few seconds are not noticed as gaps in audio.

But mpg321, which is the package which supports mpg123 now, does not seem to have support for https. With the help of curl as mentioned here,

~$ mkfifo pipe
~$ curl -L https://stream.sssmediacentre.org:8443/bhajan -o pipe & mpg123 pipe

Works well.

Tuesday, May 02, 2023

documentation on domain transfer from Godaddy to Nettigritty

 

github pages, cloudflare and SSL

Many people have issues with Github Pages not issuing TLS certificates after following the official procedure, which is to first go to settings -> pages on the repo and set up the custom domain, and then set the CNAME for the domain to point to username.github.io.

The reason for this order of actions is to prevent domain hijacking, as github explains.

One of our domains had the same issue even after 24 hours, even though Cloudflare proxying was turned off, and the "Only DNS" mode was set. The solution was to remove the custom domain and immediately add it back in github repo settings. Then the TLS certificate request went through and the certificate was issued within a minute or two.

Monday, May 01, 2023

new whatsapp voice call spam

Whatsapp has had a feature for "Block and Report" for spam chats. Now there's a new game in town - missed whatsapp voice-calls from international numbers like
+62 857 6605 4116
+55 66 8473 9018
+7 931 703 90 02
etc.

What we can do is to block them. Tap on the number to open the Call Info page, then the three dots menu on top right (on Android), Block.