I Have a Git Server Now

Posted: 2024-09-03
Tags: #git, #self-hosting, #web

Yes, you read that right: in the year 2024 C.E. — sixteen years after the advent of GitHub, whose reign spans not just my entire professional career but even my (formal) computer science education — I’ve decided to take a step backwards in time and host my software projects independently.

“But why?” you ask, “Why on Earth would anyone bother to do such a thing, in this day and age?”

A few years ago now, I encountered SourceHut, and I was immediately fascinated. Where GitHub aims to replace “old-school” practices and tools with its own ideas — which, admittedly, are friendlier, if not necessarily better — SourceHut embraces them: patch submissions, code review, and general discussions are all based around email. While this may sound archaic, it does have its advantages.

Sporadically, I’ve contemplated migrating or mirroring some of my projects from GitHub to SourceHut; the main reason I haven’t done so is because it would cost money.¹ I consider SourceHut’s financial reliance on regular users rather than enterprises or advertising to be a point in their favor, but I am always reticent to commit myself to yet another subscription. Even so, SourceHut’s approach has continued to intrigue me; and, gradually, GitHub has begun to push me away. Compared to its early simplicity, the GitHub of today has begun to feel a bit bloated; but more than that, I find their recently-declared change of focus especially off-putting.

As it seemed more and more likely that, sooner or later, I was going to end up subscribing to SourceHut, a thought occurred: if I’m willing to spend five-ish dollars per month on Git hosting anyways, why not spend that on a VM and further my new goal to have more ownership of my own web presence?

The following narrative endeavors to be complete and accurate, and to always make clear not just what I did but also why. That said, I did take a few slight liberties in order to present a more coherent guide, be it for my future self or some other interested party. Also, this all went down five or six months ago — because I lost momentum in the middle of writing this post and then my brain refused to engage with it again until now — so there may be a few slight mis-remberings here and there on that basis.

Deciding on software

SourceHut’s entire platform is free software — another point in their favor — so at first, I intended to try running their Git module. While this would be possible, though, upon investigation I didn’t think it would be especially easy for a first-timer like myself.

Forgejo is another interesting option, especially with their ongoing work towards federation. However, they explicitly imitate GitHub, and I wanted something simpler.

The canonical Linux and Git repositories are both hosted at <https://git.kernel.org>, which uses cgit. Not exactly cutting-edge, but clearly dependable; and better still, it’s very lightweight and very easy to set up. Plus, it shares the same utilitarian design sensibilities as SourceHut.²

Obtaining a server

I’ve never hosted any kind of public-facing web service before, so I wasn’t really sure what to look for in a provider. Ultimately, I settled on DigitalOcean, partly because the price seemed right — $4/month for the most barebones shared-CPU VM,³ or $6/month for one with a little more breathing room — and partly because I had recently read about how Molly White hosts her newsletter there.

I was fairly sure either the $4 or $6 options would be sufficient; but, again, I've never done this before, so I decided to click through DigitalOcean’s “Getting Started” wizard. I selected “Host a website or static site,” then “Deploy an Ubuntu server,” and it recommended the following configuration:

1 GiB RAM
1 CPU
25 GiB SSD
1000 GiB transfer⁴

…all for $6 per month. I didn’t proceed from there, though, because it selected Ubuntu 23.10 and I didn’t see a way to change this. Although my laptop ran 23.10 at the time — 24.04 now — I wanted to stick to LTS releases for my server.

I backed out to the welcome page, selected “Spin up a Droplet,”⁵ and selected mostly the same options as the wizard:

Choose Region: New York
Choose an Image: Ubuntu 22.04 (LTS) x64
Choose Size: Shared CPU/Basic
- CPU Options: Regular/SSD
- $6 (same configuration as above)
Choose Authentication Method: SSH Key
Finalize Details: set git.rdnlsmith.com as the hostname

Note, of course, that setting the hostname here only determines what the VM calls itself; the name remains meaningless to the outside world pending the creation of a DNS record.

After a minute or so, it declared the VM was ready and gave me an IP address.

rdnlsmith@zephyr ~ $ ssh root@138.197.81.0

Welcome to Ubuntu 22.04.2 LTS (GNU/Linux 5.15.0-67-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Fri Mar 22 17:38:37 UTC 2024

  System load:  0.7216796875      Users logged in:       0
  Usage of /:   6.8% of 24.05GB   IPv4 address for eth0: 138.197.81.0
  Memory usage: 25%               IPv4 address for eth0: [REDACTED]
  Swap usage:   0%                IPv4 address for eth1: [REDACTED]
  Processes:    100

Expanded Security Maintenance for Applications is not enabled.

17 updates can be applied immediately.
13 of these updates are standard security updates.
To see these additional updates run: apt list --upgradable

Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status


The list of available updates is more than a week old.
To check for new updates run: sudo apt update


The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

root@git:~#

Securing the server

The very first thing I did was install updates:

root@git:~# apt update && apt upgrade

Next, I set about creating a user for myself, and giving that user sudo rights. It’s generally considered a bad idea to just be root all the time — or, indeed, to allow root to log in at all. Partly, this helps protect you from accidentally destroying something important; and partly, it takes away an obvious point of attack for unscrupulous ne’er-do-wells.

I go by rdnlsmith on my current laptop and most websites, but I used to just use daniel for local accounts. Now that my email address is <daniel@rdnlsmith.com>, I decided to go back to that.

adduser daniel
usermod -aG sudo daniel

Because I uploaded my public SSH key when I created the VM, the root account already had the necessary configuration to allow access in its ~/.ssh directory. Following a DigitalOcean community tutorial, I copied that to my new user:

rsync --archive --chown=daniel:daniel ~/.ssh /home/daniel

The --archive flag preserves file permissions (and other attributes), which are important — SSH will refuse to authenticate a key if anyone besides the target user has write access to the authorized_keys file. --chown=daniel:daniel changes both the owner and group of the copied files from root to daniel.

Now, I can switch users:

root@git:~# exit
logout
Connection to 138.197.81.0 closed.
rdnlsmith@zephyr ~ $ ssh daniel@138.197.81.0

…and disable root login:

daniel@git:~$ sudo vim /etc/ssh/sshd_config

This file contained a commented-out entry #PermitRootLogin no, which I un-commented by removing the #. Because I provided an SSH key rather than a password when I created the VM, this file also contains the entry PasswordAuthentication no, which I would have added if it weren’t there already: you don’t have to worry as much about leaked or insecure passwords if your server doesn’t accept passwords in the first place. Instead, I’ll authenticate exclusively by SSH key. (It’s maybe worth double-checking that your file doesn’t have an entry that says PubkeyAuthentication no, or else you might end up locked out of your server.)

The guide linked above also recommends configuring UFW (“Uncomplicated Firewall”) to block any services you aren’t using. UFW was installed by default; I just had to configure it to allow SSH, and then enable it (in that order; or else, again, you might end up locked out).

daniel@git:~$ sudo ufw app list
[sudo] password for daniel:
Available applications:
  OpenSSH
daniel@git:~$ sudo ufw allow OpenSSH
Rules updated
Rules updated (v6)
daniel@git:~$ sudo ufw enable
Command may disrupt existing ssh connections. Proceed with operation (y|n)? y
Firewall is active and enabled on system startup
daniel@git:~$ sudo ufw status
Status: active

To                         Action      From
--                         ------      ----
OpenSSH                    ALLOW       Anywhere                  
OpenSSH (v6)               ALLOW       Anywhere (v6)

Finally, after reading this blog post by Bryan Brattlof and another DigitalOcean community tutorial, I also set up Fail2Ban.

sudo apt install fail2ban

Fail2Ban watches the authentication logs for various services. If it notices repeated authentication failures originating from the same IP address within a short span of time — by default, five attempts within ten minutes — it automatically configures your firewall to temporarily ban that address (by default, for ten minutes). This helps mitigate the (likely mild) performance impact of any automated attempts to compromise your server, and (if you haven’t disabled password authentication) substantially hinders attempts at password-guessing.

Fail2Ban’s configuration file is /etc/fail2ban/jail.conf, but this file can be overwritten by package upgrades, so you shouldn’t edit it directly. Instead, I created a new file jail.local in the same directory. The local file only needs to contain the settings that are different from what’s in jail.conf.

I figure I don’t adequately understand the ramifications of fiddling with Fail2Ban’s settings, so I decided to only change what was absolutely necessary. The default configuration creates its ban rules via iptables directly (which UFW sits on top of); but since I’m using UFW to manage my firewall, I configured Fail2Ban to do the same. These two lines:

banaction = ufw
banaction_allports = ufw

…tell Fail2Ban to read the file /etc/fail2ban/action.d/ufw.conf to understand how it should create and remove firewall rules, instead of /etc/fail2ban/action.d/iptables.conf.

With that done, I enabled the service:

sudo systemctl enable fail2ban
sudo systemctl start fail2ban

Configuring DNS

The DNS records for my domain are managed by Netlify, because when I first set up this blog (on Netlify), allowing them to manage DNS meant they would also manage the TLS certificate. Whatever DNS provider you use, though, you’ll almost certainly need to fill out the same four fields:

Type: A
Name: git.rdnlsmith.com
Value: 138.197.81.0
TTL: 3600

“A” records represent a mapping between a domain — in this case, a new “git” subdomain under rdnlsmith.com — and an IPv4 address. The time-to-live (TTL) value indicates the maximum length of time, in seconds, that any DNS resolver should cache the record; here, I’m telling them to check for updated values before responding to a query at least once per hour.

To prove that it worked, immediately before configuring the record,⁶ I ran the following ping tests:

rdnlsmith@zephyr ~ $ ping 138.197.81.0
PING 138.197.81.0 (138.197.81.0) 56(84) bytes of data.
64 bytes from 138.197.81.0: icmp_seq=1 ttl=45 time=36.1 ms
64 bytes from 138.197.81.0: icmp_seq=2 ttl=45 time=34.6 ms
64 bytes from 138.197.81.0: icmp_seq=3 ttl=45 time=31.9 ms
^C
--- 138.197.81.0 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 31.911/34.194/36.096/1.729 ms
rdnlsmith@zephyr ~ $ ping git.rdnlsmith.com
ping: git.rdnlsmith.com: Name or service not known

…and then after:

rdnlsmith@zephyr ~ $ ping git.rdnlsmith.com
PING git.rdnlsmith.com (138.197.81.0) 56(84) bytes of data.
64 bytes from git.rdnlsmith.com (138.197.81.0): icmp_seq=1 ttl=45 time=35.5 ms
64 bytes from git.rdnlsmith.com (138.197.81.0): icmp_seq=2 ttl=45 time=36.0 ms
64 bytes from git.rdnlsmith.com (138.197.81.0): icmp_seq=3 ttl=45 time=32.0 ms
^C
--- git.rdnlsmith.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 31.982/34.521/36.041/1.807 ms

Nice.

One more thing: you may have noticed in earlier snippets that, when I’m connected to the server, the hostname part of my prompt reads @git, not @git.rdnlsmith.com. There are two factors at play here.

The first is the default prompt configuration, which can be found in ~/.bashrc:

if [ "$color_prompt" = yes ]; then
    PS1='${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '
else
    PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w\$ '
fi
unset color_prompt force_color_prompt

# If this is an xterm set the title to user@host:dir
case "$TERM" in
xterm*|rxvt*)
    PS1="\[\e]0;${debian_chroot:+($debian_chroot)}\u@\h: \w\a\]$PS1"
    ;;
*)
    ;;
esac

The placeholder \h, which appears in each of the three PS1= lines above, represents the hostname. More precisely, from the man page for Bash itself (man bash):

PROMPTING

When executing interactively, bash displays the primary prompt PS1 when it is ready to read a command… Bash allows these prompt strings to be customized by inserting a number of backslash-escaped special characters that are decoded as follows:

\h

the hostname up to the first '.'

\H

the hostname

This makes sense if you have a large number of machines on the same domain, each with its own subdomain: everything after the first . will be the same, and you only need to see the first part to know which machine you’re connected to. I only have the one VM, and I wanted it to display the full hostname, so I changed each \h to \H.

After re-loading the configuration with source ~/.bashrc, however, I still saw daniel@git:~$. The second factor was the file /etc/hostname:

daniel@git:~$ cat /etc/hostname
git

I changed this to read git.rdnlsmith.com. Then, I also checked /etc/hosts:

daniel@git:~$ cat /etc/hosts
# Your system has configured 'manage_etc_hosts' as True.
# As a result, if you wish for changes to this file to persist
# then you will need to either
# a.) make changes to the master file in /etc/cloud/templates/hosts.debian.tmpl
# b.) change or remove the value of 'manage_etc_hosts' in
#     /etc/cloud/cloud.cfg or cloud-config from user-data
#
127.0.1.1 git.rdnlsmith.com git
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

…but no change was needed here, because the first loopback address (127.0.1.1) was already correlated with both git and git.rdnlsmith.com. Per the comments at the top of that file, I also went into /etc/cloud/cloud.cfg and changed preserve_hostname: false to preserve_hostname: true, to ensure that DigitalOcean won’t overwrite my change to /etc/hostname.

After sudo reboot, my prompt read daniel@git.rdnlsmith.com:~$.

Serving webpages

In order to actually see anything upon visiting <http://git.rdnlsmith.com>, I needed to install three more packages:

sudo apt install nginx fcgiwrap cgit

Nginx is a popular, lightweight web server, designed to handle more traffic under tighter resources than the older-but-more-featureful Apache. cgit is, of course, the application that I actually want to run. So, what’s fcgiwrap?

An imprecise history of dynamic webpages, from someone who was not there for most of it

The simplest kind of website is just a collection of HTML files in some directory on a server. When you request a particular URL through your browser, a program running on that server — Apache and Nginx being examples — maps the URL to a file path and sends back the file. This works well if you have a website that people will only read, such as this blog.

If you want a website that people will interact with — submit comments, for instance — then you’re going to need some kind of database to store the information those people submit, and you’re going to need some way to pull content back out of that database and inject it into a webpage. Nowadays, this is usually done client-side with JavaScript: it runs in your browser, fetches content in the background, and rewrites the webpage on the fly to incorporate the content.

But the web has been around longer than JavaScript. In the olden days, you would write another server-side program to serve as a gateway between the web server software and your database. Instead of locating a file, the server software would pass on the request to your gateway; the gateway would then find the appropriate information in the database, generate a webpage containing that information, and pass it back.

Perhaps unsurprisingly, people started writing a whole lot of gateway programs to do specific things with specific databases. Each program might depend on implementation details of a particular web server in order to function: if two people wanted to do something similar, but they used different server software, they might have to write two separate gateways.

Eventually, the web community standardized the interface between web servers and gateways, so that any compliant gateway would be compatible with any compliant web server. This was named the Common Gateway Interface, or CGI. The name “cgit” is a portmanteau of “CGI” and “Git” — it’s a gateway program that uses Git as its database.

FastCGI came along sometime later to address scaling issues with regular-CGI. With regular-CGI, every incoming web request spawns a new instance of the gateway program, which serves that one request and then terminates. Under high traffic, this approach can lead to latency (as each request waits for a process to spawn) and resource exhaustion (from running so many independent processes at once). With FastCGI, a smaller number of longer-running processes each handle multiple requests, which can be much more efficient.

Okay, back to `fcgiwrap`

As mentioned above, cgit is a CGI program. For my use case, the performance implications of CGI vs. FastCGI aren’t likely to be an issue. What is an issue is the fact that Nginx doesn’t support CGI — but it does support FastCGI. As you may have guessed by now, fcgiwrap is a wrapper for CGI programs: it spawns a persistent process that interacts with a FastCGI-compatible web server on one side and a regular-CGI program on the other.

The fcgiwrap service started automatically upon installation (check sudo systemctl status fcgiwrap), so I didn’t need to do anything else with this.

Configuring Nginx

Nginx is capable of serving multiple distinct websites from one machine. Each site gets its own server { } configuration block; typically, you would put each such block in its own file under /etc/nginx/sites-available and symlink each file to /etc/nginx/sites-enabled. This allows you to easily take individual websites down and put them back up again by removing and re-creating the symlink.

sudo touch /etc/nginx/sites-available/cgit
sudo ln -s /etc/nginx/sites-available/cgit /etc/nginx/sites-enabled/

Initially, my cgit configuration file looked like this:

server {
	listen 80;
	listen [::]:80;

These first two lines tell Nginx that any requests intended for this website should be expected on port 80 — the standard port for HTTP traffic — for IPv4 and IPv6, respectively. I haven’t actually enabled IPv6 for my VM as of this writing; but if I ever do, I won’t need to change this file.

	server_name git.rdnlsmith.com;

This line means that Nginx will only consider this file if the hostname portion of the request URL matches git.rdnlsmith.com. A request with any other hostname, even one that comes through port 80, will be handled by some other website configured in some other file.

	root /var/www/cgit

	# First attempt to serve request as file (logo, css), then fall back to
	# calling cgit (all pages).
	try_files $uri @cgit;

The first line means that any literal files to be served will be found somewhere under /var/www/cgit. By default, the remainder of the path to each file should match the path portion of the request URL.

In my case, the only literal files I have are the cgit logo (cgit.png), favicon (favicon.ico), and CSS (cgit.css), each of which I copied from /usr/share/cgit. I intend to customize them eventually.

The second line (not counting the comments) means Nginx will check for a literal file matching the request URL first, and any requests that don’t map to a literal file will be handled by the location { } block labeled @cgit. Locations are normally identified with a regular expression that matches some part of the URL; the @ syntax lets you identify a location block by name instead.

Finally:

	location @cgit {
		include fastcgi_params;

		fastcgi_param SCRIPT_FILENAME /usr/lib/cgit/cgit.cgi;
		fastcgi_param PATH_INFO $uri;
		fastcgi_param QUERY_STRING $args;
		fastcgi_param HTTP_HOST $server_name;

		fastcgi_pass unix:/run/fcgiwrap.socket;
	}
}

The first line reads in the file /etc/nginx/fastcgi_params, which maps several Nginx variables to the corresponding FastCGI parameters. The next four override some select parameters, most notably setting the location where the cgit executable is installed. The last line tells Nginx where to find the running fcgiwrap process. I’ll admit that I didn’t expend much effort trying to understand these; as much as I usually like to make my own informed decisions rather than blindly copying from others, there’s a lot that’s new to me going into this project and this bit seems pretty innocuous.

With the file created and symlinked, I tested that it was valid:

sudo nginx -t

…and re-loaded Nginx so it would take effect:

sudo nginx -s reload

Next, I needed to allow HTTP traffic through the firewall. Installing Nginx added three new entries to UFW’s app list:

daniel@git.rdnlsmith.com:~$ sudo ufw app list
Available applications:
  Nginx Full
  Nginx HTTP
  Nginx HTTPS
  OpenSSH

“Nginx HTTP” and “Nginx HTTPS” are pretty self-explanatory; “Nginx Full” combines both. For starters, I just enabled HTTP:

sudo ufw allow 'Nginx HTTP'

At this point, it was possible to visit <http://git.rdnlsmith.com> in a web browser and see cgit’s homepage, albeit with no actual content.

Adding Repositories

In its chapter on hosting, the book Pro Git depicts an example Git server where the repositories are kept in /srv/git. The Filesystem Hierarcy Standard seems to agree with this:

Purpose

/srv contains site-specific data which is served by this system.

Rationale

Th[e] main purpose of specifying this is so that users may find the location of the data files for a particular service … Data that is only of interest to a specific user should go in that user[’s] home directory. If the directory and file structure of the data is not exposed to consumers, it should go in /var/lib.

The methodology used to name subdirectories of /srv is unspecified as there is currently no consensus on how this should be done. One method for structuring data under /srv is by protocol, eg. ftp, rsync, www, and cvs.

…so, I decided to put public repositories in /srv/git and private ones in /home/daniel.

I expect to be the only person ever to have write access to any repositories on this server, even the public ones, so I could have given ownership of /srv/git to daniel. Nonetheless, I wanted to do this “right.” I created a git group, and made myself a member:

sudo addgroup git
sudo usermod -aG git daniel

I had to exit and reconnect in order for my session to pick up the new group membership.

Then, I created the /srv/git directory, made git the owning group (leaving root as the owning user), and toggled the setgid bit so that any contents created therein would inherit the group ownership:

sudo mkdir /srv/git
sudo chgrp git /srv/git
sudo chmod g+s /srv/git

After that, I created an empty repository for each of my projects; for example:

git init --bare --shared iphoto-extractor.git

The --bare creates only the .git folder with no working directory, as is typical for server-side repositories. The --shared flag propagates the group ownership from the repository’s parent directory, though I’m not sure this is actually necessary since I already set the setgid bit.

Within each repository, in the hooks subdirectory, I saved a copy of the post-receive hook example from the cgit repository (with the .agefile extension removed). This enables cgit to inspect commit metadata whenever changes are pushed in order to calculate accurate age values — which are shown on the “summary” page for each repository, and a few other places — instead of trying to estimate them based on file modification timestamps.

I also created a symlink called public in my home directory that points to /srv/git, plus a directory named private. This lets me use SSH URLs with the form git.rdnlsmith.com:public/repo-name or git.rdnlsmith.com:private/repo-name instead of git.rdnlsmith.com:/srv/git/repo-name.

ln -s /srv/git ~/public
mkdir ~/private

On my local machine, I renamed the existing remote for each repository and added a new default remote pointing to my server.

cd ~/code/iPhotoExtractor
git remote rename origin github
git remote add origin git.rdnlsmith.com:public/iphoto-extractor.git
git push --all origin

Configuring cgit

cgit’s configuration file is /etc/cgitrc. The available options are described by man cgitrc. Unfortunately, it doesn’t give much indication as to which options you’re likely to need; but, so far, I haven’t needed much:

cache-size=1000

Any positive value here enables caching, so cgit won’t have to re-generate a recently-served page if someone else visits it (or the same person visits it again). To save disk space, cgit will start deleting the oldest cached pages if the number of entries reaches the configured number (1000).

You can also configure how long different types of pages should be served from a cached copy before the cache is considered stale and the page is re-generated anyways. I’ve kept the defaults: most pages can be cached for about five minutes; repository “about” pages for fifteen; commits indefinitely, since they’re immutable.

readme=:README.md
mimetype-file=/etc/mime.types

about-filter=/usr/lib/cgit/filters/about-formatting.sh
email-filter=/usr/lib/cgit/filters/email-gravatar.py

The first line above says to look for a root-level file named README.md in each repository and use its contents for the repository’s “about” tab. You can list this option more than once with different file names if you use different conventions from one repository to another; or you can configure it separately for each repository.

The second line tells cgit to use the file /etc/mime.types — commonly included in Linux distributions — to look up which MIME types to use for which file extensions. This is necessary in order for e.g. embedded pictures in the “about” pages to actually render as pictures.

The last two lines specify scripts to be run when generating “about” pages or when displaying contributor names, respectively. Both of these are included with cgit, but you can use your own custom scripts too.

about-formatting.sh checks if the “about” file is one of several common formats — Markdown, reStructuredText, a man page, a plain-text file — and runs it through an appropriate converter program so it will render nicely as HTML. I had to install the python3-markdown package for Markdown to work.

email-gravatar.py fetches the Gravatar image for each contributor’s email address and displays it beside their name wherever it appears. There’s also a Lua version, which is supposed to be faster, but the Python one worked fine for me and I didn't want to bother figuring out how to get the Lua script working.

You can use filters to enable syntax highlighting, as well. I’ve left this off for now, in keeping with my blog’s aesthetic.

enable-git-config=1

This lets me store repository-specific settings in each repository’s Git configuration file (./config in a bare repository, or ./.git/config in one that has a working tree) instead of having a separate cgit configuration file in each. The only thing I’m using this for right now is to allow some of my projects to have their displayed names written differently than their URLs; for example, my iPhotoExtractor project has its name displayed in Pascal case (as is typical in the .NET ecosystem) but all of my repository URLs are in kebab case (iphoto-extractor).

enable-http-clone=0
clone-url=https://git.rdnlsmith.com/$CGIT_REPO_URL.git

cgit supports cloning over HTTP via Git’s older, “dumb” HTTP protocol. This is on by default, but I chose to disable it in favor of the “smart” HTTP(S) protocol, which I’ll cover later on.

The second line sets the pattern for the clone URL(s) displayed at the bottom of each repository’s “summary” page. You can list multiple patterns separated by spaces if e.g. you support more than one protocol. Note, however, that listing these patterns is for display purposes only; it does not make those URLs actually work.

The variable $CGIT_REPO_URL contains the path to the repository relative to a configured root directory. In my case, the root directory is /srv/git and the path is just the repository name. This can be overridden with per-repository configuration.

remove-suffix=1
virtual-root=/

As is typical for bare repositories, my directories under /srv/git all have a .git suffix; e.g. iphoto-extractor.git. The remove-suffix option excludes that from the URL and the displayed name (if not overridden) for each repository. I set this because it looks pretty. Consequently, I needed to include .git at the end of my clone-url pattern above.

I’m honestly not totally sure why I need the virtual-root setting here. The man page implies I shouldn’t need it anymore if I’ve set my PATH_INFO CGI parameter correctly, which I think I have. Without it, though, relative links throughout the website started their paths too far up the hierarchy and, consequently, didn’t work.

scan-path=/srv/git

Finally, this tells cgit where to look for my repositories. This pretty much has to go last, because only the settings above this line will be applied to the repositories discovered here. It’s also possible to explicitly list out paths to individual repositories (with a different setting) if you don’t want cgit to scan for them.

Enabling HTTPS and Read-Only Public Cloning

Enabling HTTPS requires two things. First, I needed to allow HTTPS traffic through the firewall:

sudo ufw allow 'Nginx HTTPS'

(You could skip this step by allowing “Nginx Full” in the first place, instead of starting with just HTTP.)

Secondly, I needed to obtain a TLS certificate from a widely-trusted certificate authority. The simplest way to do this is to get one from Let’s Encrypt using Certbot; which, once configured, will automatically obtain a certificate and renew it whenever necessary.

For Ubuntu, Certbot’s official instructions recommend installing it as a snap package:

sudo snap install core
sudo snap install --classic certbot
sudo ln -s /snap/bin/certbot /usr/bin/certbot

Then, run it and answer the prompts:

daniel@git.rdnlsmith.com:~$ sudo certbot --nginx -d git.rdnlsmith.com
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Enter email address (used for urgent renewal and security notices)
 (Enter 'c' to cancel): daniel@rdnlsmith.com

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Please read the Terms of Service at
https://letsencrypt.org/documents/LE-SA-v1.3-September-21-2022.pdf. You must
agree in order to register with the ACME server. Do you agree?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(Y)es/(N)o: y

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Would you be willing, once your first certificate is successfully issued, to
share your email address with the Electronic Frontier Foundation, a founding
partner of the Let's Encrypt project and the non-profit organization that
develops Certbot? We'd like to send you email about our work encrypting the web,
EFF news, campaigns, and ways to support digital freedom.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(Y)es/(N)o: n
Account registered.
Requesting a certificate for git.rdnlsmith.com

Successfully received certificate.
Certificate is saved at: /etc/letsencrypt/live/git.rdnlsmith.com/fullchain.pem
Key is saved at:         /etc/letsencrypt/live/git.rdnlsmith.com/privkey.pem
This certificate expires on 2024-06-26.
These files will be updated when the certificate renews.
Certbot has set up a scheduled task to automatically renew this certificate in the background.

Deploying certificate
Successfully deployed certificate for git.rdnlsmith.com to /etc/nginx/sites-enabled/cgit
Congratulations! You have successfully enabled HTTPS on https://git.rdnlsmith.com

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
If you like Certbot, please consider supporting our work by:
 * Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
 * Donating to EFF:                    https://eff.org/donate-le
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Certbot found the Nginx configuration file that contains the server block for git.rdnlsmith.com and made a few changes. It removed the lines

listen 80;
listen [::]:80;

that I had written originally, and inserted several lines at the end to listen for HTTPS traffic and use the certificate it acquired:

listen [::]:443 ssl ipv6only=on; # managed by Certbot
listen 443 ssl; # managed by Certbot
ssl_certificate /etc/letsencrypt/live/git.rdnlsmith.com/fullchain.pem; # managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/git.rdnlsmith.com/privkey.pem; # managed by Certbot
include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

It also added another server block in the same file to catch HTTP traffic and redirect it to HTTPS:

server {
	if ($host = git.rdnlsmith.com) {
		return 301 https://$host$request_uri;
	} # managed by Certbot

	listen 80;
	listen [::]:80;

	server_name git.rdnlsmith.com;
	return 404; # managed by Certbot
}

Git’s smart HTTP backend is another CGI executable, git-http-backend, which is included in Ubuntu’s git package. All I had to do to get it working was add another location block to my Nginx configuration, very similar to the one for cgit:

	# Smart HTTP backend
	location ~ \.git {
		include fastcgi_params;

		fastcgi_param SCRIPT_FILENAME /usr/lib/git-core/git-http-backend;
		fastcgi_param GIT_HTTP_EXPORT_ALL "";
		fastcgi_param GIT_PROJECT_ROOT /srv/git;
		fastcgi_param PATH_INFO $uri;

		fastcgi_pass unix:/run/fcgiwrap.socket;
	}

The ~ after location indicates a regular expression; \.git will match any URL that ends with .git. So, the URL <https://git.rdnlsmith.com/dotnet-pgn> will be handled by cgit, but <https://git.rdnlsmith.com/dotnet-pgn.git> will be handled by git-http-backend.

The GIT_HTTP_EXPORT_ALL line creates an empty environment variable of the same name, instructing git-http-backend to serve any and all repositories it finds in the GIT_PROJECT_ROOT directory. Without this, I would have to create a file named git-daemon-export-ok within each repository that I wanted to make available.

Examples that I’ve seen also included

		client_max_body_size 0;

…disabling the default size limit for incoming requests. However, this is to facilitate git push, and I only intend to push via SSH, so I left this out. Now, I can clone repositories via HTTPS:

rdnlsmith@zephyr ~ $ git clone https://git.rdnlsmith.com/dotnet-pgn.git
Cloning into 'dotnet-pgn'...
remote: Enumerating objects: 25, done.
remote: Counting objects: 100% (25/25), done.
remote: Compressing objects: 100% (25/25), done.
remote: Total 25 (delta 3), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (25/25), 9.42 KiB | 3.14 MiB/s, done.
Resolving deltas: 100% (3/3), done.

…or HTTP, via redirect:

rdnlsmith@zephyr ~ $ rm -rf dotnet-pgn
rdnlsmith@zephyr ~ $ git clone http://git.rdnlsmith.com/dotnet-pgn.git
Cloning into 'dotnet-pgn'...
warning: redirecting to https://git.rdnlsmith.com/dotnet-pgn.git/
remote: Enumerating objects: 25, done.
remote: Counting objects: 100% (25/25), done.
remote: Compressing objects: 100% (25/25), done.
remote: Total 25 (delta 3), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (25/25), 9.42 KiB | 4.71 MiB/s, done.
Resolving deltas: 100% (3/3), done.

…but any attempt to push via HTTP(S) is rejected, as intended:

rdnlsmith@zephyr ~ $ cd dotnet-pgn
rdnlsmith@zephyr ~/dotnet-pgn [master ≡]$ vim README.md
rdnlsmith@zephyr ~/dotnet-pgn [master ≡ +0 ~1 -0 !]$ git commit -am "push test"
[master a5f14e0] push test
 1 file changed, 2 insertions(+)
rdnlsmith@zephyr ~/dotnet-pgn [master ↑1]$ git push
fatal: unable to access 'https://git.rdnlsmith.com/dotnet-pgn.git/': The requested URL returned error: 403

Footnotes

SourceHut is still in a public alpha stage, and until that changes, paying for Git hosting is optional. Nonetheless, it wouldn’t feel right to me to use it without paying. ↩︎
The git.sr.ht module’s first commit actually incorporates some CSS from cgit (later customized). ↩︎
I refuse to call them “droplets.” ↩︎
The data transfer allowance actually accrues based on how much time the VM is active; 1000 GiB (and $6) assumes it’s running for the entire month. Usage in excess of your allowance is billed at $0.01 per 1 GiB. ↩︎
For the record: quoting verbatim from DigitalOcean’s website does not count as me “call[ing] them ‘droplets.’ ” ↩︎
In retrospect, this was risky. DNS resolvers can cache a negative result — as in, cache the fact that a (sub)domain doesn’t exist. In that case, trying to ping git.rdnlsmith.com before I created the record could have caused me to have to wait quite some time — possibly hours — before the follow-up test would have worked. ↩︎