Fixing My Broken Scuttlebutt Pub Server in Kubernetes

Poor planning and neglect

It wasn't that I wasn't warned, but I chose a too-small data volume for my scuttlebutt pub server and I let it fill up, thus crashing my server (perhaps we can talk about that as a stability issue in ssb later). As I'm frequently disenchanted with Scuttlebutt (I don't really have any friends there) I've left the broken deployment sitting in my kubernetes cluster for too long unaddressed, so it's time to either fix it or tear it down. Let's try to fix it.

Goals and intentions

I'm writing this post as I attempt to revive my broken pub. For starters, to halt the endless CrashLoopBackoff, I've opted to delete my server's deployment, leaving the persistent volume claim in place. One of the best things about kubernetes, and to be honest one of the bigger mind-benders for me, is how casual I can be about bringing up and tearing down resources. As long as I'm not dealing with persistence (in this case my persistent volume), it's safe to tear down a deployment knowing I can bring it back up into a prescribed state with a single command. So it is understood upfront, my goal is not to resize this volume - that is not currently supported by DigitalOcean - but to rescue the necessary files from the existing volume that make up the identity of the pub server. As I understand it, this means the private key and the gossip.json files.

Step 1. - Deploy the "rescue" image

My plan as of this second is to define a rescue deployment, a simple container where I can mount the volume, exec in and pull the files down to my local machine for redeployment. I've decided to go with a vanilla Debian Buster image and to name it ssb-rescue. This I'm putting in a kubernetes deployment that looks like this:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: ssb-rescue-deployment
  namespace: scuttlebutt
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ssb-rescue
  template:
    metadata:
      labels:
        app: ssb-rescue
    spec:
      containers:
        - name: ssb-rescue
          image: debian:buster
          imagePullPolicy: Always
          livenessProbe:
            exec:
              command:
                - "sh"
                - "true"
            initialDelaySeconds: 180
            periodSeconds: 300

My intention here is to deploy a single Debian container, deployed to the same namespace as my scuttlebutt deployment, with a liveness probe of a periodic call to whoami to keep the container from exiting. I've put this in a file called rescue.yaml and, as I just made this up, let's give it a try:

$ kubectl apply -f rescue.yaml
deployment.apps/ssb-rescue-deployment created
$ kubectl get pods -n scuttlebutt
NAME                                     READY   STATUS             RESTARTS   AGE
ssb-rescue-deployment-68775dc5c4-wlslc   0/1     CrashLoopBackOff   1          13s

Well, crap. That didn't work at all. I reached out to a more experienced friend who suggested I use a command to keep the container running. So I aded this to my container definition:

command: ["/bin/bash", "-c", "--"]
args: ["while true; do sleep 30; done;"]

Now, the container starts and runs just fine...

$ kubectl apply -f rescue.yaml
deployment.apps/ssb-rescue-deployment created
$ kubectl get pods -n scuttlebutt
NAME                                    READY   STATUS    RESTARTS   AGE
ssb-rescue-deployment-dcf487bcf-wkqxj   1/1     Running   0          3m54s

Perfect. Now I want to mount the old pub data volume to this container by adding a volumes section to my deployment, and a volumeMounts section to my container specification.

First I'll define the volume, using the name of the existing PersistentVolumeClaim:

volumes:
  - name: ssb-pub-volume
    persistentVolumeClaim:
      claimName: ssb-pvc

And then I'll add a volumeMount to the container spec:

volumeMounts:
  - name: ssb-pub-volume
    mountPath: "/data/ssb-old"

Now I can exec into the pod and see the contents of the old volume, including the files I want to move to the new volume.

$ kubectl exec -it <your pod name> -n scuttlebutt /bin/bash
root@pod:/# ls /data/ssb-old
blobs       config  ebt    gossip.json   lost+found node_modules  secret
blobs_push  db      flume  gossip.json~  manifest.json  oooo

Perfect! Now I'm ready to create a new volume with a (much) larger capacity onto which I can move the pertinent files.

Step 2. - Create and mount a new volume

So far, I'm feeling pretty good about this. Like I said, I'm kinda making this up as I go along, and I'm thus far chuffed that it's working out. Since this is all done using kubernetes manifests, all of our work is saved in files that can be committed to source control. Since I already had my pub server's manifests in source control, this seems like a good time to commit.

Now that we have a usable Debian deployed, I have an easy way to copy files from the old volume to the new volume. Next we need to create that new volume. My previous volume was only 5GB, so I've decided to multiply that by 10 to 50GB, which at current pricing on DigitalOcean will be $5 a month. Not bad, and hopefully I'll learn something about pruning scuttlebutt databases by the time I fill this one up.

I've already been down this road once before, so the basics of creating the volume in kubernetes on DigitalOcean is solved. I do just want to update the volume storage size, so I'll copy the old definition and make that one change. I'll do this in a file called volume.yaml.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: ssb-pvc-extended
  namespace: scuttlebutt
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: do-block-storage

I can apply this file, and that is all that is neded to create the new storage volume.

$ kubectl apply -f volume.yaml
persistentvolumeclaim/ssb-pvc-extended created
$ kubectl get pvc -n scuttlebutt
NAME               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE
ssb-pvc            Bound    pvc-bdc11612-a0ba-11e9-bb99-de33b94a578b   5Gi        RWO            do-block-storage   57d
ssb-pvc-extended   Bound    pvc-94bca086-cde2-11e9-a30b-4671334a8dc3   50Gi       RWO            do-block-storage   4s

Here you can see the older, smaller volume as well as the newer, larger volume. This I will add to my deployment just as I did the older volume, mounting the volume at /data/ssb-new. Just like before I define the volume:

volumes:
  - name: ssb-pub-volume-new
    persistentVolumeClaim:
      claimName: ssb-pvc-extended

And then I add a volumeMount to the container spec:

volumeMounts:
  - name: ssb-pub-volume-new
    mountPath: "/data/ssb-new"

This should be all it takes to mount the newly created volume. I'll update my cluster by once again applying my manifest, then we should see the new, empty volume in place.

$ kubectl apply -f rescue.yaml
deployment.apps/ssb-rescue-deployment configured
$ kubectl get pods -n scuttlebutt
NAME                                     READY   STATUS    RESTARTS   AGE
ssb-rescue-deployment-7bcfd9c466-vk57x   1/1     Running   0          91s
$ kubectl exec -it ssb-rescue-deployment-7bcfd9c466-vk57x -n scuttlebutt /bin/bash
root@pub: /# ls /data
ssb-new ssb-old

Step 3. - Copy files!

After all of that work, this hardly seems like a step at all! Now that both volumes are mounted on our handy rescue server, it's just a matter of copying relevant files from one volume to the other.

$ cp /data/ssb-old/secret /data/ssb-new/
$ cp /data/ssb-old/gossip.json /data/ssb-new/

Whew! Ok, now...

Step 4. - Bring up the pub server

Ok, so before I redeploy my pub server, I'm going to need to take down my rescue server deployment. This is easy enough by issueing a delete command against the deployment.

$ kubectl delete deployment ssb-rescue-deployment -n scuttlebutt
deployment.extensions "ssb-rescue-deployment" deleted

To be clear, I'm going to leave the old volume in place until I'm confident that the new volume is working and that my pub server is back up and humming along. This way if my assumptions regarding exactly which files need to be moved to the new volume are incorrect, I can try again.

To use the new volume, I only need to update my existing pub server deployment manifest. In my case, this only means pointing my persistentVolumeClaim to the new claim.

volumes:
  - name: ssb-pub-volume
    persistentVolumeClaim:
      claimName: ssb-pvc-extended

And now the spooky part, let's try bringing the pub server back up.

$ kubectl apply -f deployment.yaml
namespace/scuttlebutt unchanged
service/ssb-pub unchanged
persistentvolumeclaim/ssb-pvc unchanged
configmap/ssb-config unchanged
deployment.apps/ssb-pub-deployment created

As you can see, the bulk of my configuration hasn't changed. At the beginning of this exercise I deleted my deployment which brought down the server pod, but the remaining elements remained in place. I can now look and see that my pod is running.

$ kubectl get pods -n scuttlebutt
NAME                                 READY   STATUS    RESTARTS   AGE
ssb-pub-deployment-f7965fbff-tn572   1/1     Running   0          77s

Right away I see my pod is back up and running, and Patchwork, my scuttlebutt client, is reporting the same! After a few minutes waiting, I can see my pub is rebuilding its database and is beginning to relay messages. If I understand the protocol correctly, the pub is going to take some time to rebuild its database by requesting messages from me and its one other follower.

Success!

I look forward to seeing my pub settle into itself over the next few hours, but overall I feel pretty good about this exercise. I intend to leave the old volume in place for a few days just to make sure I didn't miss anything before deleting it. If I learn anything new in the process, I'll be sure to share it here.

Ĝis la revido, kaj feliĉa kodumado!

links

social