Poor planning and neglect
It wasn't that I wasn't warned, but I chose a too-small data volume for my scuttlebutt pub server and I let it fill up, thus crashing my server (perhaps we can talk about that as a stability issue in ssb later). As I'm frequently disenchanted with Scuttlebutt (I don't really have any friends there) I've left the broken deployment sitting in my kubernetes cluster for too long unaddressed, so it's time to either fix it or tear it down. Let's try to fix it.
Goals and intentions
I'm writing this post as I attempt to revive my broken pub. For starters, to
halt the endless CrashLoopBackoff
, I've opted to delete my server's
deployment, leaving the persistent volume claim in place. One of the best things
about kubernetes, and to be honest one of the bigger mind-benders for me, is how
casual I can be about bringing up and tearing down resources. As long as I'm not
dealing with persistence (in this case my persistent volume), it's safe to tear
down a deployment knowing I can bring it back up into a prescribed state with a
single command. So it is understood upfront, my goal is not to resize this
volume - that is not currently supported by DigitalOcean - but to rescue the
necessary files from the existing volume that make up the identity of the pub
server. As I understand it, this means the private key and the gossip.json
files.
Step 1. - Deploy the "rescue" image
My plan as of this second is to define a rescue deployment, a simple container
where I can mount the volume, exec in and pull the files down to my local
machine for redeployment. I've decided to go with a vanilla Debian Buster image
and to name it ssb-rescue
. This I'm putting in a kubernetes deployment that
looks like this:
kind: Deployment
apiVersion: apps/v1
metadata:
name: ssb-rescue-deployment
namespace: scuttlebutt
spec:
replicas: 1
selector:
matchLabels:
app: ssb-rescue
template:
metadata:
labels:
app: ssb-rescue
spec:
containers:
- name: ssb-rescue
image: debian:buster
imagePullPolicy: Always
livenessProbe:
exec:
command:
- "sh"
- "true"
initialDelaySeconds: 180
periodSeconds: 300
My intention here is to deploy a single Debian container, deployed to the same
namespace as my scuttlebutt deployment, with a liveness probe of a periodic call
to whoami
to keep the container from exiting. I've put this in a file called
rescue.yaml
and, as I just made this up, let's give it a try:
$ kubectl apply -f rescue.yaml
deployment.apps/ssb-rescue-deployment created
$ kubectl get pods -n scuttlebutt
NAME READY STATUS RESTARTS AGE
ssb-rescue-deployment-68775dc5c4-wlslc 0/1 CrashLoopBackOff 1 13s
Well, crap. That didn't work at all. I reached out to a more experienced friend who suggested I use a command to keep the container running. So I aded this to my container definition:
command: ["/bin/bash", "-c", "--"]
args: ["while true; do sleep 30; done;"]
Now, the container starts and runs just fine...
$ kubectl apply -f rescue.yaml
deployment.apps/ssb-rescue-deployment created
$ kubectl get pods -n scuttlebutt
NAME READY STATUS RESTARTS AGE
ssb-rescue-deployment-dcf487bcf-wkqxj 1/1 Running 0 3m54s
Perfect. Now I want to mount the old pub data volume to this container by adding
a volumes
section to my deployment, and a volumeMounts
section to my
container specification.
First I'll define the volume, using the name of the existing PersistentVolumeClaim:
volumes:
- name: ssb-pub-volume
persistentVolumeClaim:
claimName: ssb-pvc
And then I'll add a volumeMount to the container spec:
volumeMounts:
- name: ssb-pub-volume
mountPath: "/data/ssb-old"
Now I can exec
into the pod and see the contents of the old volume, including the
files I want to move to the new volume.
$ kubectl exec -it <your pod name> -n scuttlebutt /bin/bash
root@pod:/# ls /data/ssb-old
blobs config ebt gossip.json lost+found node_modules secret
blobs_push db flume gossip.json~ manifest.json oooo
Perfect! Now I'm ready to create a new volume with a (much) larger capacity onto which I can move the pertinent files.
Step 2. - Create and mount a new volume
So far, I'm feeling pretty good about this. Like I said, I'm kinda making this up as I go along, and I'm thus far chuffed that it's working out. Since this is all done using kubernetes manifests, all of our work is saved in files that can be committed to source control. Since I already had my pub server's manifests in source control, this seems like a good time to commit.
Now that we have a usable Debian deployed, I have an easy way to copy files from the old volume to the new volume. Next we need to create that new volume. My previous volume was only 5GB, so I've decided to multiply that by 10 to 50GB, which at current pricing on DigitalOcean will be $5 a month. Not bad, and hopefully I'll learn something about pruning scuttlebutt databases by the time I fill this one up.
I've already been down this road once before, so the basics of creating the
volume in kubernetes on DigitalOcean is solved. I do just want to update the
volume storage size, so I'll copy the old definition and make that one change.
I'll do this in a file called volume.yaml
.
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: ssb-pvc-extended
namespace: scuttlebutt
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: do-block-storage
I can apply this file, and that is all that is neded to create the new storage volume.
$ kubectl apply -f volume.yaml
persistentvolumeclaim/ssb-pvc-extended created
$ kubectl get pvc -n scuttlebutt
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ssb-pvc Bound pvc-bdc11612-a0ba-11e9-bb99-de33b94a578b 5Gi RWO do-block-storage 57d
ssb-pvc-extended Bound pvc-94bca086-cde2-11e9-a30b-4671334a8dc3 50Gi RWO do-block-storage 4s
Here you can see the older, smaller volume as well as the newer, larger volume.
This I will add to my deployment just as I did the older volume, mounting the
volume at /data/ssb-new
. Just like before I define the volume:
volumes:
- name: ssb-pub-volume-new
persistentVolumeClaim:
claimName: ssb-pvc-extended
And then I add a volumeMount to the container spec:
volumeMounts:
- name: ssb-pub-volume-new
mountPath: "/data/ssb-new"
This should be all it takes to mount the newly created volume. I'll update my cluster by once again applying my manifest, then we should see the new, empty volume in place.
$ kubectl apply -f rescue.yaml
deployment.apps/ssb-rescue-deployment configured
$ kubectl get pods -n scuttlebutt
NAME READY STATUS RESTARTS AGE
ssb-rescue-deployment-7bcfd9c466-vk57x 1/1 Running 0 91s
$ kubectl exec -it ssb-rescue-deployment-7bcfd9c466-vk57x -n scuttlebutt /bin/bash
root@pub: /# ls /data
ssb-new ssb-old
Step 3. - Copy files!
After all of that work, this hardly seems like a step at all! Now that both volumes are mounted on our handy rescue server, it's just a matter of copying relevant files from one volume to the other.
$ cp /data/ssb-old/secret /data/ssb-new/
$ cp /data/ssb-old/gossip.json /data/ssb-new/
Whew! Ok, now...
Step 4. - Bring up the pub server
Ok, so before I redeploy my pub server, I'm going to need to take down my rescue server deployment. This is easy enough by issueing a delete command against the deployment.
$ kubectl delete deployment ssb-rescue-deployment -n scuttlebutt
deployment.extensions "ssb-rescue-deployment" deleted
To be clear, I'm going to leave the old volume in place until I'm confident that the new volume is working and that my pub server is back up and humming along. This way if my assumptions regarding exactly which files need to be moved to the new volume are incorrect, I can try again.
To use the new volume, I only need to update my existing pub server deployment manifest. In my case, this only means pointing my persistentVolumeClaim to the new claim.
volumes:
- name: ssb-pub-volume
persistentVolumeClaim:
claimName: ssb-pvc-extended
And now the spooky part, let's try bringing the pub server back up.
$ kubectl apply -f deployment.yaml
namespace/scuttlebutt unchanged
service/ssb-pub unchanged
persistentvolumeclaim/ssb-pvc unchanged
configmap/ssb-config unchanged
deployment.apps/ssb-pub-deployment created
As you can see, the bulk of my configuration hasn't changed. At the beginning of this exercise I deleted my deployment which brought down the server pod, but the remaining elements remained in place. I can now look and see that my pod is running.
$ kubectl get pods -n scuttlebutt
NAME READY STATUS RESTARTS AGE
ssb-pub-deployment-f7965fbff-tn572 1/1 Running 0 77s
Right away I see my pod is back up and running, and Patchwork, my scuttlebutt client, is reporting the same! After a few minutes waiting, I can see my pub is rebuilding its database and is beginning to relay messages. If I understand the protocol correctly, the pub is going to take some time to rebuild its database by requesting messages from me and its one other follower.
Success!
I look forward to seeing my pub settle into itself over the next few hours, but overall I feel pretty good about this exercise. I intend to leave the old volume in place for a few days just to make sure I didn't miss anything before deleting it. If I learn anything new in the process, I'll be sure to share it here.
Ĝis la revido, kaj feliĉa kodumado!