DeployWithK3s - Roundup Tracker Wiki

DeployWithK3s

Immutable Page
Comments
Info
Attachments
More Actions:

These are notes on deploying the docker Roundup image on an odroid running the light weight kubernetes (k8s) environment provided by k3s.

Contents

Getting started
Database Integrity (what's that??)
Backups
Debugging - running pod using kubectl debug

The image is obtained from: https://hub.docker.com/r/rounduptracker/roundup-development under the multi tag. This image is built for i386, amd64 (aka x86_64), armv7, and the critical arch for this example: arm64.

The odroid c4 computer is running Debian bullseye/sid.

k3s was installed using the quick start method.

I used a demo mode deployment using sqlite as the database. It has 3 replicas all sharing the same Persistent Volume (PV) running on one node.¹

Getting started

I created (updated 2023-12-26):

Persistent Volume Claim using the local-path storage class.
Service of type Load Balancer
Deployment of the Roundup image with a volume mounted at /usr/src/app/tracker.
- Also can export secrets stored under roundup-secrets to /usr/src/app/secrets/.... These can be used with file://../secrets/secret_name in config.ini.
- The root filesystem is mounted read only.
- Because the root filesystem is read only, /tmp is an empty directory mounted read write to allow file uploads.
a horizontal autoscaler between 2 and 5 replicas.

Here is the single yaml file roundup-demo-deployment.yaml with the four parts:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: roundup-demo-pvc
  namespace: default
  labels:
    app: roundup-demo
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: local-path
  resources:
    requests:
      storage: 2Gi
---
apiVersion: v1
kind: Service
metadata:
  name: roundup-demo
  labels:
      app: roundup-demo
spec:
  ports:
    - name: "8080"
      port: 8917
      targetPort: 8080
  selector:
    app: roundup-demo
  type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: roundup-demo
  labels:
      app: roundup-demo
  namespace: default
spec:
# comment out replicas due to using autoscaling group
#  replicas: 3
  minReadySeconds: 30
  selector:
    matchLabels:
      app: roundup-demo
  #strategy:
  #  type: Recreate
  template:
    metadata:
      labels:
        app: roundup-demo
    spec:
      # add to make secrets files readable to roundup group.
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        # I need fsGroup for secrets only. tracker dir can be left alone.
        # Try to prevent the equivalent of a
        #   'find tracker -exec chgrp 1000 \{}'
        # down the tracker subdir that can be deep.
        fsGroupChangePolicy: "OnRootMismatch"
      containers:
      - name: roundup-demo
        image: rounduptracker/roundup-development:multi
        imagePullPolicy: Always
        args: ['demo']
        ports:
        - name: roundup-demo
          containerPort: 8080
        resources:
#          limits:
#            cpu: 500m
#            memory: "52428800"
          requests:
            cpu: 500m
            memory: "20971520"
        readinessProbe:
          httpGet:
            path: /demo/
            port: roundup-demo
          failureThreshold: 30
          periodSeconds: 10
          successThreshold: 2
        volumeMounts:
          - name: trackers
            mountPath: /usr/src/app/tracker
          - name: secret-volume
            mountPath: /usr/src/app/tracker/secrets
            readOnly: true
          # required for readOnlyRootFilesystem securityContext
          - name: tmp-scratch
            mountPath: /tmp
        securityContext:
          readOnlyRootFilesystem: true
      volumes:
      - name: trackers
        persistentVolumeClaim:
          claimName: roundup-demo-pvc
      - name: tmp-scratch
        emptyDir: {}
      - name: secret-volume
        secret:
          secretName: roundup-secret
          optional: true
          # octal 0400 -> dec 256; 0440 -> 288
          # for some reason even 256 becomes mode 0440 with
          # the fsGroup security context. Without securitycontext
          # it's mode 400.
          defaultMode: 288
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  labels:
    app: roundup-demo
  name: roundup-demo
spec:
  maxReplicas: 5
  metrics:
  - resource:
      name: cpu
      target:
        averageUtilization: 70
        type: Utilization
    type: Resource
  minReplicas: 2
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: roundup-demo

It was deployed using: sudo kubectl create -f roundup-demo-deployment.yaml.

Once it was deployed and a pod was active, I used: sudo kubectl get pods to get the name of one of the pods. Then I sudo kubectl exec -it pod/roundup-demo-578d6c65d8-v2wmr -- sh to get a shell. I edited the tracker's config.ini replacing localhost with the hostname of the odroid. Then I restarted all the pods using sudo kubectl rollout restart deployment/roundup-demo. I was able to watch the pods get recycled.

NAME                            READY   STATUS        RESTARTS   AGE
roundup-demo-7fb6bcb5b9-l6w59   1/1     Running       0          36s
roundup-demo-7fb6bcb5b9-w7sg8   1/1     Running       0          33s
roundup-demo-657796ff66-9jm68   1/1     Running       0          3s
roundup-demo-7fb6bcb5b9-4kj8l   1/1     Terminating   0          30s
roundup-demo-657796ff66-gv7bx   0/1     Pending       0          0s

Then I could connect to 'http://odroid_name:8179/demo/' and interact with the tracker.

Also I set up the same app label across all of the k8s objects. So I could run:

% sudo kubectl get pvc,pv,deployments,pods,service,hpa -l app=roundup-demo
NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/roundup-demo-pvc   Bound    pvc-7d5808e0-5ce8-45ee-a794-f5cdc3f1c677   2Gi        RWO            local-path     15h

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                      STORAGECLASS   REASON   AGE
persistentvolume/pvc-7d5808e0-5ce8-45ee-a794-f5cdc3f1c677   2Gi        RWO            Delete           Bound    default/roundup-demo-pvc   local-path              15h

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/roundup-demo   3/3     3            3           15h

NAME                                READY   STATUS    RESTARTS   AGE
pod/roundup-demo-657796ff66-9jm68   1/1     Running   0          89s
pod/roundup-demo-657796ff66-gv7bx   1/1     Running   0          86s
pod/roundup-demo-657796ff66-m77wn   1/1     Running   0          83s

NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
service/roundup-demo   LoadBalancer   10.43.175.253   172.23.1.28   8917:32564/TCP   15h

NAME                                               REFERENCE                 TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/roundup-demo   Deployment/roundup-demo   2%/70%    2         5         2          5h

to list all of the resources:

Persistent Volume Claim
Persistent Volume
Deployment
pods (three replicas)
service load balancer
horizontal autoscaler

At one point, I added interfaces.py to the tracker. It was missing an import. When I restarted the rollout, all three pods errored. Since there is no way to kubectl run an image with attached volumes, I had to create a Pod declaration:

apiVersion: v1
kind: Pod
metadata:
  name: roundup-demo-edit
  labels:
      app: roundup-demo-edit
  namespace: default
spec:
  containers:
  - name: roundup-demo-edit
    image: rounduptracker/roundup-development:multi
    imagePullPolicy: Always
    args: ['shell']
    stdin: true
    tty: true
    volumeMounts:
    - name: trackers
      mountPath: /usr/src/app/tracker
  volumes:
  - name: trackers
    persistentVolumeClaim:
      claimName: roundup-demo-pvc

Note the use of stdin and tty to do the equivalent of docker -it or kubectl run -it for the deployed pod. Without these settings, the container will exit because it requires an interactive tty for shell mode. Using sudo kubectl create -f demo-edit.yaml I was able to start a running pod with the PV attached. I then ran sudo kubectl attach -it roundup-demo-edit and edited the broken interfaces.py. One of the broken pods was trying to restart and did come up but the other two were still in a failed state. However even the one working pod made the demo tracker accessible.

I then restarted a rollout and all three came up.

This shouldn't be needed. When Roundup crashed one of the working pods should have been left. However at the time, I didn't have minReadySeconds: 30 setup. I think k3s restarted the first pod and it didn't crash before all of the rest of the pods had been recycled as well. This left me without any working pods 8-(.

Also I was missing the readinessProbe at the time. The latest update above fixes this.

Database Integrity (what's that??)

One thing to note is that I am playing fast and loose with the data. SQLite has locks and other mechanisms to prevent data loss when multiple processes are accessing the same database. These work on a single file on disk. I believe they also work when using a local file data provider. If I was using NFS or other network/shared disk provider I think there would be a higher chance of data corruption. If your roundup servers are running on multiple systems, you should use journal (DELETE) mode, not the default WAL mode. The shared mmap'ed memory segment required for WAL mode can't be shared across systems.

If you run multiple replicas in an HA config across multiple nodes, you should (must) use mysql or postgresql.

The way that Roundup stores file data is to get a file id number from the db and use that to store the data. One file is written and read multiple times. If the db is working, there should never be two roundup processes trying to overwrite the same file. This is similar to courier maildir or MH maildir handling which were designed to work on NFS.

Backups

For backups I have resorted to using kubectl exec into one of the pods. For example:

% sudo kubectl exec roundup-demo-7bfdf97595-2zsfm -- tar -C /usr/src/app -cf - tracker | tar -xvf - --wildcards 'tracker/demo/db/db*'

can create a copy of the db files. The entire tarfile could just be captured for backup purposes. It would probably be a good idea to use the sqlite3 command (which is not in the docker image 8-() to make a consistent copy of the db files and back them up. See: https://www.sqlite.org/backup.html, https://www.sqlite.org/forum/info/2ea989bbe9a6dfc8 for other ideas including .dump.

Creating a cronjob that does the export is a future idea.

Debugging - running pod using kubectl debug

At one point I needed to check to see if the sqlite db was in wal mode. However the image is missing the sqlite3 cli. Also there is no way to run as root inside a pod container (docker exec -it -u root is not a thing in k8s world). Also the root filesystem was mounted read only. So I ran a debugging image using:

sudo kubectl-superdebug roundup-demo-7bfdf97595-czckt -t roundup-demo -I alpine:latest

from https://github.com/JonMerlevede/kubectl-superdebug and described at https://medium.com/datamindedbe/debugging-running-pods-on-kubernetes-2ba160c47ef5. This creates a ephemeral container that you attach to. In the ephemeral container, you run as root, so you can apk install sqlite 8-). This container shares the pod resources which allows additional investigation if needed.

You can also spin up a new pod similar to my edit pod above.

I had issues with k3s not starting on a new system. It reported:

    failed to find cpu cgroup (v2)

To try to fix I used https://rootlesscontaine.rs/getting-started/common/cgroup2/#enabling-cpu-cpuset-and-io-delegation

 Rootless Containers / Getting Started / Common steps (Read first!) / [Optional] cgroup v2
 Edit this page
[Optional] cgroup v2
Note

Enabling cgroup v2 is optional.

Enabling cgroup v2 is often needed for running Rootless Containers with limiting the consumption of the CPU, memory, I/O, and PIDs resources, e.g. docker run --memory 32m.

Note that cgroup is not needed for just limiting resources with traditional ulimit and cpulimit, though they work in process-granularity rather than in container-granularity. See here for the further information.

Checking whether cgroup v2 is already enabled
If /sys/fs/cgroup/cgroup.controllers is present on your system, you are using v2, otherwise you are using v1.

The following distributions are known to use cgroup v2 by default:

Fedora (since 31)
Arch Linux (since April 2021)
openSUSE Tumbleweed (since c. 2021)
Debian GNU/Linux (since 11)
Ubuntu (since 21.10)
RHEL and RHEL-like distributions (since 9)
Enabling cgroup v2
Enabling cgroup v2 for containers requires kernel 4.15 or later. Kernel 5.2 or later is recommended.

And yet, delegating cgroup v2 controllers to non-root users requires a recent version of systemd. systemd 244 or later is recommended.

To boot the host with cgroup v2, add the following string to the GRUB_CMDLINE_LINUX line in /etc/default/grub and then run sudo update-grub.

systemd.unified_cgroup_hierarchy=1

For ubuntu on azure, you should add this in /etc/default/grub.d/50-cloudimg-settings.cfg

Enabling CPU, CPUSET, and I/O delegation
By default, a non-root user can only get memory controller and pids controller to be delegated.

$ cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/user@$(id -u).service/cgroup.controllers
memory pids
To allow delegation of other controllers such as cpu, cpuset, and io, run the following commands:

sudo mkdir -p /etc/systemd/system/user@.service.d
cat <<EOF | sudo tee /etc/systemd/system/user@.service.d/delegate.conf
[Service]
Delegate=cpu cpuset io memory pids
EOF
sudo systemctl daemon-reload
Delegating cpuset is recommended as well as cpu. Delegating cpuset requires systemd 244 or later.

After changing the systemd configuration, you need to re-login or reboot the host. Rebooting the host is recommended.

but this did nothing probably because I was still using a 4.9 kernel and cpu needs a 4.15 kernel per: https://www.man7.org/linux/man-pages/man7/cgroups.7.html.

The PV is set to ReadWriteOnce. This prevents another node from mounting the volume. However due to a feature/bug in PV's multiple pods can use the same PV even if it is ReadWriteOnce as long as they run on the same node. (1)