So you are running some application in a pod on kubernetes. The data is stored in a volume mounted into the pod. So far so easy.
Now you want to do backups; if your storage solution (and application!) support
ReadWriteMany-volumes, it’s easy: You can start a
Job that also mounts the volume and copy the data.
But what if you are using
ReadWriteOnce (“RWX” for “eXclusive”) volumes?
I have tried two ways of getting to files in a Pod from a job:
kubectl cpin the job
kubectl exec ... -- tar -C /volume -cp .in the job
Since copying a a large number of files (or fewer files from a slow storage solution) takes a long time, both ways run into time out issues. Both ways cannot pick up where they stopped and need to copy all files from the beginning.
First, we will need to get a
rsync “server” into our pod; I have created this
Dockerfile that just contains the
So we add this as a sidecar to our application Pod like so:
containers: ... - name: rsync image: toelke158/docker-rsync volumeMounts: - name: home mountPath: /data
To reach this
rsync, we can use
kubectl exec; we just have to tell
rsync to use it instead of ssh or the native rsync protocol.
I have developed1 a flexible shell-script that does exactly that.
It boils down to doing
kubectl exec -i -- "$@" (where
$@ means “all arguments given to the script”).
rsync will call the script like
<script> rsync server ... — and kubectl will thus start exactly that command in the pod.
If you read the script, you will see that it supports specifying the
rsync “server” in multiple ways:
podif the pod is in the default namespace and the first container is the rsync container.
pod.containerto specify the container to access.
kind#pod.containerto specify e.g. to connect to a
Deployment— this is useful when the name of the Pod is not stable
kind#pod.container@namespaceif you need to specify the namespace