Using rsync to copy files to and from a kubernetes pod
So you are running some application in a pod on kubernetes. The data is stored in a volume mounted into the pod. So far so easy.
Now you want to do backups; if your storage solution (and application!) support ReadWriteMany
-volumes, it’s easy: You can start a Job
that also mounts the volume and copy the data.
But what if you are using ReadWriteOnce
(“RWX” for “eXclusive”) volumes?
I have tried two ways of getting to files in a Pod from a job:
- running
kubectl cp
in the job - running
kubectl exec ... -- tar -C /volume -cp .
in the job
Since copying a a large number of files (or fewer files from a slow storage solution) takes a long time, both ways run into time out issues. Both ways cannot pick up where they stopped and need to copy all files from the beginning.
Yes! rsync
was designed to solve this exact problem: Copying files with the ability to restart and to pick up changes over time.
There is a reason multiple solutions exist for backing up using rsync
…
First, we will need to get a rsync
“server” into our pod; I have created this Dockerfile
that just contains the rsync
binary.
So we add this as a sidecar to our application Pod like so:
containers:
...
- name: rsync
image: toelke158/docker-rsync
volumeMounts:
- name: home
mountPath: /data
To reach this rsync
, we can use kubectl exec
; we just have to tell rsync
to use it instead of ssh or the native rsync protocol.
I have developed1 a flexible shell-script that does exactly that.
It boils down to doing kubectl exec -i -- "$@"
(where $@
means “all arguments given to the script”).
rsync
will call the script like <script> rsync server ...
— and kubectl will thus start exactly that command in the pod.
If you read the script, you will see that it supports specifying the rsync
“server” in multiple ways:
pod
if the pod is in the default namespace and the first container is the rsync container.pod.container
to specify the container to access.kind#pod
orkind#pod.container
to specify e.g. to connect to aDeployment
— this is useful when the name of the Pod is not stablepod@namespace
,kind#pod@namespace
orkind#pod.container@namespace
if you need to specify the namespace
In the README of this github-repository I show a complete example of getting the data and then using restic to actually store the backup.
- And with “developed” I mean of course, blatantly copied from StackOverflow and adapted… ↩︎