Docker Swarm with XNAT
Setting up Docker Swarm
A complete explanation of how to setup Docker Swarm is outside the scope of this document but you can find some useful articles here:
https://scalified.com/2018/10/08/building-jenkins-pipelines-docker-swarm/
https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/
https://docs.docker.com/engine/swarm/ingress/
Setting up with AWS:
https://semaphoreci.com/community/tutorials/bootstrapping-a-docker-swarm-mode-cluster
Pipelines
XNAT uses pipelines to perform various different processes - mostly converting image types to other image types (DICOM to NIFTI for example).
In the past this was handled on the instance as part of the XNAT program, then as a docker server on the instance and finally, externally as an external docker server, either directly or using Docker swarm.
XNAT utilises the Container service which is a plugin to perform docker based pipelines. In the case of Kubernetes, docker MUST be run externally so Docker swarm is used as it provides load balancing.
Whilst the XNAT team work on replacing the Container service on Docker Swarm with a Kubernetes based Container service, Docker swarm is the most appropriate stop gap option.
Prerequisites
You will require the Docker API endpoint opened remotely so that XNAT can access and send pipeline jobs to it. For security, this should be done via HTTPS (not HTTP).
Standard port is TCP 2376. With Docker Swarm enabled you can send jobs to any of the manager or worker nodes and it will automatically internally load balance. I chose to use the Manager node’s IP and pointed DNS to it.
You should lock access to port 2376 to the Kubernetes XNAT subnets only using firewalls or Security Group settings. You can also use an external Load balancer with certificates which maybe preferred.
If the certificates are not provided by a known CA, you will need to add the certificates (server, CA and client) to your XNAT container build so choosing a proper certificate from a known CA will make your life easier.
If you do use self signed certificates, you will need create a folder, add the certificates and then specify that folder in the XNAT GUI > Administer > Plugin Settings > Container Server Setup > Edit Host Name. In our example case:
Certificate Path: /usr/local/tomcat/certs
Access from the Docker Swarm to the XNAT shared filesystem - at a minimum Archive and build. The AIS Helm chart doesn’t have /data/xnat/build setup by default but without this Docker Swarm can’t write the temporaray files it needs and fails.
Setup DNS and external certificates
Whether you will need to create self signed certificates or public CA verified ones, you will need a fully qualified domain name to create them against.
I suggest you set an A record to point to the Manager node IP address, or a Load Balancer which points to all nodes. Then create the certificates against your FQDN - e.g. swarm.example.com.
Allow remote access to Docker API endpoint on TCP 2376
To enable docker to listen on port 2376 edit the service file or create /etc/docker/daemon.json.
We will edit the docker service file. Remember to specify whatever certificates you will be using in here. They will be pointing to your FQDN - in our case above, swarm.example.com.
systemctl edit docker
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2376 --tlsverify --tlscacert /root/.docker/ca.pem --tlscert /root/.docker/server-cert.pem -tlskey /root/.docker/server-key.pem -H unix:///var/run/docker.sock
systemctl restart docker
Repeat on all nodes. Docker Swarm is now listening remotely on TCP 2376.
Secure access to TCP port 2376
Add a firewall rule to only allow access to TCP port 2376 from the Kubernetes subnets.
Ensure Docker Swarm nodes have access to the XNAT shared filesystem
Without access to the Archive shared filesystem Docker cannot run any pipeline conversions. This seems pretty obvious. Less obvious however is that the XNAT Docker Swarm requires access to the Build shared filesystem to run temporary jobs before writing back to Archive upon completion.
This presents a problem as the AIS Helm Chart does not come with a persistent volume for the Build directory, so we need to create one.
Create a volume outside the Helm Chart and then present it in your values file. In this example I created a custom class. Make sure accessMode is ReadWriteMany so Docker Swarm nodes can access.
volumes:
build:
accessMode: ReadWriteMany
mountPath: /data/xnat/build
storageClassName: "custom-class"
volumeMode: Filesystem
persistentVolumeReclaimPolicy: Retain
persistentVolumeClaim:
claimName: "build-xnat-xnat-web"
size: 10Gi
You would need to create the custom-class storageclass and apply it first or the volume won’t be created. In this case, create a file - storageclass.yaml and add the followinng contents:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: custom-class
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
You can then apply it:
kubectl apply -f storageclass.yaml
Of course you may want to use an existing Storage Class so this maybe unnecessary, it is just an example.
Apply the Kubernetes volume file first and then apply the Helm chart and values file. You should now see something like the following:
kubectl get -nxnat pvc,pv
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/archive-xnat-xnat-web Bound archive-xnat-xnat-web 10Gi RWX custom-class 5d1h
persistentvolumeclaim/build-xnat-xnat-web Bound build-xnat-xnat-web 10Gi RWX custom-class 5d1h
persistentvolumeclaim/cache-xnat-xnat-web-0 Bound pvc-b5b72b92-d15f-4a22-9b88-850bd726d1e2 10Gi RWO gp2 5d1h
persistentvolumeclaim/prearchive-xnat-xnat-web Bound prearchive-xnat-xnat-web 10Gi RWX custom-class 5d1h
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/archive-xnat-xnat-web 10Gi RWX Retain Bound xnat/archive-xnat-xnat-web custom-class 5d1h
persistentvolume/build-xnat-xnat-web 10Gi RWX Retain Bound xnat/build-xnat-xnat-web custom-class 5d1h
persistentvolume/prearchive-xnat-xnat-web 10Gi RWX Retain Bound xnat/prearchive-xnat-xnat-web custom-class 5d1h
persistentvolume/pvc-b5b72b92-d15f-4a22-9b88-850bd726d1e2 10Gi RWO Delete Bound xnat/cache-xnat-xnat-web-0 gp2 5d1h
As you can see, the build directory is now a mounted volume. You are now ready to mount the volumes on the Docker swarm nodes.
Depending how you presented your shared filesystem, just create the directories on the Docker swarm nodes and manager (if the manager is also a worker), add to /etc/fstab and mount the volumes.
To make your life easier use the same file structure for the mounts - i.e build volume mounted in /data/xnat/build and archive volume mounted in /data/xnat/archive. If you don’t do this you will need to specify the Docker swarm mounted XNAT directories in the XNAT GUI.
Add your Docker Swarm to XNAT Plugin Settings
You can read about the various options in the official XNAT documentation on their website here:
https://wiki.xnat.org/container-service/installing-and-enabling-the-container-service-in-xnat-126156821.html
https://wiki.xnat.org/container-service/configuring-a-container-host-126156926.html
In the XNAT GUI, go to Administer > Plugin Settings > Container Server Setup and under Docker Server setup select > New Container host.
In our above example, for host name you would select swarm.example.com, URL would be https://swarm.example.com:2376 and certificate path would be /usr/local/tomcat/certs. As previously mentioned, it is desirable to have public CA and certificates to avoid the needs for specifying certificates at all here.
Select Swarm Mode to “ON”.
You will need to select Path Translation if you DIDN’T mount the Docker swarm XNAT directories in the same place.
The other options are optional.
Once applied make sure that Status is “Up”. The Image hosts section should also now have a status of Up.
You can now start adding your Images & Commands in the Administer > Plugin Settings > Images & Commands section.
Troubleshooting
If you have configured docker swarm to listen on port 2376 but status says down, firstly check you can telnet or netcat to the port first locally, then remotely. From one of the nodes:
nc -zv 127.0.0.1 2376
or
telnet 127.0.0.1 2376
If you can, try remotely from a location that has firewall ingress access. In our example previously, try:
nc -zv swarm.example.com 2376
telnet swarm.example.com 2376
Make sure the correct ports are open and accessible on the Docker swarm manager:
The network ports required for a Docker Swarm to function correctly are:
TCP port 2376 for secure Docker client communication. This port is required for Docker Machine to work. Docker Machine is used to orchestrate Docker hosts.
TCP port 2377. This port is used for communication between the nodes of a Docker Swarm or cluster. It only needs to be opened on manager nodes.
TCP and UDP port 7946 for communication among nodes (container network discovery).
UDP port 4789 for overlay network traffic (container ingress networking).
Make sure docker service is started on all docker swarm nodes.
If Status is set to Up and the container automations are failing, confirm the archive AND build shared filesystems are properly mounted on all servers - XNAT and Docker swarm. A Failed (Rejected) status for a pipeline is likely due to this error.
In this case, as a service can’t be created you won’t have enough time to see the service logs with the usual:
docker service ls
command followed by looking at the service in question, so stop the docker service on the Docker swarm node and start in the foreground, using our service example above:
dockerd -H tcp://0.0.0.0:2376 --tlsverify --tlscacert /root/.docker/ca.pem --tlscert /root/.docker/server-cert.pem --tlskey /root/.docker/server-key.pem -H unix:///var/run/docker.sock
Then upload some dicoms and watch the processing run in the foreground.
Docker Swarm admin guide:
https://docs.docker.com/engine/swarm/admin_guide/
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.