In Research-to-Go with Docker we used Docker and Docker Compose to create a Serenity research environment. Unfortunately all good things must come to an end, and what's described in that previous article is currently not working because Jupyter Lab has moved on to the 2.x series, but the Git extension has not kept up; the Docker Compose setup no longer boots. I decided to take the chance to re-do it using a custom Docker image that's fully in my control rather than the stock Jupyter Docker image and switch to Kubernetes to manage it at runtime.
Updating the Dockerfile
The first change is to the Dockerfile. The base changes from the Jupyter Docker stack to the more basic python:3.8-slim-buster image used by the rest of Serenity. Some functions done in the "base" Dockerfile also get pushed down, e.g. creation of a "serenity" user and home directory:
FROM python:3.8-slim-buster
USER root
RUN apt update && apt install --yes git nodejs npm
COPY $PWD/requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt && \
jupyter labextension install @jupyterlab/git --no-build && \
jupyter labextension install @jupyterlab/plotly-extension --no-build && \
jupyter labextension install nbdime-jupyterlab --no-build && \
jupyter serverextension enable --py jupyterlab_git && \
jupyter lab build
RUN groupadd -r serenity && useradd --create-home --no-log-init -r -g serenity serenity
USER serenity
RUN mkdir -p /home/serenity/dev/shadows
WORKDIR /home/serenity/dev/shadows
RUN git clone https://github.com/cloudwall/serenity.git
CMD ["jupyter", "lab", "--ip=*", "--port=8888", "--no-browser", "--notebook-dir=/home/serenity/dev/shadows/serenity"]
With this done we can build, tag and push:
cd serenity/serenity-research
sudo docker build -t serenity-jupyterlab:2020.04.12-2 .
sudo docker tag serenity-jupyterlab:2020.04.12-2 \
cloudwallcapital/serenity-jupyterlab:2020.04.12-2
sudo docker push cloudwallcapital/serenity-jupyterlab:2020.04.12-2
creating a new tag up in the cloudwallcapital/serenity-jupyterlab repository.
You can now run it with:
$ sudo docker run cloudwallcapital/serenity-jupyterlab:2020.04.12-2
[I 14:58:26.081 LabApp] Writing notebook server cookie secret to /home/serenity/.local/share/jupyter/runtime/notebook_cookie_secret
[W 14:58:26.273 LabApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 14:58:26.827 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.8/site-packages/jupyterlab
[I 14:58:26.827 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab
[I 14:58:27.452 LabApp] Serving notebooks from local directory: /home/serenity/dev/shadows/serenity
[I 14:58:27.452 LabApp] The Jupyter Notebook is running at:
[I 14:58:27.452 LabApp] http://f2c0257c98ba:8888/?token=0a03d9e4153a59a77a76d3ef41b2b1b0d8bb2dbda6556793
[I 14:58:27.452 LabApp] or http://127.0.0.1:8888/?token=0a03d9e4153a59a77a76d3ef41b2b1b0d8bb2dbda6556793
[I 14:58:27.452 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 14:58:27.454 LabApp]
To access the notebook, open this file in a browser:
file:///home/serenity/.local/share/jupyter/runtime/nbserver-1-open.html
Or copy and paste one of these URLs:
http://f2c0257c98ba:8888/?token=0a03d9e4153a59a77a76d3ef41b2b1b0d8bb2dbda6556793
or http://127.0.0.1:8888/?token=0a03d9e4153a59a77a76d3ef41b2b1b0d8bb2dbda6556793
to test that everything is working before we move on to the Kubernetes step.
Long-running Jupyter with Kubernetes Deployment
The Kubernetes piece requires a couple components;
- a storage mapping for read-only access to the Behemoth tick store
- a Deployment for the container itself
- a NodePort object to expose to the container to the outside world
We start with storage, which uses the ReadOnlyMany mode, unlike the recorder, and createa cleaim for the serenity-jupyterlab app to use this storage:
kind: PersistentVolume
apiVersion: v1
metadata:
name: serenity-jupyterlab-pv-volume
labels:
type: local
app: serenity-jupyterlab
spec:
storageClassName: behemoth-read-sc
capacity:
storage: 50Gi
accessModes:
- ReadOnlyMany
hostPath:
path: "/mnt/raid/data/behemoth"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: serenity-jupyterlab-pv-claim
labels:
app: serenity-jupyterlab
spec:
storageClassName: behemoth-read-sc
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 50Gi
The Deployment runs a single instance with a specific version of the Labs software to ensure stability, having learned our lessons with the previous Jupyter stack. We also reference the behemoth-volume claim and mount it at /behemoth
for use:
apiVersion: apps/v1
kind: Deployment
metadata:
name: serenity-jupyterlab
labels:
app: serenity-jupyterlab
spec:
replicas: 1
selector:
matchLabels:
app: serenity-jupyterlab
template:
metadata:
labels:
app: serenity-jupyterlab
spec:
containers:
- name: serenity-jupyterlab
image: cloudwallcapital/serenity-jupyterlab:2020.04.12-2
ports:
- containerPort: 8888
volumeMounts:
- mountPath: /behemoth
name: behemoth-volume
volumes:
- name: behemoth-volume
persistentVolumeClaim:
claimName: serenity-jupyterlab-pv-claim
Finally, we need to expose our Jupyter lab instance to the world, mapping the Pods internal port 8888 listener to 30888:
piVersion: v1
kind: Service
metadata:
name: serenity-jupyterlab-nodeport
labels:
app: serenity-jupyterlab
spec:
type: NodePort
ports:
- port: 8888
nodePort: 30888
selector:
app: serenity-jupyterlab
We can use microk8s.kubectl
to apply all three of these YAML files, and the net result should look something like this:
$ sudo microk8s.kubectl get pods
NAME READY STATUS RESTARTS AGE
binance-recorder-67954bbdf8-ppjxc 1/1 Running 0 15h
coinbase-recorder-5dc6bc495b-p7khr 1/1 Running 0 15h
phemex-recorder-69b84bc9bc-ww56c 1/1 Running 0 15h
scheduler-69fb554d55-bl46l 1/1 Running 0 15h
serenity-jupyterlab-fdb94c444-vpftc 1/1 Running 0 15h
timescaledb-648d59b7f5-gnd5w 1/1 Running 1 15h
We'll need to get the token, but we can grab that quickly with the logs command:
$ sudo microk8s.kubectl logs serenity-jupyterlab-fdb94c444-vpftc | egrep "127.0.0.1:8888"
And we are back up and running, this time with an always-running Jupyter Lab and a stable deployment mechanism.