With our tickstore building up every day with all the BTC-USD and BTC-USDT trades from Coinbase and Binance, it's time to create a research environment. For this part of the project I wanted a combination of Jupyter Lab with its integrated Git plugin, plus a means of sharing the RAID store of ticks on the local PC. Docker and Docker Compose once again figure prominently in the solution.
Docker for data science
The Jupyter project offers a large collection of different Docker images via its docker-stacks repository. I chose the datascience-notebook
one to use as a base. This gives you a complete Linux image plus local Anaconda install with Jupyter enabled.
The key idea here is everything you want set up to be ready-to-go should be in the Dockerfile: all extensions installed, Git repository cloned, any pip installs you want done, etc.. This is consistent with the Docker philosophy: your Dockerfile is essentially a record of all customization, so rather than documenting lots of setup steps, you encode all that knowledge in the Dockerfile itself. It will take a while to run the first time as it downloads the image and all the extensions, but subsequent runs are super-quick.
Here's the complete Dockerfile
:
FROM jupyter/datascience-notebook
USER root
RUN apt update && apt install --yes git
USER $NB_USER
COPY $PWD/requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt && \
jupyter labextension install @jupyterlab/git --no-build && \
jupyter labextension install nbdime-jupyterlab --no-build && \
jupyter labextension install beakerx-jupyterlab --no-build && \
jupyter serverextension enable --py jupyterlab_git && \
jupyter lab build
WORKDIR /home/jovyan/work
RUN git clone https://github.com/cloudwall/serenity.git
CMD ["start.sh", "jupyter", "lab", "--notebook-dir=/home/jovyan/work/serenity"]
plus a requirements.txt
containing some useful extensions not included in the baseline:
beakerx
cufflinks
dask
jupyterlab-git
pandas
py4j
quandl
scipy
tables
Again, any time we find we need a new extension in the notebook we don't go to the terminal and "pip install" it -- to make it permanent, we update requirements.txt
and rebuild our research environment.
Compose as one-click launcher
To make the Dockerfile usable we'll need to map ports and also map /behemoth
to our local host's tickstore location. We could write a shell script to do that, but in keeping with our philosophy of encoding all the environment knowledge in Docker, let's use a small Docker Compose file for this instead. That way if in future we want to add a Dask cluster, a MongoDB or some other containerized process we can just add it to the Compose YAML to knit it all together:
version: '2.1'
services:
jupyterlab:
build:
context: .
dockerfile: ./docker/jupyterlab/Dockerfile
container_name: "jupyterlab"
ports:
- 8888:8888
volumes:
- ${BEHEMOTH_HOME:-/mnt/raid/data/behemoth}:/behemoth
Voila, in PyCharm Professional we can now right-click and run the Compose file, and we will see a new jupyterlab container running alongside our marketdata recorders and job scheduler:
And if we browse the Linux container's filesystem, we can see that the default /home/jovyan/work
directory has the Serenity mono-repo checked out from Git, and our sample notebooks under serenity/serenity-research/notebooks
are right there:
All that remains is to launch it in the browser, and we have our research notebook!
Integrating Git
One of the reasons I chose the newer Jupyter Lab over Jupyter is its Git integration. Since we've created our Docker image with a pre-cloned Git repository, we can immediately open the Git interface in the browser and see the files that have changed and that we've staged for commit:
You can also "git pull" and "git push" right in the browser.
Don't like the extra step of staging a change? Enable to Git -> Simple staging in the menu, and every change will be automatically staged.
You can even display diffs vs. the last checkpoint (to see what's unsaved) or vs. Git HEAD once you save your update: