Data science with JupyterLab
Docker and JupyterLab are two powerful tools that can enhance your data science workflow. In this guide, you will learn how to use them together to create and run reproducible data science environments. This guide is based on Supercharging AI/ML Development with JupyterLab and Docker.
In this guide, you'll learn how to:
- Run a personal Jupyter Server with JupyterLab on your local machine
- Customize your JupyterLab environment
- Share your JupyterLab notebook and environment with other data scientists
What is JupyterLab?
JupyterLab is an open source application built around the concept of a computational notebook document. It enables sharing and executing code, data processing, visualization, and offers a range of interactive features for creating graphs.
Why use Docker and JupyterLab together?
By combining Docker and JupyterLab, you can benefit from the advantages of both tools, such as:
- Containerization ensures a consistent JupyterLab environment across all deployments, eliminating compatibility issues.
- Containerized JupyterLab simplifies sharing and collaboration by removing the need for manual environment setup.
- Containers offer scalability for JupyterLab, supporting workload distribution and efficient resource management with platforms like Kubernetes.
Prerequisites
To follow along with this guide, you must install the latest version of Docker Desktop.
Run and access a JupyterLab container
In a terminal, run the following command to run your JupyterLab container.
$ docker run --rm -p 8889:8888 quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
The following are the notable parts of the command:
-p 8889:8888
: Maps port 8889 from the host to port 8888 on the container.start-notebook.py --NotebookApp.token='my-token'
: Sets an access token rather than using a random token.
For more details, see the Jupyter Server Options and the docker run CLI reference.
If this is the first time you are running the image, Docker will download and run it. The amount of time it takes to download the image will vary depending on your network connection.
After the image downloads and runs, you can access the container. To access the container, in a web browser navigate to localhost:8889/lab?token=my-token.
To stop the container, in the terminal press ctrl
+c
.
To access an existing notebook on your system, you can use a bind mount. Open a terminal and change directory to where your existing notebook is. Then, run the following command based on your operating system.
$ docker run --rm -p 8889:8888 -v "$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
$ docker run --rm -p 8889:8888 -v "%cd%":/home/jovyan/work quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
$ docker run --rm -p 8889:8888 -v "$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
$ docker run --rm -p 8889:8888 -v "/$(pwd):/home/jovyan/work" quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
The -v
option tells Docker to mount your current working directory to
/home/jovyan/work
inside the container. By default, the Jupyter image's root
directory is /home/jovyan
and you can only access or save notebooks to that
directory in the container.
Now you can access localhost:8889/lab?token=my-token and open notebooks contained in the bind mounted directory.
To stop the container, in the terminal press ctrl
+c
.
Docker also has volumes, which are the preferred mechanism for persisting data generated by and used by Docker containers. While bind mounts are dependent on the directory structure and OS of the host machine, volumes are completely managed by Docker.
Save and access notebooks
When you remove a container, all data in that container is deleted. To save notebooks outside of the container, you can use a volume.
Run a JupyterLab container with a volume
To start the container with a volume, open a terminal and run the following command
$ docker run --rm -p 8889:8888 -v jupyter-data:/home/jovyan/work quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
The -v
option tells Docker to create a volume named jupyter-data
and mount it in the container at /home/jovyan/work
.
To access the container, in a web browser navigate to localhost:8889/lab?token=my-token. Notebooks can now be saved to the volume and will accessible even when the container is deleted.
Save a notebook to the volume
For this example, you'll use the Iris Dataset example from scikit-learn.
-
Open a web browser and access your JupyterLab container at localhost:8889/lab?token=my-token.
-
In the Launcher, under Notebook, select Python 3.
-
In the notebook, specify the following to install the necessary packages.
!pip install matplotlib scikit-learn
-
Select the play button to run the code.
-
In the notebook, specify the following code.
from sklearn import datasets iris = datasets.load_iris() import matplotlib.pyplot as plt _, ax = plt.subplots() scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target) ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1]) _ = ax.legend( scatter.legend_elements()[0], iris.target_names, loc="lower right", title="Classes" )
-
Select the play button to run the code. You should see a scatter plot of the Iris dataset.
-
In the top menu, select File and then Save Notebook.
-
Specify a name in the
work
directory to save the notebook to the volume. For example,work/mynotebook.ipynb
. -
Select Rename to save the notebook.
The notebook is now saved in the volume.
In the terminal, press ctrl
+ c
to stop the container.
Now, any time you run a Jupyter container with the volume, you'll have access to the saved notebook.
When you do run a new container, and then run the data plot code again, it'll
need to run !pip install matplotlib scikit-learn
and download the packages.
You can avoid reinstalling packages every time you run a new container by
creating your own image with the packages already installed.
Customize your JupyterLab environment
You can create your own JupyterLab environment and build it into an image using Docker. By building your own image, you can customize your JupyterLab environment with the packages and tools you need, and ensure that it's consistent and reproducible across different deployments. Building your own image also makes it easier to share your JupyterLab environment with others, or to use it as a base for further development.
Define your environment in a Dockerfile
In the previous Iris Dataset example from
Save a notebook to the volume, you had to install the dependencies, matplotlib
and scikit-learn
, every time you ran a new container. While the dependencies in that small example download and
install quickly, it may become a problem as your list of dependencies grow.
There may also be other tools, packages, or files that you always want in your
environment.
In this case, you can install the dependencies as part of the environment in the image. Then, every time you run your container, the dependencies will always be installed.
You can define your environment in a Dockerfile. A Dockerfile is a text file that instructs Docker how to create an image of your JupyterLab environment. An image contains everything you want and need when running JupyterLab, such as files, packages, and tools.
In a directory of your choice, create a new text file named Dockerfile
. Open the Dockerfile
in an IDE or text editor and then add the following contents.
# syntax=docker/dockerfile:1
FROM quay.io/jupyter/base-notebook
RUN pip install --no-cache-dir matplotlib scikit-learn
This Dockerfile uses the quay.io/jupyter/base-notebook
image as the base, and then runs pip
to install the dependencies. For more details about the instructions in the Dockerfile, see the
Dockerfile reference.
Before you proceed, save your changes to the Dockerfile
.
Build your environment into an image
After you have a Dockerfile
to define your environment, you can use docker build
to build an image using your Dockerfile
.
Open a terminal, change directory to the directory where your Dockerfile
is
located, and then run the following command.
$ docker build -t my-jupyter-image .
The command builds a Docker image from your Dockerfile
and a context. The
-t
option specifies the name and tag of the image, in this case
my-jupyter-image
. The .
indicates that the current directory is the context,
which means that the files in that directory can be used in the image creation
process.
You can verify that the image was built by viewing the Images
view in Docker Desktop, or by running the docker image ls
command in a terminal. You should see an image named my-jupyter-image
.
Run your image as a container
To run your image as a container, you use the docker run
command. In the
docker run
command, you'll specify your own image name.
$ docker run --rm -p 8889:8888 my-jupyter-image start-notebook.py --NotebookApp.token='my-token'
To access the container, in a web browser navigate to localhost:8889/lab?token=my-token.
You can now use the packages without having to install them in your notebook.
-
In the Launcher, under Notebook, select Python 3.
-
In the notebook, specify the following code.
from sklearn import datasets iris = datasets.load_iris() import matplotlib.pyplot as plt _, ax = plt.subplots() scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target) ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1]) _ = ax.legend( scatter.legend_elements()[0], iris.target_names, loc="lower right", title="Classes" )
-
Select the play button to run the code. You should see a scatter plot of the Iris dataset.
In the terminal, press ctrl
+ c
to stop the container.
Use Compose to run your container
Docker Compose is a tool for defining and running multi-container applications.
In this case, the application isn't a multi-container application, but Docker
Compose can make it easier to run by defining all the docker run
options in a
file.
Create a Compose file
To use Compose, you need a compose.yaml
file. In the same directory as your
Dockerfile
, create a new file named compose.yaml
.
Open the compose.yaml
file in an IDE or text editor and add the following
contents.
services:
jupyter:
build:
context: .
ports:
- 8889:8888
volumes:
- jupyter-data:/home/jovyan/work
command: start-notebook.py --NotebookApp.token='my-token'
volumes:
jupyter-data:
name: jupyter-data
This Compose file specifies all the options you used in the docker run
command. For more details about the Compose instructions, see the
Compose file reference.
Before you proceed, save your changes to the compose.yaml
file.
Run your container using Compose
Open a terminal, change directory to where your compose.yaml
file is located, and then run the following command.
$ docker compose up --build
This command builds your image and runs it as a container using the instructions
specified in the compose.yaml
file. The --build
option ensures that your
image is rebuilt, which is necessary if you made changes to your Dockerfile
.
To access the container, in a web browser navigate to localhost:8889/lab?token=my-token.
In the terminal, press ctrl
+ c
to stop the container.
Share your work
By sharing your image and notebook, you create a portable and replicable research environment that can be easily accessed and used by other data scientists. This process not only facilitates collaboration but also ensures that your work is preserved in an environment where it can be run without compatibility issues.
To share your image and data, you'll use Docker Hub. Docker Hub is a cloud-based registry service that lets you share and distribute container images.
Share your image
-
Sign up or sign in to Docker Hub.
-
Rename your image so that Docker knows which repository to push it to. Open a terminal and run the following
docker tag
command. ReplaceYOUR-USER-NAME
with your Docker ID.$ docker tag my-jupyter-image YOUR-USER-NAME/my-jupyter-image
-
Run the following
docker push
command to push the image to Docker Hub. ReplaceYOUR-USER-NAME
with your Docker ID.$ docker push YOUR-USER-NAME/my-jupyter-image
-
Verify that you pushed the image to Docker Hub.
- Go to Docker Hub.
- Select Repositories.
- View the Last pushed time for your repository.
Other users can now download and run your image using the docker run
command. They need to replace YOUR-USER-NAME
with your Docker ID.
$ docker run --rm -p 8889:8888 YOUR-USER-NAME/my-jupyter-image start-notebook.py --NotebookApp.token='my-token'
Share your volume
This example uses the Docker Desktop graphical user interface. Alternatively, in the command line interface you can back up the volume and then push it using the ORAS CLI.
- Sign in to Docker Desktop.
- In the Docker Dashboard, select Volumes.
- Select the jupyter-data volume by selecting the name.
- Select the Exports tab.
- Select Quick export.
- For Location, select Registry.
- In the text box under Registry, specify your Docker ID, a name for the
volume, and a tag. For example,
YOUR-USERNAME/jupyter-data:latest
. - Select Save.
- Verify that you exported the volume to Docker Hub.
- Go to Docker Hub.
- Select Repositories.
- View the Last pushed time for your repository.
Other users can now download and import your volume. To import the volume and then run it with your image:
- Sign in to Docker Desktop.
- In the Docker Dashboard, select Volumes.
- Select Create to create a new volume.
- Specify a name for the new volume. For this example, use
jupyter-data-2
. - Select Create.
- In the list of volumes, select the jupyter-data-2 volume by selecting the name.
- Select Import.
- For Location, select Registry.
- In the text box under Registry, specify the same name as the repository
that you exported your volume to. For example,
YOUR-USERNAME/jupyter-data:latest
. - Select Import.
- In a terminal, run
docker run
to run your image with the imported volume. ReplaceYOUR-USER-NAME
with your Docker ID.
$ docker run --rm -p 8889:8888 -v jupyter-data-2:/home/jovyan/work YOUR-USER-NAME/my-jupyter-image start-notebook.py --NotebookApp.token='my-token'
Summary
In this guide, you learned how to leverage Docker and JupyterLab to create reproducible data science environments, facilitating the development and sharing of data science projects. This included, running a personal JupyterLab server, customizing the environment with necessary tools and packages, and sharing notebooks and environments with other data scientists.
Related information: