Whether running data analysis or training a simple model there are many reasons choose working on a Jupyter Notebooks service on the cloud. Google’s offers thee main Cloud-Notebooks products: Vertex AI Workbench, Colab and Deep Learning VMs. What are the differences between them? Which service is better for industry usage and which is more optimal for educational purposes? Let’s dive into the difference in these products!
Summary table
Google Colab – When you need to share
Back in 2017, Google announced the official public release of its internal tool Colaboraty, also known as Colab. This tool promised to be the next generation of Jupyter Notebooks running in the cloud, with no setup needed to be up and running. Five years later, it has delivered on its promise.
You can think of Colab as a combination of a Jupyter Notebook and a Google Doc: browser based product that offers easy online editing sharing and collaborating with authorised users. Just like in Google Docs your work is saved under Google Drive. When you will want to run the code itself Google will allow you to select one of 3 backends: CPU, GPU and TPU accelerated – by default: for no cost at all!
Colab’s interface is very similar to the JupyterLab experience
While Colab is free for the most part the default allocated instance has some limitations: you can upgrade to Colab Pro which gives you access to faster GPU and allows you to perform background execution of the notebooks. Still, even with a subscription, you do not get guarantee access to resources, and the notebook lifetime only gets upgraded from 12 to 24 hours. For users who try to accomplish long running tasks this feature is critical since when session disconnects for any reason all data saved on the machine is lost (unless it was previously saved on Google Drive). In this case you have to rerun all your code, including installing non-default Python libraries.
Another hiccup, is that there is no live editing, which is a bummer for a tool that calls itself Collaboratory. Especially while other tools in this suite: Docs, Slides and Sheets all have that feature.
In my humble opinion Google Colab is optimal for academic purposes like a teacher sharing code with their class, letting every student create their own copy of the base code and run it. Colab is also optimal when you want to share a code example with the community or just run some non-critical data science work without using a credit card.
Vertex AI – easy to use with full GCP access
Vertex AI is already part of Google Cloud and requires a GCP project with billing set up. One of its components is Vertex AI Workbench (previously Vertex AI Notebooks) which is a managed Jupyter service.
In its basis Workbench is just a VM instance that will show up in your Compute Engine list. But under the hood the service has many useful features to ease the use of the machine as a notebook server. When initiating an instance, you can go on the advanced settings to fine tune this instance to your machine learning needs.
Vertex AI offers you the full JupyterLab experience that you click to start using directly on your browser
One of the coolest features offered by Workbench is the ability to have multiple editors on the same notebook, by enabling the Realtime Collaboration setting. As a Compute Engine instance, the server has access to all GCP services based on the machine’s identity allowing smooth work with BigQuery, GCS and other important tools that together offer a powerful suite for data scientists. Accessing the notebook is done by clicking a button on the GCP console and while the connection is easy it’s also safe and implements identity access control as well as encrypted communication between the browser and the server.
The downside of Vertex AI is its price. In addition to the compute cost, the service introduces a management fee per CPU and spot instances are not available as part of the service.
I found this service to be more appropriate for employees of companies for which time is money. It’s optimal for running complex workloads leveraging the full power of GCP.
Deep Learning VM – powerful yet affordable
Way before Colab and Vertex AI were launched, Google had already been providing customers with cloud services through Cloud Platform. Compute Engine was one of the early services launched and it supports custom images.
Deep Learning VM is basically a dedicated AI image for Compute Engine. Launching a GCE Instance with Deep Learning image provides users with a powerful environment for data science tasks: ready for machine learning projects, integrated with the most popular AI frameworks. It also easily supports by Cloud GPU and TPU.
Unlike Workbench, these are not a managed service and access to them is done by creating an SSH tunnel from your local machine to the server. It’s not so difficult as it sounds: After you setup your own virtual machine, you can access it via an SSH connection, either on your own terminal or a virtual one in the browser provided by Google. And yet, of the 3 services that we review here it’s the most complicated to launch and connect to.
You can easily access the Deep Learning VM through an SSH conncetion
This gives you a much better experience overall compared to Colab, as you can save your files permanently in the persistent disk. Besides that, you can now organise your work into multiple notebooks and Python scripts.
One major advantage over Workbench is that with DLVMs you are able to request a spot instance, reducing the cost by up to 90%. The catch is, if there’s bigger demand, Google will remove the instance allocated to you without a warning. You can read more about spot instances in our previous post.
This service is the perfect option for researchers that might need a little extra power on their projects at a lower cost and where data resilience is not that important (if you’re using a spot instance). It’s also wonderful for companies that are running on a tight budget.
What’s your favourite Notebook Service?
In this blog post I only reviewed 3 Google Notebook services, there are plenty more – showing how important and useful tool is Jupyter. If you think that I missed any crucial points or you know a better service let me know in the comments below? 🙂