How to Install Libraries permanently in Google Colab?
Most coders or developers use Google Colab for executing their prototypes very fast as it provides you access to faster GPUs like the T4 and P100. But sometimes your program requires a specific version of the libraries which are not installed in google colab to be installed. It becomes tedious and time-consuming to install dependencies in every run(installed dependencies are no longer available once runtime disconnects) of colab notebook.
Recently I got stuck on a similar problem. I have to install few large-sized libraries to run a program in colab, which takes too long to complete installation. But I want to run my program multiple times and thus need to avoid the installation of libraries in each run to decrease installation time. Which is not possible without installing dependencies permanently in google colab. So, In this blog, I will describe how I have successfully installed all the dependencies permanently in google colab.
First, mount the google drive using the following two lines of code.
from google.colab import drive
drive.mount("/content/drive")
After successfully mounting the google drive, Let’s create a virtual environment using virtualenv
library. In 2024, virtualenv
doesn’t come by default in colab, so install it using pip install virtualenv
. One thing to keep in mind is that, create a virtual environment inside your Google Drive, which is mounted above.
!virtualenv /content/drive/MyDrive/colab_env
Here you can see that, a virtual environment namedcolab_env
has been created in google drive.
Now let’s install a library named Pypdf
in the virtual environment colab_env
. To install a library in the virtual environment, we should activate the environment first, and install the library in the same cell.
!source /content/drive/MyDrive/colab_env/bin/activate; pip install Pypdf
In the above line of code !source/content/drive/MyDrive/colab_env/bin/activate
activates our environment colab_env
. And pip install Pypdf
installs Pypdf
library inside the colab_env
environment.
from pypdf import PdfReader
reader = PdfReader("/content/Data science journey 3.pdf")
number_of_pages = len(reader.pages)
print(number_of_pages)
Here, I have tried to run some functionalities of our newly installed Pypdf
library(a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files).
But the above lines of codes throw an error i.e ModuleNotFoundError: No module named ‘pypdf’
.
So, what do you think about? Why we couldn’t import our newly installed library Pypdf
? Here, we have installed Pypdf
in our virtual environment colab_env
but our imported module search library in google colab runtime. In order to find the library installed on the virtual environment we should add the path of the virtual environmentsite-packages
to colab system path.
import sys
sys.path.append("/content/drive/MyDrive/colab_env/lib/python3.8/site-packages")
The above lines of code added the path of virtual environment packages to the system path. Now, let’s run again the following script to test whether our newly installed dependencies are working or not.
from pypdf import PdfReader
reader = PdfReader("/content/Data science journey 3.pdf")
number_of_pages = len(reader.pages)
print(number_of_pages)
Wow, we fixed it. now it works as expected. Our PdfReader
function reads pdf and calculates the number of pages which can be viewed using.pages
method. Here, the length of the pdf which I have tested is 41 pages.
Here, I have tested our installation in the first run only, but my requirement is to avoid the installation of the dependencies in future runs. To use previously installed packages of the virtual environment
colab_env
you must mount your drive and add the path ofcolab_env
site-packages to colab system path usingsys.path.append("/content/drive/MyDrive/colab_env/lib/python3.8/site-packages”).
# step 1: Mount the drive first
from google.colab import drive
drive.mount("/content/drive/")
# step 2: Add the path of virtual environment (colab_env) site-packages
# to colaboratory system path
import sys
sys.path.append("/content/drive/MyDrive/colab_env/lib/python3.8/site-packages")
After adding the path, just import and use the packages which were installed in the virtual environment
colab_env
.
In conclusion, It is possible to install dependencies permanently to Google Colab using a virtual environment. One thing to take care of is, don’t forget to add the path of the virtual environment site-packages
to colab system path. I have tested this method in 2024 and it works fine for me. Find all the above code snippets here.
I hope that this tutorial is helpful for you to install dependencies permanently in Google Colab. It will reduce time-consuming and tedious dependencies installing issues if you are a Colaboratory regular user. Now, you can use a single virtual environment anywhere by just adding a path to the Colab system. Finally, if you like my work, then please don’t forget to clap and share it with your friends. See you in the next blog…