Python

Overview

As well as the standard Python versions that you'd normally expect, we've additionally preinstalled hundreds of additional packages for you to use. These are available via the python-cbrg module.

Basic usage

If you just want to get up and running with our curated set of commonly used bioinformatics packages, you can do so with a single command: module load python-cbrg

If you want to see the full list of installed packages, load the module as above and then run the command pip list | less. But please note this command can be slow to run so you may prefer visiting our online snapshot of packages.

Requesting additional packages

If you need to use a package which isn't installed, please contact us at help@imm.ox.ac.uk before attempting to install a local copy. In many cases we can easily add it to the central installation.

Version control

The setup of the python-cbrg module uses the following system.

python-base is the base language install
python-cbrg is our packages library

python-base contains fixed, unchanging installations of the base languages. This is for safety; they cannot be accidentally overwritten causing unexpected changes of behaviour. python-cbrg contains separate package and library repositories for each version of Python. Because packages and library versions also change over time, we take a snapshot of the state every 3 months and then lock this to prevent changes causing unexpected behaviour. A single current version for each provides a continual rolling 'head' where changes are applied. Loading the python-cbrg module will automatically pull in the latest stable base and all packages or libraries:

$ module load python-cbrg
Loading python-cbrg/current
  Loading requirement: python-base/3.11.14
$ module list
Currently Loaded Modulefiles:
 1) python-base/3.11.14   2) python-cbrg/current

To load an older version of Python packages, then load a specific snapshot version:

$ module load python-cbrg/202510
  Loading requirement: python-base/3.11.14
$ module list
Currently Loaded Modulefiles:
 1) python-base/3.11.14   2) python-cbrg/202510

If you also want an older version of the Python base language, then load python-base before python-cbrg:

$ module load python-base/3.11.3
$ module load python-cbrg/202510
$ module list
Currently Loaded Modulefiles:
 1) python-base/3.11.3   2) python-cbrg/202510

Installing locally

There are a range of package management tools for Python - PIP, venv, Poetry, Conda. The simplest is PIP, and has the advantage of being compatible with our Jupyter Notebook server. The others are for reproducible environments.

If you just want to quickly try a package not in our library, then install into your default ~/.local folder. Load python-cbrg first to reuse our packages where possible. Example:

module load python-cbrg  #  or python-base
pip install --user "biopython==1.87"
ls ~/.local/lib/python3.11/site-packages
> Bio  biopython-1.87.dist-info  BioSQL

To protect against supply chain attacks - when a package maintainer gets hacked and their credentials used to publish a malicious package - simply set a cutoff date for new uploads. This works because these malicious packages are usually found very quickly and removed, usually within an hour or 2. This cutoff applies recursively to all dependencies, not just the named package. To allow for weekends, let's set a cutoff of 3 days:

pip install --uploaded-prior-to P3D ...

Alternatively, store in config to apply automatically:

# Store in config
pip config set global.uploaded-prior-to P3D
pip install ...

Avoid installing many packages into your ~/.local if you regularly use our library - you will introduce incompatibilities that eventually break packages in our library. Simply from installing biopython above, you see this:

colabfold 1.6.1 requires biopython<1.86, but you have biopython 1.87 which is incompatible.

And you don't always get a clear message - sometimes you get an error during package import:

import scanpy as sc
...
AttributeError: 'NoneType' object has no attribute 'get'

Redirect where PIP installs to with PYTHONUSERBASE:

export PYTHONUSERBASE=/project/PROJECT/user/py-libs/biopython
pip install --uploaded-prior-to P3D --user "biopython==1.87"

To make available to your Python program:

export PYTHONPATH="/project/PROJECT/user/py-libs/biopython/lib/python$PYTHONMAJMIN/site-packages":"$PYTHONPATH"
python ...

To import in Jupyter, add to top of your notebook:

import sys ; sys.path.insert(0, '/project/biopython/user/my-py-lib/lib/python3.11/site-packages')

This is obviously cumbersome, so consider upgrading to a proper tool for managing environments ...

Environments

venv

venv allows you to create isolated reproducible environments for Python packages. It does not manage the Python language itself so load python-base first. Note: your pip config will apply to these.

module load python-base
python -m venv $HOME/venvs/biopython        # Create folder for your venv
source $HOME/venvs/biopython/bin/activate   # Activate
module load python-base                     # Reload Python (venv broke it)
# Install into it:
pip install --uploaded-prior-to P3D pandas

Because venv does not intefere with Python itself, then it is compatible with our Jupyter server.

Module

If you want to reuse existing packages in python-cbrg, and install missing or newer packages "on top", then you want a GNU environment module file. You could implement without, but the module system provides an easy way to activate and deactivate.

Module file, save anywhere:

#%Module1.0

# Set to where to install your packages:
set  INSTALL_DIR  /project/sysadmin/[getenv USER]/software/my-py

# Python + packages base, specific versions for reproducibility
prereq  python-base/3.11
prereq  python-cbrg/202510

# Maybe you need CUDA also
prereq  cuda/12.9

# This prepends your folder to PYTHONPATH, so Python looks here before in python-cbrg/202510
prepend-path    PYTHONPATH  $INSTALL_DIR/lib/python3.11/site-packages

# This configures "pip install --user ..." to install here
setenv  PYTHONUSERBASE  $INSTALL_DIR

prepend-path  PATH  $INSTALL_DIR/bin

Activate it:

module load path/to/your/module/file

Install into it:

pip install --uploaded-prior-to P3D --user pandas

Conda

While we do not offically support Conda, we can provide a basic set of good practices for using it on our system. Essentially this is being aware that when installing Conda on our network-based filesystem, then Conda can be very slow depending on how busy the cluster is.

During Conda install you are asked whether to update your shell profile to automatically initialize conda. You should decline this, preventing your SSH logins being delayed by Conda. You can manually activate Conda with:

. path/to/your/Conda/install/bin/activate

Also, do not run conda init, as this will add automatic initialization to your .bashrc. If you have already added automatic initialization to your .bashrc file, then disable simply by deleting or relocating the clearly-marked Conda block from your .bashrc.

Cleaning old packages or regenerating the index cache can speedup activation:

conda clean --packages
conda clean --index-cache

Disabling automatic update can speedup environment creation and new package installs. To disable, put the following in file ~/.condarc:

auto_update_conda: false

You can manually update with:

conda update --all && conda update conda