Python
Overview
As well as the standard Python versions that you'd normally expect,
we've additionally preinstalled hundreds of additional packages for you to use.
These are available via the python-cbrg module.
Basic usage
If you just want to get up and running with our curated set of commonly used bioinformatics packages,
you can do so with a single command: module load python-cbrg
If you want to see the full list of installed packages,
load the module as above and then run the command pip list | less.
But please note this command can be slow to run so
you may prefer visiting our online snapshot of packages.
Requesting additional packages
If you need to use a package which isn't installed, please contact us at help@imm.ox.ac.uk before attempting to install a local copy. In many cases we can easily add it to the central installation.
Version control
The setup of the python-cbrg module uses the following system.
-
python-baseis the base language install -
python-cbrgis our packages library
python-base contains fixed, unchanging installations of the base languages.
This is for safety; they cannot be accidentally overwritten causing unexpected changes of behaviour.
python-cbrg contains separate package and library repositories for each version of Python.
Because packages and library versions also change over time,
we take a snapshot of the state every 3 months and then lock this to prevent changes causing unexpected behaviour.
A single current version for each provides a continual rolling 'head' where changes are applied.
Loading the python-cbrg module will automatically pull in the latest stable base and all packages or libraries:
$ module load python-cbrg
Loading python-cbrg/current
Loading requirement: python-base/3.11.14
$ module list
Currently Loaded Modulefiles:
1) python-base/3.11.14 2) python-cbrg/current
To load an older version of Python packages, then load a specific snapshot version:
$ module load python-cbrg/202510
Loading requirement: python-base/3.11.14
$ module list
Currently Loaded Modulefiles:
1) python-base/3.11.14 2) python-cbrg/202510
If you also want an older version of the Python base language, then load python-base before python-cbrg:
$ module load python-base/3.11.3
$ module load python-cbrg/202510
$ module list
Currently Loaded Modulefiles:
1) python-base/3.11.3 2) python-cbrg/202510
Installing locally
There are a range of package management tools for Python - PIP, venv, Poetry, Conda. The simplest is PIP, and has the advantage of being compatible with our Jupyter Notebook server. The others are for reproducible environments.
If you just want to quickly try a package not in our library, then install into your default ~/.local folder.
Load python-cbrg first to reuse our packages where possible.
Example:
module load python-cbrg # or python-base
pip install --user "biopython==1.87"
ls ~/.local/lib/python3.11/site-packages
> Bio biopython-1.87.dist-info BioSQL
To protect against supply chain attacks - when a package maintainer gets hacked and their credentials used to publish a malicious package - simply set a cutoff date for new uploads. This works because these malicious packages are usually found very quickly and removed, usually within an hour or 2. This cutoff applies recursively to all dependencies, not just the named package. To allow for weekends, let's set a cutoff of 3 days:
pip install --uploaded-prior-to P3D ...
Alternatively, store in config to apply automatically:
# Store in config
pip config set global.uploaded-prior-to P3D
pip install ...
Avoid installing many packages into your ~/.local if you regularly use our library -
you will introduce incompatibilities that eventually break packages in our library.
Simply from installing biopython above, you see this:
colabfold 1.6.1 requires biopython<1.86, but you have biopython 1.87 which is incompatible.
And you don't always get a clear message - sometimes you get an error during package import:
import scanpy as sc
...
AttributeError: 'NoneType' object has no attribute 'get'
Redirect where PIP installs to with PYTHONUSERBASE:
export PYTHONUSERBASE=/project/PROJECT/user/py-libs/biopython
pip install --uploaded-prior-to P3D --user "biopython==1.87"
To make available to your Python program:
export PYTHONPATH="/project/PROJECT/user/py-libs/biopython/lib/python$PYTHONMAJMIN/site-packages":"$PYTHONPATH"
python ...
To import in Jupyter, add to top of your notebook:
import sys ; sys.path.insert(0, '/project/biopython/user/my-py-lib/lib/python3.11/site-packages')
This is obviously cumbersome, so consider upgrading to a proper tool for managing environments ...
Environments
venv
venv allows you to create isolated reproducible environments for Python packages.
It does not manage the Python language itself so load python-base first.
Note: your pip config will apply to these.
module load python-base
python -m venv $HOME/venvs/biopython # Create folder for your venv
source $HOME/venvs/biopython/bin/activate # Activate
module load python-base # Reload Python (venv broke it)
# Install into it:
pip install --uploaded-prior-to P3D pandas
Because venv does not intefere with Python itself, then it is compatible with our Jupyter server.
Module
If you want to reuse existing packages in python-cbrg, and install missing or newer packages "on top",
then you want a GNU environment module file.
You could implement without, but the module system provides an easy way to activate and deactivate.
Module file, save anywhere:
#%Module1.0
# Set to where to install your packages:
set INSTALL_DIR /project/sysadmin/[getenv USER]/software/my-py
# Python + packages base, specific versions for reproducibility
prereq python-base/3.11
prereq python-cbrg/202510
# Maybe you need CUDA also
prereq cuda/12.9
# This prepends your folder to PYTHONPATH, so Python looks here before in python-cbrg/202510
prepend-path PYTHONPATH $INSTALL_DIR/lib/python3.11/site-packages
# This configures "pip install --user ..." to install here
setenv PYTHONUSERBASE $INSTALL_DIR
prepend-path PATH $INSTALL_DIR/bin
Activate it:
module load path/to/your/module/file
Install into it:
pip install --uploaded-prior-to P3D --user pandas
Conda
While we do not offically support Conda, we can provide a basic set of good practices for using it on our system. Essentially this is being aware that when installing Conda on our network-based filesystem, then Conda can be very slow depending on how busy the cluster is.
During Conda install you are asked whether to update your shell profile to automatically initialize conda.
You should decline this, preventing your SSH logins being delayed by Conda.
You can manually activate Conda with:
. path/to/your/Conda/install/bin/activate
Also, do not run conda init,
as this will add automatic initialization to your .bashrc.
If you have already added automatic initialization to your .bashrc file,
then disable simply by deleting or relocating the clearly-marked Conda block from your .bashrc.
Cleaning old packages or regenerating the index cache can speedup activation:
conda clean --packages
conda clean --index-cache
Disabling automatic update can speedup environment creation and new package installs.
To disable, put the following in file ~/.condarc:
auto_update_conda: false
You can manually update with:
conda update --all && conda update conda