SSH tunnel
Overview
Sometimes you need to use software on JADE that is accessed via a locally-hosted web link (e.g. localhost),
the most common example being Jupyter Notebook.
While we provide a service running Jupyter Notebook,
you may wish to launch your own customised version or other software.
The solution is to create an SSH tunnel to your software and forward the necessary port to your local computer.
Before you start
Creating an SSH tunnel is only recommended if you're comfortable with Linux and SSH. If you're a beginner, we recommend using with our pre-configured services until you gain confidence. We are unable to offer signficant help for SSH port forwarding and you are expected to perform your own troubleshooting before asing for help.
If you need SSH tunnelling then you should consider setting up key-based SSH access and an SSH agent with agent forwarding. Without then you will be repeatedly asked for your CCB password. Guides are easily found on Internet.
How tunnelling works
A SSH tunnel create a secure connection between two computers. By adding port forwarding to a tunnel, you can redirect Internet traffic to go through tunnel. The key trick for our cluster is that you can chain multiple SSH tunnels together in several hops, and send traffic through them to systems that aren't accessible on the Internet. This allows you to connect your local web browser to a web service running on one of our compute nodes.
The key flag for setting up an SSH tunnel is -L.
There are many variations on this flag, however the one we use is in the form:
-L LOCAL_PORT:localhost:REMOTE_PORT
This tells SSH that we want it to connect together the traffic trying to pass between our local and remote ports.
Choosing a port number
A network port can only be used by one program on each of the computers in question,
and there's relatively few rules on which ports to use.
Your program may not allow choosing which port it will run on,
but for our SSH tunnels we can choose.
For simplicity we recommend that you only use a single tunnel simultaneously,
and that you choose a port number equal to your CCB login ID found with command id -u.
For this example, we'll use port 20001 corresponding to user dtooke.
Please note that all login IDs on the cluster are in the range 20000--40000,
and that forwarding the port associated with another user is considered to be against our acceptable-use.
If you need to forward additional ports then use a number in the range 40001--60000.
Example
This is an example of running the command-line program cellxgene over an SSH tunnel with port forwarding.
To do this we require two separate SSH sessions: one tunnel to our login node,
then another tunnel from login to the compute node.
SSH session 1
Here, I log into JADE, start an interactive session and run my program:
ssh dtooke@login1.molbiol.ox.ac.uk
dtooke@imm-login1:~$ srun --partition=short --cpus-per-task=4 --mem=32G --pty bash -i
srun: job 17804 queued and waiting for resources
srun: job 17804 has been allocated resources
dtooke@imm-wn2:~$ module load cellxgene
Loading cellxgene/20230703
Loading requirement: python-base/3.11.3
dtooke@imm-wn2:~$ cellxgene launch https://github.com/chanzuckerberg/cellxgene/raw/main/example-dataset/pbmc3k.h5ad
[cellxgene] Starting the CLI...
[cellxgene] Loading data from pbmc3k.h5ad.
[cellxgene] Warning: Moving element from .uns['neighbors']['distances'] to .obsp['distances'].
This is where adjacency matrices should go now.
[cellxgene] Warning: Moving element from .uns['neighbors']['connectivities'] to .obsp['connectivities'].
This is where adjacency matrices should go now.
[cellxgene] Launching! Please go to http://localhost:5005 in your browser.
[cellxgene] Type CTRL-C at any time to exit.
This gives me two key pieces of information - the program is running on imm-wn2 and using port 5005.
SSH session 2
Now I create the first SSH tunnel:
$ ssh -A -L 5005:localhost:20001 dtooke@login1.molbiol.ox.ac.uk
Then create the second SSH tunnel:
dtooke@imm-login1:~$ ssh -N -L 20001:localhost:5005 dtooke@imm-wn2
[This command produces no output]
My first SSH forwards my agent so that the second one doesn't need a password and sets up a port forward between 5005 on my local computer and 20001 on the login node. The second SSH is a non-login session (logins are not permitted on worker nodes) that sets up a port forward between 20001 on the login node and port 5005 on the worker node. The end result is port 5005 on the worker node is forwarded to port 5005 on my local computer and I can directly open the link in my browser. All of the forwarding is done over port 20001.
Closing the session
When I'm finished I need to do the following in order:
-
Close my browser window
-
Control-Cin session 2 to end the second tunnel -
exitin session 2 to end the first tunnel -
Control-Cin session 1 to end the program -
exitin session 1 to exit my interactive session
Troubleshooting
-
Be certain that you have the correct worker node and port for the program that you're running; they may both change and you can't rely on past commands to work the same way twice.
-
The second session with the SSH tunnel must be from your local computer to the login node, then from there to the worker node. Accidentally starting the second session directly from the login node will not work.
-
Double check that you have the port numbers in the two tunnel hops the correct way around, and note that they are opposite and that the order matters
-
If you get a message in the form:
bind [127.0.0.1]:20001: Address already in use channel_setup_fwd_listener_tcpip: cannot listen to port: 20001 Could not request local forwarding.then there is another SSH session using that port. The most likely explanation is that you did not cleanly shut down your session, leaving a SSH tunnel running somewhere. The easist solution is to pick a different port in the range 40000 to 60000 and try again if you are unable to find the process in question.
Getting help
You can email the CCB team at help@imm.ox.ac.uk. However, please note that, as described above, SSH tunnels and port forwarding are an advanced topic and that you are expected to have performed your own troubleshooting before asking for help.