Webpage for the University of Chicago Data Science Clinic
Hosted on GitHub Pages — Theme by orderedlist
This document contains a basic introduction to how to use Slurm on the cluster. This should not be considered a complete reference and if you are looking for additional references take a look at UChicago’s Slurm set up can be found at the UChicago CS Slurm How To Page. Note that this reference is for a slightly different cluster than the DSI cluster, but nearly all the information should be the same.
There are two modes of using Slurm: (1) interactively (what this document details) and (2) non-interactively (sometimes called batch).
Before starting this, make sure that you are:
fe
.g, h, i, j, k, l
and m
. Nodes with matching prefixes have similar hardware.ssh fe.ds
.
ssh
this probably means that you did not connect.pwd
command. Running this command should return /home/USERNAME
where USERNAME
is your CNET ID./net/projects
or /net/projects2
shows shared project directories. These are limited access and you need to contact techstaff if you do not have access to a directory required for a project. Unix user groups are used to manage access. This is the primary location where data should be placed./net/scratch
and /net/scratch2
are open areas where anyone can put anything. Note that this is ephemeral. Any data put here may be deleted at any time./net
directories are network storage drives and are available on any node in the cluster.A common use of the cluster is running code from a github repository on it. If you have followed the instructions on how to set up ssh for the cluster you should abel to quickly clone any repo you have access to on github.
ssh
if you have not already.ssh -T git@github.com
which should return your username. If it does not it means that ssh
is not set up properly. Please use the ssh
docs above to identify which system is not st up correctly.pwd
and checking to make sure it says \home\CNET ID
. If this is not your current working directory type in cd
to return to your home directory.git clone COPIED_VALUE
to clone the repo to your home directory. Verify that there were no errors printed and that the repo was properly cloned.ssh
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh
You can accept the defaults. Make sure you select yes when it asks to run conda init. This will ensure conda is activated by default. re-open and close your terminal.
conda create --name PROJECT_NAME python=3.11
conda activate PROJECT_NAME
pip install -r requirements.txt
Where PROJECT_NAME
is the name of the project you are working on. Now when you log into ai cluster, just make sure you run conda activate PROJECT_NAME
.
Ensure VS Code uses the correct python environment. When a python file is open and selected, click the Python version number on the bottom right and select the interpreter for PROJECT_NAME. If it is not listed, the path is: /home/USERNAME/miniconda3/envs/PROJECT_NAME/bin/python
where USERNAME
is your CNET ID.
ipykernel
in the PROJECT_NAME
environment:
conda install -n PROJECT_NAME ipykernel --update-deps --force-reinstall
With a Jupyter notebook open, click the Python version number in the upper right and select the kernel for PROJECT_NAME
. You may need to refresh the list of available kernels using the icon in the upper right of the menu.
sbatch
.The instructions below provide specific instructions for setting up VS Code. Before preceding, please make sure that it is installed.
Traditionally, one would ssh
in a terminal and be restricted to command-line text editors like Vim. We can use the extension, Remote - SSH
allows us to act like we are developing on our local machine as normal for the most part and has less of a learning curve. Information on the extension can be found here.
Install Remote - SSH
. Click ‘Extensions’ on the menu at the left side of VS Code (its icon is four squares with the top right one pulled away). Search for and install Remote - SSH
.
Open User Settings (JSON)
. If it is empty, paste the following:{
"remote.SSH.defaultExtensions": [
"ms-toolsai.jupyter",
"ms-toolsai.jupyter-renderers",
"ms-python.python",
"ms-python.vscode-pylance"
]
}
Otherwise, make sure to add a comma to the end of the current last item and add the following before the }
:
"remote.SSH.defaultExtensions": [
"ms-toolsai.jupyter",
"ms-toolsai.jupyter-renderers",
"ms-python.python",
"ms-python.vscode-pylance"
]
Remote-SSH: Connect to Host...
and you should see fe.ds
as an option. Select it. Otherwise, you can try typing in fe.ds
.SSH: fe.ds
to signify you are using the SSH extension and connected to the host fe.ds
as in the image here:File
then Open Folder
and select your repository folder.Never run any large code jobs when connected the login node. All python code should be run only after connecting to a compute node! |
ssh fe.ds
.srun -p general --gres=gpu:1 --pty --mem 1000 -t 90:00 /bin/bash
. Once you have been your request has been granted, your command prompt will change to something like USERNAME@hostname
where hostname is probably like g004
.Remote-SSH: Connect to Host...
. Select it and type in as your host HOSTNAME.ds
replacing the HOSTNAME
with the hostname from above.Common errors and troubleshooting moved to Troubleshooting