On Wednesday, February 1, 2017, from 8am - 4pm CDT, MSI staff will perform scheduled maintenance and upgrades to the network and various MSI systems.
During this month's scheduled maintenance period, MSI will apply updates to prevent jobs from using more memory or CPU resources than what the job was allocated. In the past, jobs that exceeded their allocation would sometimes use up all of the memory on a node, which would cause other jobs and even entire nodes to crash.
Most MSI users will not be impacted by this change because most jobs submitted to MSI HPC queues do not exceed their resource allocation. However, after the update, jobs will be terminated if they attempt to exceed the amount of memory or CPU resources that they were allocated. More details on this update and the others that will be performed can be found below.
February maintenance will include:
- Implement Control Groups (AKA cgroups) in MOAB and Torque (see above)
- B40 Air test: The air handlers will be turned off during this test; as a result, the clusters need to be idle throughout maintenance day
- MAM Update: Self-service account management functions through the website will be disabled during this update (e.g. new user registrations and group membership management)
- Nokomis switch firmware update: Impacts Nokomis blades
- Firewall firmware update
- CentOS & Debian updates for Lab machines
Systems status is always available on our Status page.
If you have any questions, please email help@msi.umn.edu.