5. News & Events

General information about the Maintenance is applicable to all scheduled downtime periods.
Specific information regarding the maintenance is described in the details below.
All current users are notified via the mailing list hpc-users@uci.edu about:
Any specific maintenance scheduling
When the maintenance is completed

5.1. June 16, 2026

HPC3 Downtime June 16, 2026
2026-06-01 by Nick Santucci
The next outage/planned downtime will be June 16, 2026, beginning at 8 am.
The cluster will be unavailable all day.
Maintenance Items:
  1. Apply latest Rocky 9.7 OS updates on user facing cluster resources (compute, GPU, login, JupyterHub portal servers)

  2. Some of the Dell top-of-rack switches will be updated from OS9 to OS10

  3. Update all dfsX shared keys stored in HashiCorp Vault

  4. Update the munge keys (used to create/validate access between hosts running slurm) stored in HashiCorp Vault

Impacts:

This is a full outage.

  1. All existing logins will be terminated.

  2. You will NOT have access to HPC3 during the planned downtime.

  3. You will have access to CRSP using Web based File browser or CRSP Desktop App.

  4. No Slurm jobs can run.

User Action required:

Attention

Slurm jobs submitted to close to the maintenance window can result in Pending job due to ReqNodeNotAvail, Reserved for maintenance state. This means jobs that are not guaranteed (via TimeLimit) to complete before 8am on the day of maintenance. These jobs will need to be canceled and resubmitted after the maintenance. Please see requesting time limits on queues.

  1. Save your work:

    • cancel all Slurm jobs

    • stop any containers running via Jupyterhub portal

    • stop Any VSCode instances

  2. Logout

5.2. March 24, 2026

HPC3 Downtime March 24, 2026
2026-02-26 by Nadya Williams
The next outage will be March 24, 2026, beginning at 8am.
The cluster will be unavailable all day.
Maintenance Items:
  1. Replace core Ethernet switching infrastructure for HPC3 (Arista switches purchased in 2019) with two Sonic-based switches

  2. Beegfs Upgrade to 8.3.0 on all DFS servers and cluster-wide

  3. Apply latest Rocky 9.7 OS updates on all cluster nodes.

  4. Update NAS Servers to Rocky 9.7

  5. Update Applications stack to Rocky 9.7

Impacts:

This is a full outage.

  1. All existing logins will be terminated.

  2. You will NOT have access to HPC3 during the planned downtime.

  3. You will have access to CRSP using Web based File browser or CRSP Desktop App.

  4. No Slurm jobs can run.

User Action required:

Attention

Slurm jobs submitted to close to the maintenance window can result in Pending job due to ReqNodeNotAvail, Reserved for maintenance state. This means jobs that are not guaranteed (via TimeLimit) to complete before 8am on the day of maintenance. These jobs will need to be canceled and resubmitted after the maintenance. Please see requesting time limits on queues.

  1. Save your work:

    • cancel all Slurm jobs

    • stop any containers running via Jupyterhub portal

    • stop Any VSCode instances

  2. Logout