How do I update my workflow after the software library migration?

Updated 10/18/23 with specific issues and solutions.

 

Description of the migration

What is happening?

MSI has moved our software library to a new location to migrate away from hardware that is being decommissioned. The software library has been copied to the new location and patched as best as possible to accommodate this change, but workflows referencing absolute paths in the old location will need to be updated. The software install paths being decommissioned and the equivalent paths in the new location are as follows:
 
Original Path New Path
/soft /common/software/install/migrated
/panfs/roc/msisoft /common/software/install/migrated
/panfs/roc/soft/el6 /common/software/install/migrated.softel6
/panfs/roc/intel /common/software/install/migrated.intel
 
If you reference any of the old paths above, make sure to update those references to point to the new locations. For example, if you reference the file:
 
/panfs/roc/msisoft/gcc/8.2.0/bin/gcc
 
in a script, you would want to update this reference to:
 
/common/software/install/migrated/gcc/8.2.0/bin/gcc
 
The modulefiles directories are also changing, and a list of the old paths and the new equivalents is as follows:
 
Original Path New Path
/panfs/roc/soft/modulefiles.common /common/software/modulefiles/migrated/common
/panfs/roc/soft/modulefiles.hpc /common/software/modulefiles/migrated/hpc
/panfs/roc/soft/modulefiles.centos7 /common/software/modulefiles/migrated/centos7
/panfs/roc/soft/modulefiles.mesabi /common/software/modulefiles/migrated/mesabi
/panfs/roc/soft/modulefiles.mangi /common/software/modulefiles/migrated/mangi
/panfs/roc/soft/modulefiles.k40 /common/software/modulefiles/migrated/k40
/panfs/roc/soft/modulefiles.v100 /common/software/modulefiles/migrated/v100
/panfs/roc/soft/modulefiles.legacy /common/software/modulefiles/migrated/legacy
/panfs/roc/intel/modulefiles /common/software/modulefiles/migrated/intel

 

In migrating the over 5,000 software installations of the current software library to the new location, MSI has made a significant effort to patch and update the installations so they will work the same at the new location as they did at the old one. The main changes that were made fall under the following categories:
  • Symlinks that refer to a '/panfs' directory have been updated to point to the equivalent location in '/common/software'
  • Configuration files containing references to '/panfs' directories have been updated to reference the equivalent locations
  • Executable files containing '/panfs' directories in their RPATH or RUNPATH have been patched to refer to the equivalent directories
  • Modulefiles have been made more specific to ensure that dependencies are found correctly (e.g. updating PERL5LIB for perl modules to the new location)
  • Reinstalling modules that don't respond to any of the above methods
These changes should have caught the majority of the issues that would arise from moving the software library to a new location. However, since this was an operation involving parsing and patching millions of files, there are likely going to be issues that we didn't anticipate or couldn't have tested for. This page lists some of the common issues and strategies for resolving them.
 

Why make this change?

This migration is motivated by the Panasas hardware where the library is currently stored going out of warranty, which means it will need to be decommissioned soon. Unfortunately, our previous software installation procedure makes moving the library to new hardware somewhat challenging. The installation paths that we used for the past 6+ years all start with the '/panfs/roc' prefix, which is a direct reference to the hardware being decommissioned. The software library is being moved to our new VAST storage solution (the appliance that currently hosts /scratch.global), which cannot be mounted at the same path as the older Panasas hardware. The new home for the software library will be under '/common/software', a hardware-agnostic path that we can remap to new storage solutions in the future as needed. This will also allow us to more easily introduce new package management technologies like spack and singularity-hpc to support a broader range of software packages in a more durable manner.
 

When did this happen?

This migration happened in two phases. In the first phase, we updated your environment to reference the new library location and set the old library location to read-only. This was a relatively transparent change for users because the files still existed in their old locations, and most of the errors we saw had to do with hidden references to the old locations. The second phase actually turned off the old location, which is where we are seeing most of the issues. The schedule for these phases was:
 
Phase 1 - September 6, during maintenance
Phase 2 - October 4, during maintenance
 
If you notice a workflow that was functional before either of these phases and has broken after they were applied and you have already updated the referenced absolute paths as indicated above, please contact help@msi.umn.edu with the text of any error messages you are encountering and we will help you update your workflows as necessary.
 

Changes you might need to make

Check your bashrc for references to old paths

Many software packages and workflow customizations will modify your bashrc file, which is used to initialize settings for new shell sessions. You might define or modify environment variables, define functions and aliases, or load modules among other possible customizations. Some software packages like conda will automatically modify your bashrc file in order to enable special features, so you may have changed your bashrc file even if you've never opened it yourself.

 

The file is located at ~/.bashrc , and is a plaintext file that you can open with your favorite text editor.  Check this file for references to any of the old paths to software installs or modulefiles, and update them to the new paths or remove them if the modifications to your environment are no longer necessary.

 

As a common example of this, if you ever ran 'conda init' or an equivalent command, you will have a block in your ~/.bashrc file that looks like the following:

 
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/panfs/roc/msisoft/mamba/0.11.3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/panfs/roc/msisoft/mamba/0.11.3/etc/profile.d/conda.sh" ]; then
        . "/panfs/roc/msisoft/mamba/0.11.3/etc/profile.d/conda.sh"
    else
        export PATH="/panfs/roc/msisoft/mamba/0.11.3/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<
The specific paths referenced will differ depending on which conda installation you used to run 'conda init', but any references here to a location starting with '/panfs/roc' will need to be updated. Alternatively, you could delete this section of your ~/.bashrc file, load a new conda module, and run 'conda init' again to regenerate this section with updated path references.

 

Load additional modules during your jobs

Some compiled software will hardcode hints to the location of dependencies when you build it. Later, when you run this software it will use these hints to find the location of library files and other dependencies that are not otherwise visible in your current environment. Unfortunately, hints of this type for software that was built before the software migration will no longer work. As a result you may start seeing 'missing library' errors for workflows that previously worked without issue.

Often you can resolve this by loading the module corresponding to the missing dependency. If you are unsure which module you should load, it will usually be one or more of the modules you loaded when you originally compiled the software. Common modules that might need to be loaded like this include gcc, cuda, and mkl.

 

Patch or rebuild software that hard-codes old paths

If you have built your own software that dynamically references libraries in the old locations and you are unable to address the issue by loading additional modules, you can either rebuild the software from scratch or else try patching the executable with a tool like patchelf or chrpath. Using a cmake executable as an example, patching an executable with patchelf might look something like:
 
module load patchelf
 
# examine the executable to see the current RPATH or RUNPATH
$ patchelf --print-rpath ./cmake
/panfs/roc/msisoft/gcc/8.1.0/lib64:/panfs/roc/msisoft/isl/0.19_gcc8.1.0/lib:/panfs/roc/msisoft/mpc/1.1.0_gcc8.1.0/lib:/panfs/roc/msisoft/mpfr/4.0.1_gcc8.1.0/lib:/panfs/roc/msisoft/gmp/6.1.2_gcc8.1.0/lib
 
# update the executable with a new RPATH or RUNPATH
$ patchelf --set-rpath /common/software/install/migrated/gcc/8.1.0/lib64:/common/software/install/migrated/isl/0.19_gcc8.1.0/lib:/common/software/install/migrated/mpc/1.1.0_gcc8.1.0/lib:/common/software/install/migrated/mpfr/4.0.1_gcc8.1.0/lib:/common/software/install/migrated/gmp/6.1.2_gcc8.1.0/lib
 
Not all software can be patched in this way however, and if you find that patchelf doesn't work in your case you will likely need to rebuild the software from scratch.
 

Re-link files that point to old paths

Another potential issue will show up as 'Host is down:' errors when trying to run an executable file. This happens when the executable in the environment you are using is actually a link to that executable in a '/panfs/roc' location. You can find broken links of this type by running:

find ~/.conda/envs -xtype l
This example will print out a list of the broken links in your conda environments, but you can target any directory where you suspect broken links by modifying this command as needed. Depending on how many broken links you have, you may be able to manually relink them to the same executable in a new location. For instance, you might see the following output for a broken python executable in a conda environment named 'myenv':
 
/home/users/3/dunn0404/.conda/envs/myenv/bin/python
 
You can find out where this link is pointing via:
ls -lha /users/3/dunn0404/.conda/envs/myenv/bin/python
lrwxrwxrwx. 1 dunn0404 msistaff 42 Oct 12 12:08 /users/3/dunn0404/.conda/envs/myenv/bin/python -> /panfs/roc/msisoft/mamba/0.11.3/bin/python
Then, to relink this to the equivalent file in the new software library location, you could run:
ln -nsf /common/software/install/migrated/mamba/0.11.3/bin/python /users/3/dunn0404/.conda/envs/myenv/bin/python
If you have many broken links like this, it will likely be easier to re-create the environment from scratch. If this isn't feasible and you need help from MSI to preserve the original environment, please reach out to help@msi.umn.edu.

 

Common issues you might see

Host is down errors

Since the old software library was located on a network storage appliance that has now been partially turned off, you might see errors of the type:

 

Host is down:

 

when trying to run software, even when you wouldn't expect the particular command you are using to need to access another host. This error is showing up because some part of the command, usually the location of an executable file, references one of the old software install locations. So far we've seen this most commonly with python, R, Rscript, and ruby commands that use a conda environment.

The resolution for this issue is usually to update broken links to the old software paths and remove references to old paths in your bashrc.

 

Missing libraries

One of the more common issues you might see is a missing library file. These errors will look something like the following:

error while loading shared libraries : libsomething.so.16 cannot open shared object file : no such file or directory

This error indicates that the library 'libsomething.so.16' isn't available in your environment. The resolution for this issue is usually to load the module that provides this dependency or patch the impacted executables to reference the updated paths

 

Conda environments not working

Due to the specifics of how they are installed, conda environments are especially prone to issues from the software migration. There are a variety of ways that a conda environment might fail after the migration, but you can address the majority of them by to updating broken links to the old software paths and removing references to old paths in your bashrc

One additional issue you may see are errors referencing problems with an SSL CA certificate that prevent you from creating new environments. You can fix this by manually specifying the location of the certificate file for the conda module you are using. For instance, if you are using the the 'mamba' module you might do the following:

Find the root of the module install:

$ module show mamba
-------------------------------------------------------------------
/common/software/modulefiles/migrated/common/mamba/0.11.3:

prepend-path    PATH /common/software/install/migrated/mamba/0.11.3/bin
-------------------------------------------------------------------

The root directory for this module will be the directory that contains 'bin'. So in this case, it would be

/common/software/install/migrated/mamba/0.11.3

The SSL certificate for conda modules is located under '$root/ssl/cert.pem', which in this case would be:

/common/software/install/migrated/mamba/0.11.3/ssl/cert.pem

You can then indicate the location of this certificate to your conda config by running:

​conda config  --set ssl_verify  /common/software/install/migrated/mamba/0.11.3/ssl/cert.pem

At this point you should be able to create new conda environments again without SSL errors.

 

R libraries not working

Due to the use of a variety of software installation approaches for R libraries, some of the libraries you have installed in your home directory may no longer work after the migration. The most common errors we've seen with R environments are due to missing libraries. These issues can usually be resolved by  loading the module that provides a missing dependency or patch the impacted executables to reference the updated paths.
 

Modules that simply stop working

Some of MSI's modules unexpectedly broke during the migration. While we did our best to patch all of the software installations to avoid this outcome, the wide variety in the design of software distribution means that this just isn't possible in some cases. If you find a module that is no longer working after the migration that doesn't match the descriptions of other common errors on this page, please report it to help@msi.umn.edu so we can flag it for reinstallation.

 
 
 
 
 
 

 

 
 
Category: 
Software