Minnesota Supercomputing Institute
52488
42557, 52488
Tuesday, August 29, 2023
ALLPATHS-LG is a whole-genome shotgun assembler that can generate high-quality genome assemblies using short reads (~100bp) such as those produced by the new generation of sequencers. The significant difference between ALLPATHS and traditional assemblers such as Arachne is that ALLPATHS assemblies are not necessarily linear, but instead are presented in the form of a graph. This graph representation retains ambiguities, such as those arising from polymorphism, uncorrected read errors, and unresolved repeats, thereby providing information that has been absent from previous genome assemblies.
To run this software interactively in a Linux environment run the commands:
module load allpathslg PrepareAllPathsInputs.pl DATA_DIR=/path/to/data RunAllPathsLG PRE=<pre> DATA_SUBDIR=<data> RUN=<ref> REFERENCE_NAME=<ref>
Note:
The PrepareAllPathsInputs.pl script requires one parameter, the path to the directory containing the input data.<pre> is the root directory ALLPATHS-LG will use. <data> is the subdirectory containing the input data. <run> is the directory used for assembly pre-processing. <ref> is the organism or reference genome name.
ALLPATHS-LG is composed of a number of modules, each of which performs a step in the assembly process. While each module can be run individually, ALLPATHS-LG provides a module that controls the entire assembly pipeline, called RunAllPathsLG. In addition, before ALLPATHS-LG can be used, data must be converted using the Perl script PrepareAllPathsInputs.pl.
AllPathsLG assembler has specific requirement for the paired-end read libraries. It requires the paired read to be actually interwinded.
A more detailed discussion of each of these directories, as well as a list of other command-line arguments, is avaible in the user manual. Other ALLPATHS-LG utilities may be found in the directory
/soft/allpathslg/VER/bin
where VER is the version of ALLPATHS-LG you are using.
An example PBS script for submitting ALLPATHS-LG jobs to the queue is shown below.
#PBS -l nodes=1:ppn=8,mem=1gb,walltime=4:00:00 #PBS -m abe module load allpaths-lg # Prepare input data mkdir -p test.genome/data PrepareAllPathsInput.pl \ DATA_DIR=$PWD/test.genome/data # Assemble data RunAllPathsLG \ PRE=$PWD \ DATA_SUBDIR=data \ RUN=run \ REFERENCE_NAME=test.genome
Additional Information
User Manual
Example Data