Minnesota Supercomputing Institute
0.0.14
0.0.13, 0.0.14
Tuesday, August 29, 2023
The FASTX-Toolkit is a collection of command line tools for preprocessing short nucleotide reads in FASTA and FASTQ formats, usually produced by Next-Generation sequencing machines. The main processing of such FASTA/FASTQ files is mapping (aligning) the sequences to reference genomes or other databases using specialized programs like BWA, Bowtie and many others. However, it is sometimes more productive to preprocess the FASTA/FASTQ files before mapping the sequences to the genome—manipulating the sequences to produce better mapping results. The FASTX-Toolkit tools perform some of these preprocessing tasks.
To run this software interactively in a Linux environment run the commands:
module load fastx_toolkit fastq_to_fasta [options] -i INFILE -o OUTFILE
To display usage instructions for each tool, you can use the -h argument, e.g.,
fastq_to_fasta -h
All of the FASTX_Toolkit utilities may be found in the directory
/soft/fastx_toolkit/VER/bin
where VER is the module version you are using.
Please note, if you are working on fastq files that use the standard Sanger encoding for phred base calling quality scores, you need to add -Q33 in the command.
Fastx_toolkit was developed long time ago, and assume the Illumina (1.7-) format as the default encoding.
Additional Information
Command Line Usage