Master the Command Line: From FASTQ to VCF for NGS Analysis

Learn CLI Bioinformatics: Analyze Next-Generation Sequencing data from FASTQ to VCF – Linux, WSL, Windows – DIY

This course is a complete hands-on guide to processing real next-generation sequencing (NGS) data from raw FASTQ files to final VCF variant calls – all using command-line tools in a Linux environment.

What you’ll learn

  • Download and extract raw sequencing data from the NCBI Short Read Archive using command-line tools.
  • Assess and improve the quality of FASTQ files using FastQC and fastp.
  • Align sequencing reads to a reference genome with BWA, and process SAM/BAM files using samtools.
  • Call and filter genomic variants (VCF) using bcftools, and understand how to interpret the results.
  • Organize NGS analysis projects in a clean directory structure for reproducibility and clarity.
  • Understand the structure of FASTQ, SAM, and VCF files and extract meaningful information from each format.
  • Use standard Linux command-line tools to manipulate large genomic files efficiently.

Course Content

  • Setup and Installation –> 2 lectures • 5min.
  • Downloading and Preparing FASTQ Files –> 3 lectures • 30min.
  • Read Alignment and SAM/BAM Processing –> 4 lectures • 20min.
  • Variant Calling and VCF Interpretation –> 2 lectures • 10min.

Master the Command Line: From FASTQ to VCF for NGS Analysis

Requirements

This course is a complete hands-on guide to processing real next-generation sequencing (NGS) data from raw FASTQ files to final VCF variant calls – all using command-line tools in a Linux environment.

You will learn to install and use essential bioinformatics tools such as fastqc, fastp, bwa, samtools, and bcftools. These tools are the foundation of most modern NGS pipelines used in genomics research. If you’re a Windows user, no problem – we’ll show you how to set up WSL (Windows Subsystem for Linux), so you can follow every step directly from your own machine.

The course is structured around short, focused lessons. Each one walks you through a specific task in the sequencing data pipeline: downloading data from NCBI’s SRA, performing quality control checks, trimming low-quality reads and adapters, aligning reads to a reference genome, processing alignment files, and calling SNPs and indels to generate clean, filtered VCF files.

This course is ideal for beginners and intermediate users alike – whether you’re a student, researcher, or bioinformatics enthusiast. You don’t need any prior experience with Linux or the command line. By the end of the course, you’ll have a complete working pipeline and the confidence to analyze real NGS datasets on your own.

Get Tutorial