Glow Logo
latest
  • Introduction to Glow
  • Getting Started
  • Variant Data Manipulation
  • Tertiary Analysis
  • Troubleshooting
  • Blog Posts
  • Additional Resources
  • Python API
Glow
  • Docs »
  • Glow
  • Edit on GitHub

GlowΒΆ

Glow is an open-source toolkit for working with genomic data at biobank-scale and beyond. The toolkit is natively built on Apache Spark, the leading unified engine for big data processing and machine learning, enabling genomics workflows to scale to population levels.

  • Introduction to Glow
  • Getting Started
    • Running Locally
    • Running in the cloud
    • Notebooks embedded in the docs
    • Demo notebook
  • Variant Data Manipulation
    • Read and Write VCF, Plink, and BGEN with Spark
    • Read Genome Annotations (GFF3) as a Spark DataFrame
    • Create a Genomics Delta Lake
    • Variant Quality Control
    • Sample Quality Control
    • Liftover
    • Variant Normalization
    • Split Multiallelic Variants
    • Merging Variant Datasets
    • Utility Functions
  • Tertiary Analysis
    • Parallelizing Command-Line Bioinformatics Tools With the Pipe Transformer
    • Using Python Statistics Libraries
    • Genome-wide Association Study Regression Tests
    • GloWGR: Whole Genome Regression
  • Troubleshooting
  • Blog Posts
    • [Jul. 2020] Introducing GloWGR: An industrial-scale, ultra-fast and sensitive method for genetic association studies
    • [Jun. 2020] Glow 0.4 Enables Integration of Genomic Variant and Annotation Data
    • [Mar. 2020] Glow 0.3.0 Introduces Several New Large-Scale Genomic Analysis Features
    • [Nov. 2019] Streamlining Variant Normalization on Large Genomic Datasets
  • Additional Resources
    • Databricks notebooks
    • External blog posts
  • Python API
    • Glow Top-Level Functions
    • Glow PySpark Functions
    • GloWGR
Next

© Copyright 2019, Glow Authors Revision 796c98e6.

Built with Sphinx using a theme provided by Read the Docs.