EpiBench: Technical Requirements and Setup

This note outlines the technical requirements and setup process for EpiBench. Since this is an active project, requirements may evolve—check back for updates. Last updated: 2024-09-20

Hardware Requirements

EpiBench is designed to handle large genomic datasets and train sophisticated neural network models. As such, it has specific hardware recommendations for optimal performance:

Minimum Requirements

CPU: 4+ cores (8+ threads)
RAM: 16GB
Storage: 50GB free space (SSD recommended)
GPU: Not strictly required, but CPU-only training will be significantly slower

Recommended Specifications

CPU: 8+ cores (16+ threads), modern AMD Ryzen or Intel i7/i9
RAM: 32GB+
Storage: 100GB+ SSD
GPU: NVIDIA GPU with 8GB+ VRAM (RTX 2070 or better)
- CUDA-compatible GPU required for GPU acceleration
- RTX 3080/3090 or A100 recommended for large datasets or complex models

High-Performance Computing

For institutional users analyzing multiple datasets or training custom models, we recommend:

Access to an HPC cluster with GPU nodes
Job submission through SLURM, LSF, or similar scheduler
Shared storage for large reference genomes and datasets

Software Dependencies

EpiBench relies on several key software components and libraries:

Core Requirements

Python: Version 3.7+ (3.8 or 3.9 recommended)
CUDA: Version 11.0+ (for GPU acceleration)
cuDNN: Compatible with installed CUDA version

Key Python Libraries

Library	Purpose	Version
PyTorch	Deep learning framework	1.8.0+
Biopython	Genomic sequence processing	1.78+
pyBigWig	Histone modification data handling	0.3.18+
Pandas	Data manipulation	1.3.0+
NumPy	Numerical operations	1.20.0+
Scikit-learn	Traditional ML algorithms and metrics	0.24.0+
Matplotlib	Basic visualization	3.4.0+
Seaborn	Statistical visualization	0.11.0+
Plotly	Interactive visualization	5.1.0+

Environment Setup

We strongly recommend using a dedicated conda environment for EpiBench:

# Create a new conda environment
conda create -n epibench python=3.9
 
# Activate the environment
conda activate epibench
 
# Install PyTorch with CUDA support
conda install pytorch cudatoolkit=11.3 -c pytorch
 
# Install bioinformatics libraries
conda install -c bioconda biopython pybigwig pyfaidx
 
# Install data science tools
conda install pandas numpy scikit-learn matplotlib seaborn plotly
 
# Install additional dependencies
pip install captum  # For model interpretability

Data Requirements

EpiBench works with several data types, each with specific format requirements:

Reference Genome

Complete reference genome in FASTA format
Common choices include hg38 or hg19 for human data
Must match the genome build used for your other data types

Epigenetic Data

DNA Methylation: BED or CSV files with coordinates and methylation scores
Histone Modifications: BigWig format signal tracks
- Each histone mark should be in a separate file
- Files should use the same genomic coordinate system

Sample Metadata

CSV files with sample information
Must include sample IDs that match filenames
Additional clinical or experimental annotations as needed

Disk Space Considerations

Be aware that genomic analyses can require significant storage:

Reference Genome: ~3GB (compressed)
BigWig Files: ~1-5GB per histone mark per sample
Methylation Data: ~0.5-2GB per sample
Model Checkpoints: ~0.5-1GB per model
Visualization Outputs: ~1-5GB depending on detail level

Installation Process

Option 1: Direct Installation

Clone the EpiBench repository:

git clone https://github.com/username/epibench.git
cd epibench

The Bonney Lab

Explorer

EpiBench Technical Requirements and Setup