← All GROQ-seq datasets

TEV Protease - Pilot ML Libraries

GROQ-seq function measurements for TEV protease ML libraries

About This Dataset

The project aims to collect data on thousands of protease sequence variants, with applications in understanding protease specificity and developing therapeutic interventions.

This dataset contains GROQ-seq measurements of a TEV protease library designed by seven generative protein-design models (PSSM, EVCouplings, EVE, ESM2, Tranception, ESM-IF1, ProteinMPNN) alongside an epPCR baseline, benchmarking generative methods against experimental function. Assayed via split-DHFR at the Living Measurement Systems Foundry (LMSF) at NIST.

Dataset Information

Owning Organization
Marks Lab
Gene
TEV
Released
Apr 13, 2026

Experiment Details

total Records
39,708
host Organism
Escherichia coli
antibiotic
Trimethoprim
experiment Date
12/15/2025
collection Site
LMSF

Downloads

TEV_Pilot_ML_output_v1.0.zip

File containing final function values for variants

41.0 MB

Download

TEV_Pilot_ML_Supplemental_v1.0.zip

Supplemental data including QC analysis notebooks

67.6 MB

Download