Tools
The lab uses a range of computational approaches, many of which we are actively developing. A subset of these tools are listed below. The development of novel tools and approaches is central to our ability to ask new types of questions.
Simulation tools
We use and develop several tools for performing molecular simulations. Of note, PIMMS is our in-house developed lattice-based simulation engine, and CAMPARITraj is our suite of analysis tools for working with disordered protein sequences.
PIMMS
PIMMS (Polymer Interactions in Multicomponent MixtureS) is a high-performance coarse-grained lattice-based simulation engine developed explicitly for studying phase transitions of complex heteropolymers, such as intrinsically disordered proteins. PIMMS was developed by Alex while in the Pappu Lab, and is a MOLSSI sponsored project.
SOURSOP
SOURSOP (Simulation analysis Of Unstructured and disordered RegionS Orchestrated in Python) is an integrative analysis suite for analyzing all-atom simulation trajectories of intrinsically disordered proteins. While SOURSOP was developed with CAMPARI Monte Carlo simulations in mind, it can be broadly used with almost any simulation engine and trajectory type.
The initial prototype for SOURSOP was developed by Alex while in the Pappu lab (as a MOLSSI-sponsored project) and completed by Pappu lab member Jared Lalmansingh.
Paper: Lalmansingh et al. JCTC (2023)
Documentation: https://soursop.readthedocs.io/
Code: https://github.com/holehouse-lab/soursop
PyPI entry: https://pypi.org/project/soursop/
SolutionSpaceScanner
SolutionSpaceScanner is a Python toolkit that includes a command-line tool for re-wiring the solvation behaviour of polypeptides by creating customizable parameter-files for the ABSINTH implicit solvent model. SolutionSpaceScanner was developed with the Sukenik lab at UC Merced and is a MOLSSI sponsored project.
Paper: Holehouse & Sukenik JCTC (2020)
Documentation: https://solutionspacescanner.readthedocs.io/
Code: https://github.com/holehouse-lab/solutionspacescanner
PyPI entry: https://pypi.org/project/solutionspacescanner/
SEQUENCE DESIGN TOOLS
The lab also develops tools for the rational design of disordered protein regions. Our first such tool (GOOSE) is online as of October 2023!
GOOSE
GOOSE is a package for rationally designing disordered regions with specific sequence properties. The associated preprint will be online soon, but the code and documentation are already available. GOOSE was used in the design of sequence libraries for the ALBATROSS preprint, as well as several unpublished projects!
Preprint: Emenecker & Guadalupe et al. bioRxiv (2023)
Code: GitHub repository
Documentation: ReadTheDocs
Google Colab Notebook: Colab notebooks (in alpha - please report any issues!)
SEQUENCE ANALYSIS TOOLS
In addition to tools associated with molecular simulations and sequence design, a significant focus of the Holehouse lab is the development of tools to analyze protein sequence information. Below are several of our lab-developed methods. All are developed in Python.
FINCHES
FINCHES is a Python package for predicting chemical specificity between two disordered regions, or a disordered region and folded domain. FINCHES uses CALVADOS2 or Mpipi-GG forcefields to define an analytical energy function - if you use FINCHES you must also cite the associated CALVADOS and Mpipi papers (Tesei et al. 2022 and Joseph et al. 2021).
Paper: Ginell et al. bioRxiv (2024)
Code: Github repo for sparrow
Google colab notebooks: Colab notebooks
Webserver: http://finches-online.com/
ALBATROSS
ALBATROSS is a collection of deep learning models that enable the direct prediction of disordered protein dimensions from sequence. ALBATROSS is implemented inside sparrow, our general sequence analysis framework. However, ALBATROSS predictions are also available via several Google colab notebooks, as well as via the metapredict.net webserver.
Paper: Lotthammer, Ginell, and Griffith et al. Nature Methods (2024)
Code: Github repo for sparrow
Google colab notebooks: Colab notebooks
Webserver: https://metapredict.net/
metapredict
metapredict is our high-performance, deep-learning-based disorder predictor. Metapredict provides both a Python API and a command-line tool for interacting with FASTA files or directly downloading sequences from the UniProt database. In addition, we provide a web server for individual sequences that can be accessed at http://metapredict.net/, and a Google colab notebook for multiple sequences.
Based on rankings in CAID1 and CAID2, metapredict V3 is among the top most accurate disorder predictiors.
NOTE: As of November 2024 the default metapredict implementation has been updated to metapredict V3. V3 brings enhanced performance and accuracy over prior versions.
Metapredict was developed by graduate Jeff Lotthammer and postdoctoral fellow Ryan Emenecker with help from former graduate student Dan Griffith.
Paper (V3): TO BE ADDED SHORTLY!
Paper (V1): Emenecker et al. Biophysical Journal (2021) )
Supporting data: GitHub supporting data repository
Documentation: metapredict documentation
Code: GitHub repository and PyPI project
Web server (single sequences): https://metapredict.net/
Google colab notebook (multiple sequences): Click here
SHEPHARD
SHEPHARD is our general framework for organizing and annotating large-scale protein-based datasets. SHEPHARD was developed by Garrett Ginell.
Paper: Ginell et al. Bioinformatics (2023)
Supporting data: GitHub supporting data repository
Documentation: SHEPHARD documentation
Code: GitHub repository and PyPI project
Colab notebooks: Annotated human proteome, general examples
PARROT
PARROT is a general-purpose deep learning platform for mapping between amino acid sequence and some arbitrary sequence annotation. PARROT was developed by graduate student Dan Griffith, and the logo was created by undergrad Shub Minhas.
Paper: Griffith & Holehouse eLife (2021)
Supporting data: GitHub supporting data repository
Documentation: parrot documentation (includes an introduction to deep learning for sequence prediction)
Code: GitHub repository and PyPI project
protfasta
protfasta is a Python API and command-line tool for reading, parsing, and sanitizing protein FASTA files. protfasta was developed by Alex, and has been effectively used on datasets numbering millions of protein sequences without issue.
Documentation: protfasta documentation
Code: GitHub repository and PyPI project
Zenodo: Zenodo record
DOI: 10.5281/zenodo.4482762
How to cite: Please cite the DOI above as well as the version used