Thursday, 21 November 2024

SHARPCLAW: Synteny, Homology And Repeat Pre-curation for Chromosome-Level Assembly Workflows

I’ve been a recovering acronymiser for some time, with the last few bioinformatics given more punny names, like Diploidocus, Taxolotl, Telociraptor and SynBad. But I was pushed for time preparing a conference poster and needed a name for a new workflow, and SHARPCLAW - Synteny, Homology And Repeat Pre-curation for Chromosome-Level Assembly Workflows - was born. I’ve not yet crow-barred in additional meaning, so for now this has definite ad hoc status.

Saturday, 30 May 2015

IRENE - Image, Reconstruct, Erase Noise, Etc.

This month’s Nature Audio file features a device, IRENE, designed to:

acquire digital maps of the surface of the media, without contact, and then apply image analysis methods to recover the audio data and reduce noise.

IRENE stands for Image, Reconstruct, Erase Noise, Etc. and was named after one of the first reconstructed audio recordings: “Goodnight Irene”, written by H. Ledbetter and J. Lomax, performed by the Weavers (1950). This earns IRENE the much prized pre-hoc classification.

Here more about IRENE here.

Wednesday, 15 April 2015

FUBAR - Fast Unconstrained Bayesian AppRoximation

From the authors that brought you BUSTED (Branch-site Unrestricted Statistical Test for Episodic Diversification), behold FUBAR: Fast Unconstrained Bayesian AppRoximation. (Doubly from the authors in this case, as the first author gave the tip-off.)

Despite it’s intranym status, FUBAR gets an extra geek hat-tip for being a homonym of “foo bar”. (Although in looking that up, I came across the original FUBAR acronym, which is hopefully not reflective of their method!)

Abstract

Model-based analyses of natural selection often categorize sites into a relatively small number of site classes. Forcing each site to belong to one of these classes places unrealistic constraints on the distribution of selection parameters, which can result in misleading inference due to model misspecification. We present an approximate hierarchical Bayesian method using a Markov chain Monte Carlo (MCMC) routine that ensures robustness against model misspecification by averaging over a large number of predefined site classes. This leaves the distribution of selection parameters essentially unconstrained, and also allows sites experiencing positive and purifying selection to be identified orders of magnitude faster than by existing methods. We demonstrate that popular random effects likelihood methods can produce misleading results when sites assigned to the same site class experience different levels of positive or purifying selection–an unavoidable scenario when using a small number of site classes. Our Fast Unconstrained Bayesian AppRoximation (FUBAR) is unaffected by this problem, while achieving higher power than existing unconstrained (fixed effects likelihood) methods. The speed advantage of FUBAR allows us to analyze larger data sets than other methods: We illustrate this on a large influenza hemagglutinin data set (3,142 sequences). FUBAR is available as a batch file within the latest HyPhy distribution (http://www.hyphy.org), as well as on the Datamonkey web server (http://www.datamonkey.org/).

  • Murrell B, Moola S, Mabona A, Weighill T, Sheward D, Kosakovsky Pond SL & Scheffler K (2013). FUBAR: a fast, unconstrained bayesian approximation for inferring selection. Mol Biol Evol. 30(5):1196-205. PMID: 23420840

Thursday, 26 February 2015

BUSTED - Branch-site Unrestricted Statistical Test for Episodic Diversification

This paper popped up in my PubCrawler feed today:

Murrell B et al. (2015). Gene-wide identification of episodic selection. Mol Biol Evol. 2015 Feb 19. pii: msv035.

We present BUSTED, a new approach to identifying gene-wide evidence of episodic positive selection, where the non-synonymous substitution rate is transiently greater than the synonymous rate. BUSTED can be used either on an entire phylogeny (without requiring an a priori hypothesis regarding which branches are under positive selection) or on a pre-specified subset of foreground lineages (if a suitable a priori hypothesis is available). Selection is modeled as varying stochastically over branches and sites, and we propose a computationally inexpensive evidence metric for identifying sites under episodic positive selection on any foreground branches. We compare BUSTED to existing models on simulated and empirical data. An implementation is available on www.datamonkey.org/busted, with a widget allowing the interactive specification of foreground branches.

From the Introduction, we find that BUSTED is indeed an orca-worthy contrived acronym: BUSTED - Branch-site Unrestricted Statistical Test for Episodic Diversification.

Saturday, 20 December 2014

SANTA - Spatial Analysis of NeTwork Associations

A festive bioinformatics acronym today: SANTA - Spatial Analysis of NeTwork Associations. The authors don't make a big deal of the acronym in the paper but it seemed contrived enough for a Christmas ORCA entry.

Abstract

Linking networks of molecular interactions to cellular functions and phenotypes is a key goal in systems biology. Here, we adapt concepts of spatial statistics to assess the functional content of molecular networks. Based on the guilt-by-association principle, our approach (called SANTA) quantifies the strength of association between a gene set and a network, and functionally annotates molecular networks like other enrichment methods annotate lists of genes. As a general association measure, SANTA can (i) functionally annotate experimentally derived networks using a collection of curated gene sets and (ii) annotate experimentally derived gene sets using a collection of curated networks, as well as (iii) prioritize genes for follow-up analyses. We exemplify the efficacy of SANTA in several case studies using the S. cerevisiae genetic interaction network and genome-wide RNAi screens in cancer cell lines. Our theory, simulations, and applications show that SANTA provides a principled statistical way to quantify the association between molecular networks and cellular functions and phenotypes. SANTA is available from http://bioconductor.org/packages/release​/bioc/html/SANTA.html.

Ref: Cornish AJ & Markowetz F (2014) SANTA: Quantifying the Functional Content of Molecular Networks. PLoS Comput Biol 10(9): e1003808.

Wednesday, 5 November 2014

REACH - Registration, Evaluation, Authorisation and Restriction of Chemicals

REACH - Registration, Evaluation, Authorisation and Restriction of Chemicals is a fine example of acronym contrivance (although lacking the panache of a good pre hoc concoction). Not got the right words for something catchy? Just ignore the inconvenient word!

“[REACH] streamlines and improves the former legislative framework on chemicals of the European Union (EU). The main aims of REACH are to ensure a high level of protection of human health and the environment from the risks that can be posed by chemicals, the promotion of alternative test methods, the free circulation of substances on the internal market and enhancing competitiveness and innovation.”

I guess you can’t blame them for trying to make it more interesting.

Tuesday, 14 October 2014

MUSIC - MUltiScale enrIchment Calling

Over on the ACGT blog, Keith Bradnam has another JABBA Award (and a nice new JABBA logo):

MUSIC - Multiscale Enrichment Calling.

I'm not sure what the connection between MUSIC and ChIP-Seq is but the authors seemed pretty determined. His post is actually a twofer, as it also draws attention to an equally contrived an unfathomable intranym:

MuSiC - Multiple Sequence Alignment with Constraints

Read more at Keith’s blog!