Below are several useful software packages developed in the lab. Feel free to use them and please cite accordingly.
PMGP is a promoter modeling platform based on genetic programming. Given a set of coregulated sequences PMGP uses a generalized log-likelihood ratio scheme to search for common patterns that conform an arbitrary, flexible motif incorporating DNA structural features and position-specific weight matrices.
CGB is a complete software platform for the comparative analysis of bacterial transcriptional regulatory networks. Using a robust Bayesian model for assessing gene regulation, the pipeline enables the use of both draft and complete genomes for comparative analyses, automatically combines known motifs for the transcription factor under analysis using phylogenetic weighting, predicts operons and orthologs and enables the user to perform ancestral state reconstruction to formally interrogate the gain and loss of regulation for any gene in the network.
Cite: Kılıç S, Sánchez-Osuna M, Collado-Padilla A, Barbé J, Erill I. Flexible comparative genomics of prokaryotic transcriptional regulatory networks. BMC Genomics. 2020 21(Suppl 5):466. doi: 10.1186/s12864-020-06838-x. PMID: 33327941; PMCID: PMC7739468.
ViPhy is a Python package to perform whole-genome phylogeny of large numbers of bacteriophage genomes. It accepts both annotated and non-annotated genome files and performs bootstrapped phylogenetic reconstruction using distances inferred from pair-wise BLAST.
FiToM is a computer program written in C++ for the detection of binding sites in DNA or RNA sequences. It implements several methods described in the literature to compute an approximation of binding affinity for a particular site based on a collection of binding sequences provided by the user. Using this method, FiToM scans a sequence file looking for putative binding sites across the DNA/RNA sequence in both strands, and filters the results according to a user-specified threshold. If sequence annotation is provided in the sequence file, FiToM will also link the identified sites with annotated genes and it will infer their role from their location in the vicinity of genes.
xFiToM is a revised and fully featured GUI version of FiToM for Ms-Windows operative systems with added functionality. xFiToM uses a set of aligned sites and/or IUB consensus sequences to construct a position-specific frequency matrix and search a set sequences for putative binding sites. In addition to FiToM features, xFiToM allows integrating multiple GenBank sequences for analysis, allowing immediate access to annotated sequence assemblies from unfinished genomes. It also integrates local complexity options to detect local motif enrichment and allows full use of IUPAC degenerate characters in the definition of the binding motif.
Cite: Bhargava N, Erill I. xFITOM: a generic GUI tool to search for transcription factor binding sites. Bioinformation. 2010 5(2):49-51. doi: 10.6026/97320630005049. PMID: 21346861; PMCID: PMC3039987.
jFiToM is a Java version of FiToM. Like xFiToM, jFiToM uses a set of aligned sites and/or IUB consensus sequences to construct a position-specific frequency matrix and search a set sequences for putative binding sites. It can be donwloaded or run as an applet and provides fast searches for annotated genomes. Among other improvements, jFiToM returns relevant annotation on both genomic strands and operon information.
Relative codon adaptation (RCA) is a novel index for measuring codon adaptation in genomic sequences. Similar to CAI, RCA can use a reference geneset to estimate gene expression, but RCA takes directly into account the background nucleotide distribution.
Cite: Fox JM, Erill I. Relative codon adaptation: a generic codon bias index for prediction of gene expression. DNA Res. 2010; 17(3):185-96. doi: 10.1093/dnares/dsq012. PMID: 20453079; PMCID: PMC2885275.
scnRCA is a Java program to obtain the self-consistent reference set for a given genome using the nRCA (or CAI) codon bias index. The self-consistent reference set is defined as the set of genes within the genome that possess a dominant codon bias, in the sense that ranking all genes in the genome with a codon usage index based on such a set leads to picking out the same set as the top-scoring group of genes in the genome. When translational bias is present, the self-consistent reference set is likely to be populated by genes with heavy translational bias, although other biases, such as %GC content, can confound the algorithm. We have shown that nRCA outperforms CAI at identifying the self-consistent reference set in biased genomes.
Cite: O’Neill PK, Or M, Erill I. scnRCA: a novel method to detect consistent patterns of translational selection in mutationally-biased genomes. PLoS One. 2013;8(10):e76177. doi: 10.1371/journal.pone.0076177. PMID: 24116094; PMCID: PMC3792112.
BioWord is a powerful biological sequence editor operating in the most convenient of places: inside your usual word-processor (Microsoft Word 2007 and 2010). Easy to install and embedded into a Microsoft Office Ribbon, BioWord allows instant access to most sequence manipulation and editing needs, such as reverse-complementing, DNA to protein translation or FASTA formatting, and features a full suite of sequence search methods, pair-wise alignment and motif discovery, as well as the ability to generate consensus logos for both DNA and protein multiple sequence alignments.
Cite: Anzaldi LJ, Muñoz-Fernández D, Erill I. BioWord: a sequence manipulation suite for Microsoft Word. BMC Bioinformatics. 2012 Jun 7;13:124. doi: 10.1186/1471-2105-13-124. PMID: 22676326; PMCID: PMC3546851.