Pathogen Genome Cluster Computing 


Statistics and Computer Architecture 



Maximum likelihood 

Maximum likelihood seeks to search a probability landscape for the solution with the highest (maximum) probability using a predefined model. The approach is widespread in phylogenetics in searching for different tree structures and this in turn is the bricks and mortar of phylogenomics analysis. Phylogenetics seeks to model either DNA or amino acid point mutations, although we have applied it to modeling the presence or absence of genes from DNADNA microarray data. Phylogenetic nucleotide models center on recovering information lost due to reversion mutations, particularly at "neutral" sites. Elaborations on this model incorporate skews in purine pyrimidine mutation rates, base homogeneity and site rate heterogeneity. Parameter rich models are processor hungry in an approach which is notoriously processor hungry. The combination of maximum likelihood based models across entire genomes results in highly computationally demanding calculations. Bayesian approaches use the same model based approach to reconstruct a phylogenetic tree but obtain the solution via simulation notably using Markov Chain Monte Carlo.


Stochastic Approaches 

Stochastic simulations, or microsimulations, which model individual behaviours, are very RAM intensive and RAM is often the limitation of this approach. When the population size of the microsimulation exceeds the RAM of a machine the calculation freezes (from bitter experience). A computer designed around simulation usually contains a smaller number of processors each with a huge RAM capacity.


• Next >>

London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK  Tel: +44 (0) 20 7636 8636  Comments and enquiries Last updated 28th July, 2005 MWG. 