mapgd  0.4
A program for the Maximum-likelihood analysis of population genomic data.
 All Data Structures Functions Variables Friends Groups Pages
mapgd Documentation

Contents

Introduction

Basic Design

Style Guidelines

Tutorials

Task List

Introduction

MAPGD is a series of related programs that estimate allele frequency, heterozygosity, Hardy-Weinberg disequilibrium and identity by descent (IBD) coefficients from population genomic data using statistically rigorous maximum-likelihood approach.

Basic Design

Style Guidelines

MAPGD is written to conform to the GNU style guidelines, at least to the extent that I have had time to read and implement the guidelines.

Models

In statistics, a model is an idealized description of how data is generated. If a statistical model of data exist, it can be used to estimate parameters used to generate the data through the process of likelihood maximization.

Likelihoods

Maximum Likelihood

Priors

Posteriors

Maximizing Posteriors

Detecting violations of the model

Useful representations of data

Allele frequencies

Genotypic likelihoods

Genotypic Correlation

Ultimately the genetic structure of a population is fully specified by the genotypes of the individuals that compose that population. This means that if we can accurately calculate all genotypic probabilities, then the calculation of any other population statistics becomes trivial. However, in order to calculate genotypic probabilities we must take account of the errors made in the genotyping process. Include inferring the presence of alleles that are not there, which can arise from sequencing error or mistakes made aligning to a reference, and failing to detect the presence of alleles which are there because of low coverage or biased sequencing of a single parental chromosome. We maximize a likelihood equation to account for sequencing error and the failure to sample genotypes, and then we test the data for fit to the parameters, and reject estimates where the data has a poor fit to the estimated parameters.

Jaquard's condensed coefficients of identity

Zygosity Correlations

The genotypic correlation between two loci within a population is generally described in terms of linkage disequilibrium, but can more generally be seen as second order zygosity correlations between loci. These correlations exist within individuals, and ultimately linkage disequilibrium is simply the average second order zygosity correlation across individuals within the population.

Tutorials

[An introduction to quartets]() [Reading and writing to files]() [Making models]()

Task List

[x] Claim MAPGD.org.

The website should probably just redirect to the github repository.

[ ] Implement gettext for internationalization.

[ ] Get SQL back end working again.

[ ] Get MPI working again.

[ ] Write scripting tutorials for the main user page.

Write a tutorial to show

[ ] Write SQL tutorials.

SQL tutorials should show users how to quickly calculate various summary statistics form the SQL database, such as piN and piS.

[ ] Automate tutorial testing.

Both coding tutorials and example work flows in the README should be automatically tested before each push to the main repository.

[ ] Automatic statistical and computational performance testing.

Statistical test should report bais and MSE of MAPGD and several other programs (ANGSD/GATK/PLINK). Add scripts to keep other programs up to date. Performance test should look at how computations times scale with input size, number of threads, and number of nodes. Results of Statistical and Performance test should automatically be displayed in the readme.

[x] Automatically generate figures from recent papers.

These figures and scripts are placed in the directories Ackerman 2016a and Ackerman 2016b.