The initial Monte Carlo-based algorithm is described in:
Keightley, P. D. (1994). The distribution of mutation effects on viability in Drosophila melanogaster. Genetics 138: 1315-1322.
The numerical integration algorithm used by this program is a development of the above, described in:
Keightley, P. D. and Ohnishi, O. (1998). EMS-induced polygenic mutation rates for nine quantitative characters in Drosophila melanogaster. Genetics 148: 753-766. [Article]
Keightley, P. D. (1998). Inference of genome wide mutation rates and distributions of mutation effects for fitness traits: a simulation study. Genetics 150: 1283-1293. [Article]
The following instructions are written with a Unix system in mind; running the program on other systems may differ somewhat from what follows.
Obtaining the program
The program contains copyright routines, used under license, and unfortunately it has not been possible to offer download of the program from this website. The program must instead be requested by contacting the author by email: . Files associated with the program can be downloaded from this website (see below). One of these files, gengamtab.out, is a required file,
Compilation of the Program
The program is written in C, and consists of only the one module:
To compile the program using, for example, the gcc compiler on a Unix system:
gcc -O2 -o mlgenomeu mlgenomeu.c -lm
The -O2 option specifies that the object code should be optimised for increased speed; compilers other than gcc may use a different option to specify this. The -o option specifies the name of the executable (object ) file, mlgenomeu in this case.
Input File Formats
The program expects an input file, data.dat. The first entry in the file is the generation number of the mutation accumulation (MA) lines. In this program, only one MA generation (+ control = generation 0) can be handled. The input file contains a list of phenotypic observations for the MA line replicates and the control line replicates. Each observation is the phenotypic value of one replicate for an MA or control line. The control line replicate values are assumed to follow a normal distributed. No additional random effects are estimated by this version of the program, but fixed effects may be included by inserting an appropriate code into the input data file. All replicates are assumed to be measured with the same error. MA lines have at least one replicate, and the number of replicates per MA line can vary. The individual MA lines are given a unique integer identifier (>0), but replicate numbers within a line are not given codes. The phenotypic value for the trait is a real number. Control lines are given the identifier `0'. An example of part of a simulated data.dat file is as follows. The MA lines (50 of them, each with 4 replicates) come first. This is followed by a block of control line replicaes. Note that the generation number of the MA lines is 1 in this case.
1 9.735818 1 9.079754 1 10.793222 1 10.062783
2 11.842031 2 10.241879 2 10.206810 2 9.931010
3 8.810917 3 8.723238 3 10.741316 3 10.449603
4 7.858996 4 10.354959 4 9.068551 4 9.349407
49 10.538285 49 9.587561 49 10.778127 49 8.931881
50 11.384082 50 9.507858 50 8.633099 50 10.235791
0 8.934888 0 10.686638 0 9.251242 0 11.910901
0 9.242428 0 8.849134 0 9.418149 0 10.984947
0 9.951213 0 8.866559 0 9.243635 0 8.940020
0 12.175585 0 9.282169 0 9.776285 0 8.819433
0 12.093757 0 9.640237 0 11.307031 0 9.133931
0 9.807303 0 10.327360 0 9.346597 0 11.193762
0 10.825930 0 8.457343 0 10.121449 0 9.166584
Download complete example file data.dat.
A downloadable file (zipped) gengamtab.out.gz, containing tables of gamma distributions used by the numerical integration procedure, is required. This file should not be changed in any way. In order to start downloading these files, it may be necessary to hold the shift key down while clicking on the file.
Running the Program and its Input Parameter Values
The program is set up to generate profile likelihoods, in which one of the parameters is kept at a fixed value. Convergence over the full likelihood surface is often a problem, so maximization with all of the key parameters of the model as variables is discouraged.
IMPORTANT. Runs should be checked with different starting values, in order to check convergence, and to check for the presence of multiple peaks.
There is a `line search' option to search over fixed values of a specific parameter, and this may be used to increase the robustness of the search, but it takes more CPU time. The way in which line search is used is explained below.
The recommended way to run the program is to estimate U and E(a) for a series of different values of beta ranging from ->infinity (equal effects - the -99 beta option) to, say, 0.1. Within each model (i.e. value for beta), confidence intervals for U and E(a) can be computed from profile likelihoods. The fit of different models can be compared via the likelihood.
The summary output is written to the file sum.out. Full details of the run are stored in mlgenomeu.out.
Below, input values are shown in red for a typical run in which beta is fixed at 1.0. The starting value for alpha is 10.0 giving a mean mutant effect of beta/alpha = 0.1 (this obviously has to be chosen to give a reasonable fit to the data). This run inputs a value for the proportion of mutants positive = zero, i.e. all mutations are deleterious. If P is set to >0, the program slows down dramatically. Here is an example of the start of a run with suitable input values. This was run using the example file data.dat that can be downloaded above. Comments on the input values are given in green.
These are coded after the line number in data.dat. For a fixed effect
with n levels, they are coded from zero to n-1.
Line search option
This option allows one of the parameters to have a series of fixed values,
and likelihood is maximized with respect to the other variable parameters
in a series of separate runs. At the end of these runs, the line search parameter
is changed to a variable parameter, with a starting value equal to that which gave
the highest likelihood in the series of fixed value runs. Using the line search option can
reduce the dimensionality of the search, so can help in finding the
global likelihood maximum if their are multiple peaks in the likelihood
Please email any bug report to