*all*statistical methods of gene localization. It is hoped that, by keeping this bias in mind, we will more realistically interpret and extrapolate from the results of genomewide scans.

## Introduction

- Utz HF
- Melchinger AE

*all*other methods of gene localization and estimates of

*any*underlying parameter relating marker-locus genotypes and trait phenotypes, if these parameters are estimated at peaks of the test statistic. The main goal of this study is to make investigators aware of the existence and severity of this problem in general.

### Brief Overview of VC Linkage Analysis

^{2}) is modeled as the sum of the phenotypic variances attributable to the additive effects of a QTL at the given chromosomal position (σ

^{2}

_{q}), the aggregate additive effects of genes elsewhere in the genome (σ

^{2}

_{p}), and individual-specific environmental effects and/or measurement errors (σ

^{2}

_{e}). Under the null hypothesis of no linkage (i.e., the absence of a QTL at or near the chromosomal location being tested), σ

^{2}

_{q}is set to 0. The statistical evidence for linkage is evaluated by a likelihood-ratio test, which is typically presented as a LOD score:

L( ) denotes the likelihood. Asymptotically, the likelihood-ratio statistic, Λ=2ln(10)Z, is assumed to be distributed as an equal mixture of a χ

^{2}random variable with 1 df and a point mass at 0 (Self and Liang

and the (additive) heritability attributable to the QTL can be written as

For brevity, we refer to σ

^{2}

_{q}as “QTL effect size” and to

*h*

^{2}

_{q}as “QTL heritability.”

### Sources of Bias

*q,*in the genome, and the QTL heritability is estimated independently of the magnitude of the LOD score—that is,

By “genomewide sources of bias,” we mean the additional bias resulting from joint estimation of locus position and effect size in scans of the whole genome or parts thereof—that is,

Of course, a genome scan may provide statistically significant evidence not just for one locus but for multiple loci or for none at all. Equation (2) is meant to illustrate the multiple-testing problem that results in genomewide bias, independent of whether the genomewide maximum LOD score is significant. This genomewide bias may also be viewed as a type of pointwise bias, resulting when the QTL heritability is estimated only when the LOD score exceeds some threshold.

#### Pointwise sources of bias

*h*

^{2}

_{q}is defined on the interval [0,1], as it is a proportion. For

*h*

^{2}

_{q}≠0.5, the boundaries of its domain (0 and 1) are asymmetrical, which necessarily leads to an asymmetrical distribution of $\stackrel{\wedge}{h}$

^{2}

_{q}, and is expected to result in bias. In most cases, the closer to a boundary the true value of the parameter, the larger the bias from this source. As QTL heritabilities for complex traits are expected to be small in general, on the basis of both empirical evidence and theoretical considerations (e.g., Blangero et al.

#### Genomewide sources of bias

*Z*and $\stackrel{\wedge}{h}$

^{2}

_{q}are therefore not independent. In fact, for a given data set, assuming constant information on meiotic transmissions throughout the genome, there is essentially a one-to-one correspondence between LOD scores and QTL-heritability estimates, which then provide redundant information (see fig. 2). Typically, however, the available information on chromosomal segregation varies from point to point throughout the genome, because of differences in marker density, marker informativeness, and which individuals are genotyped at a given marker—among other reasons, including genotyping, map, and other errors. Although the one-to-one correspondence between LOD scores and QTL-heritability estimates then no longer holds,

*Z*and $\stackrel{\wedge}{h}$

^{2}

_{q}remain positively correlated. By maximization of the LOD score over the genome or over parts thereof, the estimate of QTL heritability is thus effectively maximized as well, resulting in an upward bias. The bias from genomewide testing may also be viewed as a type of pointwise bias that results when the QTL heritability is estimated only when the LOD score is significant. Even if pointwise estimates of locus-specific effect size were unbiased when estimated irrespective of the LOD score, they cannot be so conditional on the linkage test being significant, given the correlation of the LOD score and the QTL-heritability estimate.

*P*value of .0001 (Morton

*P*value of .05 commonly used as a standard in pointwise statistical analysis. The autocorrelation of the test statistic and, equivalently, of the locus effect-size estimate along the chromosomes depends on many factors, including the nature of the data and analysis method, and the appropriate genomewide significance threshold could be chosen accordingly (see also Lander and Kruglyak

## Results

### Simulation Results

*h*

^{2}, was set to 0.5, attributable to 0–5 unlinked QTLs with

*h*

^{2}

_{q}=0.1 each (or 0–10 QTLs with

*h*

^{2}

_{q}=0.05 each) and nonlocalized polygenic effects. Each QTL was diallelic, with equally frequent alleles, and was located in the middle of a chromosome. The remaining phenotypic variance was due to individual-specific effects. Phenotypic effects of dominance, covariates, shared environment, and any other complicating factors were assumed to be absent. Marker maps, marker genotypes, and phenotypic data were assumed to be accurate. Two-point VC-based linkage analysis was conducted on each marker using SOLAR (Almasy and Blangero

^{2}across replicates was 0.498, essentially identical to the generating value. Figure 1 shows the pointwise distribution of $\stackrel{\wedge}{h}$

^{2}

_{q}at the true position of a QTL across replicates, for generating value

*h*

^{2}

_{q}=0.05. The distribution is clearly skewed, with a long upper tail and a large point mass at the lower boundary of 0, and the expected value of $\stackrel{\wedge}{h}$

^{2}

_{q}, 0.062, is somewhat biased upwards. For

*h*

^{2}

_{q}=0.1, the mean estimate was 0.104 (data not shown). The decrease in bias is simply due to the fact that the larger generating value is further away from the lower boundary, thus reducing the skewness resulting from the point mass at 0. Note that the observed pointwise biases are not large under the ideal circumstances simulated.

^{2}

_{q})

^{2}(Williams and Blangero

*Z*

_{max}) and at the chromosomal positions of LOD score peaks of at least 3 (columns labeled

*Z*

_{⩾3}), as a function of the number of QTLs with

*h*

^{2}

_{q}=0.1 in the genome. Let us first focus on the estimates for both true and false peaks. In the baseline case, when there is no mappable QTL in the genome (i.e.,

*h*

^{2}

_{q}=0.0), the mean estimate of $\stackrel{\wedge}{h}$

^{2}

_{q}associated with

*Z*

_{max}is ∼0.24, demonstrating the magnitude of the bias due to maximization of the LOD score over the genome. When there are mappable QTLs in the genome, essentially identical estimates are obtained. The same also holds for QTLs with

*h*

^{2}

_{q}=0.05 (data not shown). The QTL-heritability estimates thus are of similar magnitude, no matter what the true QTL heritabilities are or whether any mappable QTLs exist at all. Under the simulation settings, the estimates are therefore essentially independent of the true state of nature. Table 1 also gives the QTL-heritability estimates associated with only those LOD score peaks meeting or exceeding the customary LOD score threshold of 3 (Morton

E[
$\stackrel{\wedge}{h}$^{2}_{q}] for | ||||
---|---|---|---|---|

True and False Peaks, at | True Peaks, at | |||

No. of QTLs in Genome | Z_{max} | Z_{⩾3} | Z_{max} | Z_{⩾3} |

0 | .236 | .298 | NA | NA |

1 | .242 | .300 | .251 | NO |

2 | .246 | .301 | .256 | .301 |

3 | .249 | .302 | .258 | .303 |

4 | .253 | .301 | .259 | .301 |

5 | .254 | .301 | .260 | .301 |

^{2}

_{q}] = mean QTL-heritability estimate;

*Z*

_{max}= genomewide maximum LOD score;

*Z*

_{⩾3}LOD score peaks ⩾3. The true generating value for the additive trait heritability attributable to each QTL was 0.1. See text for details of simulations.

### Analytical Results

^{2}random variable with 1 df and noncentrality parameter equal to the expected value of the statistic on the data—that is, ξ=E[Λ] (Stuart and Ord

^{′2}[Λ,1,ξ]. The expected QTL-heritability estimate at the true QTL location, given that the test statistic is significant—that is, Λ⩾λ, is given by

The denominator represents the pointwise power to detect linkage in a data set. The division by this quantity is required to ensure that the integration is done over a proper density function, integrating to 1, because the expectation is computed conditional on the test statistic being significant. Williams and Blangero (

*h*

^{2}

_{q})

^{2}c, where

*c*is a constant for a given data set and total heritability. Furthermore, when full and accurate information on chromosomal transmissions and phenotypes is available,

because of the one-to-one correspondence of LOD score and QTL-heritability estimate for a given data set in that situation (see fig. 2). By substituting these two expressions into equation (3), one obtains

As an example, for a data set comprising

*n*nuclear pedigrees consisting of two parents with two offspring, the constant is given by

(Williams and Blangero

^{2}

_{q}=

*h*

^{2}

_{q}+bias≈constant—and thus are virtually independent of the true QTL heritability, just as observed by simulation.

^{2}

_{q}for the larger sample. Secondly, everything else being equal, the larger the sample size, the greater the power to map a locus of a given effect size. The bias, which results because the locus-specific effect size is estimated only when the test statistic is significant, is thus reduced, because a larger sample need not be as extreme, with respect to its locus-specific effect size estimate than a smaller sample, to yield a statistically significant finding. In the figure, the shaded area under each curve corresponds to the power of a sample of that size and, equivalently, to the proportion of samples of that size from which the locus-specific effect size is estimated and reported. A corollary of this is that when a LOD score of, say, 10 is reported, its associated genotype-phenotype parameter estimate(s) is more believable (i.e., is expected to be less biased) than at a less convincing LOD score of, say, 4. However, studies of such high power to detect genes influencing truly complex traits appear unrealistic, at present, for most complex-trait loci—or, at least, may be achievable only by nonrandom ascertainment schemes, in which case the ascertainment bias is expected to be large and potentially uncorrectable.

### Replication

where ξ

_{rep}is the expected likelihood-ratio statistic for the replication study and λ

_{rep}is the critical value for declaring replication significant, here assumed to correspond to a LOD score of 3, as before. In figure 5, “probability-of-replication-failure” curves (i.e., “1−power” curves) are superimposed on the bias curves of figure 3, as a function of the true underlying QTL heritability and the sample size, using the same data structures (two-offspring nuclear pedigrees) and conditions (overall heritability of 0.5 and complete and accurate information on phenotypes and chromosomal transmissions) as before. Note that the bias does not disappear until the sample size is large and/or the true QTL heritability is sizeable; in either of these cases, power would be high. Most current genetic studies of complex traits are probably underpowered and are subject to a significant upward bias in locus-specific effect-size estimates.

## Discussion

### Bias Elimination?

^{2}

_{q}, with its expected value given an assumed true QTL heritability,

*h*

^{2}

_{q}, and then solving the following equation, based on equation (4) above, for

*h*

^{2}

_{q}

*:*

(If the equation has no solution, the estimate of the underlying QTL heritability in the population would be 0.) In principle, this approach would also allow computation of confidence intervals for the unknown true QTL heritability. The fundamental problem with such an approach is that the corrected QTL-heritability estimate would be very crude, with an extremely wide confidence interval. The reason is that for low-power investigations, such as most complex-trait–mapping studies, the expected value of the likelihood-ratio statistic is quite small; as a consequence, only the upper tails of different noncentral χ

^{2}distributions would be compared with one another, and these tails overlap significantly. This is demonstrated in figure 6, which shows LOD score density functions for a data set of 1,000 two-offspring nuclear families (as before) for different underlying values of

*h*

^{2}

_{q}, conditional on the LOD score being significant (⩾3). Note the wide overlap of the various distributions.

*h*

^{2}

_{q}=0). Of course, the LOD score may also be a true positive, giving evidence of a true locus with

*h*

^{2}

_{q}>0. If there were a statistical technique by which the bias could be accurately corrected, this would mean, in essence, that there is information allowing us to distinguish true and false LOD score peaks from each other, on the basis of their magnitude alone. Of course, this is not possible.

### Pointwise Replication

### Generality of Results

### Differential Information Content throughout the Genome

## Conclusions

## Acknowledgments

## References

- Testing the robustness of the likelihood-ratio test in a variance-component quantitative-trait loci-mapping procedure.
*Am J Hum Genet.*1999; 65: 531-544 - Multipoint quantitative-trait linkage analysis in general pedigrees.
*Am J Hum Genet.*1998; 62: 1198-1211 - Robust variance-components approach for assessing genetic linkage in pedigrees.
*Am J Hum Genet.*1994; 54: 535-543 - Robust inference for variance components models in families ascertained through probands. I. Conditioning on the proband's phenotype.
*Genet Epidemiol.*1987; 4: 203-210 - The power and deceit of QTL experiments: lessons from comparative QTL studies.in: 49th annual corn and sorghum industry research conference. American Seed Trade Association, Washington, DC1994: 250-266
- QTL analysis: power, precision, and accuracy.in: Paterson AH Molecular dissection of complex traits. CRC Press, Boca Raton, FL1998: 145-162
- Quantitative trait locus mapping using human pedigrees.
*Hum Biol.*2000; 72: 35-62 - Variance component methods for detecting complex trait loci.
*Adv Genet.*2001; 42: 151-181 - The effects of conditioning on probands to correct for multiple ascertainment.
*Am J Hum Genet.*1984; 36: 1298-1308 - Ascertainment and goodness of fit of variance component models for pedigree data.
*Prog Clin Biol Res.*1984; 147: 173-192 - Correcting for ascertainment bias in the COGA data set.
*Genet Epidemiol.*1999; 17: S109-S114 - An introduction to population genetics theory. Harper and Row, New York1970
- The interpretation of lod scores in linkage analysis.
*Cytogenet Cell Genet.*1976; 3: 289-293 - Introduction to quantitative genetics. 4th ed. Prentice Hall, Harlow, United Kingdom1996
- On the cost of data analysis.
*J Comput Graph Stat.*1992; 1: 213-229 - The effect of methods of ascertainment upon the estimation of frequencies.
*Ann Eugenics.*1934; 6: 13-25 - Mapping quantitative trait loci controlling milk production in dairy cattle by exploiting progeny testing.
*Genetics.*1995; 139: 907-920 Göring HHH (2000) Statistical aspects of human gene mapping in the presence of errors. Ph.D. thesis, Columbia University, New York

- Linkage analysis in the presence of errors. I: Complex-valued recombination fractions and complex phenotypes.
*Am J Hum Genet.*2000a; 66: 1095-1106 - Linkage analysis in the presence of errors. II: Marker-locus genotyping errors modeled with hypercomplex recombination fractions.
*Am J Hum Genet.*2000b; 66: 1107-1118 - Linkage analysis in the presence of errors. IV: Joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigrees and singletons when the mode of inheritance cannot be accurately specified.
*Am J Hum Genet.*2000c; 66: 1310-1327 - Multipoint analysis of human quantitative genetic variation.
*Am J Hum Genet.*1990; 47: 957-967 - Extensions to multivariate normal models for pedigree analysis.
*Ann Hum Genet.*1982; 46: 373-383 - Simultaneous inference in epidemiological studies.
*Int J Epidemiol.*1982; 11: 276-282 - QTL analysis in plants: where are we now?.
*Heredity.*1998; 80: 137-142 - Efficiency of marker-assisted selection in the improvement of quantitative traits.
*Genetics.*1990; 124: 743-756 - Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results.
*Nat Genet.*1995; 11: 241-247 - Robust statistical modeling using the
*t*distribution.*J Am Stat Assoc.*1989; 84: 881-896 - Extensions to pedigree analysis. III. Variance components by the scoring method.
*Ann Hum Genet.*1976; 39: 485-491 - The design of replicated studies.
*Am Statistician.*1993; 47: 217-228 - Quantitative trait locus (QTL) mapping using different testers and independent population samplex in maize reveals low power of QTL detection and large bias in estimates of QTL effects.
*Genetics.*1998; 149: 383-403 - Subset selection in regression. Chapman and Hall, London1990
- Sequential tests for the detection of linkage.
*Am J Hum Genet.*1955; 7: 277-318 - Heritability of height and assortative mating in the Framingham Study.
*Am J Hum Genet Suppl.*2000; 67: A235 - Multifactorial analysis of family data ascertained through truncation: a comparative evaluation of two methods of statistical inference.
*Am J Hum Genet.*1988; 42: 506-515 - The future of genetic studies of complex human diseases.
*Science.*1996; 273: 1516-1517 - Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions.
*J Am Stat Assoc.*1987; 82: 605-610 - Kendall's advanced theory of statistics. Vol. 2: Classical interference and relationship. 5th ed. Oxford University Press, New York1991
- Problems of replicating linkage claims in psychiatry.in: Gershon ES Cloninger CR Genetic approaches to mental disorders. American Psychiatric Press, Washington, DC1994: 23-46
- Gene mapping in the 20th and 21st centuries: statistical methods, data analysis, and experimental design.
*Hum Biol.*2000; 72: 63-132 Terwilliger JD, Göring HHH, Magnusson PKE, Lee JH. Study design for genetic epidemiology and gene mapping: the Korean Diaspora Project. Shengming Kexue Yanjiu (in press).

- True and false positive peaks in genomewide scans: application of length-biased sampling to linkage mapping.
*Am J Hum Genet.*1997; 61: 430-438 - The problem of multiple inference in studies designed to generate hypotheses.
*Am J Epidemiol.*1985; 122: 1080-1095 - Comparison of different approaches to interval mapping of quantitative trait loci.in: Van Ooijen JW Jansen J Biometrics in plant breeding: applications of molecular markers, proceedings of the ninth meeting of the EUCARPIA section biometrics in plant breeding. CPRO-DLO, Wageningen, The Netherlands1994: 195-204
- Bias and sampling error of the estimated proportion of genotypic variance explained by quantitative trait loci determined from experimental data in maize using cross validation and validation with independent samples.
*Genetics.*2000; 154: 1839-1849 - How many diseases do you have to study to map one gene with SNPs?.
*Nat Genet.*2000; 26: 151-158 - Power of variance component linkage analysis to detect quantitative trait loci.
*Ann Hum Genet.*1999; 63: 545-563 - Statistical properties of a variance components method for quantitative trait linkage analysis in nuclear families and extended pedigrees.
*Genet Epidemiol.*1997; 14: 1065-1070

## Article Info

### Publication History

### Identification

### Copyright

### User License

Elsevier user license |## Permitted

### For non-commercial purposes:

- Read, print & download
- Text & data mine
- Translate the article

## Not Permitted

- Reuse portions or extracts from the article in other works
- Redistribute or republish the final article
- Sell or re-use for commercial purposes

Elsevier's open access license policy

### ScienceDirect

Access this article on ScienceDirect## Related Articles

## Comments

#### Cell Press commenting guidelines

To submit a comment for a journal article, please use the space above and note the following:

- We will review submitted comments within 2 business days.
- This forum is intended for constructive dialog. Comments that are commercial or promotional in nature, pertain to specific medical cases, are not relevant to the article for which they have been submitted, or are otherwise inappropriate will not be posted.
- We recommend that commenters identify themselves with full names and affiliations.
- Comments must be in compliance with our Terms & Conditions.
- Comments will not be peer-reviewed.