This approach to penetrance calculation was initially described within Spargo et al. (2021). The disease model upon which it is based is presented within Al-Chalabi and Lewis (2011). The method is also available as an R function, hosted on GitHub.

On this tab, we outline: (1) the assumptions of this method, (2) the main operations of the tool, (3) information regarding the integrated data repository for approximating sibship size.

Assumptions

Disease model assumptions

All individuals harbouring the variant are ascertained.
All variants are inherited from exactly one parent; homozygosity is not included within the model and there are no de novo variants.
A nuclear family containing a parent who harbours the variant can be classed into one of three disease states:

familial - more than one family member is affected.
sporadic - one family member is affected.
unaffected - no family members are affected.

Penetrance is not complete.

Additional assumptions for penetrance calculation

Individual families are represented only once across the sample.
Weighting factors, average sibship size, and disease risk for people not harbouring a variant (which defaults to 0) are absolute values.
The value specified for sibship size is representative of sibship size across disease state groups.
People can also be described within an ‘affected’ disease state, characterised by the instance of one or more family members being affected (i.e. this does not stratify the affected population between the familial and sporadic states).
Disease state classifications are assigned assigned according to the status of the sampled person and first-degree relatives only
When sampling across only families where the variant occurs, disease state classifications for sampled families will not change at a future time
When estimating variant frequencies within disease states across cohorts of people with and without the variant, family disease states will change comparably over time for people with and without the variant
Unless an estimate of disease risk for people not harbouring a variant is provided, those not harbouring the variant are unaffected.

Workflow

Step 1: Estimating disease state rate (the rate of state X)

To calculate penetrance via this method it is necessary to calculate the rate at which one of the disease states represented in the available data is observed among people who harbour the assessed variant across all of those disease states represented. In this tool, the user can select to estimate this via one of three methods. In formats (1) and (2), this calculation is performed within the tool based on data given by the user. In format (3) we allow the user to specify this estimate directly for a given state, as requested by the calculator. In formats (1) and (2) the user should indicate the frequency at which the assessed variant occurs in either two or three of the 'familial', 'sporadic' and 'unaffected' states or for the 'affected' and 'unaffected' states (see assumptions for state definitions). In format (1) these are given as variant counts and sample size, and in format (2) as variant frequencies. The user must then also specify the disease characteristics requested, including the rate at which familial/sporadic disease occur among the affected population and the overall population risk of being affected. Which disease characteristics are required depends upon the disease states for which variant frequency estimates have been given; the tool will indicate which should be specified.

If data are provided in formats (1) or (2) they will be used to calculate, as a weighted proportion, the rate at which one of the disease states occurs among the states for which variant frequency data are provided. This is the observed rate of 'state X', where state X denotes one of the states for which variant frequency data were given. If data are provided in format (3), the user should specify the rate of state X among people harbouring the assessed variant across those represented disease states, as requested by the calulator. If the familial state is represented within input data, then state X is familial. If only the sporadic and unaffected states are represented then state X is sporadic. If the affected and unaffected states are represented, then state X is affected. Note that the affected and unaffected states represent cases and controls respectively.

The result of this calculation represents the 'observed' rate of state X shown in the output Table.

Step 2: Generating a lookup table

The user must specify the average sibship size within the population from which the variant frequency estimates were drawn. We recommend also specifying the disease risk who do not harbour the variant, which is important for accurate penetrance estimation in common traits (e.g. where this risk is >0.01); by default this risk is assumed to be 0. The tool will produce a sequence of potential penetrance values from 0 to 1, increasing at increments of 0.0001.

Using the extended Al-Chalabi and Lewis (2011) disease model, according to the specified sibship size, and disease risk for people not inheriting the variant, and for each value of penetrance, the tool calculates the probabilities of families harbouring the variant presenting as unaffected, familial and sporadic. From these disease state probabilities, taking only those states for which data were provided in Step 1, the tool will calculate the 'expected' rate of state X at each penetrance value. A lookup table is then generated, storing each value of the expected rate of X (being either familial, sporadic, or affected) alongside the penetrance value to which it corresponds. When X is the affected state, the expected affected rate is determined as the sum of the expected familial and sporadic rates at a given penetrance value.

Step 3: Querying the lookup table

The lookup table is queried to identify the expected disease rate of X closest to the observed rate of X obtained in Step 1. The corresponding penetrance estimate within the lookup table is then obtained. Please note that the estimate here is subject to a systematic bias underlying the approach, and must be adjusted in Step 4 to obtain the final penetrance estimate.

Step 4: Adjust estimate for systematic bias

The penetrance estimate obtained in Step 3 is adjusted in this step, to account for systematic biases present. The degree of correction is determined by penetrance value estimated and the error in that estimate predicted under a polynomial regression model. This regression model is fitted based on errors in Step 3 penetrance estimates observed in a simulated dataset which is generated according to the states used to model the rate of state X, the mean sibship size and disease risk for people without the variant, as defined in the input data. The simulated dataset contains a representation of the distribution of sibship sizes across sampled families, following a Poisson distribution where lambda is the mean sibship size defined. Penetrance is then estimated for this population as in Steps 1-3 across a series of true penetrance values between 0 and 1, and the difference between true penetrance values and estimates is determined: estimate Error = true penetrance value - estimated penetrance. An nth degree polynomial regression model, between 1 and 5 degrees, is then fitted to these data, selecting the best-fit model according the Akaike Information Criterion. The penetrance estimate obtained for the real sample data is then adjusted according to error predicted in this estimate within the fitted model: adjusted penetrance estimate = unadjusted estimate + predicted error.

OPTIONAL: Error propagation

Confidence intervals can be calculated for the penetrance estimate based on error terms for the variant data provided by the user. Data format options are described in the 'Tool operation guide' provided in the Penetrance Calculator tab. If data for Step 1 are given in format (1), then error propagation is performed by default. If data are provided in formats (2) or (3) then the user can opt to provide error terms for these estimates, either as standard errors or as confidence intervals, and then indicate the desired confidence level for the resulting penetrance estimate. When input data are provided in format (1) or format (2) with error terms, the calculus-based approximation of error propagation (Hughes & Hase, 2010) is applied to obtain confidence intervals for the estimate of the observed disease state rate. When input data are provided in format (3), the standard error of the disease state rate is taken as specified, or converted into the standard error from confidence intervals using z-score conversion. Any transformations between confidence intervals and standard errors are performed using z-score conversion, for the confidence level indicated by the user. The lookup table constructed within Step 2 is queried as in Step 3 for the upper and lower bounds of the observed disease state rate and corresponding penetrance estimate, which is then adjusted as in Step 4 to determine confidence interval bounds the adjusted penetrance estimate.

Sibship data repository

Sibship size is a key parameter in the Al-Chalabi & Lewis (2011) model, determining disease state probabilities at each potential penetrance value.

If known, the user can manually define the average sibship size for the sample. We also provide a dataset of Total Fertility Rate estimates for many world regions, at a country level or aggregated across multiple countries, obtained from the latest data available within the World Bank database. Total Fertility Rate represents the number of children that would be born to a woman if she were to live to the end of her childbearing years and bear children in accordance with age-specific fertility rates of the specified year. It can be applied as a proxy for average sibship size in the population of interest.

An example of usage of the ADPenetrance tool is provided on this tab, illustrating penetrance estimation of SOD1 risk variants for amyotrophic lateral sclerosis. This is estimated based on variant frequencies reported in the familial and sporadic disease states for a European sample of people with ALS collected within a 2017 meta-analysis (Zou et al., 2017).

Each element of tool operation and the returned output has been annotated, marked by box-colour, to briefly describe tool operation as applied to this example. A more comprehensive description of tool operation is given within the tool operation guide under the 'Penetrance Calculator' tab.

Thomas Spargo

PhD Student

Dr Sarah Opie-Martin

Computer Scientist

Professor Cathryn Lewis

Professor of Genetic Epidemiology and Statistics

Harry Bowles

PhD Student

Dr Alfredo Iacoangeli

Senior Research Fellow in Bioinformatics

Professor Ammar Al-Chalabi

Professor of Neurology and Complex Disease

Reference list:

Al-Chalabi, A., & Lewis, C. M. (2011). Modelling the Effects of Penetrance and Family Size on Rates of Sporadic and Familial Disease. Human Heredity, 71 (4): 281-288. doi: 10.1159/000330167

Hughes, I. & Hase, T. (2010). Measurements and their uncertainties: a practical guide to modern error analysis, Oxford University Press.

Spargo, T. P., Opie-Martin, S., Bowles, H., Lewis, C. M., Iacoangeli, A., & Al-Chalabi, A. (2022). Calculating variant penetrance from family history of disease and average family size in population-scale data. Genome Medicine 14, 141. doi: 10.1186/s13073-022-01142-7

World Bank, World Development Indicators. Fertility rate, total (births per woman) Retrieved from http://api.worldbank.org/v2/indicator/SP.DYN.TFRT.IN?downloadformat=csv

Zou, Z-Y., Zhou, Z-R., Che, C-H., Liu, C-Y., He, R-L., & Huang, H-P. (2017). Genetic epidemiology of amyotrophic lateral sclerosis: a systematic review and meta-analysis, J Neurol Neurosurg Psychiatry, 88 540-549. doi:10.1136/jnnp-2016-315018

ADPenetrance: penetrance estimation for autosomal dominant traits

The tool is operated within the 'Penetrance Calculator' tab

Assumptions

Workflow

Sibship data repository

Penetrance calculator

Disease states represented in data:

Data format:

Include error propagation?

Familial parameters:

Sporadic parameters:

Affected population parameters:

Unaffected population parameters:

Define weighting factors:

Set additional parameters:

Sibship data repository

Total Fertility Rate (World bank database):

Tool operation guide

Displayed if data are provided in either formats (1) or (2)

Displayed if data are provided in format (3)

Tool output:

Reference list: