Title: | Insertion/Deletion Dynamics for Transposable Elements |
---|---|
Description: | Provides functions to estimate the insertion and deletion rates of transposable element (TE) families. The estimation of insertion rate consists of an improved estimate of the age distribution that takes into account random mutations, and an adjustment by the deletion rate. A hypothesis test for a uniform insertion rate is also implemented. This package implements the methods proposed in Dai et al (2018). |
Authors: | Xiongtao Dai [aut, cre, cph], Hao Wang [aut], Jan Dvorak [ctb], Jeffrey Bennetzen [ctb], Hans-Georg Mueller [ctb] |
Maintainer: | Xiongtao Dai <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3-0 |
Built: | 2025-03-10 05:20:08 UTC |
Source: | https://github.com/cran/TE |
This data file contains the LTR retrotransposons in Ae. tauschii.
A data frame with 18024 rows and 12 columns. Each row corresponds to a unique LTR retrotransposon, and each column corresponds to a feature of the LTR-RT. The columns are:
LTR retrotransposon sequence ID
Length of each LTR
Number of mismatches
Divergence, as defined by (# of mismatches) / (LTR length)
Chromosome number
Start location in bp
Ending location in bp
LTR retrotransposon Family ID
Super family membership
Recombination rate
Whether the LTR-RT is near a gene that is colinear with wild emmer (TRUE) or not (FALSE)
Whether the LTR-RT is near the start codon (1) or not (-1)
Log distance to the nearest gene in bp
Distance to the nearest gene in bp
Luo, Ming-Cheng, et al. (2017) "Genome sequence of the progenitor of the wheat D genome Aegilops tauschii." Nature 551.7681.
Dvorak, J., L. Wang, T. Zhu, C. M. Jorgensen, K. R. Deal et al., (2018) "Structural variation and rates of genome evolution in the grass family seen through comparison of sequences of genomes greatly differing in size". The Plant Journal 95: 487-503.
Dai, X., Wang, H., Dvorak, J., Bennetzen, J., Mueller, H.-G. (2018). "Birth and Death of LTR Retrotransposons in Aegilops tauschii". Genetics
This data file contains the LTR retrotransposons in Arabidopsis lyrata.
A data frame with 397 rows and 7 columns. Each row corresponds to a unique LTR retrotransposon, and each column corresponds to a feature of the LTR-RT. The columns are:
LTR retrotransposon sequence ID
Length of each LTR
Number of mismatches
Divergence, as defined by (# of mismatches) / (LTR length)
Super family membership
LTR retrotransposon Family ID
Family name matched in the LTR-RT families of A. thaliana
Lamesch, Philippe, Tanya Z. Berardini, Donghui Li, David Swarbreck, Christopher Wilks, Rajkumar Sasidharan, Robert Muller et al. "The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools." Nucleic acids research 40, no. D1 (2011): D1202-D1210.
Dai, X., Wang, H., Dvorak, J., Bennetzen, J., Mueller, H.-G. (2018+). "Birth and Death of LTR Retrotransposons in Aegilops tauschii"
Given the number of mismatches and element lengths for an LTR retrotransposon family, estimate the age distribution, insertion rate, and deletion rates.
EstDynamics(mismatch, len, r = 0.013, perturb = 2, rateRange = NULL, plotFit = FALSE, plotSensitivity = FALSE, pause = plotFit && plotSensitivity, main = sprintf("n = %d", n)) EstDynamics2(mismatch, len, r = 0.013, nTrial = 10L, perturb = 2, rateRange = NULL, plotFit = FALSE, plotSensitivity = FALSE, pause = plotFit && plotSensitivity, ...)
EstDynamics(mismatch, len, r = 0.013, perturb = 2, rateRange = NULL, plotFit = FALSE, plotSensitivity = FALSE, pause = plotFit && plotSensitivity, main = sprintf("n = %d", n)) EstDynamics2(mismatch, len, r = 0.013, nTrial = 10L, perturb = 2, rateRange = NULL, plotFit = FALSE, plotSensitivity = FALSE, pause = plotFit && plotSensitivity, ...)
mismatch |
A vector containing the number of mismatches. |
len |
A vector containing the length of each element. |
r |
Mutation rate (substitutions/(million year * site)) used in the calculation. |
perturb |
A scalar multiple to perturb the estimated death rate from the null hypothesis estimate. Used to generate the sensitivity analysis. |
rateRange |
A vector of death rates, an alternative to |
plotFit |
Whether to plot the distribution fits. |
plotSensitivity |
Whether to plot the sensitivity analysis. |
pause |
Whether to pause after each plot. |
main |
The title for the plot. |
nTrial |
The number of starting points for searching for the MLE. |
... |
Pass to EstDynamics |
EstDynamics
estimates the TE dynamics through fitting a negative binomial fit to the mismatch data, while EstDynamics2
uses a mixture model. For detailed implementation see References.
EstDynamics
returns a TEfit
object, containing the following fields, where the unit for time is million years ago (Mya):
pvalue |
The p-value for testing H_0: The insertion rate is uniform over time. |
ageDist |
A list containing the estimated age distributions. |
insRt |
A list containing the estimated insertion rates. |
agePeakLoc |
The maximum point (in age) of the age distribution. |
insPeakLoc |
The maximum point (in time) of the insertion rate. |
estimates |
The parameter estimates from fitting the distributions; see References |
sensitivity |
A list containing the results for the sensitivity analysis, with fields |
n |
The sample size. |
meanLen |
The mean of element length. |
meanDiv |
The mean of divergence. |
KDE |
A list containing the kernel density estimate for the mismatch data. |
logLik |
The log-likelihoods of the parametric fits. |
This function returns a TEfit2
object, containing all the above fields for TEfit
and the following:
estimates2 |
The parameter estimates from fitting the mixture distribution. |
ageDist2 |
The estimated age distribution from fitting the mixture distribution. |
insRt2 |
The estimated insertion rate from fitting the mixture distribution. |
agePeakLoc2 |
Maximum point(s) for the age distribution. |
insPeakLoc2 |
Maximum point(s) for the insertion rate. |
Dai, X., Wang, H., Dvorak, J., Bennetzen, J., Mueller, H.-G. (2018). "Birth and Death of LTR Retrotransposons in Aegilops tauschii". Genetics
# Analyze Gypsy family 24 (Nusif) data(AetLTR) dat <- subset(AetLTR, GroupID == 24 & !is.na(Chr)) set.seed(1) res1 <- EstDynamics(dat$Mismatch, dat$UngapedLen, plotFit=TRUE, plotSensitivity=FALSE, pause=FALSE) # p-value for testing a uniform insertion rate res1$pvalue # Use a mixture distribution to improve fit res2 <- EstDynamics2(dat$Mismatch, dat$UngapedLen, plotFit=TRUE) # A larger number of trials is recommended to achieve the global MLE ## Not run: res3 <- EstDynamics2(dat$Mismatch, dat$UngapedLen, plotFit=TRUE, nTrial=1000L) ## End(Not run)
# Analyze Gypsy family 24 (Nusif) data(AetLTR) dat <- subset(AetLTR, GroupID == 24 & !is.na(Chr)) set.seed(1) res1 <- EstDynamics(dat$Mismatch, dat$UngapedLen, plotFit=TRUE, plotSensitivity=FALSE, pause=FALSE) # p-value for testing a uniform insertion rate res1$pvalue # Use a mixture distribution to improve fit res2 <- EstDynamics2(dat$Mismatch, dat$UngapedLen, plotFit=TRUE) # A larger number of trials is recommended to achieve the global MLE ## Not run: res3 <- EstDynamics2(dat$Mismatch, dat$UngapedLen, plotFit=TRUE, nTrial=1000L) ## End(Not run)
Implements the master gene model in Marchani et al (2009)
MasterGene(mismatch, len, r = 0.013, plotFit = FALSE, main = sprintf("n = %d", n))
MasterGene(mismatch, len, r = 0.013, plotFit = FALSE, main = sprintf("n = %d", n))
mismatch |
A vector containing the number of mismatches. |
len |
A vector containing the length of each element. |
r |
Mutation rate (substitutions/(million year * site)) used in the calculation. |
plotFit |
Whether to plot the distribution fits. |
main |
The title for the plot. |
For the method implemented see References.
This function returns various parameter estimates described in Marchani et al (2009), containing the following fields. The unit for time is million years ago (mya):
B |
The constant insertion rate |
q |
The constant excision rate |
lam |
The population growth rate |
R |
The ratio of the number of elements in class j over class j+1, which is constant by assumption |
age1 |
The age of the system under model 1 (lambda > 1) |
age2 |
The age of the system under model 2 (an initial burst followed by stasis lambda = 1) |
Marchani, Elizabeth E., Jinchuan Xing, David J. Witherspoon, Lynn B. Jorde, and Alan R. Rogers. "Estimating the age of retrotransposon subfamilies using maximum likelihood." Genomics 94, no. 1 (2009): 78-82.
# Analyze Gypsy family 24 (Nusif) data(AetLTR) dat <- subset(AetLTR, GroupID == 24 & !is.na(Chr)) res2 <- MasterGene(dat$Mismatch, dat$UngapedLen, plotFit=TRUE)
# Analyze Gypsy family 24 (Nusif) data(AetLTR) dat <- subset(AetLTR, GroupID == 24 & !is.na(Chr)) res2 <- MasterGene(dat$Mismatch, dat$UngapedLen, plotFit=TRUE)
Implements the matrix model in Promislow et al (1999)
MatrixModel(mismatch, len, nsolo, r = 0.013, plotFit = FALSE, main = sprintf("n = %d", n))
MatrixModel(mismatch, len, nsolo, r = 0.013, plotFit = FALSE, main = sprintf("n = %d", n))
mismatch |
A vector containing the number of mismatches. |
len |
A vector containing the length of each element. |
nsolo |
An integer giving the number of solo elements. |
r |
Mutation rate (substitutions/(million year * site)) used in the calculation. |
plotFit |
Whether to plot the distribution fits. |
main |
The title for the plot. |
For the method implemented see References.
This function returns various parameter estimates described in Promislow et al. (1999), containing the following fields. The unit for time is million years ago (Mya):
B |
The constant insertion rate |
q |
The constant excision rate |
lam |
The population growth rate |
R |
The ratio of the number of elements in class j over class j+1, which is constant by assumption |
age1 |
The age of the system under model 1 (lambda > 1) |
age2 |
The age of the system under model 2 (an initial burst followed by stasis lambda = 1) |
Promislow, D., Jordan, K. and McDonald, J. "Genomic demography: a life-history analysis of transposable element evolution." Proceedings of the Royal Society of London B: Biological Sciences 266, no. 1428 (1999): 1555-1560.
# Analyze Gypsy family 24 (Nusif) data(AetLTR) dat <- subset(AetLTR, GroupID == 24 & !is.na(Chr)) res1 <- MatrixModel(dat$Mismatch, dat$UngapedLen, nsolo=450, plotFit=TRUE)
# Analyze Gypsy family 24 (Nusif) data(AetLTR) dat <- subset(AetLTR, GroupID == 24 & !is.na(Chr)) res1 <- MatrixModel(dat$Mismatch, dat$UngapedLen, nsolo=450, plotFit=TRUE)
Calcualte the KL divergence of a negative binomial fit to the mismatch data.
nbLackOfFitKL(res)
nbLackOfFitKL(res)
res |
A TEfit object. |
# Analyze Gypsy family 24 (Nusif) data(AetLTR) dat <- subset(AetLTR, GroupID == 24 & !is.na(Chr)) set.seed(1) res1 <- EstDynamics(dat$Mismatch, dat$UngapedLen, plotFit=TRUE, plotSensitivity=FALSE, pause=FALSE) nbLackOfFitKL(res1)
# Analyze Gypsy family 24 (Nusif) data(AetLTR) dat <- subset(AetLTR, GroupID == 24 & !is.na(Chr)) set.seed(1) res1 <- EstDynamics(dat$Mismatch, dat$UngapedLen, plotFit=TRUE, plotSensitivity=FALSE, pause=FALSE) nbLackOfFitKL(res1)
Plot the age distributions or insertion rates for multiple families.
PlotFamilies(resList, type = c("insRt", "ageDist"), ...)
PlotFamilies(resList, type = c("insRt", "ageDist"), ...)
resList |
A list of TEfit/TEfit2 objects, which can be mixed |
type |
Whether to plot the insertion rates ('insRt') or the age distributions ('ageDist'). |
... |
Passed into plotting functions. |
A list of line data (plotDat) and peak locations (peakDat).
data(AetLTR) copia3 <- subset(AetLTR, GroupID == 3 & !is.na(Chr)) gypsy24 <- subset(AetLTR, GroupID == 24 & !is.na(Chr)) res3 <- EstDynamics(copia3$Mismatch, copia3$UngapedLen) res24 <- EstDynamics2(gypsy24$Mismatch, gypsy24$UngapedLen) # Plot insertion rates PlotFamilies(list(`Copia 3`=res3, `Gypsy 24`=res24)) # Plot age distributions PlotFamilies(list(`Copia 3`=res3, `Gypsy 24`=res24), type='ageDist')
data(AetLTR) copia3 <- subset(AetLTR, GroupID == 3 & !is.na(Chr)) gypsy24 <- subset(AetLTR, GroupID == 24 & !is.na(Chr)) res3 <- EstDynamics(copia3$Mismatch, copia3$UngapedLen) res24 <- EstDynamics2(gypsy24$Mismatch, gypsy24$UngapedLen) # Plot insertion rates PlotFamilies(list(`Copia 3`=res3, `Gypsy 24`=res24)) # Plot age distributions PlotFamilies(list(`Copia 3`=res3, `Gypsy 24`=res24), type='ageDist')
Print a TEfit or TEfit2 object
## S3 method for class 'TEfit' print(x, ...) ## S3 method for class 'TEfit2' print(x, ...)
## S3 method for class 'TEfit' print(x, ...) ## S3 method for class 'TEfit2' print(x, ...)
x |
A TEfit or TEfit2 object |
... |
Not used |
Create sensitivity plots of a few families to investigate different death rate scenarios
SensitivityPlot(resList, col, xMax, markHalfPeak = FALSE, famLegend = TRUE, rLegend = names(resList), ...)
SensitivityPlot(resList, col, xMax, markHalfPeak = FALSE, famLegend = TRUE, rLegend = names(resList), ...)
resList |
A list of families returned by |
col |
A vector of colors |
xMax |
The maximum of the x-axis |
markHalfPeak |
Whether to mark the time points with half-intensity |
famLegend |
Whether to create legend for families |
rLegend |
Text for the legend for families |
... |
Passed into |
data(AetLTR) copia3 <- subset(AetLTR, GroupID == 3 & !is.na(Chr)) copia9 <- subset(AetLTR, GroupID == 9 & !is.na(Chr)) res3 <- EstDynamics(copia3$Mismatch, copia3$UngapedLen) res9 <- EstDynamics(copia9$Mismatch, copia9$UngapedLen) SensitivityPlot(list(`Copia 3`=res3, `Copia 9`=res9))
data(AetLTR) copia3 <- subset(AetLTR, GroupID == 3 & !is.na(Chr)) copia9 <- subset(AetLTR, GroupID == 9 & !is.na(Chr)) res3 <- EstDynamics(copia3$Mismatch, copia3$UngapedLen) res9 <- EstDynamics(copia9$Mismatch, copia9$UngapedLen) SensitivityPlot(list(`Copia 3`=res3, `Copia 9`=res9))
TE package for analyzing insertion/deletion dynamics for transposable elements
Provides functions to estimate the insertion and deletion rates of
transposable element (TE) families. The estimation of insertion rate
consists of an improved estimate of the age distribution that takes into
account random mutations, and an adjustment by the deletion rate. This
package includes functions EstDynamics
and EstDynamics2
for
analyzing the TE divergence, and visualization functions such as
PlotFamilies
and SensitivityPlot
.
This package implements the methods proposed in Dai et al (2018+).
Xiongtao Dai [email protected], Hao Wang Jan Dvorak Jeffrey Bennetzen Hans-Georg Mueller
Maintainer: Xiongtao Dai [email protected]
Luo, Ming-Cheng, et al. (2017) "Genome sequence of the progenitor of the wheat D genome Aegilops tauschii." Nature 551.7681.
Dai, X., Wang, H., Dvorak, J., Bennetzen, J., Mueller, H.-G. (2018). "Birth and Death of LTR Retrotransposons in Aegilops tauschii". Genetics