# Importing filtered genotypic data
matrix <- read.table("data/FilteredBarley.txt", sep = "\t", header = TRUE, row.names = 1, check.names = FALSE)
# SNP matrix has to have individuals in rows and markers as columns for the posterior functions
matrix <- t(matrix)
# Importing metadata
metadata <- read_excel("data/BarleyMetadata.xlsx")14 Module 3.4: Genetic Diversity
SNP data provides us with a genome-wide view of variation within individuals and populations. Calculating certain diversity parameters from this data helps us better understand the genetic diversity held within a population and between its subpopulations. These can later be used for diversity-based plant breeding.
14.1 Diversity parameters
We can easily calculate the most relevant diversity parameters using the genDivSNPReady() from our package.
genDivSNPReady(geno, plots = FALSE): Returns diversity parameters calculated with snpReady package.
geno: our genotype matrixplots: defaults to FALSE, if TRUE, a graphical output of the results is produced
The function returns a list object with two data frames, one with the diversity parameters for each marker, and one with the diversity parameters for each accession. If plots = TRUE, a third object is generated with the different plots.
# Obtaining genetic diversity parameters
SNPReadyParams <- genDivSNPReady(matrix, plots = TRUE)
# Printing marker diversity parameters
SNPReadyParams$markers# Printing individual's diversity parameters
SNPReadyParams$accessions# Diversity plots
SNPReadyParams$plots14.2 By population
These parameters can also be calculated by population in order to compare the diversity between them. We will use the HeBySubgroups() function from our package.
HeBySubgroups(geno, subgroups, plot = FALSE): returns expected heterozygosity (He) by groups, including an optional plot
geno: our genotype matrixsubgroups: a vector with our factor informationplots: defaults to FALSE, if TRUE, a graphical output of the results is produced
# Defining our populations from country information
popSet <- as.factor(metadata$countryOfOriginCode[metadata$Individual %in% rownames(matrix)])
# Calculating parameters by population
He <- HeBySubgroups(matrix, popSet, plot = TRUE)
# Plotting results
He$plot# Printing results
He$df14.3 AMOVA
AMOVA or Analysis of Molecular Variance can be run from a genetic distance matrix to evaluate genetic variation within populations, between populations and among populations. It helps us understand the structure of variation in our sample. We will be using the genDistPop() and AMOVA() functions from our package for this. They use frameworks from adegenet and poppr to carry out the AMOVA.
genDistPop(geno, subgroups, method = 1, PCoA = FALSE): returns a genetic distance matrix and optional Principal Coordinate Analysis from the distance matrix.
geno: our genotype matrixsubgroups: a vector with our factor informationmethod: defaults to 1 (Nei’s distance), allows for values 1-5 (Nei, Edwards, Reynolds, Rogers, Provesti)PCoA: defaults to FALSE, if TRUE, performs a principal coordinates analysis of a Euclidean distance matrix
AMOVA():
geno: our genotype matrixsubgroups: a vector with our factor information
# Calculating our genetic distance matrix and PCoA
genDist <- genDistPop(matrix, popSet, PCoA = TRUE)
Converting data from a genind to a genpop object...
...done.
# Printing results
genDist$genDist
CHN ETH TUR
CHN 0.0000000
ETH 0.1785492 0.0000000
TUR 0.1075913 0.1388646 0.0000000
$PCoA
Duality diagramm
class: pco dudi
$call: dudi.pco(d = genDist, scannf = FALSE, nf = 3)
$nf: 2 axis-components saved
$rank: 2
eigen values: 0.005458 0.001513
vector length mode content
1 $cw 2 numeric column weights
2 $lw 3 numeric row weights
3 $eig 2 numeric eigen values
data.frame nrow ncol content
1 $tab 3 2 modified array
2 $li 3 2 row coordinates
3 $l1 3 2 row normed scores
4 $co 2 2 column coordinates
5 $c1 2 2 column normed scores
other elements: NULL
$PCoAPlot
# Running AMOVA
amovaResult <- AMOVA(matrix, popSet)
Replaced 248144 missing values.
Warning in validityMethod(object): @tab does not contain integers; as of
adegenet_2.0-0, numeric values are no longer used
Warning in validityMethod(object): @tab does not contain integers; as of
adegenet_2.0-0, numeric values are no longer used
No missing values detected.
# Printing results
amovaResult$call
ade4::amova(samples = xtab, distances = xdist, structures = xstruct)
$results
Df Sum Sq Mean Sq
Between samples 2 92405.09 46202.5455
Within samples 485 346712.00 714.8701
Total 487 439117.09 901.6778
$componentsofcovariance
Sigma %
Variations Between samples 291.6988 28.97952
Variations Within samples 714.8701 71.02048
Total variations 1006.5689 100.00000
$statphi
Phi
Phi-samples-total 0.2897952