Missing genotype imputation is performed based on the MAF at any given locus. Imputation will be done nsim times and imputed data with highest correlation between MAF from raw data and MAF from imputed data will be retained

impute_missing_genotypes(snpdata, genotype = "Phased", nsim = 100L)

Arguments

snpdata

a SNPdata object

genotype

the genotype table from which the missing data will be imputed. This can be either the raw genotype matrix (GT) or the phased genotype matrix (Phased)

nsim

an integer that represents the number of simulations

Value

a SNPdata object with an additional table named as: "Phased_Imputed" if the phased data was used for imputation or "Imputed" if the imputation was done on the raw genotypes.

Details

When both alleles are not supported by any read or the total number of reads supporting both alleles at a given site is < 5, the genotype will be phased based on a Bernoulli distribution using the MAF as a parameter. Similarly, when the total number of reads is > 5 and the number of reads supporting one of the allele is not 2 times the number of the other allele, the genotype is phased using a Bernoulli distribution.

Examples

if (FALSE) { # \dontrun{
  snpdata <- impute_missing_genotypes(snpdata)
 } # }