The function generates the input data needed for population genetics analyses using whole genome SNPs data genotyped from the malaria parasite.

get_snpdata(
  vcf_file = NULL,
  meta_file = NULL,
  output_dir = NULL,
  gof = NULL,
  gff = NULL,
  num_threads = 4L
)

Arguments

vcf_file

The path to the input VCF file (required)

meta_file

The path to the sample metadata file (required)

output_dir

The path to the folder where the output and temporary files will be stored (optional)

gof

The gene ontology annotation file (optional). If not provided, the default file obtained from the PlasmoDB # nolint: line_length_linter will be used

gff

The gene annotation file (optional). If not provided, the default file obtained from the PlasmoDB # nolint: line_length_linter will be used

num_threads

The number of threads to be used when reading in the data from the VCF file. default is 4

Value

An object of class SNPdata with the following 4 elements:

  1. meta: A data.frame that contains the sample's metadata

  2. details: A data.frame with the SNPs genomic coordinates, the fraction of missing data per SNP and the names and descriptions of the gene on which they are located.

  3. GT: An integer matrix with the genotype data. 0='reference allele', 1='alternate allele', 2='mixed allele', NA='missing allele'

  4. vcf: the full path to the VCF file from which the data is generated.

Examples

if (FALSE) { # \dontrun{
  snpdata <- get_snpdata(
    vcf_file   = system.file("extdata", "Input_Data.vcf.gz",
                             package = "mpbr"),
    meta_file  = system.file("extdata", "SampleMetadata.RDS",
                             package = "mpbr"),
    output_dir = tempdir()
 )
} # }