using CSV # for reading in delimited files
using DataFrames # for storing data as type DataFrame
using Haversine # for calculating Great Circle (haversine) distances between sites
using Statistics # for mean() function
using MultivariateStats # for Principal Coordinates Analysis (multidimensional scaling)
using DelimitedFiles # for reading delimited files (the genotypic data)
using Impute # for imputing missing genotypes
using JLD2 # for saving data
using CairoMakie # for plots
using PrettyTables # for printing nice tables to REPL
activate!() # this makes CairoMakie the main package for figures (in case another already loaded) CairoMakie.
Greenish Warbler Genomic Analysis
This page covers initial setup, loading of the genotypic information, saving data for other pages, and some figure production.
Citation
The scripts, data, and figures shown in this website were used as the basis for the paper listed below, which should be cited as the source of information from this website:
Irwin, D., S. Bensch, C. Charlebois, G. David, A. Geraldes, S.K. Gupta, B. Harr, P. Holt, J.H. Irwin, V.V. Ivanitskii, I.M. Marova, Y. Niu, S. Seneviratne, A. Singh, Y. Wu, S. Zhang, T.D. Price. 2025. The distribution and dispersal of large haploblocks in a superspecies. Molecular Ecology, in press.
A note about plots in this document
The plots shown below may different somewhat in appearance between the version produced by Quarto (i.e., in this published document) and the version you would get if you run this code without using Quarto. In particular, the dimensions and font sizes of labels and titles may differ. So if you want the versions identical to those used in the paper, run the code directly in the Julia REPL (or using an environment such as VS Code) without using Quarto.
In the rendered (.html
) version of this Quarto notebook, each figure may be accompanied by a warning caused by an interaction between Quarto and the Makie plotting package. Ignore these warnings as they do not affect the calculations or plots.
Load packages
If running this for the first time, you will need to load packages used in the script, so run what is in this section below. It will take some time to install and precompile the packages:
import Pkg; Pkg.add("CSV")
Pkg.add("DataFrames")
Pkg.add("Haversine")
Pkg.add("Statistics")
Pkg.add("MultivariateStats")
Pkg.add("DelimitedFiles")
Pkg.add("Impute")
Pkg.add("JLD2")
Pkg.add("CairoMakie")
Pkg.add("PrettyTables")
Now actually load those packages into the Julia session:
Load my custom package GenomicDiversity
(v.0.2.2; higher will likely also work)
This is used for calculating Fst, ViSHet, etc., and making graphs such as PCA and genome-by-individual plots.
import Pkg; Pkg.add("GenomicDiversity") # gets the package from Github, via the Julia General Registry
Pkg.update("GenomicDiversity") # make sure the latest package is loaded
using GenomicDiversity
Resolving package versions...
No Changes to `~/.julia/environments/v1.11/Project.toml`
No Changes to `~/.julia/environments/v1.11/Manifest.toml`
Updating registry at `~/.julia/registries/General.toml`
No Changes to `~/.julia/environments/v1.11/Project.toml`
No Changes to `~/.julia/environments/v1.11/Manifest.toml`
Test Julia:
= 1; y = 2; z = x+y
x println("z = ", z)
z = 3
(If Quarto is calling Julia properly, you will see z = 3
as the output of the code block above.)
Choose working directory (adjust as appropriate):
= pwd() # this gets the starting working directory, for later use
repoDirectory = "/Users/darrenirwin/Dropbox/Darren's current work/"
dataDirectory cd(dataDirectory)
Load the chromosome (scaffold) lengths
The code below will load a fasta index file for the reference genome, and then make a dictionary (a look-up table) where the key is the scaffold name and value is the chromosome length. This will be used in the genotype-by-individual figures.
cd(repoDirectory)
= "metadata/GW2022ref.fa.fai" # a fasta index file for the reference genome
scaffold_info_filepath = DataFrame(CSV.File(scaffold_info_filepath; header=["name", "length", "offset", "linebases", "linewidth"], types=[String, Int, Int, Int, Int]))
scaffold_info = Dict(scaffold_info.name .=> scaffold_info.length)
scaffold_lengths cd(dataDirectory)
Set up file directories and names
# choose path and filename for the 012 files
= "GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome"
baseName = ".max2allele_noindel.vcf.maxmiss"
filenameTextMiddle # indicate percent threshold for missing genotypes for each SNP--
# this was set by earlier filtering, and is just a record-keeper for the filenames:
= 60
missingGenotypeThreshold = ".MQ20.lowHet.tab"
filenameTextEnd = ".Jan2025." # choose a tag name for this analysis
tagName # indicate name of metadata file, a text file with these column headings:
# ID location group Fst_group plot_order
= "GW_genomics_2022_with_new_genome/GW_all4plates.Fst_groups.txt"
metadataFile # assemble paths and filenames for data files:
= string(baseName, filenameTextMiddle, missingGenotypeThreshold, filenameTextEnd, ".012.indv")
individualsFile = string(baseName, filenameTextMiddle, missingGenotypeThreshold, filenameTextEnd, ".012.pos")
positionsFile = string(baseName, filenameTextMiddle, missingGenotypeThreshold, filenameTextEnd, ".012minus1") genotypesFile
"GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.max2allele_noindel.vcf.maxmiss60.MQ20.lowHet.tab.012minus1"
List the scaffolds to be imputed (for possible later inclusion in PCA)
= vec(["gw2",
chromosomes_to_process "gw1",
"gw3",
"gwZ",
"gw1A",
"gw4",
"gw5",
"gw7",
"gw6",
"gw8",
"gw9",
"gw11",
"gw12",
"gw10",
"gw13",
"gw14",
"gw18",
"gw20",
"gw15",
"gw1B",
"gws100",
"gw17",
"gw19",
"gws101",
"gw4A",
"gw21",
"gw26",
"gws102",
"gw23",
"gw25",
"gws103",
"gw22",
"gws104",
"gw28",
"gw27",
"gw24",
"gws105",
"gws106",
"gws107",
"gws108",
"gws109",
"gws110",
"gws112"]);
Set up function to correct the names of GBS runs
(This will be used later.)
function correctNames(metadataColumn)
= replace(metadataColumn, "GW_Armando_plate1_TTGW05_rep2" => "GW_Armando_plate1_TTGW05r2",
metadataColumn_corrected "GW_Lane5_NA3-3ul" => "GW_Lane5_NA3",
"GW_Armando_plate1_TTGW_15_05" => "GW_Armando_plate1_TTGW-15-05",
"GW_Armando_plate1_TTGW_15_07" => "GW_Armando_plate1_TTGW-15-07",
"GW_Armando_plate1_TTGW_15_08" => "GW_Armando_plate1_TTGW-15-08",
"GW_Armando_plate1_TTGW_15_09" => "GW_Armando_plate1_TTGW-15-09",
"GW_Armando_plate1_TTGW_15_01" => "GW_Armando_plate1_TTGW-15-01",
"GW_Armando_plate1_TTGW_15_02" => "GW_Armando_plate1_TTGW-15-02",
"GW_Armando_plate1_TTGW_15_03" => "GW_Armando_plate1_TTGW-15-03",
"GW_Armando_plate1_TTGW_15_04" => "GW_Armando_plate1_TTGW-15-04",
"GW_Armando_plate1_TTGW_15_06" => "GW_Armando_plate1_TTGW-15-06",
"GW_Armando_plate1_TTGW_15_10" => "GW_Armando_plate1_TTGW-15-10",
"GW_Armando_plate2_TTGW_15_01" => "GW_Armando_plate2_TTGW-15-01",
"GW_Armando_plate2_TTGW_15_02" => "GW_Armando_plate2_TTGW-15-02",
"GW_Armando_plate2_TTGW_15_03" => "GW_Armando_plate2_TTGW-15-03",
"GW_Armando_plate2_TTGW_15_04" => "GW_Armando_plate2_TTGW-15-04",
"GW_Armando_plate2_TTGW_15_06" => "GW_Armando_plate2_TTGW-15-06",
"GW_Armando_plate2_TTGW_15_10" => "GW_Armando_plate2_TTGW-15-10")
end
correctNames (generic function with 1 method)
Load the genomic data and metadata into a GenoData
object:
= GenomicDiversity.loadGenoData(dataDirectory * metadataFile, dataDirectory * individualsFile, dataDirectory * genotypesFile, dataDirectory * positionsFile)
GW_GenoData # look at the genotype matrix:
GW_GenoData.genotypes
Have read in genotypic data at 2431709 loci for 310 individuals.
310×2431709 Matrix{Int16}:
-1 -1 -1 0 0 0 0 0 0 … 0 0 0 2 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 -1 -1 -1 -1 -1 -1
0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 -1 -1
0 0 0 0 0 0 0 0 0 0 0 0 -1 -1 0 0 -1 -1
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 -1 0
0 1 0 0 0 0 0 0 0 … -1 0 -1 2 2 0 0 -1 -1
0 0 0 0 0 0 0 0 0 -1 -1 -1 0 -1 0 0 -1 -1
0 0 0 0 0 0 0 0 0 -1 -1 -1 0 -1 0 0 -1 -1
0 0 0 0 0 0 0 0 0 -1 -1 -1 0 -1 0 0 -1 -1
0 0 0 0 0 0 0 0 0 -1 -1 -1 0 -1 0 0 -1 -1
0 0 0 0 0 0 0 0 0 … 0 0 0 2 0 0 0 -1 -1
0 0 0 0 0 0 0 0 0 0 0 -1 0 -1 0 0 -1 -1
0 0 0 0 0 0 0 0 0 0 0 0 -1 0 -1 -1 -1 -1
⋮ ⋮ ⋱ ⋮ ⋮
0 0 0 -1 -1 -1 -1 -1 -1 0 0 0 -1 0 0 0 -1 -1
0 0 0 -1 -1 -1 -1 -1 -1 0 0 0 2 0 0 0 0 0
-1 -1 -1 -1 -1 -1 -1 -1 -1 … 0 0 0 0 -1 -1 -1 -1 -1
0 0 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 -1 0 0 -1 0
0 0 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 -1 0 0 0 0
0 0 0 -1 -1 -1 -1 -1 -1 0 0 0 2 0 0 0 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 … -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 0 0 -1 -1
The GenoData
object (named GW_GenoData
) produced by the above contains three sub-objects: * The genotype matrix .genotypes
(shown above) * The metadata .indInfo
(corresponding to rows of the matrix) * The positions .positions
of SNPs in the reference genome (corresponding to columns of the matrix)
The function above checks that the rownames in the ind
column of the individualsFile
matches those in the ID
column of the metadataFile
, and presents a warning if they don’t.
Look at metadata:
GW_GenoData.indInfo
Row | ind | ID | location | group | Fst_group | plot_order |
---|---|---|---|---|---|---|
String | String31 | String7 | String15 | String15 | Float64 | |
1 | GW_Armando_plate1_AB1 | GW_Armando_plate1_AB1 | AB | vir | vir | 20.01 |
2 | GW_Armando_plate1_JF07G02 | GW_Armando_plate1_JF07G02 | ST | plumb | plumb | 108.0 |
3 | GW_Armando_plate1_JF07G03 | GW_Armando_plate1_JF07G03 | ST | plumb | plumb | 109.0 |
4 | GW_Armando_plate1_JF07G04 | GW_Armando_plate1_JF07G04 | ST | plumb | plumb | 110.0 |
5 | GW_Armando_plate1_JF08G02 | GW_Armando_plate1_JF08G02 | ST | plumb | plumb | 111.0 |
6 | GW_Armando_plate1_JF09G01 | GW_Armando_plate1_JF09G01 | ST | plumb | plumb | 112.0 |
7 | GW_Armando_plate1_JF09G02 | GW_Armando_plate1_JF09G02 | ST | plumb | plumb | 113.0 |
8 | GW_Armando_plate1_JF10G03 | GW_Armando_plate1_JF10G03 | ST | plumb_vir | plumb_vir | 170.0 |
9 | GW_Armando_plate1_JF11G01 | GW_Armando_plate1_JF11G01 | ST | plumb | plumb | 114.0 |
10 | GW_Armando_plate1_JF12G01 | GW_Armando_plate1_JF12G01 | ST | plumb | plumb | 115.0 |
11 | GW_Armando_plate1_JF12G02 | GW_Armando_plate1_JF12G02 | ST | plumb | plumb | 116.0 |
12 | GW_Armando_plate1_JF12G04 | GW_Armando_plate1_JF12G04 | ST_vi | vir | vir | 24.001 |
13 | GW_Armando_plate1_JF13G01 | GW_Armando_plate1_JF13G01 | ST | plumb | plumb | 117.0 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
299 | GW_Liz_GBS_Liz6204 | GW_Liz_GBS_Liz6204 | ML | lud_chick | lud_ML | 51.59 |
300 | GW_Liz_GBS_Liz6461 | GW_Liz_GBS_Liz6461 | ML | lud | lud_ML | 51.6 |
301 | GW_Liz_GBS_Liz6472 | GW_Liz_GBS_Liz6472 | ML | lud | lud_ML | 51.61 |
302 | GW_Liz_GBS_Liz6478 | GW_Liz_GBS_Liz6478 | ML | lud | lud_ML | 51.62 |
303 | GW_Liz_GBS_Liz6766 | GW_Liz_GBS_Liz6766 | ML | lud | lud_ML | 51.63 |
304 | GW_Liz_GBS_Liz6776 | GW_Liz_GBS_Liz6776 | ML | lud | lud_ML | 51.64 |
305 | GW_Liz_GBS_Liz6794 | GW_Liz_GBS_Liz6794 | ML | lud | lud_ML | 51.65 |
306 | GW_Liz_GBS_P_fusc | GW_Liz_GBS_P_fusc | fusc | fusc | fusc | 201.0 |
307 | GW_Liz_GBS_P_h_man | GW_Liz_GBS_P_h_man | hmand | hmand | hmand | 202.0 |
308 | GW_Liz_GBS_P_humei | GW_Liz_GBS_P_humei | hume | hume | hume | 203.0 |
309 | GW_Liz_GBS_P_inor | GW_Liz_GBS_P_inor | inor | inor | inor | 204.0 |
310 | GW_Liz_GBS_S_burk | GW_Liz_GBS_S_burk | burk | burk | burk | 205.0 |
Look at SNP positions:
GW_GenoData.positions
Row | chrom | position |
---|---|---|
String | Int64 | |
1 | gws103 | 809376 |
2 | gws103 | 809378 |
3 | gws103 | 809384 |
4 | gws103 | 893285 |
5 | gws103 | 893292 |
6 | gws103 | 893294 |
7 | gws103 | 893296 |
8 | gws103 | 893304 |
9 | gws103 | 893306 |
10 | gws103 | 893315 |
11 | gws103 | 893335 |
12 | gws103 | 893339 |
13 | gws103 | 893340 |
⋮ | ⋮ | ⋮ |
2431698 | gws913 | 2201 |
2431699 | gws913 | 2206 |
2431700 | gws913 | 2229 |
2431701 | gws913 | 2256 |
2431702 | gws913 | 2268 |
2431703 | gws913 | 2271 |
2431704 | gws913 | 2315 |
2431705 | gws913 | 2402 |
2431706 | gws913 | 6596 |
2431707 | gws913 | 6600 |
2431708 | gws915 | 7705 |
2431709 | gws915 | 7787 |
Filtering
Filter out duplicate runs
Some samples were run multiple times, in which case at least one is indicated with _rep
in Fst_group
column. For each individual we removed all except one of the runs from the dataset. In general we kept the one with the most reads.
= occursin.("_rep", GW_GenoData.indInfo.Fst_group)
selection = GenomicDiversity.filterInds(GW_GenoData, selection); GW_GenoData_indFiltered_1
Specific individuals filtered out:
11-element Vector{String}:
"GW_Armando_plate1_TTGW05_rep1"
"GW_Armando_plate2_IL2"
"GW_Armando_plate2_LN11"
"GW_Armando_plate2_TTGW05_rep3"
"GW_Armando_plate2_TTGW05_rep4"
"GW_Lane5_AB1"
"GW_Lane5_IL2"
"GW_Lane5_LN2"
"GW_Lane5_TL3"
"GW_Lane5_UY1"
"GW_Liz_GBS_Liz5101_R"
Filter specific individuals
If there are certain individuals that we want to filter out prior to any additional analysis, we can do so here by setting filter to true
and specifying the individual run names in filter_out_inds
. Here we will filter out outgroup species (which mostly had low read depth anyway):
= ["GW_Liz_GBS_P_fusc", "GW_Liz_GBS_P_h_man", "GW_Liz_GBS_P_humei", "GW_Liz_GBS_P_inor", "GW_Liz_GBS_S_burk"]
filter_out_inds
= GenomicDiversity.filterNamedInds(GW_GenoData_indFiltered_1,
GW_GenoData_indFiltered_2 filter_out_inds);
Specific individuals filtered out:
5-element Vector{String}:
"GW_Liz_GBS_P_fusc"
"GW_Liz_GBS_P_h_man"
"GW_Liz_GBS_P_humei"
"GW_Liz_GBS_P_inor"
"GW_Liz_GBS_S_burk"
Filter individuals based on missing genotypes
Here we determine number of missing SNPs per individual, and filter out those individual datasets with more than a certain percent of missing SNPs (40% for this round):
= 40 # this is the percentage threshold
SNPmissing_percent_allowed_per_ind = size(GW_GenoData_indFiltered_2.genotypes, 2)
loci_count = loci_count * SNPmissing_percent_allowed_per_ind/100
threshold_missing = sum(GW_GenoData_indFiltered_2.genotypes .== -1, dims=2)
numMissings .= numMissings
GW_GenoData_indFiltered_2.indInfo.numMissings = vec(numMissings .> threshold_missing) # the vec command converts to BitVector rather than BitMatrix--important below
selection println("Filtering out these individuals based on too many missing genotypes: ")
= GW_GenoData_indFiltered_2.indInfo[selection, :]
filtered_inds = GenomicDiversity.filterInds(GW_GenoData_indFiltered_2,
GW_GenoData_indFiltered_3
selection)println("Here are the remaining individuals: ")
println(GW_GenoData_indFiltered_3.indInfo)
Filtering out these individuals based on too many missing genotypes:
Specific individuals filtered out:
33-element Vector{String}:
"GW_Armando_plate1_JG08G02"
"GW_Armando_plate1_JG10G01"
"GW_Armando_plate1_NO_BC_TTGW05"
"GW_Armando_plate1_NO_DNA"
"GW_Armando_plate1_TTGW21"
"GW_Armando_plate1_TTGW71"
"GW_Armando_plate2_NO_BC_TTGW05"
"GW_Armando_plate2_NO_DNA"
"GW_Armando_plate2_TTGW15"
"GW_Lane5_AA10"
"GW_Lane5_DA6"
"GW_Lane5_LN11"
"GW_Liz_GBS_Liz5101"
⋮
"GW_Liz_GBS_Liz5172"
"GW_Liz_GBS_Liz5174"
"GW_Liz_GBS_Liz5176"
"GW_Liz_GBS_Liz5177"
"GW_Liz_GBS_Liz5180"
"GW_Liz_GBS_Liz5186"
"GW_Liz_GBS_Liz5187"
"GW_Liz_GBS_Liz5192"
"GW_Liz_GBS_Liz5195"
"GW_Liz_GBS_Liz6012"
"GW_Liz_GBS_Liz6203"
"GW_Liz_GBS_Liz6766"
Here are the remaining individuals:
261×7 DataFrame
Row │ ind ID location group Fst_group plot_order numMissings
│ String String31 String7 String15 String15 Float64 Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ GW_Armando_plate1_AB1 GW_Armando_plate1_AB1 AB vir vir 20.01 99270
2 │ GW_Armando_plate1_JF07G02 GW_Armando_plate1_JF07G02 ST plumb plumb 108.0 181208
3 │ GW_Armando_plate1_JF07G03 GW_Armando_plate1_JF07G03 ST plumb plumb 109.0 116334
4 │ GW_Armando_plate1_JF07G04 GW_Armando_plate1_JF07G04 ST plumb plumb 110.0 116065
5 │ GW_Armando_plate1_JF08G02 GW_Armando_plate1_JF08G02 ST plumb plumb 111.0 87194
6 │ GW_Armando_plate1_JF09G01 GW_Armando_plate1_JF09G01 ST plumb plumb 112.0 237912
7 │ GW_Armando_plate1_JF09G02 GW_Armando_plate1_JF09G02 ST plumb plumb 113.0 110439
8 │ GW_Armando_plate1_JF10G03 GW_Armando_plate1_JF10G03 ST plumb_vir plumb_vir 170.0 202891
9 │ GW_Armando_plate1_JF11G01 GW_Armando_plate1_JF11G01 ST plumb plumb 114.0 112391
10 │ GW_Armando_plate1_JF12G01 GW_Armando_plate1_JF12G01 ST plumb plumb 115.0 129490
11 │ GW_Armando_plate1_JF12G02 GW_Armando_plate1_JF12G02 ST plumb plumb 116.0 130718
12 │ GW_Armando_plate1_JF12G04 GW_Armando_plate1_JF12G04 ST_vi vir vir 24.001 133969
13 │ GW_Armando_plate1_JF13G01 GW_Armando_plate1_JF13G01 ST plumb plumb 117.0 107508
14 │ GW_Armando_plate1_JF15G03 GW_Armando_plate1_JF15G03 DV plumb plumb 103.0 110933
15 │ GW_Armando_plate1_JF16G01 GW_Armando_plate1_JF16G01 DV_vi plumb_vir vir 24.041 122599
16 │ GW_Armando_plate1_JF20G01 GW_Armando_plate1_JF20G01 MB plumb plumb 94.0 102740
17 │ GW_Armando_plate1_JF22G01 GW_Armando_plate1_JF22G01 MB plumb plumb 95.0 99501
18 │ GW_Armando_plate1_JF23G01 GW_Armando_plate1_JF23G01 VB plumb plumb 98.0 113052
19 │ GW_Armando_plate1_JF23G02 GW_Armando_plate1_JF23G02 VB plumb plumb 99.0 87860
20 │ GW_Armando_plate1_JF24G02 GW_Armando_plate1_JF24G02 VB plumb plumb 100.0 88105
21 │ GW_Armando_plate1_JF26G01 GW_Armando_plate1_JF26G01 ST plumb plumb 118.0 104432
22 │ GW_Armando_plate1_JF27G01 GW_Armando_plate1_JF27G01 ST plumb plumb 119.0 96001
23 │ GW_Armando_plate1_JF29G01 GW_Armando_plate1_JF29G01 ST plumb plumb 120.0 86530
24 │ GW_Armando_plate1_JF29G02 GW_Armando_plate1_JF29G02 ST plumb plumb 121.0 101762
25 │ GW_Armando_plate1_JF29G03 GW_Armando_plate1_JF29G03 ST plumb plumb 122.0 107333
26 │ GW_Armando_plate1_JG02G02 GW_Armando_plate1_JG02G02 PR plumb plumb 145.0 81977
27 │ GW_Armando_plate1_JG02G04 GW_Armando_plate1_JG02G04 PR plumb plumb 146.0 140445
28 │ GW_Armando_plate1_JG08G01 GW_Armando_plate1_JG08G01 ST plumb plumb 123.0 131448
29 │ GW_Armando_plate1_JG12G01 GW_Armando_plate1_JG12G01 ST plumb plumb 126.0 132475
30 │ GW_Armando_plate1_JG17G01 GW_Armando_plate1_JG17G01 ST plumb_vir plumb 127.0 106045
31 │ GW_Armando_plate1_RF20G01 GW_Armando_plate1_RF20G01 BJ obs_plumb plumb_BJ 77.501 102115
32 │ GW_Armando_plate1_RF29G02 GW_Armando_plate1_RF29G02 BJ obs_plumb plumb_BJ 77.502 99695
33 │ GW_Armando_plate1_TL3 GW_Armando_plate1_TL3 TL vir vir 11.01 119043
34 │ GW_Armando_plate1_TTGW01 GW_Armando_plate1_TTGW01 MN troch_MN troch_west 53.0 80329
35 │ GW_Armando_plate1_TTGW05_rep2 GW_Armando_plate1_TTGW05_rep2 MN troch_MN troch_west 53.0 95518
36 │ GW_Armando_plate1_TTGW06 GW_Armando_plate1_TTGW06 SU lud_Sukhto lud_central 47.0 80282
37 │ GW_Armando_plate1_TTGW07 GW_Armando_plate1_TTGW07 SU lud_Sukhto lud_central 47.0 153267
38 │ GW_Armando_plate1_TTGW10 GW_Armando_plate1_TTGW10 SU lud_Sukhto lud_central 47.0 97805
39 │ GW_Armando_plate1_TTGW11 GW_Armando_plate1_TTGW11 SU lud_Sukhto lud_central 47.0 93712
40 │ GW_Armando_plate1_TTGW13 GW_Armando_plate1_TTGW13 TH lud_Thallighar lud_central 43.0 87677
41 │ GW_Armando_plate1_TTGW17 GW_Armando_plate1_TTGW17 TH lud_Thallighar lud_central 43.0 100310
42 │ GW_Armando_plate1_TTGW19 GW_Armando_plate1_TTGW19 TH lud_Thallighar lud_central 43.0 100019
43 │ GW_Armando_plate1_TTGW22 GW_Armando_plate1_TTGW22 SR lud_Sural lud_central 45.0 95151
44 │ GW_Armando_plate1_TTGW23 GW_Armando_plate1_TTGW23 SR lud_Sural lud_central 45.0 218168
45 │ GW_Armando_plate1_TTGW29 GW_Armando_plate1_TTGW29 SR lud_Sural lud_central 45.0 101068
46 │ GW_Armando_plate1_TTGW52 GW_Armando_plate1_TTGW52 NG lud_Nainaghar lud_central 49.0 89055
47 │ GW_Armando_plate1_TTGW53 GW_Armando_plate1_TTGW53 NG lud_Nainaghar lud_central 49.0 76154
48 │ GW_Armando_plate1_TTGW55 GW_Armando_plate1_TTGW55 NG lud_Nainaghar lud_central 49.0 84162
49 │ GW_Armando_plate1_TTGW57 GW_Armando_plate1_TTGW57 NG lud_Nainaghar lud_central 49.0 93413
50 │ GW_Armando_plate1_TTGW58 GW_Armando_plate1_TTGW58 NG lud_Nainaghar lud_central 49.0 102386
51 │ GW_Armando_plate1_TTGW59 GW_Armando_plate1_TTGW59 NG lud_Nainaghar lud_central 49.0 98471
52 │ GW_Armando_plate1_TTGW63 GW_Armando_plate1_TTGW63 SP lud_Spiti troch_west 55.0 98425
53 │ GW_Armando_plate1_TTGW64 GW_Armando_plate1_TTGW64 SP lud_Spiti troch_west 55.0 111234
54 │ GW_Armando_plate1_TTGW65 GW_Armando_plate1_TTGW65 SP lud_Spiti troch_west 55.0 119729
55 │ GW_Armando_plate1_TTGW66 GW_Armando_plate1_TTGW66 SP lud_Spiti troch_west 55.0 105443
56 │ GW_Armando_plate1_TTGW68 GW_Armando_plate1_TTGW68 SP lud_Spiti troch_west 55.0 98099
57 │ GW_Armando_plate1_TTGW70 GW_Armando_plate1_TTGW70 SA lud_Sathrundi lud_Sath 41.0 84930
58 │ GW_Armando_plate1_TTGW72 GW_Armando_plate1_TTGW72 SA lud_Sathrundi lud_Sath 41.0 92370
59 │ GW_Armando_plate1_TTGW74 GW_Armando_plate1_TTGW74 SA lud_Sathrundi lud_Sath 41.0 325539
60 │ GW_Armando_plate1_TTGW78 GW_Armando_plate1_TTGW78 SA lud_Sathrundi lud_Sath 41.0 96540
61 │ GW_Armando_plate1_TTGW_15_05 GW_Armando_plate1_TTGW_15_05 SR lud_Sural lud_central 45.0 173819
62 │ GW_Armando_plate1_TTGW_15_07 GW_Armando_plate1_TTGW_15_07 SR lud_Sural lud_central 45.0 84861
63 │ GW_Armando_plate1_TTGW_15_08 GW_Armando_plate1_TTGW_15_08 SR lud_Sural lud_central 45.0 88322
64 │ GW_Armando_plate1_TTGW_15_09 GW_Armando_plate1_TTGW_15_09 SR lud_Sural lud_central 45.0 87883
65 │ GW_Armando_plate1_UY1 GW_Armando_plate1_UY1 UY plumb plumb 87.0 97598
66 │ GW_Armando_plate2_JE31G01 GW_Armando_plate2_JE31G01 VB_vi vir_misID vir 24.002 101749
67 │ GW_Armando_plate2_JF03G01 GW_Armando_plate2_JF03G01 ST_vi vir_misID vir 24.003 214510
68 │ GW_Armando_plate2_JF03G02 GW_Armando_plate2_JF03G02 VB_vi vir_misID vir 24.004 296518
69 │ GW_Armando_plate2_JF07G01 GW_Armando_plate2_JF07G01 ST plumb plumb 128.0 104578
70 │ GW_Armando_plate2_JF08G04 GW_Armando_plate2_JF08G04 ST plumb plumb 129.0 101010
71 │ GW_Armando_plate2_JF10G02 GW_Armando_plate2_JF10G02 ST plumb plumb 130.0 95393
72 │ GW_Armando_plate2_JF11G02 GW_Armando_plate2_JF11G02 ST plumb plumb 131.0 89277
73 │ GW_Armando_plate2_JF12G03 GW_Armando_plate2_JF12G03 ST plumb plumb 132.0 118134
74 │ GW_Armando_plate2_JF12G05 GW_Armando_plate2_JF12G05 ST plumb plumb 133.0 85768
75 │ GW_Armando_plate2_JF13G02 GW_Armando_plate2_JF13G02 ST plumb plumb 134.0 113146
76 │ GW_Armando_plate2_JF14G01 GW_Armando_plate2_JF14G01 DV plumb plumb 104.0 97185
77 │ GW_Armando_plate2_JF14G02 GW_Armando_plate2_JF14G02 DV plumb plumb 105.0 94078
78 │ GW_Armando_plate2_JF15G01 GW_Armando_plate2_JF15G01 DV plumb plumb 106.0 89126
79 │ GW_Armando_plate2_JF15G02 GW_Armando_plate2_JF15G02 DV plumb plumb 107.0 101787
80 │ GW_Armando_plate2_JF16G02 GW_Armando_plate2_JF16G02 DV_vi plumb_vir vir 24.042 125147
81 │ GW_Armando_plate2_JF19G01 GW_Armando_plate2_JF19G01 MB plumb plumb 96.0 94354
82 │ GW_Armando_plate2_JF20G02 GW_Armando_plate2_JF20G02 MB plumb plumb 97.0 78032
83 │ GW_Armando_plate2_JF24G01 GW_Armando_plate2_JF24G01 VB plumb plumb 101.0 113526
84 │ GW_Armando_plate2_JF24G03 GW_Armando_plate2_JF24G03 ST plumb plumb 135.0 77983
85 │ GW_Armando_plate2_JF25G01 GW_Armando_plate2_JF25G01 VB plumb plumb 102.0 86955
86 │ GW_Armando_plate2_JF26G02 GW_Armando_plate2_JF26G02 ST plumb plumb 136.0 79995
87 │ GW_Armando_plate2_JF27G02 GW_Armando_plate2_JF27G02 ST plumb plumb 137.0 97159
88 │ GW_Armando_plate2_JF30G01 GW_Armando_plate2_JF30G01 ST_vi vir_misID vir 24.005 117010
89 │ GW_Armando_plate2_JG01G01 GW_Armando_plate2_JG01G01 PR plumb plumb 147.0 97961
90 │ GW_Armando_plate2_JG02G01 GW_Armando_plate2_JG02G01 PR plumb plumb 148.0 81494
91 │ GW_Armando_plate2_JG02G03 GW_Armando_plate2_JG02G03 PR plumb plumb 149.0 96990
92 │ GW_Armando_plate2_JG10G02 GW_Armando_plate2_JG10G02 ST plumb plumb 138.0 78807
93 │ GW_Armando_plate2_JG10G03 GW_Armando_plate2_JG10G03 ST plumb plumb 139.0 91258
94 │ GW_Armando_plate2_JG12G02 GW_Armando_plate2_JG12G02 ST plumb plumb 140.0 102563
95 │ GW_Armando_plate2_JG12G03 GW_Armando_plate2_JG12G03 ST plumb plumb 141.0 91523
96 │ GW_Armando_plate2_LN2 GW_Armando_plate2_LN2 LN troch_LN troch_LN 58.01 77555
97 │ GW_Armando_plate2_RF29G01 GW_Armando_plate2_RF29G01 BJ obs_plumb plumb_BJ 77.503 94504
98 │ GW_Armando_plate2_TTGW02 GW_Armando_plate2_TTGW02 MN troch_MN troch_west 53.0 82355
99 │ GW_Armando_plate2_TTGW03 GW_Armando_plate2_TTGW03 MN troch_MN troch_west 53.0 93915
100 │ GW_Armando_plate2_TTGW08 GW_Armando_plate2_TTGW08 SU lud_Sukhto lud_central 47.0 79030
101 │ GW_Armando_plate2_TTGW09 GW_Armando_plate2_TTGW09 SU lud_Sukhto lud_central 47.0 124074
102 │ GW_Armando_plate2_TTGW12 GW_Armando_plate2_TTGW12 TH lud_Thallighar lud_central 43.0 80774
103 │ GW_Armando_plate2_TTGW14 GW_Armando_plate2_TTGW14 TH lud_Thallighar lud_central 43.0 91726
104 │ GW_Armando_plate2_TTGW16 GW_Armando_plate2_TTGW16 TH lud_Thallighar lud_central 43.0 78610
105 │ GW_Armando_plate2_TTGW18 GW_Armando_plate2_TTGW18 TH lud_Thallighar lud_central 43.0 90831
106 │ GW_Armando_plate2_TTGW20 GW_Armando_plate2_TTGW20 SR lud_Sural lud_central 45.0 70473
107 │ GW_Armando_plate2_TTGW24 GW_Armando_plate2_TTGW24 SR lud_Sural lud_central 45.0 88081
108 │ GW_Armando_plate2_TTGW25 GW_Armando_plate2_TTGW25 SR lud_Sural lud_central 45.0 113545
109 │ GW_Armando_plate2_TTGW27 GW_Armando_plate2_TTGW27 SR lud_Sural lud_central 45.0 92690
110 │ GW_Armando_plate2_TTGW28 GW_Armando_plate2_TTGW28 SR lud_Sural lud_central 45.0 90248
111 │ GW_Armando_plate2_TTGW50 GW_Armando_plate2_TTGW50 NG lud_Nainaghar lud_central 49.0 73271
112 │ GW_Armando_plate2_TTGW51 GW_Armando_plate2_TTGW51 NG lud_Nainaghar lud_central 49.0 82100
113 │ GW_Armando_plate2_TTGW54 GW_Armando_plate2_TTGW54 NG lud_Nainaghar lud_central 49.0 680557
114 │ GW_Armando_plate2_TTGW56 GW_Armando_plate2_TTGW56 NG lud_Nainaghar lud_central 49.0 77725
115 │ GW_Armando_plate2_TTGW60 GW_Armando_plate2_TTGW60 SP lud_Spiti troch_west 55.0 80730
116 │ GW_Armando_plate2_TTGW61 GW_Armando_plate2_TTGW61 SP lud_Spiti troch_west 55.0 96107
117 │ GW_Armando_plate2_TTGW62 GW_Armando_plate2_TTGW62 SP lud_Spiti troch_west 55.0 101894
118 │ GW_Armando_plate2_TTGW67 GW_Armando_plate2_TTGW67 SP lud_Spiti troch_west 55.0 145905
119 │ GW_Armando_plate2_TTGW69 GW_Armando_plate2_TTGW69 SP lud_Spiti troch_west 55.0 91534
120 │ GW_Armando_plate2_TTGW73 GW_Armando_plate2_TTGW73 SA lud_Sathrundi lud_Sath 41.0 72217
121 │ GW_Armando_plate2_TTGW75 GW_Armando_plate2_TTGW75 SA lud_Sathrundi lud_Sath 41.0 92128
122 │ GW_Armando_plate2_TTGW77 GW_Armando_plate2_TTGW77 SA lud_Sathrundi lud_Sath 41.0 75475
123 │ GW_Armando_plate2_TTGW79 GW_Armando_plate2_TTGW79 SA lud_Sathrundi lud_Sath 41.0 96324
124 │ GW_Armando_plate2_TTGW80 GW_Armando_plate2_TTGW80 SA lud_Sathrundi lud_Sath 41.0 75357
125 │ GW_Armando_plate2_TTGW_15_01 GW_Armando_plate2_TTGW_15_01 SR lud_Sural lud_central 45.0 91072
126 │ GW_Armando_plate2_TTGW_15_02 GW_Armando_plate2_TTGW_15_02 SR lud_Sural lud_central 45.0 72691
127 │ GW_Armando_plate2_TTGW_15_03 GW_Armando_plate2_TTGW_15_03 SR lud_Sural lud_central 45.0 74750
128 │ GW_Armando_plate2_TTGW_15_04 GW_Armando_plate2_TTGW_15_04 SR lud_Sural lud_central 45.0 193531
129 │ GW_Armando_plate2_TTGW_15_06 GW_Armando_plate2_TTGW_15_06 SR lud_Sural lud_central 45.0 76538
130 │ GW_Armando_plate2_TTGW_15_10 GW_Armando_plate2_TTGW_15_10 SR lud_Sural lud_central 45.0 219999
131 │ GW_Lane5_AA1 GW_Lane5_AA1 AA vir_S vir_S 25.0 717676
132 │ GW_Lane5_AA11 GW_Lane5_AA11 AA vir_S vir_S 34.0 736085
133 │ GW_Lane5_AA3 GW_Lane5_AA3 AA vir_S vir_S 26.0 662552
134 │ GW_Lane5_AA4 GW_Lane5_AA4 AA vir_S vir_S 27.0 665969
135 │ GW_Lane5_AA5 GW_Lane5_AA5 AA vir_S vir_S 28.0 718610
136 │ GW_Lane5_AA6 GW_Lane5_AA6 AA vir_S vir_S 29.0 778821
137 │ GW_Lane5_AA7 GW_Lane5_AA7 AA vir_S vir_S 30.0 724717
138 │ GW_Lane5_AA8 GW_Lane5_AA8 AA vir_S vir_S 31.0 922313
139 │ GW_Lane5_AA9 GW_Lane5_AA9 AA vir_S vir_S 32.0 791514
140 │ GW_Lane5_AB2 GW_Lane5_AB2 AB vir vir 21.0 915622
141 │ GW_Lane5_AN1 GW_Lane5_AN1 AN plumb plumb 80.0 719041
142 │ GW_Lane5_AN2 GW_Lane5_AN2 AN plumb plumb 81.0 684348
143 │ GW_Lane5_BK2 GW_Lane5_BK2 BK plumb plumb 78.0 666018
144 │ GW_Lane5_BK3 GW_Lane5_BK3 BK plumb plumb 79.0 665090
145 │ GW_Lane5_DA2 GW_Lane5_DA2 XN obs obs 73.0 655980
146 │ GW_Lane5_DA3 GW_Lane5_DA3 XN obs obs 74.0 696722
147 │ GW_Lane5_DA4 GW_Lane5_DA4 XN obs obs 75.0 634940
148 │ GW_Lane5_DA7 GW_Lane5_DA7 XN obs obs 77.0 733664
149 │ GW_Lane5_EM1 GW_Lane5_EM1 EM troch_EM troch_EM 72.0 711636
150 │ GW_Lane5_IL1 GW_Lane5_IL1 IL plumb plumb 82.0 698039
151 │ GW_Lane5_IL4 GW_Lane5_IL4 IL plumb plumb 83.0 726050
152 │ GW_Lane5_KS1 GW_Lane5_KS1 OV lud_KS lud_KS 40.0 751642
153 │ GW_Lane5_KS2 GW_Lane5_KS2 OV lud_KS lud_KS 40.0 820241
154 │ GW_Lane5_LN1 GW_Lane5_LN1 LN troch_LN troch_LN 57.0 707607
155 │ GW_Lane5_LN10 GW_Lane5_LN10 LN troch_LN troch_LN 64.0 783282
156 │ GW_Lane5_LN12 GW_Lane5_LN12 LN troch_LN troch_LN 66.0 884653
157 │ GW_Lane5_LN14 GW_Lane5_LN14 LN troch_LN troch_LN 67.0 753482
158 │ GW_Lane5_LN16 GW_Lane5_LN16 LN troch_LN troch_LN 68.0 738272
159 │ GW_Lane5_LN18 GW_Lane5_LN18 LN troch_LN troch_LN 69.0 711612
160 │ GW_Lane5_LN19 GW_Lane5_LN19 LN troch_LN troch_LN 70.0 731751
161 │ GW_Lane5_LN20 GW_Lane5_LN20 LN troch_LN troch_LN 71.0 738170
162 │ GW_Lane5_LN3 GW_Lane5_LN3 LN troch_LN troch_LN 59.0 702316
163 │ GW_Lane5_LN4 GW_Lane5_LN4 LN troch_LN troch_LN 60.0 683211
164 │ GW_Lane5_LN6 GW_Lane5_LN6 LN troch_LN troch_LN 61.0 690808
165 │ GW_Lane5_LN7 GW_Lane5_LN7 LN troch_LN troch_LN 62.0 758021
166 │ GW_Lane5_LN8 GW_Lane5_LN8 LN troch_LN troch_LN 63.0 662407
167 │ GW_Lane5_MN1 GW_Lane5_MN1 MN troch_MN troch_west 51.0 943574
168 │ GW_Lane5_MN12 GW_Lane5_MN12 MN troch_MN troch_west 56.0 671074
169 │ GW_Lane5_MN3 GW_Lane5_MN3 MN troch_MN troch_west 52.0 946478
170 │ GW_Lane5_MN5 GW_Lane5_MN5 MN troch_MN troch_west 53.0 752979
171 │ GW_Lane5_MN8 GW_Lane5_MN8 MN troch_MN troch_west 54.0 771625
172 │ GW_Lane5_MN9 GW_Lane5_MN9 MN troch_MN troch_west 55.0 897944
173 │ GW_Lane5_NA1 GW_Lane5_NA1 NR lud_PK lud_PK 39.2 919218
174 │ GW_Lane5_NA3-3ul GW_Lane5_NA3-3ul NR lud_PK lud_PK 39.2 798370
175 │ GW_Lane5_PT11 GW_Lane5_PT11 KL lud_KL lud_central 42.0 771797
176 │ GW_Lane5_PT12 GW_Lane5_PT12 KL lud_KL lud_central 42.0 792683
177 │ GW_Lane5_PT2 GW_Lane5_PT2 ML lud_ML lud_ML 51.0 760347
178 │ GW_Lane5_PT3 GW_Lane5_PT3 PA lud_PA lud_central 46.0 722177
179 │ GW_Lane5_PT4 GW_Lane5_PT4 PA lud_PA lud_central 46.0 705413
180 │ GW_Lane5_PT6 GW_Lane5_PT6 KL lud_KL lud_central 42.0 763723
181 │ GW_Lane5_SH1 GW_Lane5_SH1 SH lud_PK lud_PK 39.1 966761
182 │ GW_Lane5_SH2 GW_Lane5_SH2 SH lud_PK lud_PK 39.1 768646
183 │ GW_Lane5_SH4 GW_Lane5_SH4 SH lud_PK lud_PK 39.1 849641
184 │ GW_Lane5_SH5 GW_Lane5_SH5 SH lud_PK lud_PK 39.1 929394
185 │ GW_Lane5_SL1 GW_Lane5_SL1 SL plumb plumb 150.0 648883
186 │ GW_Lane5_SL2 GW_Lane5_SL2 SL plumb plumb 151.0 654733
187 │ GW_Lane5_ST1 GW_Lane5_ST1 ST plumb plumb 142.0 606242
188 │ GW_Lane5_ST12 GW_Lane5_ST12 ST plumb plumb 144.0 691266
189 │ GW_Lane5_ST3 GW_Lane5_ST3 ST plumb plumb 143.0 696992
190 │ GW_Lane5_STvi1 GW_Lane5_STvi1 ST_vi vir vir 22.0 690090
191 │ GW_Lane5_STvi2 GW_Lane5_STvi2 ST_vi vir vir 23.0 897337
192 │ GW_Lane5_STvi3 GW_Lane5_STvi3 ST_vi vir vir 24.0 768162
193 │ GW_Lane5_TA1 GW_Lane5_TA1 TA plumb plumb 86.0 711909
194 │ GW_Lane5_TL1 GW_Lane5_TL1 TL vir vir 9.0 743509
195 │ GW_Lane5_TL10 GW_Lane5_TL10 TL vir vir 17.0 669934
196 │ GW_Lane5_TL11 GW_Lane5_TL11 TL vir vir 18.0 638402
197 │ GW_Lane5_TL12 GW_Lane5_TL12 TL vir vir 19.0 585697
198 │ GW_Lane5_TL2 GW_Lane5_TL2 TL vir vir 10.0 770857
199 │ GW_Lane5_TL4 GW_Lane5_TL4 TL vir vir 12.0 758037
200 │ GW_Lane5_TL5 GW_Lane5_TL5 TL vir vir 13.0 867165
201 │ GW_Lane5_TL7 GW_Lane5_TL7 TL vir vir 14.0 803407
202 │ GW_Lane5_TL8 GW_Lane5_TL8 TL vir vir 15.0 698745
203 │ GW_Lane5_TL9 GW_Lane5_TL9 TL vir vir 16.0 606969
204 │ GW_Lane5_TU1 GW_Lane5_TU1 TU nit nit 35.0 793640
205 │ GW_Lane5_TU2 GW_Lane5_TU2 TU nit nit 36.0 736785
206 │ GW_Lane5_UY2 GW_Lane5_UY2 UY plumb plumb 88.0 729002
207 │ GW_Lane5_UY3 GW_Lane5_UY3 UY plumb plumb 89.0 677527
208 │ GW_Lane5_UY4 GW_Lane5_UY4 UY plumb plumb 90.0 749848
209 │ GW_Lane5_UY5 GW_Lane5_UY5 UY plumb plumb 91.0 718373
210 │ GW_Lane5_UY6 GW_Lane5_UY6 UY plumb plumb 92.0 713675
211 │ GW_Lane5_YK1 GW_Lane5_YK1 YK vir vir 1.0 831245
212 │ GW_Lane5_YK11 GW_Lane5_YK11 YK vir vir 8.0 730798
213 │ GW_Lane5_YK3 GW_Lane5_YK3 YK vir vir 2.0 731944
214 │ GW_Lane5_YK4 GW_Lane5_YK4 YK vir vir 3.0 740051
215 │ GW_Lane5_YK5 GW_Lane5_YK5 YK vir vir 4.0 738740
216 │ GW_Lane5_YK6 GW_Lane5_YK6 YK vir vir 5.0 697420
217 │ GW_Lane5_YK7 GW_Lane5_YK7 YK vir vir 6.0 692052
218 │ GW_Lane5_YK9 GW_Lane5_YK9 YK vir vir 7.0 768722
219 │ GW_Liz_GBS_Liz10045 GW_Liz_GBS_Liz10045 ML lud lud_ML 51.01 818407
220 │ GW_Liz_GBS_Liz10094 GW_Liz_GBS_Liz10094 ML lud lud_ML 51.02 905297
221 │ GW_Liz_GBS_Liz5144 GW_Liz_GBS_Liz5144 ML lud lud_ML 51.08 942026
222 │ GW_Liz_GBS_Liz5163 GW_Liz_GBS_Liz5163 ML lud_chick lud_ML 51.12 851723
223 │ GW_Liz_GBS_Liz5164 GW_Liz_GBS_Liz5164 ML lud_chick lud_ML 51.13 866612
224 │ GW_Liz_GBS_Liz5165 GW_Liz_GBS_Liz5165 ML lud lud_ML 51.14 939782
225 │ GW_Liz_GBS_Liz5167 GW_Liz_GBS_Liz5167 ML lud_chick lud_ML 51.15 932442
226 │ GW_Liz_GBS_Liz5168 GW_Liz_GBS_Liz5168 ML lud_chick lud_ML 51.16 944373
227 │ GW_Liz_GBS_Liz5173 GW_Liz_GBS_Liz5173 ML lud_chick lud_ML 51.2 961268
228 │ GW_Liz_GBS_Liz5175 GW_Liz_GBS_Liz5175 ML lud lud_ML 51.22 947557
229 │ GW_Liz_GBS_Liz5178 GW_Liz_GBS_Liz5178 ML lud_chick lud_ML 51.25 942983
230 │ GW_Liz_GBS_Liz5179 GW_Liz_GBS_Liz5179 ML lud_chick lud_ML 51.26 933796
231 │ GW_Liz_GBS_Liz5182 GW_Liz_GBS_Liz5182 ML lud_chick lud_ML 51.28 950069
232 │ GW_Liz_GBS_Liz5184 GW_Liz_GBS_Liz5184 ML lud_chick lud_ML 51.29 875068
233 │ GW_Liz_GBS_Liz5185 GW_Liz_GBS_Liz5185 ML lud lud_ML 51.3 948876
234 │ GW_Liz_GBS_Liz5188 GW_Liz_GBS_Liz5188 ML lud lud_ML 51.33 853346
235 │ GW_Liz_GBS_Liz5189 GW_Liz_GBS_Liz5189 ML lud_chick lud_ML 51.34 972180
236 │ GW_Liz_GBS_Liz5190 GW_Liz_GBS_Liz5190 ML lud_chick lud_ML 51.35 922566
237 │ GW_Liz_GBS_Liz5191 GW_Liz_GBS_Liz5191 ML lud_chick lud_ML 51.36 836413
238 │ GW_Liz_GBS_Liz5193 GW_Liz_GBS_Liz5193 ML lud_chick lud_ML 51.38 895586
239 │ GW_Liz_GBS_Liz5194 GW_Liz_GBS_Liz5194 ML lud_chick lud_ML 51.39 900099
240 │ GW_Liz_GBS_Liz5197 GW_Liz_GBS_Liz5197 ML lud lud_ML 51.41 808093
241 │ GW_Liz_GBS_Liz5199 GW_Liz_GBS_Liz5199 ML lud_chick lud_ML 51.42 953161
242 │ GW_Liz_GBS_Liz6002 GW_Liz_GBS_Liz6002 ML lud lud_ML 51.43 903717
243 │ GW_Liz_GBS_Liz6006 GW_Liz_GBS_Liz6006 ML lud lud_ML 51.44 872123
244 │ GW_Liz_GBS_Liz6008 GW_Liz_GBS_Liz6008 ML lud lud_ML 51.45 932032
245 │ GW_Liz_GBS_Liz6009 GW_Liz_GBS_Liz6009 ML lud lud_ML 51.46 946736
246 │ GW_Liz_GBS_Liz6010 GW_Liz_GBS_Liz6010 ML lud lud_ML 51.47 844804
247 │ GW_Liz_GBS_Liz6014 GW_Liz_GBS_Liz6014 ML lud lud_ML 51.49 828632
248 │ GW_Liz_GBS_Liz6055 GW_Liz_GBS_Liz6055 ML lud lud_ML 51.5 878978
249 │ GW_Liz_GBS_Liz6057 GW_Liz_GBS_Liz6057 ML lud lud_ML 51.51 850572
250 │ GW_Liz_GBS_Liz6060 GW_Liz_GBS_Liz6060 ML lud lud_ML 51.52 865228
251 │ GW_Liz_GBS_Liz6062 GW_Liz_GBS_Liz6062 ML lud lud_ML 51.53 834482
252 │ GW_Liz_GBS_Liz6063 GW_Liz_GBS_Liz6063 ML lud lud_ML 51.54 879233
253 │ GW_Liz_GBS_Liz6066 GW_Liz_GBS_Liz6066 ML lud lud_ML 51.55 876902
254 │ GW_Liz_GBS_Liz6072 GW_Liz_GBS_Liz6072 ML lud lud_ML 51.56 845266
255 │ GW_Liz_GBS_Liz6079 GW_Liz_GBS_Liz6079 ML lud lud_ML 51.57 820880
256 │ GW_Liz_GBS_Liz6204 GW_Liz_GBS_Liz6204 ML lud_chick lud_ML 51.59 840138
257 │ GW_Liz_GBS_Liz6461 GW_Liz_GBS_Liz6461 ML lud lud_ML 51.6 815684
258 │ GW_Liz_GBS_Liz6472 GW_Liz_GBS_Liz6472 ML lud lud_ML 51.61 907391
259 │ GW_Liz_GBS_Liz6478 GW_Liz_GBS_Liz6478 ML lud lud_ML 51.62 779125
260 │ GW_Liz_GBS_Liz6776 GW_Liz_GBS_Liz6776 ML lud lud_ML 51.64 962187
261 │ GW_Liz_GBS_Liz6794 GW_Liz_GBS_Liz6794 ML lud lud_ML 51.65 841253
Filter SNPs with too many missing genotypes:
= sum(GW_GenoData_indFiltered_3.genotypes .== -1, dims=1)
missing_genotypes_per_SNP = 5 # this is the percentage threshold
missing_genotypes_percent_allowed_per_site = size(GW_GenoData_indFiltered_3.genotypes)[1] * missing_genotypes_percent_allowed_per_site/100
threshold_genotypes_missing = vec(missing_genotypes_per_SNP .> threshold_genotypes_missing)
lociSelection = GenomicDiversity.filterLoci(GW_GenoData_indFiltered_3, lociSelection)
GW_GenoData_ind_SNP_filtered println("Started with ", size(GW_GenoData_indFiltered_3.genotypes, 2), " SNPs.
for no more than ", missing_genotypes_percent_allowed_per_site, "% missing genotypes among all individuals, ", size(GW_GenoData_ind_SNP_filtered.genotypes, 2), " SNPs remain." ) After filtering SNPs
Started with 2431709 SNPs.
After filtering SNPs for no more than 5% missing genotypes among all individuals, 1017581 SNPs remain.
2nd round of filtering individuals
I added this in August 2023, to improve accuracy of imputation-based PCA, because I noticed outliers tended to have more missing data. Now I only allow up to 10% missing SNPs per individual.
= 10 # this is the percentage threshold
SNPmissing_percent_allowed_per_ind_round2 = size(GW_GenoData_ind_SNP_filtered.genotypes, 2)
loci_count = loci_count * SNPmissing_percent_allowed_per_ind_round2/100
threshold_missing = sum(GW_GenoData_ind_SNP_filtered.genotypes .== -1, dims=2)
numMissings .= numMissings
GW_GenoData_ind_SNP_filtered.indInfo.numMissings = vec(numMissings .> threshold_missing) # the vec command converts to BitVector rather than BitMatrix--important below
selection println("Filtering out these individuals based on too many missing genotypes: ")
= GW_GenoData_ind_SNP_filtered.indInfo[selection, :]
filtered_inds = GenomicDiversity.filterInds(GW_GenoData_ind_SNP_filtered,
GW_GenoData_indFiltered
selection)println("This leaves ", size(GW_GenoData_indFiltered.genotypes, 1), " individuals and ", size(GW_GenoData_indFiltered.genotypes, 2), " loci,
missing more than ", SNPmissing_percent_allowed_per_ind_round2, "% of genotypes
with no individuals missing in more than ", missing_genotypes_percent_allowed_per_site, "% of individual genotypes.") and no loci
Filtering out these individuals based on too many missing genotypes:
Specific individuals filtered out:
4-element Vector{String}:
"GW_Armando_plate1_TTGW74"
"GW_Armando_plate2_TTGW54"
"GW_Lane5_AA8"
"GW_Lane5_YK1"
This leaves 257 individuals and 1017581 loci,
with no individuals missing more than 10% of genotypes
and no loci missing in more than 5% of individual genotypes.
Polish a few individual names (for more readable graphs)
This uses the correctNames()
function, defined further above.
= correctNames(GW_GenoData_indFiltered.indInfo.ind)
GW_GenoData_indFiltered.indInfo.ind = correctNames(GW_GenoData_indFiltered.indInfo.ID) GW_GenoData_indFiltered.indInfo.ID
257-element Vector{String}:
"GW_Armando_plate1_AB1"
"GW_Armando_plate1_JF07G02"
"GW_Armando_plate1_JF07G03"
"GW_Armando_plate1_JF07G04"
"GW_Armando_plate1_JF08G02"
"GW_Armando_plate1_JF09G01"
"GW_Armando_plate1_JF09G02"
"GW_Armando_plate1_JF10G03"
"GW_Armando_plate1_JF11G01"
"GW_Armando_plate1_JF12G01"
"GW_Armando_plate1_JF12G02"
"GW_Armando_plate1_JF12G04"
"GW_Armando_plate1_JF13G01"
⋮
"GW_Liz_GBS_Liz6060"
"GW_Liz_GBS_Liz6062"
"GW_Liz_GBS_Liz6063"
"GW_Liz_GBS_Liz6066"
"GW_Liz_GBS_Liz6072"
"GW_Liz_GBS_Liz6079"
"GW_Liz_GBS_Liz6204"
"GW_Liz_GBS_Liz6461"
"GW_Liz_GBS_Liz6472"
"GW_Liz_GBS_Liz6478"
"GW_Liz_GBS_Liz6776"
"GW_Liz_GBS_Liz6794"
Calculate distances around ring
The locations around the ring (assuming barrier in North) can be graphed against genomic PC1 (or other variables).
Load lat/long data
cd(repoDirectory)
= "metadata/GW_locations_LatLong_2023.txt"
latlong_filepath = DataFrame(CSV.File(latlong_filepath))
latlongs print(latlongs)
37×5 DataFrame
Row │ Location_name location_short lat_N long_E subspecies
│ String31 String7 Float64 Float64 String15
─────┼──────────────────────────────────────────────────────────────────────────────
1 │ Yekaterinburg YK 56.8 60.6 viridanus
2 │ Abakan AB 52.0 89.5 viridanus
3 │ Teletsk TL 51.7 87.6 viridanus
4 │ Verkh_Biryusa_vir VB_vi 55.9 92.0 viridanus
5 │ Divnogorsk_vir DV_vi 56.0 92.4 viridanus
6 │ Stolbi_vir ST_vi 55.9 92.6 viridanus
7 │ Turkey TU 41.0 42.0 nitidus
8 │ Ala_Archa AA 42.54 74.5 viridanus
9 │ Naran_Pakistan NR 34.884 73.691 ludlowi
10 │ Shogran_Pakistan SH 34.594 73.466 ludlowi
11 │ Overa_Kashmir OV 33.991 75.243 ludlowi
12 │ Satharundhi_ChambaDistrict SA 32.974 76.222 ludlowi
13 │ KL_Killar_HP KL 33.106 76.409 ludlowi
14 │ Thalighar TH 32.828 76.45 ludlowi
15 │ Sural SR 33.134 76.455 ludlowi
16 │ PA_Tindi_HP PA 32.771 76.472 ludlowi
17 │ Sukhto SU 32.868 76.855 ludlowi
18 │ Nainaghar NG 32.728 76.8594 ludlowi
19 │ Mooling_and_Keylong ML 32.508 76.981 ludlowi
20 │ Manali MN 32.237 77.13 ludlowi
21 │ Spiti SP 32.377 77.281 ludlowi
22 │ Langtang LN 28.25 85.5 trochiloides
23 │ Gongga GG 29.5 102.0 obscuratus
24 │ Emeishan EM 29.5 103.3 obscuratus
25 │ Xining XN 37.0 102.0 obscuratus
26 │ Beijing BJ 40.0 115.5 plumbeitarsus
27 │ Baikal BK 51.9 104.9 plumbeitarsus
28 │ Arshan AN 51.9 102.5 plumbeitarsus
29 │ Ilinka IL 51.1 95.5 plumbeitarsus
30 │ Tuva TA 51.3 92.0 plumbeitarsus
31 │ Uyukski UY 51.9 94.1 plumbeitarsus
32 │ Manskoe_Belogorie MB 54.7 94.0 plumbeitarsus
33 │ Verkh_Biryusa VB 55.9 92.0 plumbeitarsus
34 │ Divnogorsk DV 56.0 92.4 plumbeitarsus
35 │ Stolbi ST 55.9 92.6 plumbeitarsus
36 │ Predivinsk PR 57.1 93.5 plumbeitarsus
37 │ Solgonski SL 55.7 91.0 plumbeitarsus
Make a quick plot to inspect latlong data:
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "Research locations",
title = "longitude (E)",
xlabel = "latitude (N)"
ylabel
)scatter!(latlongs.long_E, latlongs.lat_N)
text!(latlongs.long_E, latlongs.lat_N; text = latlongs.location_short)
f
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
(Ignore the warning that likely appears above. It is caused by an interaction between Quarto and the Makie graphing package, but has no consequence.)
remove “Green warbler” nitidus
Phylloscopus [t.] nitidus is outside of the main ring, so remove these samples from this analysis:
= latlongs[Not(latlongs.subspecies .== "nitidus"), :];
latlongs2 print(latlongs2)
36×5 DataFrame
Row │ Location_name location_short lat_N long_E subspecies
│ String31 String7 Float64 Float64 String15
─────┼──────────────────────────────────────────────────────────────────────────────
1 │ Yekaterinburg YK 56.8 60.6 viridanus
2 │ Abakan AB 52.0 89.5 viridanus
3 │ Teletsk TL 51.7 87.6 viridanus
4 │ Verkh_Biryusa_vir VB_vi 55.9 92.0 viridanus
5 │ Divnogorsk_vir DV_vi 56.0 92.4 viridanus
6 │ Stolbi_vir ST_vi 55.9 92.6 viridanus
7 │ Ala_Archa AA 42.54 74.5 viridanus
8 │ Naran_Pakistan NR 34.884 73.691 ludlowi
9 │ Shogran_Pakistan SH 34.594 73.466 ludlowi
10 │ Overa_Kashmir OV 33.991 75.243 ludlowi
11 │ Satharundhi_ChambaDistrict SA 32.974 76.222 ludlowi
12 │ KL_Killar_HP KL 33.106 76.409 ludlowi
13 │ Thalighar TH 32.828 76.45 ludlowi
14 │ Sural SR 33.134 76.455 ludlowi
15 │ PA_Tindi_HP PA 32.771 76.472 ludlowi
16 │ Sukhto SU 32.868 76.855 ludlowi
17 │ Nainaghar NG 32.728 76.8594 ludlowi
18 │ Mooling_and_Keylong ML 32.508 76.981 ludlowi
19 │ Manali MN 32.237 77.13 ludlowi
20 │ Spiti SP 32.377 77.281 ludlowi
21 │ Langtang LN 28.25 85.5 trochiloides
22 │ Gongga GG 29.5 102.0 obscuratus
23 │ Emeishan EM 29.5 103.3 obscuratus
24 │ Xining XN 37.0 102.0 obscuratus
25 │ Beijing BJ 40.0 115.5 plumbeitarsus
26 │ Baikal BK 51.9 104.9 plumbeitarsus
27 │ Arshan AN 51.9 102.5 plumbeitarsus
28 │ Ilinka IL 51.1 95.5 plumbeitarsus
29 │ Tuva TA 51.3 92.0 plumbeitarsus
30 │ Uyukski UY 51.9 94.1 plumbeitarsus
31 │ Manskoe_Belogorie MB 54.7 94.0 plumbeitarsus
32 │ Verkh_Biryusa VB 55.9 92.0 plumbeitarsus
33 │ Divnogorsk DV 56.0 92.4 plumbeitarsus
34 │ Stolbi ST 55.9 92.6 plumbeitarsus
35 │ Predivinsk PR 57.1 93.5 plumbeitarsus
36 │ Solgonski SL 55.7 91.0 plumbeitarsus
Make a matrix of great circle distances
These are Haversine distances, assuming spherical Earth which is really close:
= GeoLocation.(latlongs2.long_E, latlongs2.lat_N)
geoPoints # this next line is so neat--uses list comprehension to make a matrix of pairwise calculations
= [(HaversineDistance(geoPoints[i], geoPoints[j])/1000) for i in eachindex(geoPoints), j in eachindex(geoPoints)] distances
36×36 Matrix{Float64}:
0.0 1929.03 1829.5 1920.28 … 1956.21 1976.01 1866.47
1929.03 0.0 134.698 463.414 478.643 622.753 422.996
1829.5 134.698 0.0 548.935 570.583 711.032 497.776
1920.28 463.414 548.935 0.0 37.404 162.101 66.3387
1941.17 483.376 572.19 27.2735 16.6941 139.661 93.5371
1956.21 478.643 570.583 37.404 … 0.0 144.411 102.442
1866.88 1539.57 1417.12 1943.75 1971.61 2101.68 1883.0
2629.03 2280.66 2174.21 2720.63 2744.17 2882.26 2664.81
2653.46 2318.81 2212.24 2758.56 2782.15 2920.16 2702.67
2768.57 2304.58 2204.89 2753.66 2775.3 2915.75 2700.28
2905.38 2370.69 2276.33 2825.06 … 2845.24 2987.08 2773.58
2897.44 2350.4 2256.5 2805.15 2825.21 2967.15 2753.85
2927.82 2377.48 2284.21 2832.75 2852.64 2994.71 2781.67
⋮ ⋱ ⋮
4317.27 2390.71 2499.26 2464.31 2434.21 2474.86 2502.82
2869.29 1053.52 1187.02 953.003 … 918.603 934.047 1003.57
2724.34 889.835 1023.03 817.872 785.302 818.533 863.881
2341.58 426.63 551.809 581.592 567.027 679.716 591.859
2117.8 189.217 307.752 511.497 513.021 652.226 493.694
2214.04 315.404 447.369 465.516 455.478 579.504 468.914
2082.09 423.338 540.944 183.922 … 160.175 268.68 220.448
1920.28 463.414 548.935 0.0 37.404 162.101 66.3387
1941.17 483.376 572.19 27.2735 16.6941 139.661 93.5371
1956.21 478.643 570.583 37.404 0.0 144.411 102.442
1976.01 622.753 711.032 162.101 144.411 0.0 218.833
1866.47 422.996 497.776 66.3387 … 102.442 218.833 0.0
Now adjust distances to assume no gene flow through centre of ring.
# get some key distances
function getIndex(name; nameVector = latlongs2.Location_name)
findfirst(isequal(name), nameVector)
end
= getIndex("Ala_Archa")
index_AA = getIndex("Naran_Pakistan")
index_NR = getIndex("Langtang")
index_LN = getIndex("Gongga")
index_GG = getIndex("Xining")
index_XN = getIndex("Beijing")
index_BJ = nrow(latlongs2)
index_last
= distances[index_NR, index_LN]
dist_NR_to_LN = distances[index_LN, index_GG]
dist_LN_to_GG = distances[index_GG, index_BJ]
dist_GG_to_BJ
# This next part will assume locations in the input file are arranged in order around ring:
= Matrix{Float32}(undef, size(distances)[1], size(distances)[2])
distsAroundRing
# function for accepting straight-line great circle dists as distances between sets of sites
= function(straightGreatCircleDists, start, finish, distsAroundRing)
acceptDists :finish, start:finish] = straightGreatCircleDists[start:finish, start:finish]
distsAroundRing[startreturn(distsAroundRing)
end
# accept all distances within viridanus:
= acceptDists(distances, 1, index_AA, distsAroundRing)
distsAroundRing
# accept dist from AA to NR:
= acceptDists(distances, index_AA, index_NR, distsAroundRing)
distsAroundRing
# accept all distances from NR to LN:
= acceptDists(distances, index_NR, index_LN, distsAroundRing)
distsAroundRing
# accept dist from LN to GG:
= acceptDists(distances, index_LN, index_GG, distsAroundRing)
distsAroundRing
# accept dists between GG, EM, XN, BJ:
= acceptDists(distances, index_GG, index_BJ, distsAroundRing)
distsAroundRing
# accept all distances within plumbeitarsus:
= acceptDists(distances, index_BJ, index_last, distsAroundRing)
distsAroundRing
# function for adding up distances measured through certain sites:
= function(set1start, set1end, set2start, set2end, distsAroundRing)
addDists = repeat(distsAroundRing[set1start:(set1end-1), set1end], 1, set2end-set2start+1)
firstDists = repeat(transpose(distsAroundRing[set1end, set2start:set2end]), set1end-set1start, 1)
secondDists = firstDists + secondDists
totalDists :(set1end-1), set2start:set2end] = totalDists
distsAroundRing[set1start:set2end, set1start:(set1end-1)] = transpose(totalDists)
distsAroundRing[set2startreturn(distsAroundRing)
end
# dists from viridanus to NR are sum of dists to AA plus AA to NR:
= addDists(1, index_AA, index_NR, index_NR, distsAroundRing)
distsAroundRing
# dists from "northwest of PK" to Himalayas are sum of ringdists to NR plus NR to locations up to LN:
= addDists(1, index_NR, index_NR+1, index_LN, distsAroundRing)
distsAroundRing
# dists from "west / northwest of LN" to GG are sum of dists to LN plus LN to EM:
= addDists(1, index_LN, index_GG, index_GG, distsAroundRing)
distsAroundRing
# dists from "west / northwest of EM" to China are sum of dists to GG plus GG to (EM, XN, BJ):
= addDists(1, index_GG, index_GG, index_BJ, distsAroundRing)
distsAroundRing
# dists from "west of BJ" to east Siberia are sum of dists to BJ plus BJ to other plumbeitarsus:
= addDists(1, index_BJ, index_BJ+1, index_last, distsAroundRing); distsAroundRing
Do Principal Coordinates Analysis on the distances around the ring
This produces a single location axis around ring, going from west Siberia south, then east, then north to east Siberia.
= fit(MDS, distsAroundRing; distances=true, maxoutdim=1)
PCO_around_ring # add this as a column to the data frame:
= vec(-predict(PCO_around_ring))
latlongs2.LocationAroundRing :, [:location_short, :LocationAroundRing]]
latlongs2[println(latlongs2[:, [:location_short, :LocationAroundRing]])
36×2 DataFrame
Row │ location_short LocationAroundRing
│ String7 Float32
─────┼────────────────────────────────────
1 │ YK -4799.38
2 │ AB -4548.43
3 │ TL -4428.78
4 │ VB_vi -4953.25
5 │ DV_vi -4978.83
6 │ ST_vi -4980.5
7 │ AA -3027.02
8 │ NR -2171.62
9 │ SH -2166.63
10 │ OV -1998.24
11 │ SA -1861.95
12 │ KL -1854.67
13 │ TH -1835.41
14 │ SR -1852.63
15 │ PA -1830.4
16 │ SU -1805.38
17 │ NG -1796.96
18 │ ML -1774.64
19 │ MN -1747.34
20 │ SP -1743.06
21 │ LN -830.651
22 │ GG 783.671
23 │ EM 890.416
24 │ XN 1481.8
25 │ BJ 2480.12
26 │ BK 4027.02
27 │ AN 4134.32
28 │ IL 4451.78
29 │ TA 4673.03
30 │ UY 4581.05
31 │ MB 4762.52
32 │ VB 4943.39
33 │ DV 4929.77
34 │ ST 4913.15
35 │ PR 4951.66
36 │ SL 4982.05
Add these ring locations to the metadata table:
.= NaN # pre-allocate the column
GW_GenoData_indFiltered.indInfo.ring_km for i in axes(latlongs2, 1)
= findall(GW_GenoData_indFiltered.indInfo.location .== latlongs2.location_short[i])
match_indices .= latlongs2.LocationAroundRing[i];
GW_GenoData_indFiltered.indInfo.ring_km[match_indices] end
Make a table of locations and groups of individuals included after filtering
= groupby(GW_GenoData_indFiltered.indInfo, [:location, :Fst_group])
gdf = combine(gdf, nrow)
sample_origin_summary println(sample_origin_summary)
# to save as comma-delimited text:
# CSV.write("GW2025_sample_origin_summary.txt", sample_origin_summary; delim='\t')
37×3 DataFrame
Row │ location Fst_group nrow
│ String7 String15 Int64
─────┼──────────────────────────────
1 │ AB vir 2
2 │ ST plumb 35
3 │ ST plumb_vir 1
4 │ ST_vi vir 6
5 │ DV plumb 5
6 │ DV_vi vir 2
7 │ MB plumb 4
8 │ VB plumb 5
9 │ PR plumb 5
10 │ BJ plumb_BJ 3
11 │ TL vir 11
12 │ MN troch_west 10
13 │ SU lud_central 6
14 │ TH lud_central 7
15 │ SR lud_central 18
16 │ NG lud_central 9
17 │ SP troch_west 10
18 │ SA lud_Sath 8
19 │ UY plumb 6
20 │ VB_vi vir 2
21 │ LN troch_LN 14
22 │ AA vir_S 8
23 │ AN plumb 2
24 │ BK plumb 2
25 │ XN obs 4
26 │ EM troch_EM 1
27 │ IL plumb 2
28 │ OV lud_KS 2
29 │ NR lud_PK 2
30 │ KL lud_central 3
31 │ ML lud_ML 44
32 │ PA lud_central 2
33 │ SH lud_PK 4
34 │ SL plumb 2
35 │ TA plumb 1
36 │ TU nit 2
37 │ YK vir 7
Save the filtered dataset and other info for other Quarto pages
cd(dataDirectory)
= string(baseName, tagName, "ind_SNP_ind_filtered.jld2")
filename jldsave(filename; GW_GenoData_indFiltered,
repoDirectory,
dataDirectory,
scaffold_info,
scaffold_lengths,
baseName,
filenameTextMiddle,
missingGenotypeThreshold,
filenameTextEnd,
tagName,
chromosomes_to_process,
metadataFile)println("Saved the filtered data.")
Saved the filtered data.
The above saves the important data for other Quarto pages, which will load data as below:
Load the filtered dataset
(This block is not active here)
= string(baseName, tagName, "ind_SNP_ind_filtered.jld2")
filename # load info into a dictionary:
= load(filename)
d = d["GW_GenoData_indFiltered"]
GW_GenoData_indFiltered = d["repoDirectory"]
repoDirectory = d["dataDirectory"]
dataDirectory = d["scaffold_info"]
scaffold_info = d["scaffold_lengths"]
scaffold_lengths = d["baseName"]
baseName = d["filenameTextMiddle"]
filenameTextMiddle = d["missingGenotypeThreshold"]
missingGenotypeThreshold = d["filenameTextEnd"]
filenameTextEnd = d["tagName"]
tagName = d["chromosomes_to_process"]
chromosomes_to_process = ["metadataFile"] metadataFile
Genotype-by-individual plots
Now, show individual genotypes for subsets of the dataset. Can choose individuals and genomic regions to plot, along with an Fst cutoff (only show SNPs with greater Fst than the cutoff).
= ["vir","plumb"]
groups = ["vir","plumb_vir","plumb"]
plotGroups = ["blue","purple","red"]
plotGroupColors = [100,100,100] # maximum number of individuals to plot from each group
numIndsToPlot = "vir" # these groups will determine the color used in the graph
group1 = "plumb"
group2 = "vir_plumb"
groupsToCompare = 0.8
Fst_cutoff = 0.2; # only show SNPs with less than this fraction of missing data among individuals missingFractionAllowed
Calculate allele freqs and sample sizes
This uses groups indicated column Fst_group
in the metadata file.
= GenomicDiversity.getFreqsAndSampleSizes(GW_GenoData_indFiltered, groups)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
Calculated population allele frequencies and sample sizes
Calculate Fst
= GenomicDiversity.getFst(freqs,
Fst, FstNumerator, FstDenominator, pairwiseNamesFst
sampleSizes, groups)println("Calculated Fst values")
Calculated Fst values
Limit the individuals to include in plot
# Limit each group to specified numbers
= limitIndsToPlot(plotGroups, numIndsToPlot, GW_GenoData_indFiltered.genotypes, GW_GenoData_indFiltered.indInfo)
genotypes_included, indInfo_included # put the info for limited individuals into new GenoData object
= GenoData(indInfo_included, genotypes_included, GW_GenoData_indFiltered.positions);
GW_GenoData_indFiltered_included # Note to self: make this function more streamlined in next version of package,
# so that limitIndsToPlot() can work directly on a GenoData object.
Choose the scaffold and region to show
= "gw23"
chr # get the maximum position for the chromosome, for use later:
= GenomicDiversity.chooseChrRegion(GW_GenoData_indFiltered_included.positions, chr;
regionInfo =1, positionMax=scaffold_lengths[chr]) positionMin
("gw23", 1, 8435933, "chr gw23 1 to 8435933")
Now actually make the plot
= GenomicDiversity.plotGenotypeByIndividualWithFst(GW_GenoData_indFiltered_included,
plotInfo
groupsToCompare, Fst_cutoff, missingFractionAllowed,
regionInfo, Fst, pairwiseNamesFst,
freqs, plotGroups, plotGroupColors);# plotInfo contains a tuple with: (f, plottedGenotype, locations, plottedMetadata)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Choose another chromosome, and plot similarly to above
= "gw26"
chr = chooseChrRegion(GW_GenoData_indFiltered_included.positions, chr; positionMin=1, positionMax=scaffold_lengths[chr])
regionInfo = GenomicDiversity.plotGenotypeByIndividualWithFst(GW_GenoData_indFiltered_included,
plotInfo
groupsToCompare, Fst_cutoff, missingFractionAllowed,
regionInfo, Fst, pairwiseNamesFst, freqs, plotGroups, plotGroupColors);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Make a GBI plot to illustrate variation along west side of ring
= ["vir","troch_LN"] # for purpose of calculating pairwise Fst and Fst_group (to determine SNPs)
groups = ["vir","vir_S","nit", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN"]
plotGroups = ["blue","turquoise1","grey","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow"]
plotGroupColors = [10, 5, 2, 3, 5, 15, 3, 5, 10, 10] # maximum number of individuals to plot from each group
numIndsToPlot = "vir" # these groups will determine the color used in the graph
group1 = "troch_LN"
group2 = "vir_troch_LN" # "Fst_among"
groupsToCompare = 0.9
Fst_cutoff = 0.2 # only show SNPs with less than this fraction of missing data among individuals
missingFractionAllowed
# Calculate allele freqs and sample sizes (use column Fst_group)
= GenomicDiversity.getFreqsAndSampleSizes(GW_GenoData_indFiltered, groups)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# calculate Fst
= GenomicDiversity.getFst(freqs,
Fst, FstNumerator, FstDenominator, pairwiseNamesFst
sampleSizes, groups)println("Calculated Fst values")
# Limit each group to specified numbers
= limitIndsToPlot(plotGroups, numIndsToPlot, GW_GenoData_indFiltered.genotypes, GW_GenoData_indFiltered.indInfo)
genotypes_included, indInfo_included # put the info for limited individuals into new GenoData object
= GenoData(indInfo_included, genotypes_included, GW_GenoData_indFiltered.positions);
GW_GenoData_indFiltered_included
# choose the scaffold and region to show
= "gw26"
chr = chooseChrRegion(GW_GenoData_indFiltered_included.positions, chr; positionMin=1, positionMax=scaffold_lengths[chr])
regionInfo
#### Now actually make the plot
= GenomicDiversity.plotGenotypeByIndividualWithFst(GW_GenoData_indFiltered_included,
plotInfo
groupsToCompare, Fst_cutoff, missingFractionAllowed,
regionInfo, Fst, pairwiseNamesFst, freqs, plotGroups, plotGroupColors);
Calculated population allele frequencies and sample sizes
Calculated Fst values
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show another chromosome along west side of ring:
# choose the scaffold and region to show
= "gw28"
chr = chooseChrRegion(GW_GenoData_indFiltered_included.positions, chr; positionMin=1, positionMax=scaffold_lengths[chr])
regionInfo
#### Now actually make the plot
= GenomicDiversity.plotGenotypeByIndividualWithFst(GW_GenoData_indFiltered_included,
plotInfo
groupsToCompare, Fst_cutoff, missingFractionAllowed,
regionInfo, Fst, pairwiseNamesFst, freqs, plotGroups, plotGroupColors);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Make a GBI plot to illustrate variation along east side of ring
= ["troch_LN","obs","plumb"] # for purpose of calculating pairwise Fst and Fst_group (to determine SNPs)
groups = ["troch_LN","troch_EM","obs","plumb_BJ","plumb"]
plotGroups = ["yellow","gold","orange","pink","red"]
plotGroupColors = [15, 15, 15, 15, 17] # maximum number of individuals to plot from each group
numIndsToPlot = "troch_LN" # these groups will determine the color used in the graph
group1 = "plumb"
group2 = "troch_LN_plumb" #"troch_LN_plumb" #"Fst_among"
groupsToCompare = 0.9
Fst_cutoff = 0.2 # only show SNPs with less than this fraction of missing data among individuals
missingFractionAllowed
# Calculate allele freqs and sample sizes (use column Fst_group)
= GenomicDiversity.getFreqsAndSampleSizes(GW_GenoData_indFiltered, groups)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# calculate Fst
= GenomicDiversity.getFst(freqs,
Fst, FstNumerator, FstDenominator, pairwiseNamesFst
sampleSizes, groups)println("Calculated Fst values")
# Limit each group to specified numbers
= limitIndsToPlot(plotGroups, numIndsToPlot, GW_GenoData_indFiltered.genotypes, GW_GenoData_indFiltered.indInfo)
genotypes_included, indInfo_included # put the info for limited individuals into new GenoData object
= GenoData(indInfo_included, genotypes_included, GW_GenoData_indFiltered.positions);
GW_GenoData_indFiltered_included
# choose the scaffold and region to show
= "gw28"
chr = chooseChrRegion(GW_GenoData_indFiltered_included.positions, chr; positionMin=1, positionMax=scaffold_lengths[chr])
regionInfo
# need to remove two plumbeitarsus individuals that have introgression from viridanus, as that would otherwise be misleading in this figure since viridanus are not shown
= GenomicDiversity.filterNamedInds(GW_GenoData_indFiltered_included,
GW_GenoData_indFiltered_included_minus2 "GW_Armando_plate1_JF09G01", "GW_Armando_plate1_JF24G02"]);
[
#### Now actually make the plot
= GenomicDiversity.plotGenotypeByIndividualWithFst(GW_GenoData_indFiltered_included_minus2,
plotInfo
groupsToCompare, Fst_cutoff, missingFractionAllowed,
regionInfo, Fst, pairwiseNamesFst, freqs, plotGroups, plotGroupColors);
Calculated population allele frequencies and sample sizes
Calculated Fst values
Specific individuals filtered out:
2-element Vector{String}:
"GW_Armando_plate1_JF09G01"
"GW_Armando_plate1_JF24G02"
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Make GBI plots showing all individuals in the study
= ["vir","vir_S","nit", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN","troch_EM","obs","plumb_BJ","plumb","plumb_vir"]
groups_to_plot_all = ["blue","turquoise1","grey","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow","gold","orange","pink","red","purple"]; group_colors_all
This will use Fst among multiple groups to determine SNPs to plot. For chr 28:
= ["vir","lud_PK","troch_LN","obs","plumb"] # for purpose of calculating pairwise Fst and Fst_group (to determine SNPs)
groups = groups_to_plot_all
plotGroups = group_colors_all
plotGroupColors = "vir" # these groups will determine the color used in the graph
group1 = "plumb"
group2 = ["vir_plumb", "vir_troch_LN", "troch_LN_plumb"] # "Fst_among" #"vir_plumb"
groupsToCompare = 0.8
Fst_cutoff = 0.2 # only show SNPs with less than this fraction of missing data among individuals
missingFractionAllowed
# Calculate allele freqs and sample sizes (use column Fst_group)
= GenomicDiversity.getFreqsAndSampleSizes(GW_GenoData_indFiltered, groups)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= GenomicDiversity.getFst(freqs,
Fst, FstNumerator, FstDenominator, pairwiseNamesFst
sampleSizes, groups)println("Calculated Fst values")
= "gw28"
chr = chooseChrRegion(GW_GenoData_indFiltered.positions, chr; positionMin=1, positionMax=scaffold_lengths[chr]) # this gets the maximum position for the chromosome
regionInfo
= GenomicDiversity.plotGenotypeByIndividualWithFst(GW_GenoData_indFiltered,
plotInfo
groupsToCompare, Fst_cutoff, missingFractionAllowed,
regionInfo, Fst, pairwiseNamesFst,
freqs, groups_to_plot_all, group_colors_all;=4, figureSize=(1200,1600),
indFontSize= ""); plotTitle
Calculated population allele frequencies and sample sizes
Calculated Fst values
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Make list of scaffolds to plot according to above:
= "gw" .* string.(vcat(28:-1:17, 15:-1:1))
scaffolds_to_plot push!(scaffolds_to_plot, "gw1A", "gw4A") # add two other scaffolds
29-element Vector{String}:
"gw28"
"gw27"
"gw26"
"gw25"
"gw24"
"gw23"
"gw22"
"gw21"
"gw20"
"gw19"
"gw18"
"gw17"
"gw15"
⋮
"gw10"
"gw9"
"gw8"
"gw7"
"gw6"
"gw5"
"gw4"
"gw3"
"gw2"
"gw1"
"gw1A"
"gw4A"
Loop through each scaffold and produce GBI plot. (Making inactive because already saved figs and takes long.)
for i in 1:length(scaffolds_to_plot)
= scaffolds_to_plot[i]
chr = chooseChrRegion(GW_GenoData_indFiltered, chr; positionMin=1, positionMax=scaffold_lengths[chr]) # this gets the maximum position for the chromosome
regionInfo = GenomicDiversity.plotGenotypeByIndividualWithFst(GW_GenoData_indFiltered,
plotInfo
groupsToCompare, Fst_cutoff, missingFractionAllowed,
regionInfo, Fst, pairwiseNamesFst,
freqs, plotGroups, plotGroupColors;=4, figureSize=(1200,1600), plotTitle = "");
indFontSizeprintln("Completed the figure for ", chr, ".")
if false # set to true to save plot
= string("Figure_", chr, "_Fst3groups_GBI_allInds_from_Julia.png")
filename save(filename, plotInfo[1], px_per_unit = 2.0)
println("Saved ", filename)
end
end
Impute missing genotypes, for PCA
Our goal is to produce plots showing individuals in genotype space, using Principal Components Analysis.
PCA requires imputation of missing genotypes. I did imputation for each scaffold above a certain size threshold. These are listed in chromosomes_to_process
.
Imputation can take several minutes per scaffold, so I ran this imputation step separately from this Quarto notebook (otherwise render would take long) and saved the genotype data for each scaffold for loading in the next step. There are two alternative algorithms that I’ve used for imputing: SVD (Singular Value Decomposition) and KNN (K Nearest Neighbours), both of which are implemented in the Impute.jl package. After much testing and thought, I decided to use KNN imputation (with K=1) on each chromosome, and then combine chromosomes for the genome-wide PCA. For PCAs of small genomic regions, I use SVD.
The code for SVD imputing is below (not actually used here):
for i in eachindex(chromosomes_to_process)
= chromosomes_to_process[i]
chrom = string("chr", chrom)
regionText = GW_GenoData_indFiltered.indInfo
ind_with_metadata_indFiltered = (GW_GenoData_indFiltered.positions.chrom .== chrom)
loci_selection = GW_GenoData_indFiltered.positions[loci_selection,:]
pos_SNP_filtered_region = Matrix{Union{Missing, Float32}}(GW_GenoData_indFiltered.genotypes[:,loci_selection])
genosOnly_region_for_imputing @time imputed_genos = Impute.svd(genosOnly_region_for_imputing)
= string(baseName, tagName, regionText, ".SVDimputedMissing.jld2")
filename jldsave(filename; imputed_genos, ind_with_metadata_indFiltered, pos_SNP_filtered_region)
println(string("Chromosome ", chrom, ": Saved real and imputed genotypes for ", size(pos_SNP_filtered_region, 1)," SNPs and ", size(genosOnly_region_for_imputing, 1)," individuals."))
end
Imputation using KNN
Troyanskaya et al. (2001) recommend imputation using K-nearest neighbors approach as being better than SVD, both of which are better than other methods for DNA genotyping. They also recommend using Euclidian distance. So I use KNN with Euclidian distance for imputation along whole chromosomes. I am going with setting dims
to :rows
, as that seems to run much faster and produces PCAs that make a lot of sense. I’ve already run this next code cell, which does the imputing and saves the imputed data matrix for each scaffold. This actually runs quite quickly–at most a couple minutes per scaffold. :)
for i in eachindex(chromosomes_to_process)
= chromosomes_to_process[i]
chrom = string("chr", chrom)
regionText = GW_GenoData_indFiltered.indInfo
ind_with_metadata_indFiltered = (GW_GenoData_indFiltered.positions.chrom .== chrom)
loci_selection = GW_GenoData_indFiltered.positions[loci_selection,:]
pos_SNP_filtered_region = Matrix{Union{Missing, Float32}}(GW_GenoData_indFiltered.genotypes[:,loci_selection])
genosOnly_region_for_imputing @time imputed_genos = Impute.knn(genosOnly_region_for_imputing; dims = :rows) # much faster with the "dims = :rows" there
= string(baseName, tagName, regionText, ".KNNimputedMissing.jld2")
filename jldsave(filename; imputed_genos, ind_with_metadata_indFiltered, pos_SNP_filtered_region)
println(string("Chromosome ", chrom, ": Saved real and imputed genotypes for ", size(pos_SNP_filtered_region, 1)," SNPs and ", size(genosOnly_region_for_imputing, 1)," individuals."))
end
Now do the KNN PCAs:
Now we can cycle through a set of chromosomes and plot a PCA for each. We need to first specify some groups to include in the plot, and their colors:
= ["vir","vir_S","nit", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN","troch_EM","obs","plumb_BJ","plumb","plumb_vir"]
groups_to_plot_PCA = ["blue","turquoise1","grey","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow","gold","orange","pink","red","purple"]; group_colors_PCA
Now we can actually do the PCA and make the plot for each scaffold. (Making this inactive for now, because it makes a lot of plots.)
for i in eachindex(chromosomes_to_process)
= chromosomes_to_process[i]
scaffold = string("chr", scaffold)
regionText = string(baseName, tagName, regionText, ".SVDimputedMissing.jld2")
filename = load(filename, "imputed_genos")
imputed_genos = load(filename, "ind_with_metadata_indFiltered")
ind_with_metadata_indFiltered = load(filename, "pos_SNP_filtered_region")
pos_SNP_filtered_region println(string("Loaded ",filename))
println(string(regionText, ": ", size(imputed_genos,2), " SNPs from ", size(imputed_genos,1), " individuals"))
= true
flipPC1 = true
flipPC2 = plotPCA(imputed_genos, ind_with_metadata_indFiltered,
PCAmodel
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText=regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Chromosome ", scaffold," PC1"), yLabelText = string("Chromosome ", scaffold," PC2"),
xLabelText = false)
showPlot # add position of reference genome
= predict(PCAmodel.model, zeros(size(imputed_genos, 2)))
refGenomePCAposition && (refGenomePCAposition[1] *= -1) # this flips PC1 if flipPC1 = true
flipPC1 && (refGenomePCAposition[2] *= -1) # same for PC2
flipPC2 scatter!(refGenomePCAposition[1], refGenomePCAposition[2], marker = :diamond, color="black", markersize=15, strokewidth=0.5)
CairoMakie.try
display(PCAmodel.PCAfig)
catch
println("NOTICE: Figure for ", regionText, " could not be shown due to an unknown error.")
end
end
More genotype-by-individual plots (some examples)
GBI plot showing just west individuals:
= ["vir","lud_PK","troch_LN"] # for purpose of calculating pairwise Fst and Fst_group (to determine SNPs)
groups = ["vir", "vir_misID", "vir_S", "nit", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML", "troch_west", "troch_LN"] # west GW individuals
plotGroups = ["blue", "blue", "turquoise1", "grey", "seagreen4", "seagreen3", "seagreen2", "olivedrab3", "olivedrab2", "olivedrab1", "yellow"]
plotGroupColors = "vir" # these groups will determine the color used in the graph
group1 = "troch_LN"
group2 = "Fst_among" #"vir_plumb"
groupsToCompare = 0.8
Fst_cutoff = 0.2 # only show SNPs with less than this fraction of missing data among individuals
missingFractionAllowed
# Calculate allele freqs and sample sizes (use column Fst_group)
= GenomicDiversity.getFreqsAndSampleSizes(GW_GenoData_indFiltered, groups)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= GenomicDiversity.getFst(freqs,
Fst, FstNumerator, FstDenominator, pairwiseNamesFst
sampleSizes, groups; =true) # set among to FALSE if no among Fst wanted (some things won't work without it)
amongprintln("Calculated Fst values")
= ["gw1"]
scaffolds_to_plot
for i in 1:length(scaffolds_to_plot)
= scaffolds_to_plot[i]
chr = chooseChrRegion(GW_GenoData_indFiltered.positions, chr; positionMin=1, positionMax=scaffold_lengths[chr]) # this gets the maximum position for the chromosome
regionInfo = GenomicDiversity.plotGenotypeByIndividualWithFst(GW_GenoData_indFiltered,
plotInfo
groupsToCompare, Fst_cutoff, missingFractionAllowed,
regionInfo, Fst, pairwiseNamesFst,
freqs, plotGroups, plotGroupColors;=4, figureSize=(1200,1600));
indFontSizeprintln("Completed the figure for ", chr, ".")
end
Calculated population allele frequencies and sample sizes
Calculated Fst values
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Completed the figure for gw1.
Make GBI plots showing just east individuals:
= ["troch_LN","obs","plumb"] # for purpose of calculating pairwise Fst and Fst_group (to determine SNPs)
groups = ["troch_LN", "troch_EM", "obs", "plumb_BJ", "plumb"] # east GW individuals
plotGroups = ["yellow", "gold", "orange", "pink", "red"]
plotGroupColors = "troch_LN" # these groups will determine the color used in the graph
group1 = "plumb"
group2 = "Fst_among" #"vir_plumb"
groupsToCompare = 0.8
Fst_cutoff = 0.2 # only show SNPs with less than this fraction of missing data among individuals
missingFractionAllowed
# Calculate allele freqs and sample sizes (use column Fst_group)
= GenomicDiversity.getFreqsAndSampleSizes(GW_GenoData_indFiltered, groups)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= GenomicDiversity.getFst(freqs,
Fst, FstNumerator, FstDenominator, pairwiseNamesFst
sampleSizes, groups; =true) # set among to FALSE if no among Fst wanted (some things won't work without it)
amongprintln("Calculated Fst values")
for i in 1:length(scaffolds_to_plot)
= scaffolds_to_plot[i]
chr = chooseChrRegion(GW_GenoData_indFiltered.positions, chr; positionMin=1, positionMax=scaffold_lengths[chr])
regionInfo = GenomicDiversity.plotGenotypeByIndividualWithFst(GW_GenoData_indFiltered,
plotInfo
groupsToCompare, Fst_cutoff, missingFractionAllowed,
regionInfo, Fst, pairwiseNamesFst,
freqs, plotGroups, plotGroupColors;=4, figureSize=(1200,1600));
indFontSizeprintln("Completed the figure for ", chr, ".")
end
Calculated population allele frequencies and sample sizes
Calculated Fst values
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Completed the figure for gw1.