Greenish Warbler heterozygosity variance analysis

Author

Darren Irwin

Published

March 10, 2025

This page shows the code used to conduct the analysis of haploblocks in the Greenish Warbler ring species, using the ViSHet (Variance in Standardized Heterozygosity) statistic.

Prior to examining the code on this page, readers should look at GreenishWarblerGenomics2025.qmd (or .html) and GW_Zchromosome_analysis.qmd (or .html), as this current page depends on the code on those pages being run first.

Citation

The scripts, data, and figures shown in this website were used as the basis for the paper listed below, which should be cited as the source of information from this website:

Irwin, D., S. Bensch, C. Charlebois, G. David, A. Geraldes, S.K. Gupta, B. Harr, P. Holt, J.H. Irwin, V.V. Ivanitskii, I.M. Marova, Y. Niu, S. Seneviratne, A. Singh, Y. Wu, S. Zhang, T.D. Price. 2025. The distribution and dispersal of large haploblocks in a superspecies. Molecular Ecology, in press.

A note about plots in this document

The plots shown below may different somewhat in appearance between the version produced by Quarto (i.e., in this published document) and the version you would get if you run this code without using Quarto. In particular, the dimensions and font sizes of labels and titles may differ. So if you want the versions identical to those used in the paper, run the code directly in the Julia REPL (or using an environment such as VS Code) without using Quarto.

In the rendered (.html) version of this Quarto notebook, each figure may be accompanied by a warning caused by an interaction between Quarto and the Makie plotting package. Ignore these warnings as they do not affect the calculations or plots.

Load packages

using JLD2 # for loading saved data
using DataFrames # for storing data as type DataFrame
using CairoMakie # for plots
using Impute # for imputing missing genotypes
using Statistics # for var() function
using MultivariateStats # for getting variances from PCA model
using CSV # for reading in delimited files
using DelimitedFiles # for reading delimited files (the genotypic data)

Load my custom package GenomicDiversity:

using GenomicDiversity

Choose working directory

Adjust as appropriate for your computer:

dataDirectory = "/Users/darrenirwin/Dropbox/Darren's current work/"
cd(dataDirectory)

Load the filtered dataset

This dataset was produced through filtering in GreenishWarblerGenomics2025.qmd:

baseName = "GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome"
tagName = ".Jan2025."
filename = string(baseName, tagName, "ind_SNP_ind_filtered.jld2")
# load info into a dictionary:
d = load(filename)
if baseName != d["baseName"]
    println("WARNING: baseNames don't match between that defined above and in the saved file")
end
if tagName != d["tagName"]
    println("WARNING: tagNames don't match don't match between that defined above and in the saved file")
end
GW_GenoData_indFiltered = d["GW_GenoData_indFiltered"]
repoDirectory = d["repoDirectory"]
dataDirectory = d["dataDirectory"]
scaffold_info = d["scaffold_info"]
scaffold_lengths = d["scaffold_lengths"]
filenameTextMiddle = d["filenameTextMiddle"]
missingGenotypeThreshold = d["missingGenotypeThreshold"]
filenameTextEnd = d["filenameTextEnd"]
chromosomes_to_process =d["chromosomes_to_process"]
metadataFile = d["metadataFile"]
println("Loaded the filtered data.")
Loaded the filtered data.

Also define correctNames() function as in main script, to correct some names:

function correctNames(metadataColumn)
        metadataColumn_corrected = replace(metadataColumn, "GW_Armando_plate1_TTGW05_rep2" => "GW_Armando_plate1_TTGW05r2",
        "GW_Lane5_NA3-3ul" => "GW_Lane5_NA3",
        "GW_Armando_plate1_TTGW_15_05" => "GW_Armando_plate1_TTGW-15-05",
        "GW_Armando_plate1_TTGW_15_07" => "GW_Armando_plate1_TTGW-15-07",
        "GW_Armando_plate1_TTGW_15_08" => "GW_Armando_plate1_TTGW-15-08",
        "GW_Armando_plate1_TTGW_15_09" => "GW_Armando_plate1_TTGW-15-09",
        "GW_Armando_plate1_TTGW_15_01" => "GW_Armando_plate1_TTGW-15-01",
        "GW_Armando_plate1_TTGW_15_02" => "GW_Armando_plate1_TTGW-15-02",   
        "GW_Armando_plate1_TTGW_15_03" => "GW_Armando_plate1_TTGW-15-03",
        "GW_Armando_plate1_TTGW_15_04" => "GW_Armando_plate1_TTGW-15-04",
        "GW_Armando_plate1_TTGW_15_06" => "GW_Armando_plate1_TTGW-15-06",
        "GW_Armando_plate1_TTGW_15_10" => "GW_Armando_plate1_TTGW-15-10",
        "GW_Armando_plate2_TTGW_15_01" => "GW_Armando_plate2_TTGW-15-01",
        "GW_Armando_plate2_TTGW_15_02" => "GW_Armando_plate2_TTGW-15-02",
        "GW_Armando_plate2_TTGW_15_03" => "GW_Armando_plate2_TTGW-15-03",
        "GW_Armando_plate2_TTGW_15_04" => "GW_Armando_plate2_TTGW-15-04",
        "GW_Armando_plate2_TTGW_15_06" => "GW_Armando_plate2_TTGW-15-06",
        "GW_Armando_plate2_TTGW_15_10" => "GW_Armando_plate2_TTGW-15-10") 
end
correctNames (generic function with 1 method)

Replace the Z chromosome SNPs with the filtered Z chromosome SNPs

# remove the Z SNPs from the big dataset loaded above:
selection = (GW_GenoData_indFiltered.positions.chrom .!= "gwZ")
GW_GenoData_indFiltered.positions = GW_GenoData_indFiltered.positions[selection, :]
GW_GenoData_indFiltered.genotypes = GW_GenoData_indFiltered.genotypes[: , selection]

# load and add the Z filtered SNPs:
filename = string(baseName, tagName, "chrgwZ_cleaned.notImputed.jld2")
genosOnly_chrgwZ_cleaned = load(filename, "genotypes_gwZ_SNPfiltered")
ind_with_metadata_indFiltered_chrgwZ_cleaned = load(filename, "ind_with_metadata_indFiltered")
pos_SNP_filtered_chrgwZ_cleaned = load(filename, "pos_SNP_filtered_region")

if GW_GenoData_indFiltered.indInfo.ind != ind_with_metadata_indFiltered_chrgwZ_cleaned.ind
    println("Warning: the list of individuals in the big file and Z file are not completely identical.")
end

GW_GenoData_indFiltered.positions = vcat(GW_GenoData_indFiltered.positions, pos_SNP_filtered_chrgwZ_cleaned)
GW_GenoData_indFiltered.genotypes = hcat(GW_GenoData_indFiltered.genotypes, genosOnly_chrgwZ_cleaned)
println("Replaced the Z chromosome data with the filtered Z data.")

# copy the sex column so we can use later:
GW_GenoData_indFiltered.indInfo.sex = ind_with_metadata_indFiltered_chrgwZ_cleaned.sex;
Replaced the Z chromosome data with the filtered Z data.

Adjust sample plotting order

This sets the genotype-by-individual plots to arrange sample sites according to ring_km, and sets the nitidus samples to -2500 km and the Siberian hybrid to 5000 km (just for plotting):

GW_GenoData_indFiltered.indInfo.original_plot_order = GW_GenoData_indFiltered.indInfo.plot_order
GW_GenoData_indFiltered.indInfo.plot_order = GW_GenoData_indFiltered.indInfo.ring_km
GW_GenoData_indFiltered.indInfo.plot_order[GW_GenoData_indFiltered.indInfo.Fst_group .== "nit"] .= -2500
GW_GenoData_indFiltered.indInfo.plot_order[GW_GenoData_indFiltered.indInfo.Fst_group .== "plumb_vir"] .= 5000;

Prepare data for Genotype-by-individual plots and PCA

For missing genotypes, change our code of -1 to missing (a special data type meaning missing data, for the later imputation step):

GW_GenoData_indFiltered.genotypes = Matrix{Union{Missing, Int16}}(GW_GenoData_indFiltered.genotypes)
GW_GenoData_indFiltered.genotypes = replace(GW_GenoData_indFiltered.genotypes, -1 => missing)
257×1015750 Matrix{Union{Missing, Int16}}:
 0  0  0  0  0  1  0  0         0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  1         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 ⋮              ⋮                     ⋱        ⋮              ⋮           
 0  0  0  0  0  0  0  0         0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0   missing  0  0  …  0  0  0  2  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0  …  0  0  0  1  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0         0  0     0  0  0  0  0  0  0  0  0  0  0  0

Make list of scaffolds to plot:

scaffolds_to_plot = "gw" .* string.(vcat(1, "1A", 2:4, "4A", 5:15, 17:28, "Z"))
30-element Vector{String}:
 "gw1"
 "gw1A"
 "gw2"
 "gw3"
 "gw4"
 "gw4A"
 "gw5"
 "gw6"
 "gw7"
 "gw8"
 "gw9"
 "gw10"
 "gw11"
 ⋮
 "gw18"
 "gw19"
 "gw20"
 "gw21"
 "gw22"
 "gw23"
 "gw24"
 "gw25"
 "gw26"
 "gw27"
 "gw28"
 "gwZ"

Determine number of SNPs in the chromosomes above

sum(map(in(scaffolds_to_plot), GW_GenoData_indFiltered.positions.chrom))
1003924

This reports that 1003924 SNPs are within the listed chromosomes. (Good because that matches the number as determined in GW_PCAplots.qmd)

Choose groups and colors

groups_to_plot_PCA = ["vir","vir_S","nit", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN","troch_EM","obs","plumb_BJ","plumb","plumb_vir"]
group_colors_PCA = ["blue","turquoise1","grey","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow","gold","orange","pink","red","purple"];

Show windowed heterozygosity for individuals (for one example scaffold)

# option to select a subset of individuals
filterGroups = false  # false means include all individuals

groupsToInclude = ["lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west"]
numIndsToPlot = [1000, 1000, 1000, 1000, 1000, 1000]

if filterGroups
    genosOnly_included, ind_with_metadata_included = limitIndsToPlot(groupsToInclude, numIndsToPlot, GW_GenoData_indFiltered.genotypes, GW_GenoData_indFiltered.indInfo)
else
    genosOnly_included = GW_GenoData_indFiltered.genotypes
    ind_with_metadata_included = GW_GenoData_indFiltered.indInfo
end

chr = "gw15"
windowSize = 500
loci_selection = (GW_GenoData_indFiltered.positions.chrom .== chr)
pos_region = GW_GenoData_indFiltered.positions[loci_selection, :]
genotypes_region = genosOnly_included[:, loci_selection]

windowedPos, windowedIndHet = getWindowedIndHet(genotypes_region, pos_region, windowSize)

plotTitle = string("Windowed heterozygosity of ", size(windowedIndHet, 1), " individuals")
titleSize = 24
xLabelText = string("Location on scaffold ", chr)
yLabelText = "Heterozygosity"
labelSize = 24
f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title=plotTitle, titlesize=titleSize,
    xlabel=xLabelText, xlabelsize=labelSize,
    ylabel=yLabelText, ylabelsize=labelSize)
lines!(windowedPos, windowedIndHet[1, :])
for i in 2:size(windowedIndHet, 1)
    lines!(windowedPos, windowedIndHet[i, :])
end
meanPerWindow_windowedIndHet = sum(windowedIndHet, dims=1) ./ size(windowedIndHet, 1)
lines!(windowedPos, vec(meanPerWindow_windowedIndHet), linewidth=10, color=:red)
display(f)

windowedIndHet_standardized = standardizeIndHet(windowedIndHet)

plotTitle = string("Standardized heterozygosity of ", size(windowedIndHet, 1), " individuals")
titleSize = 24
xLabelText = string("Location on scaffold ", chr)
yLabelText = "Standardized heterozygosity"
labelSize = 24
g = CairoMakie.Figure()
ax = Axis(g[1, 1],
    title=plotTitle, titlesize=titleSize,
    xlabel=xLabelText, xlabelsize=labelSize,
    ylabel=yLabelText, ylabelsize=labelSize)
for i in 1:size(windowedIndHet_standardized, 1)
    lines!(windowedPos, windowedIndHet_standardized[i, :])
end
display(g)

# Now graph the variance in standardized heterozygosity

windowedViSHet = getWindowedViSHet(windowedIndHet_standardized)

plotTitle = string("Var. in Stand. Het. (ViSHet) among ", size(windowedIndHet, 1), " individuals")
titleSize = 18
xLabelText = string("Location on scaffold ", chr)
yLabelText = "ViSHet"
labelSize = 18
h = CairoMakie.Figure()
ax = Axis(h[1, 1],
    title=plotTitle, titlesize=titleSize,
    xlabel=xLabelText, xlabelsize=labelSize,
    ylabel=yLabelText, ylabelsize=labelSize)
lines!(windowedPos, windowedViSHet)
display(h)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Make ViSHet plot for whole genome:

(ViSHet is Variance in Standardized Heterozygosity.)

# option to select a subset of individuals
filterGroups = false  # false means include all individuals

groupsToInclude = ["vir"]
numIndsToPlot = [1000]

if filterGroups
    genosOnly_included, ind_with_metadata_included = limitIndsToPlot(groupsToInclude, numIndsToPlot, GW_GenoData_indFiltered.genotypes, GW_GenoData_indFiltered.indInfo)
else
    genosOnly_included = GW_GenoData_indFiltered.genotypes
    ind_with_metadata_included = GW_GenoData_indFiltered.indInfo
end

scaffolds_for_ViSHet = scaffolds_to_plot
#initialize data structures
windowed_pos_all = DataFrame(chrom = String[], mean_position = Float64[], first_position = Int[], last_position = Int[])
windowed_ViSHet_all = Vector{Float32}(undef, 0)
for chrom in scaffolds_for_ViSHet
    regionText = string("chr", chrom)
    loci_selection = (GW_GenoData_indFiltered.positions.chrom .== chrom)
    pos_region = GW_GenoData_indFiltered.positions[loci_selection, :]
    genotypes_region = GW_GenoData_indFiltered.genotypes[:, loci_selection]
    if chrom == "gwZ" #include only males (because females have only one Z)
        genotypes_region_males = genotypes_region[GW_GenoData_indFiltered.indInfo.sex .== "M", :]
        windowedPos, windowedIndHet = getWindowedIndHet(genotypes_region_males, pos_region, windowSize)
    else # for all other chromosomes, include all individuals
        windowedPos, windowedIndHet = getWindowedIndHet(genotypes_region, pos_region, windowSize)
    end
    windowBoundaries = getWindowBoundaries(pos_region.position, windowSize)
    windowedIndHet_standardized = standardizeIndHet(windowedIndHet)
    windowed_ViSHet_scaffold = getWindowedViSHet(windowedIndHet_standardized)
    windowed_pos_chrom = DataFrame(chrom = repeat([chrom], length(windowedPos)), mean_position = windowedPos, first_position = windowBoundaries[:,1], last_position = windowBoundaries[:,2])
    windowed_pos_all = vcat(windowed_pos_all, windowed_pos_chrom)
    windowed_ViSHet_all = [windowed_ViSHet_all; windowed_ViSHet_scaffold]
end

Identify “haploblock regions” as those that have high ViSHet

threshold_ViSHet = 0.4
selection = windowed_ViSHet_all .>= threshold_ViSHet
windowed_pos_all.high_ViSHet = selection # adds true/false column to dataframe indicating high ViSHet windows

# Make list of contiguous high ViSHet region:

highViSHetRegions = DataFrame(regionChrom = String[], regionStart = Int[], regionEnd = Int[])
i = 1
lastWindow = nrow(windowed_pos_all)
while i <= lastWindow # eachindex(windowed_pos_all[:,1])
    if windowed_pos_all.high_ViSHet[i] == true
        regionChrom = windowed_pos_all.chrom[i]
        regionStart = windowed_pos_all.first_position[i]
        regionEnd = windowed_pos_all.last_position[i]
        # check whether contiguous with next
        next = 1
        while i + next <= lastWindow && windowed_pos_all.chrom[i + next] == regionChrom
            if windowed_pos_all.high_ViSHet[i + next] == true
                regionEnd = windowed_pos_all.last_position[i + next]
                next += 1
            else
                break
            end
        end
        highViSHetRegions = push!(highViSHetRegions, [regionChrom, regionStart, regionEnd])
        i = i + next + 1
    else
        i = i + 1
    end 
    i
end
highViSHetRegions
39×3 DataFrame
14 rows omitted
Row regionChrom regionStart regionEnd
String Int64 Int64
1 gw1 15689747 23478124
2 gw1A 4674 3771263
3 gw1A 23592559 30616953
4 gw2 54537375 59262130
5 gw2 60234161 61533451
6 gw3 101192949 103495514
7 gw3 104554714 108279595
8 gw4 5295912 5438270
9 gw4 14837641 16117455
10 gw4 20930552 23610800
11 gw4A 379058 730094
12 gw5 10095304 10956815
13 gw6 34584054 35259663
28 gw19 43362 1006242
29 gw20 27354 721651
30 gw20 5852254 6671670
31 gw21 3275121 3731689
32 gw22 5214430 5775824
33 gw23 4135459 4774426
34 gw24 3468239 4001782
35 gw25 5185626 5473966
36 gw26 4153299 5549635
37 gw27 41621 541081
38 gw28 1822776 2522648
39 gwZ 68372986 73749599

Determine fraction of genome that has high ViSHet

# get total length of high ViSHet regions 
sum_highViSHetRegions = sum(highViSHetRegions.regionEnd .- highViSHetRegions.regionStart)

# get total lengths of all scaffolds
sum_scaffold_lengths = 0 
for scaffold_name in scaffolds_to_plot
    println(scaffold_name)
    sum_scaffold_lengths += scaffold_lengths[scaffold_name]
end
sum_scaffold_lengths

# calculate percent of genome in high ViSHet regions
percentGenomeHighViSHet = 100 * sum_highViSHetRegions / sum_scaffold_lengths
println("The percent of the genome in high ViSHet regions is $percentGenomeHighViSHet")
gw1
gw1A
gw2
gw3
gw4
gw4A
gw5
gw6
gw7
gw8
gw9
gw10
gw11
gw12
gw13
gw14
gw15
gw17
gw18
gw19
gw20
gw21
gw22
gw23
gw24
gw25
gw26
gw27
gw28
gwZ
The percent of the genome in high ViSHet regions is 5.799030328440733

Plot LHBRs (Large Haploblock Regions) for whole genome

fig2 = plotGenomeViSHet(scaffolds_to_plot, 
                        windowed_ViSHet_all,
                        windowed_pos_all;
                        fillColor = "purple",
                        lineTransparency = 0.8,
                        fillTransparency = 0.2,
                        figureSize=(1200, 1200),
                        plotRegions = true,
                        regionsToPlot = highViSHetRegions,
                        regionColor = "magenta")
if false  # set to true to save plot
    filename = "Figure2_ViSHet_allGenome_fromJulia.png"
    save(filename, fig2, px_per_unit = 2.0)
    println("Saved ", filename)
end 
[["gw1", "gw4A", "gw6"], ["gw1A", "gw4", "gw9"], ["gw2", "gw8"], ["gw3", "gw5"], ["gw7", "gw10", "gw11", "gw12", "gw13", "gw14"], ["gw15", "gw17", "gw18", "gw19", "gw20", "gw21", "gw22", "gw23", "gw24", "gw25"], ["gw26", "gw27", "gw28", "gwZ"]]
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Note: from this point on will not use GenoData objects (although would be more concise code)

The paper was accepted and I need to get this final version posted. The code below uses the metadata, genotype matrix, and loci positions as separate data objects. This works fine, as the funtions are built to use either forms of data input.

pos_SNP_filtered = GW_GenoData_indFiltered.positions
genosOnly_included = GW_GenoData_indFiltered.genotypes
ind_with_metadata_indFiltered = GW_GenoData_indFiltered.indInfo;

Make a PCA based on all regions not in LHBRs

First remove the LHBR regions

# cycle through the LHBRs, determining the SNPs within each to remove from the dataset:
lociToRemove = fill(false, nrow(pos_SNP_filtered))
for i in eachrow(highViSHetRegions) 
    lociWithinThisLHBR = (pos_SNP_filtered.chrom .== i.regionChrom) .&& 
                    (pos_SNP_filtered.position .>= i.regionStart) .&&
                    (pos_SNP_filtered.position .<= i.regionEnd)
    lociToRemove = lociToRemove .|| lociWithinThisLHBR
end
# now actually remove them:
genosOnly_included_nonLHBR = genosOnly_included[:, .!lociToRemove]
pos_SNP_filtered_nonLHBR = pos_SNP_filtered[.!lociToRemove, :]
num_removed = size(pos_SNP_filtered, 1) - size(pos_SNP_filtered_nonLHBR, 1)
println("Removed $num_removed loci for the non-LHBR PCA.")
Removed 46500 loci for the non-LHBR PCA.

Make list of scaffolds to include in the whole-genome non-LHBR PCA:

chromosomes_to_include = "gw" .* string.(vcat(28:-1:17, 15:-1:1))
push!(chromosomes_to_include, "gw1A", "gw4A", "gwZ")  # add two other scaffolds
30-element Vector{String}:
 "gw28"
 "gw27"
 "gw26"
 "gw25"
 "gw24"
 "gw23"
 "gw22"
 "gw21"
 "gw20"
 "gw19"
 "gw18"
 "gw17"
 "gw15"
 ⋮
 "gw9"
 "gw8"
 "gw7"
 "gw6"
 "gw5"
 "gw4"
 "gw3"
 "gw2"
 "gw1"
 "gw1A"
 "gw4A"
 "gwZ"

Imputation using KNN

Did this on 11Jan2025 and saved files, so inactivated this cell for now (can take up to several minutes for each big scaffold):

for i in eachindex(chromosomes_to_include)
    chrom = chromosomes_to_include[i]
    regionText = string("chr", chrom, "nonLHBR") # this is where "nonLHBR will be incorporated in file name"
    loci_selection = (pos_SNP_filtered_nonLHBR.chrom .== chrom)
    pos_SNP_filtered_nonLHBR_region = pos_SNP_filtered_nonLHBR[loci_selection,:]
    genosOnly_nonLHBR_region_for_imputing = Matrix{Union{Missing, Float32}}(genosOnly_included_nonLHBR[:,loci_selection])
    @time imputed_genos = Impute.knn(genosOnly_nonLHBR_region_for_imputing; dims = :rows)
    filename = string(baseName, tagName, regionText, ".KNNimputedMissing.jld2")
    jldsave(filename; imputed_genos, ind_with_metadata_indFiltered, pos_SNP_filtered_nonLHBR_region)
    println(string(regionText, ": Saved real and imputed genotypes for non_LHBR parts of genome, for ", size(pos_SNP_filtered_nonLHBR_region, 1)," SNPs and ", size(genosOnly_nonLHBR_region_for_imputing, 1)," filtered individuals, in file $filename"))
end

Load saved imputed data for each chromosome (the non-LHBR part):

# initialize data structures
genos_imputed_loaded = Matrix{Union{Missing, Float32}}(undef, nrow(ind_with_metadata_indFiltered), 0)
pos_SNP_loaded = DataFrame(chrom = String[], position = Int64[])
for i in eachindex(chromosomes_to_include)
    chrom = chromosomes_to_include[i]
    regionText = string("chr", chrom, "nonLHBR")
    filename = string(baseName, tagName, regionText, ".KNNimputedMissing.jld2")
    imputed_genos_one_chr = load(filename, "imputed_genos")
    genos_imputed_loaded = hcat(genos_imputed_loaded, imputed_genos_one_chr)
    if ind_with_metadata_indFiltered.ind != load(filename, "ind_with_metadata_indFiltered")[:, :ind]
        println("""Warning: "ind" columns in loaded data and memory data don't match.""")
    end
    pos_SNP_filtered_region = load(filename, "pos_SNP_filtered_nonLHBR_region")
    pos_SNP_loaded = vcat(pos_SNP_loaded, pos_SNP_filtered_region)
    println(string("Loaded ",filename))
    println(string(regionText, ": ", size(imputed_genos_one_chr,2), " SNPs from ", size(imputed_genos_one_chr,1), " individuals"))
end
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw28nonLHBR.KNNimputedMissing.jld2
chrgw28nonLHBR: 10180 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw27nonLHBR.KNNimputedMissing.jld2
chrgw27nonLHBR: 9184 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw26nonLHBR.KNNimputedMissing.jld2
chrgw26nonLHBR: 11803 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw25nonLHBR.KNNimputedMissing.jld2
chrgw25nonLHBR: 3294 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw24nonLHBR.KNNimputedMissing.jld2
chrgw24nonLHBR: 13321 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw23nonLHBR.KNNimputedMissing.jld2
chrgw23nonLHBR: 12949 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw22nonLHBR.KNNimputedMissing.jld2
chrgw22nonLHBR: 4973 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw21nonLHBR.KNNimputedMissing.jld2
chrgw21nonLHBR: 12821 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw20nonLHBR.KNNimputedMissing.jld2
chrgw20nonLHBR: 30239 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw19nonLHBR.KNNimputedMissing.jld2
chrgw19nonLHBR: 23914 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw18nonLHBR.KNNimputedMissing.jld2
chrgw18nonLHBR: 17359 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw17nonLHBR.KNNimputedMissing.jld2
chrgw17nonLHBR: 24313 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw15nonLHBR.KNNimputedMissing.jld2
chrgw15nonLHBR: 25517 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw14nonLHBR.KNNimputedMissing.jld2
chrgw14nonLHBR: 28469 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw13nonLHBR.KNNimputedMissing.jld2
chrgw13nonLHBR: 30543 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw12nonLHBR.KNNimputedMissing.jld2
chrgw12nonLHBR: 31794 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw11nonLHBR.KNNimputedMissing.jld2
chrgw11nonLHBR: 27183 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw10nonLHBR.KNNimputedMissing.jld2
chrgw10nonLHBR: 26462 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw9nonLHBR.KNNimputedMissing.jld2
chrgw9nonLHBR: 37680 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw8nonLHBR.KNNimputedMissing.jld2
chrgw8nonLHBR: 37318 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw7nonLHBR.KNNimputedMissing.jld2
chrgw7nonLHBR: 35575 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw6nonLHBR.KNNimputedMissing.jld2
chrgw6nonLHBR: 39675 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw5nonLHBR.KNNimputedMissing.jld2
chrgw5nonLHBR: 54829 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw4nonLHBR.KNNimputedMissing.jld2
chrgw4nonLHBR: 47980 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw3nonLHBR.KNNimputedMissing.jld2
chrgw3nonLHBR: 79372 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw2nonLHBR.KNNimputedMissing.jld2
chrgw2nonLHBR: 91292 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw1nonLHBR.KNNimputedMissing.jld2
chrgw1nonLHBR: 77362 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw1AnonLHBR.KNNimputedMissing.jld2
chrgw1AnonLHBR: 44551 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw4AnonLHBR.KNNimputedMissing.jld2
chrgw4AnonLHBR: 17467 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgwZnonLHBR.KNNimputedMissing.jld2
chrgwZnonLHBR: 50005 SNPs from 257 individuals

Now do the PCA for non-LHBR parts of genome:

flipPC1 = true
flipPC2 = true
PCA_wholeGenome = plotPCA(genos_imputed_loaded, ind_with_metadata_indFiltered, 
        groups_to_plot_PCA, group_colors_PCA; 
        sampleSet = "greenish warblers", regionText = "non-LHBR wholeGenome",
        flip1 = flipPC1, flip2 = flipPC2,
        lineOpacity = 0.7, fillOpacity = 0.6,
        symbolSize = 14, showTitle = false)
totalObservationVariance = var(PCA_wholeGenome.model) 
PC1_variance, PC2_variance = principalvars(PCA_wholeGenome.model)[1:2]
PC1_prop_variance = PC1_variance / totalObservationVariance
PC2_prop_variance = PC2_variance / totalObservationVariance
println("PC1 explains ", 100*PC1_prop_variance, "% of the total variance.
PC2 explains ", 100*PC2_prop_variance, "%.")
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

PC1 explains 10.945476% of the total variance.
PC2 explains 5.529751%.

The above looks quite similar to the whole-genome PCA including the LHBRs, but the two axes explain a little less of the overall variation. The overall conclusion: There is a lot of signal for geographic structure, both outside and inside of the LHBRs.

Do non-LHBR PCA for a specific chromosome:

selection = pos_SNP_loaded.chrom .== "gw4A"
pos_SNP_loaded_oneChr = pos_SNP_loaded[selection, :]
genos_imputed_loaded_oneChr = genos_imputed_loaded[:, selection]
flipPC1 = true
flipPC2 = true
PCA_oneChr = plotPCA(genos_imputed_loaded_oneChr, ind_with_metadata_indFiltered, 
        groups_to_plot_PCA, group_colors_PCA; 
        sampleSet = "greenish warblers", regionText = "non-LHBR gw4A",
        flip1 = flipPC1, flip2 = flipPC2,
        lineOpacity = 0.7, fillOpacity = 0.6,
        symbolSize = 14, showTitle = false)
totalObservationVariance = var(PCA_oneChr.model) 
PC1_variance, PC2_variance = principalvars(PCA_oneChr.model)[1:2]
PC1_prop_variance = PC1_variance / totalObservationVariance
PC2_prop_variance = PC2_variance / totalObservationVariance
println("PC1 explains ", 100*PC1_prop_variance, "% of the total variance.
PC2 explains ", 100*PC2_prop_variance, "%.")
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

PC1 explains 7.5557256% of the total variance.
PC2 explains 4.751407%.

There is a whole lot of signal for population structure in the non-LHBR regions—maybe not surprising given there are high-Fst regions with low ViSHet.

Choose low-het individuals in a high ViSHet region

Now that we have high ViSHet regions indicated, we can automate the choosing of essential homozygous individuals in those regions. Below I will do this for chrZ (and later other chromosomes).

First, here’s two functions, first for getting the boundaries of the longest high ViSHet region from a scaffold, second for getting a bunch of info about the region:

function getOneHighViSHetRegion(highViSHetRegions, chr)
    selection = (highViSHetRegions.regionChrom .== chr)
    if sum(selection) == 1
        println("Good news: 1 region on that scaffold")
        positionMin = highViSHetRegions.regionStart[selection][1]
        positionMax = highViSHetRegions.regionEnd[selection][1]
        regionText = string("chr ", chr, " ",positionMin," to ",positionMax)
    elseif sum(selection) > 1
        println("More than 1 region on that scaffold. Using just the longest one.")
        highViSHetRegions_chr = highViSHetRegions[selection, :]
        display(highViSHetRegions_chr)
        # get biggest region (first one if tied):
        regionSizes = highViSHetRegions_chr.regionEnd .- highViSHetRegions_chr.regionStart
        indexOfLongest = findfirst(regionSizes .== maximum(regionSizes))
        positionMin = highViSHetRegions_chr.regionStart[indexOfLongest]
        positionMax = highViSHetRegions_chr.regionEnd[indexOfLongest]
        regionText = string("chr ", chr, " ",positionMin," to ",positionMax)
    elseif sum(selection) == 0
        println("No high ViSHet regions on that scaffold")
        return
    end
    return positionMin, positionMax, regionText
end


function getWindowedIndHetStanRegion(genos, pos, 
                                    highViSHetRegions, chr;
                                    windowSize = 500)
    # remake the windowedIndHet_standardized (done above in a different cell)
    loci_selection = (pos.chrom .== chr)
    pos_region = pos[loci_selection, :]
    genotypes_region = genos[:, loci_selection]
    windowedPos, windowedIndHet = getWindowedIndHet(genotypes_region, pos_region, windowSize)
    windowedIndHet_standardized = standardizeIndHet(windowedIndHet)
    # look up the boundaries of the high ViSHet region:
    positionMin, positionMax, regionText = getOneHighViSHetRegion(highViSHetRegions, chr)
    # choose just the windows that are in the high ViSHet region:
    window_selection = (positionMin .< windowedPos .< positionMax)
    windowedIndHetStanRegion = windowedIndHet_standardized[:,window_selection]
    meanAcrossRegionIndHetStan = mean.(eachrow(windowedIndHetStanRegion)) 
    # choose loci in region
    lociSelection = (positionMin .< pos_region.position .< positionMax)
    pos_highViSHetRegion = pos_region[lociSelection, :]
    genos_highViSHetRegion = genotypes_region[:, lociSelection]
    # convert `-1` genotypes (which indicates missing) to `missing`:
    replace!(genos_highViSHetRegion, -1 => missing)
    regionInfo = chooseChrRegion(pos_highViSHetRegion, chr; positionMin=positionMin, positionMax=positionMax) # this makes appropriate text describing the region
    return positionMin, positionMax, regionText, 
            windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
            genos_highViSHetRegion, pos_highViSHetRegion, regionInfo
end
getWindowedIndHetStanRegion (generic function with 1 method)

Now do a PCA for just one LHBR (this time on the Z chromosome)

The below will show a PCA for all individuals, and another for just the low-heterozygosity individuals (for this LHBR)

# choose scaffold
chr = "gwZ"

positionMin, positionMax, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
    getWindowedIndHetStanRegion(genosOnly_included, 
                            pos_SNP_filtered, 
                            highViSHetRegions, chr;
                            windowSize = 500)

# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)

# Add column to metadata containing the regionIndHetStan for this highHet region:
command = "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
eval(Meta.parse(command)) # this executes the command constructed above
ind_with_metadata_included.regionIndHetStan = meanAcrossRegionIndHetStan

#names(ind_with_metadata_included)

# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)

# PCA of all individuals:

genos_highViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))

flipPC1 = true
flipPC2 = false

PCAmodelAll = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelAll.PCAfig)

# Add PC values to metadata for individuals included in PCA above:
if flipPC1
    PCAmodelAll.metadata.PC1 = -1 .* PCAmodelAll.values[1,:]
else 
    PCAmodelAll.metadata.PC1 = PCAmodelAll.values[1,:]
end
if flipPC2
    PCAmodelAll.metadata.PC2 = -1 .* PCAmodelAll.values[2,:]
else
    PCAmodelAll.metadata.PC2 = PCAmodelAll.values[2,:]
end
# also flip PC3:
PCAmodelAll.metadata.PC3 = -1 .* PCAmodelAll.values[3,:]

# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`

# choose inds with low IndHet in high ViSHet region:
indSelection_lowIndHetStan = (meanAcrossRegionIndHetStan .< 2) 

#Plot only the lowIndHetStan individuals, PC1 to PC2:

fig_3A = CairoMakie.Figure()
ax = Axis(fig_3A[1, 1],
    title = "gwZ LHBR PC1 vs. PC2, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(fig_3A)
if false  # set to true to save plot
    save("Figure3A_from_Julia.png", fig_3A, px_per_unit = 2.0)
end 
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now plot PC1 vs. PC3 for low ViSHet individuals

#Plot only the lowIndHetStan individuals, PC1 to PC2:

fig_3B = CairoMakie.Figure()
ax = Axis(fig_3B[1, 1],
    title = "gwZ LHBR PC1 vs. PC3, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC3", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(fig_3B)
if false  # set to true to save plot
    save("Figure3B_from_Julia.png", fig_3B, px_per_unit = 2.0)
end 
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Save the individual colors in the metadata

indColors = fill("", size(PCAmodelAll.metadata, 1))
for i in axes(PCAmodelAll.metadata, 1)
    indColors[i] = group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
end
PCAmodelAll.metadata.indColorLeft = indColors
PCAmodelAll.metadata.indColorRight = indColors;

Plot PC1 vs. PC2:

fig_3C = CairoMakie.Figure()
ax = Axis(fig_3C[1, 1],
    title = "gwZ LHBR PC1 vs. PC2, all individuals",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(fig_3C)
if false  # set to true to save plot
    save("Figure3C_from_Julia.png", fig_3C, px_per_unit = 2.0)
end 
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Plot PC1 vs. PC3:

fig_3D = CairoMakie.Figure()
ax = Axis(fig_3D[1, 1],
    title = "gwZ LHBR PC1 vs. PC3, all individuals",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC3", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(fig_3D)
if false  # set to true to save plot
    save("Figure3D_from_Julia.png", fig_3D, px_per_unit = 2.0)
end
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Assign genotype groups based on PCA and heterozygosity

It is clear that there are six clear haplogroups of Z high ViSHet region. Divide samples into those groups, based on PCA scores, and then calculate pi and Dxy. Make a genotype-by-individual plot of homozygous LHBR individuals, with colors on left side indicating the genotype groups.

# Inspect Z chromosome PCA of low IndHet (< 2) individuals,
# and specify group boundaries:

clusterNames = ["vir",
                "nit",
                "lud",
                "troch",
                "obs",
                "plumb"]

clusterColors = ["blue",
                "grey",
                "seagreen4",
                "yellow",
                "orange",
                "red"]

vir = (PCAmodelAll.metadata.PC1 .< -4.5) .& 
        (3 .< PCAmodelAll.metadata.PC2) .&
        (PCAmodelAll.metadata.PC3 .> 2) .& 
        indSelection_lowIndHetStan
nit = (-4 .< PCAmodelAll.metadata.PC1 .< -2) .& 
        (1 .< PCAmodelAll.metadata.PC2 .< 3) .&
        indSelection_lowIndHetStan
lud = (PCAmodelAll.metadata.PC1 .< -4.5) .& 
        (3 .< PCAmodelAll.metadata.PC2) .& 
        (PCAmodelAll.metadata.PC3 .< -2) .& 
        indSelection_lowIndHetStan
troch = (PCAmodelAll.metadata.PC2 .< -5) .& 
        indSelection_lowIndHetStan
obs = (-2 .< PCAmodelAll.metadata.PC1 .< 2) .& 
        (-3 .< PCAmodelAll.metadata.PC2 .< -1) .& 
        indSelection_lowIndHetStan
plumb = (5 .< PCAmodelAll.metadata.PC1) .& 
        indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[vir]
PCAmodelAll.metadata.Fst_group[nit] # note there are two nitidus with nearly identical values
PCAmodelAll.metadata.Fst_group[lud]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]

clusterArray = [vir nit lud troch obs plumb]

# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")

# create vectors that indicate the groups and plot order for this analysis:
clusterMembership = fill("none", nrow(PCAmodelAll.metadata))
plotOrder = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembership[clusterArray[:,i]] .= clusterNames[i]
    plotOrder[clusterArray[:,i]] .= i
end

"""
    getFreqsAndSampleSizesBySexForZ(genoData, sex, indGroup, groupsToCalc)

Calculate allele frequencies and sample sizes for each group and SNP, taking into account sex for analysis of Z chromosome.

​# Arguments
- `genoData`: The genotype matrix, where rows are individuals and columns are loci, with genotype codes 0,1,2 meaning homozygous reference, heterozygote, homozygous alternate, and missing genotypes can be either -1 or `missing`.
- `sex`: Vector of sexes ('f` or `m`)
- `indGroup`: A vector providing the group name each individual belongs to.
- `groupsToCalc`: A list of group names to include in calculations.

# Notes
Returns a tuple containing 1) a matrix of frequencies, and 2) a matrix of samples sizes (in both, rows are groups and columns are loci). 
"""
function getFreqsAndSampleSizesBySexForZ(genoData, sex, indGroup, groupsToCalc)
    if any(.!map(x -> x in ["F", "M"], sex))
        println("Warning: not all entries in sex vector are `F` or `M`")
    end
    genoData[ismissing.(genoData)] .= -1 # if "missing" datatype is use, convert to -1
    groupCount = length(groupsToCalc)
    freqs = Array{Float32,2}(undef, groupCount, size(genoData, 2))
    sampleSizes = Array{Number,2}(undef, groupCount, size(genoData, 2))
    for i in 1:groupCount
        # females:
        selection = (indGroup .== groupsToCalc[i]) .& (sex .== "F") # gets the correct rows for individuals in the group 
        geno0countsF = sum(genoData[selection, :] .== 0, dims=1) # count by column the number of 0 genotypes (homozygous ref)
        geno1countsF = sum(genoData[selection, :] .== 1, dims=1) # same for 1 genotypes (heterozygous)
        geno2countsF = sum(genoData[selection, :] .== 2, dims=1) # same for 2 genotypes (homozygous alternate) 

        # males: 
        selection = (indGroup .== groupsToCalc[i]) .& (sex .== "M")
        geno0countsM = sum(genoData[selection, :] .== 0, dims=1)
        geno1countsM = sum(genoData[selection, :] .== 1, dims=1)
        geno2countsM = sum(genoData[selection, :] .== 2, dims=1)
        allele0counts = (2 .* geno0countsM) .+ geno1countsM .+ geno0countsF .+ (0.5 .* geno1countsF)
        allele2counts = (2 .* geno2countsM) .+ geno1countsM .+ geno2countsF .+ (0.5 .* geno1countsF)
        sumAlleleCounts = allele0counts .+ allele2counts
        sampleSizes[i, :] = 0.5 .* sumAlleleCounts # sample size in number of individuals
        freqs[i, :] = allele2counts ./ sumAlleleCounts
    end
    return freqs, sampleSizes
end

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizesBySexForZ(genos_highViSHetRegion, ind_with_metadata_included.sex, clusterMembership, clusterNames)
println("Calculated population allele frequencies and sample sizes")

# Calculate per-site pi (within-group nucleotide distance)

sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"

Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames; among=false)  # set among to FALSE if no among Fst wanted (some things won't work without it) 

# Now get averages of pi and Dxy for whole region:

regionPiTable = DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
#= 6×2 DataFrame
 Row │ cluster  pi         
     │ String   Float64    
─────┼─────────────────────
   1 │ vir      0.00706945
   2 │ nit      0.0035094
   3 │ lud      0.00794828
   4 │ troch    0.00968743
   5 │ obs      0.0112686
   6 │ plumb    0.0104236 =#

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#=
15×2 DataFrame
 Row │ cluster_pair  Dxy       
     │ String        Float64   
─────┼─────────────────────────
   1 │ vir_nit       0.0388124
   2 │ vir_lud       0.0173121
   3 │ vir_troch     0.0410345
   4 │ vir_obs       0.037509
   5 │ vir_plumb     0.0466843
   6 │ nit_lud       0.0365198
   7 │ nit_troch     0.0464045
   8 │ nit_obs       0.0427296
   9 │ nit_plumb     0.0524538
  10 │ lud_troch     0.038968
  11 │ lud_obs       0.0354195
  12 │ lud_plumb     0.0445131
  13 │ troch_obs     0.0265615
  14 │ troch_plumb   0.0404278
  15 │ obs_plumb     0.0365139 =#

# It seems the distances are not very consistent with a bifurcating tree,
# nor 1-D isolation by distance, but something more complex.
# Obscuratus is closer to viridanus than troch is.
# Nitidus quite distant but gets put in centre of PCA because off on its own axis.

# Make a genotype-by-individual plot using all variable loci in the region,
missingFractionAllowed = 0.1
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.original_Fst_groups = PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.Fst_group = clusterMembership
PCAmodelAll.metadata.original_plot_order = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.plot_order = plotOrder

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
pos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
Fst_selectedSNPs = Fst[:, selectedSNPs]
freqs_selectedSNPs = freqs[:, selectedSNPs]

# limit the number of individuals per group to plot
numIndsToPlot = [15, 15, 15, 15, 15, 15]

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNames, numIndsToPlot, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;
                missingFractionAllowed = missingFractionAllowed,
                titleFontSize = 20,
                indColorRightProvided = true);
The numbers in each group are [37 2 38 80 5 73] and the sum of those is 235
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now show a GBI plot like above, but with heterozygotes included:

clusterNamesWithHets = ["vir",
                        "vir_lud",
                        "nit",
                        "lud",
                        "lud_troch",
                        "troch",
                        "obs",
                        "plumb"]

clusterColorsWithHets = ["blue",
                        "lightseagreen",
                        "grey",
                        "green",
                        "yellowgreen",
                        "yellow",
                        "orange",
                        "red"]

vir_lud = (PCAmodelAll.metadata.PC1 .< -4.5) .& 
        (3 .< PCAmodelAll.metadata.PC2) .&
        (-1 .< PCAmodelAll.metadata.PC3 .< 0) # Note this one has lowIndHetStan but is mix of vir and lud
lud_troch = (-5 .< PCAmodelAll.metadata.PC1 .< -1) .&
                (-3 .< PCAmodelAll.metadata.PC2 .< 1) .&
                 .!indSelection_lowIndHetStan

clusterArray = [vir vir_lud nit lud lud_troch troch obs plumb]

sum(clusterArray, dims=1)

if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
    println("Good news: Individuals included in a group matches total number of individuals")
else 
    println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end

# check which individuals left out:

sum(clusterArray, dims=2)

PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[vec(sum(clusterArray, dims=2) .== 0)]

# create vectors that indicate the groups and plot order for this analysis:
clusterMembershipWithHets = fill("none", nrow(PCAmodelAll.metadata))
plotOrderWithHets = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembershipWithHets[clusterArray[:,i]] .= clusterNamesWithHets[i]
    plotOrderWithHets[clusterArray[:,i]] .= i
end

# Add column to main metadata object containing the cluster membership for this highHet region:
command = "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
eval(Meta.parse(command)) # this executes the command constructed above

# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.Fst_group = clusterMembershipWithHets
PCAmodelAll.metadata.plot_order = plotOrderWithHets

# limit the number of individuals per group to plot
numIndsToPlotWithHets = fill(15, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

fig_4 = plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indFontSize=8, figureSize=(1200, 1200),
                indColorLeftProvided = false,
                indColorRightProvided = true);

if false  # set to true to save plot
    save("Figure4_from_Julia.png", fig_4[1], px_per_unit = 2.0)
end
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show GBI plot according to original groups and plot order

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with all individuals

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

# Set no limit (or high limit anyway) on the number of individuals per group to plot
numIndsToPlotWithHets = fill(1000, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotInfo = plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indFontSize=6, figureSize=(1200, 1600),
                indColorLeftProvided = false,
                indColorRightProvided = true);

if false  # set to true to save plot
    save("FigureS1_from_Julia.png", plotInfo[1], px_per_unit = 2.0)
end
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the west area (without nitidus)

clusterNamesWithHetsWest = ["vir",
                        "lud",
                        "lud_troch",
                        "troch",]

clusterColorsWithHetsWest = ["blue",
                        "green",
                        "yellowgreen",
                        "yellow"]

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizesBySexForZ(genos_selectedSNPs, ind_with_metadata_included.sex, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = [100, 100, 100, 100]

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the east area

clusterNamesWithHetsEast = ["troch",
                            "obs",
                            "plumb"]

clusterColorsWithHetsEast = ["yellow",
                            "orange",
                            "red"]

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizesBySexForZ(genos_selectedSNPs, ind_with_metadata_included.sex, clusterMembershipWithHets, clusterNamesWithHetsEast)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHetsEast = fill(100, length(clusterNamesWithHetsEast))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the northern area

clusterNamesWithHetsNorth = ["vir",
                            "plumb"]

clusterColorsWithHetsNorth = ["blue",
                            "red"]

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = [100, 100, 100]

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Do a PCA based on above info, on just the west side of the ring

groups_to_plot_PCA_westside = ["vir","vir_S","nit", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN"]
group_colors_PCA_westside = ["blue","turquoise1","grey","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow"]

# without nitidus:
groups_to_plot_PCA_westside = ["vir","vir_S", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN"]
group_colors_PCA_westside = ["blue","turquoise1","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow"]

PCAmodel = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA_westside, group_colors_PCA_westside; 
            sampleSet = "greenish warblers (west side)", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodel.PCAfig)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Do a PCA based on above info, on just the east side of the ring

groups_to_plot_PCA_eastside = ["troch_LN","troch_EM","obs","plumb_BJ","plumb","plumb_vir"]
group_colors_PCA_eastside = ["yellow","gold","orange","pink","red","purple"];

PCAmodel = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA_eastside, group_colors_PCA_eastside; 
            sampleSet = "greenish warblers (east side)", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodel.PCAfig)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Show PCA for just the western haploblocks of the ring, for just chr Z

ind_with_metadata_included_temp = copy(ind_with_metadata_included)

# Leave out the individuals that don't have western haplogroup genotypes
hapGroups_to_plot_PCA_westside = ["vir","vir_lud", "nit", "lud"]
selection = map(in(hapGroups_to_plot_PCA_westside), clusterMembershipWithHets)
ind_with_metadata_included_temp.Fst_group[.!selection] .= "ignore" # write over the group name so function below won't plot that individual

groups_to_plot_PCA_westside = ["vir","vir_S","nit", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN"]
group_colors_PCA_westside = ["blue","turquoise1","grey","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow"]

flipPC1 = true
flipPC2 = true

PCAmodelHapWest = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included_temp, 
            groups_to_plot_PCA_westside, group_colors_PCA_westside; 
            sampleSet = "greenish warblers west haps", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelHapWest.PCAfig)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Show PCA for just the eastern haploblocks of the ring, for just chr Z

ind_with_metadata_included_temp = copy(ind_with_metadata_included)

# Leave out the individuals that don't have western haplogroup genotypes
hapGroups_to_plot_PCA_eastside = ["troch", "obs", "plumb"]
selection = map(in(hapGroups_to_plot_PCA_eastside), clusterMembershipWithHets)
ind_with_metadata_included_temp.Fst_group[.!selection] .= "ignore" # write over the group name so function below won't plot that individual

groups_to_plot_PCA_eastside = ["troch_LN","troch_EM","obs","plumb","plumb_vir","plumb_BJ"]
group_colors_PCA_eastside = ["yellow","gold","orange","red","purple","pink"];

flipPC1 = true
flipPC2 = true

PCAmodelHapEast = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included_temp, 
            groups_to_plot_PCA_eastside, group_colors_PCA_eastside; 
            sampleSet = "greenish warblers east haps", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelHapEast.PCAfig)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Do a PCA based on a same-size region elsewhere on the Z (with low ViSHet):

# get length of region
lengthHighViSHetRegion = positionMax - positionMin

leftLocus = 1_000_000 # start at 1 Mb from left side
rightLocus = leftLocus + lengthHighViSHetRegion
regionText_lowViSHetRegion = string("chr ", chr, " ",leftLocus," to ",rightLocus)

lociSelection = (leftLocus .<= pos_region.position .<= rightLocus)
genotypes_lowViSHetRegion = genotypes_region[:, lociSelection]

# impute missing genotypes:

genotypes_lowViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genotypes_lowViSHetRegion))

flipPC1 = true
flipPC2 = true

PCAmodel = plotPCA(genotypes_lowViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText_lowViSHetRegion,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodel.PCAfig)
if false  # set to true to save plot
    save("FigureS2A_gwZ_nonHLBRarbitrary_from_Julia.png", PCAmodel.PCAfig, px_per_unit = 2.0)
end 
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Do similar as above but for chr 15:

# choose scaffold
chr = "gw15"

positionMin, positionMax, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
    getWindowedIndHetStanRegion(genosOnly_included, 
                            pos_SNP_filtered, 
                            highViSHetRegions, chr;
                            windowSize = 500)

# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)

# Add column to metadata containing the regionIndHetStan for this highHet region:
command = "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
eval(Meta.parse(command)) # this executes the command constructed above
ind_with_metadata_included.regionIndHetStan = meanAcrossRegionIndHetStan

#names(ind_with_metadata_included)

# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)

# PCA of all individuals:

genos_highViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))

flipPC1 = true
flipPC2 = true

PCAmodelAll = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelAll.PCAfig)

# Add PC values to metadata for individuals included in PCA above:
if flipPC1
    PCAmodelAll.metadata.PC1 = -1 .* PCAmodelAll.values[1,:]
else 
    PCAmodelAll.metadata.PC1 = PCAmodelAll.values[1,:]
end
if flipPC2
    PCAmodelAll.metadata.PC2 = -1 .* PCAmodelAll.values[2,:]
else
    PCAmodelAll.metadata.PC2 = PCAmodelAll.values[2,:]
end
PCAmodelAll.metadata.PC3 = PCAmodelAll.values[3,:]

# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`

# choose inds with low IndHet in high ViSHet region:
indSelection_lowIndHetStan = (meanAcrossRegionIndHetStan .< 1.75) 

#Plot only the lowIndHetStan individuals:

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Save the individual colors in the metadata

indColors = fill("", size(PCAmodelAll.metadata, 1))
for i in axes(PCAmodelAll.metadata, 1)
    indColors[i] = group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
end
PCAmodelAll.metadata.indColorLeft = indColors
PCAmodelAll.metadata.indColorRight = indColors;

Plot PC1 vs. PC2:

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Plot PC1 vs. PC3:

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC3",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC3", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

At chr 15 high ViSHet region, there are only 5 clear haplogroups (PC3 does not distinguish vir and lud). Divide samples into those groups, based on PCA scores, and then pi and Dxy.

clusterNames = ["virLud",
                "nit",
                "troch",
                "obs",
                "plumb"]

clusterColors = ["green",
                "grey",
                "yellowgreen",
                "orange",
                "red"]

virLud = (PCAmodelAll.metadata.PC1 .< -5) .& 
            indSelection_lowIndHetStan
nit = (-5 .< PCAmodelAll.metadata.PC1 .< -2.5) .&
            indSelection_lowIndHetStan
troch = (-1 .< PCAmodelAll.metadata.PC1 .< 2.5) .&
            (PCAmodelAll.metadata.PC3 .< 1) .&
            indSelection_lowIndHetStan
obs = (0 .< PCAmodelAll.metadata.PC1 .< 3) .&
            (-5.5 .< PCAmodelAll.metadata.PC2 .< -3) .& 
            indSelection_lowIndHetStan
plumb = (7 .< PCAmodelAll.metadata.PC1) .& 
            indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]

clusterArray = [virLud nit troch obs plumb]

# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")

# create vectors that indicate the groups and plot order for this analysis:
clusterMembership = fill("none", nrow(PCAmodelAll.metadata))
plotOrder = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembership[clusterArray[:,i]] .= clusterNames[i]
    plotOrder[clusterArray[:,i]] .= i
end

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
println("Calculated population allele frequencies and sample sizes")

# Calculate per-site pi (within-group nucleotide distance)

sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"

Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames; among=false)  # set among to FALSE if no among Fst wanted (some things won't work without it) 

# Now get averages of pi and Dxy for whole region:

regionPiTable = DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
#= 5×2 DataFrame
 Row │ cluster  pi         
     │ String   Float64    
─────┼─────────────────────
   1 │ virLud   0.00892738
   2 │ nit      0.00677711
   3 │ troch    0.00725483
   4 │ obs      0.0083953
   5 │ plumb    0.00673292 =#

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 10×2 DataFrame
 Row │ cluster_pair  Dxy       
     │ String        Float64   
─────┼─────────────────────────
   1 │ virLud_nit    0.032793
   2 │ virLud_troch  0.0326016
   3 │ virLud_obs    0.0334515
   4 │ virLud_plumb  0.041869
   5 │ nit_troch     0.0389012
   6 │ nit_obs       0.0393449
   7 │ nit_plumb     0.0476769
   8 │ troch_obs     0.0150895
   9 │ troch_plumb   0.0294242
  10 │ obs_plumb     0.0297807 =#

# Make a genotype-by-individual plot using all variable loci in the region,
missingFractionAllowed = 0.1
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.original_Fst_groups = PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.Fst_group = clusterMembership
PCAmodelAll.metadata.original_plot_order = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.plot_order = plotOrder

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
pos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
Fst_selectedSNPs = Fst[:, selectedSNPs]
freqs_selectedSNPs = freqs[:, selectedSNPs]

# limit the number of individuals per group to plot
numIndsToPlot = fill(15, length(clusterNames))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNames, numIndsToPlot, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;
                missingFractionAllowed = missingFractionAllowed,
                indColorRightProvided = true);
The numbers in each group are [78 2 71 5 70] and the sum of those is 226
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now show a GBI plot like above, but with heterozygotes:

clusterNamesWithHets = ["virLud",
                        "nit",
                        "virLud_troch",
                        "troch",
                        "obs",
                        "plumb",
                        "vir_plumb"]

clusterColorsWithHets = ["blue",
                        "grey",
                        "yellowgreen",
                        "yellow",
                        "orange",
                        "red",
                        "purple"]

virLud_troch = (-5 .< PCAmodelAll.metadata.PC1 .< 0) .&
                (-5.5 .< PCAmodelAll.metadata.PC2 .< 0) .&
                 .!indSelection_lowIndHetStan
vir_plumb = (-2 .< PCAmodelAll.metadata.PC1 .< 2) .&
                (3 .< PCAmodelAll.metadata.PC2 .< 5.5) .&
                 .!indSelection_lowIndHetStan

clusterArray = [virLud nit virLud_troch troch obs plumb vir_plumb]

sum(clusterArray, dims=1)

if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
    println("Good news: Individuals included in a group matches total number of individuals")
else 
    println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end

# create vectors that indicate the groups and plot order for this analysis:
clusterMembershipWithHets = fill("none", nrow(PCAmodelAll.metadata))
plotOrderWithHets = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembershipWithHets[clusterArray[:,i]] .= clusterNamesWithHets[i]
    plotOrderWithHets[clusterArray[:,i]] .= i
end

# Add column to main metadata object containing the cluster membership for this highHet region:
command = "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
eval(Meta.parse(command)) # this executes the command constructed above

# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.Fst_group = clusterMembershipWithHets
PCAmodelAll.metadata.plot_order = plotOrderWithHets

# limit the number of individuals per group to plot
numIndsToPlotWithHets = fill(15, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the west area (without nitidus)

clusterNamesWithHetsWest = ["virLud",
                        "virLud_troch",
                        "troch"]

clusterColorsWithHetsWest = ["blue",
                        "yellowgreen",
                        "yellow"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = [100, 100, 100]

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the east area

clusterNamesWithHetsEast = ["troch",
                            "obs",
                            "plumb"]

clusterColorsWithHetsEast = ["yellow",
                            "orange",
                            "red"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHetsEast = fill(100, length(clusterNamesWithHetsEast))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the northern area

clusterNamesWithHetsNorth = ["virLud",
                            "vir_plumb",
                            "plumb"]

clusterColorsWithHetsNorth = ["blue",
                            "purple",
                            "red"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = [100, 100, 100]

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now do the same for chr 28

# choose scaffold
chr = "gw28"

positionMin, positionMax, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
    getWindowedIndHetStanRegion(genosOnly_included, 
                            pos_SNP_filtered, 
                            highViSHetRegions, chr;
                            windowSize = 500)

# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)

# Add column to metadata containing the regionIndHetStan for this highHet region:
command = "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
eval(Meta.parse(command)) # this executes the command constructed above
ind_with_metadata_included.regionIndHetStan = meanAcrossRegionIndHetStan

#names(ind_with_metadata_included)

# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)

# PCA of all individuals:

genos_highViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))

flipPC1 = true
flipPC2 = true

PCAmodelAll = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelAll.PCAfig)

# Add PC values to metadata for individuals included in PCA above:
if flipPC1
    PCAmodelAll.metadata.PC1 = -1 .* PCAmodelAll.values[1,:]
else 
    PCAmodelAll.metadata.PC1 = PCAmodelAll.values[1,:]
end
if flipPC2
    PCAmodelAll.metadata.PC2 = -1 .* PCAmodelAll.values[2,:]
else
    PCAmodelAll.metadata.PC2 = PCAmodelAll.values[2,:]
end
PCAmodelAll.metadata.PC3 = PCAmodelAll.values[3,:]

# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`

# choose inds with low IndHet in high ViSHet region:
indSelection_lowIndHetStan = (meanAcrossRegionIndHetStan .< 2) 

#Plot only the lowIndHetStan individuals:

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Save the individual colors in the metadata

indColors = fill("", size(PCAmodelAll.metadata, 1))
for i in axes(PCAmodelAll.metadata, 1)
    indColors[i] = group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
end
PCAmodelAll.metadata.indColorLeft = indColors
PCAmodelAll.metadata.indColorRight = indColors;

Plot PC1 vs. PC2:

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Plot PC1 vs. PC3:

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC3",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC3", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

At chr 28 high ViSHet region, there are only 5 clear haplogroups (yellow and orange are close). Vir and lud don’t distinguish along PC3, as nitidus varies there. Divide samples into those groups, based on PCA scores, and then calculate pi and Dxy.

clusterNames = ["virLud",
                "nit",
                "troch",
                "obs",
                "plumb"]

clusterColors = ["blue",
                "grey",
                "yellowgreen",
                "orange",
                "red"]

virLud = (PCAmodelAll.metadata.PC1 .< -4) .& indSelection_lowIndHetStan
nit = (-1 .< PCAmodelAll.metadata.PC1 .< 1) .& indSelection_lowIndHetStan
troch = (1 .< PCAmodelAll.metadata.PC1 .< 3) .& 
        (PCAmodelAll.metadata.PC2 .< -3.2) .& 
        indSelection_lowIndHetStan
obs = (1 .< PCAmodelAll.metadata.PC1 .< 2.5) .& 
        (-3.2 .< PCAmodelAll.metadata.PC2 .< -1) .&
        indSelection_lowIndHetStan
plumb = (3 .< PCAmodelAll.metadata.PC1) .& 
        (2.5 .< PCAmodelAll.PC2) .&
        indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]

clusterArray = [virLud nit troch obs plumb]

# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")

# create vectors that indicate the groups and plot order for this analysis:
clusterMembership = fill("none", nrow(PCAmodelAll.metadata))
plotOrder = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembership[clusterArray[:,i]] .= clusterNames[i]
    plotOrder[clusterArray[:,i]] .= i
end

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
println("Calculated population allele frequencies and sample sizes")

# Calculate per-site pi (within-group nucleotide distance)

sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"

Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames; among=false)  # set among to FALSE if no among Fst wanted (some things won't work without it) 

# Now get averages of pi and Dxy for whole region:

regionPiTable = DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
#= 5×2 DataFrame
 Row │ cluster  pi         
     │ String   Float64    
─────┼─────────────────────
   1 │ virLud   0.00792304
   2 │ nit      0.00320189
   3 │ troch    0.00734994
   4 │ obs      0.0101536
   5 │ plumb    0.00270239 =#

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 10×2 DataFrame
 Row │ cluster_pair  Dxy       
     │ String        Float64   
─────┼─────────────────────────
   1 │ virLud_nit    0.0334156
   2 │ virLud_troch  0.0318841
   3 │ virLud_obs    0.0351279
   4 │ virLud_plumb  0.0330054
   5 │ nit_troch     0.0314387
   6 │ nit_obs       0.0344624
   7 │ nit_plumb     0.0307517
   8 │ troch_obs     0.0188902
   9 │ troch_plumb   0.0234771
  10 │ obs_plumb     0.0265753 =#

# Make a genotype-by-individual plot using all variable loci in the region,
missingFractionAllowed = 0.1
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.original_Fst_groups = PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.Fst_group = clusterMembership
PCAmodelAll.metadata.original_plot_order = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.plot_order = plotOrder

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
pos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
Fst_selectedSNPs = Fst[:, selectedSNPs]
freqs_selectedSNPs = freqs[:, selectedSNPs]

# limit the number of individuals per group to plot
numIndsToPlot = fill(15, length(clusterNames))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNames, numIndsToPlot, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;
                missingFractionAllowed = missingFractionAllowed,
                indColorRightProvided = true);
The numbers in each group are [74 2 65 3 67] and the sum of those is 211
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now show a GBI plot like above, but with heterozygotes

clusterNamesWithHets = ["virLud",
                        "virLud_nit",
                        "nit",
                        "virLud_troch",
                        "troch",
                        "obs",
                        "obsHet",
                        "obs_plumb",
                        "plumb",
                        "vir_plumb"]

clusterColorsWithHets = ["blue",
                        "slateblue1",
                        "grey",
                        "yellowgreen",
                        "yellow",
                        "orange",
                        "darkgoldenrod1",
                        "darkorange1",
                        "red",
                        "purple"]

virLud_nit = (-4 .< PCAmodelAll.metadata.PC1 .< -2) .&
                (0 .< PCAmodelAll.metadata.PC2 .< 2) .&
                 .!indSelection_lowIndHetStan
virLud_troch = (-2.5 .< PCAmodelAll.metadata.PC1 .< 0) .&
                (-3.5 .< PCAmodelAll.metadata.PC2 .< 0) .&
                 .!indSelection_lowIndHetStan
obsHet = (1.5 .< PCAmodelAll.metadata.PC1 .< 3) .&
                (-3.5 .< PCAmodelAll.metadata.PC2 .< -1) .&
                 .!indSelection_lowIndHetStan
obs_plumb = (2.5 .< PCAmodelAll.metadata.PC1 .< 4) .&
                (-1 .< PCAmodelAll.metadata.PC2 .< 2) .&
                 .!indSelection_lowIndHetStan
vir_plumb = (-2 .< PCAmodelAll.metadata.PC1 .< 2) .&
                (2 .< PCAmodelAll.metadata.PC2 .< 4) .&
                 .!indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[virLud_nit]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[virLud_troch]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[obsHet]
PCAmodelAll.metadata.Fst_group[obs_plumb]
PCAmodelAll.metadata.Fst_group[plumb]
PCAmodelAll.metadata.Fst_group[vir_plumb]

clusterArray = [virLud virLud_nit nit virLud_troch troch obs obsHet obs_plumb plumb vir_plumb]

sum(clusterArray, dims=1)

if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
    println("Good news: Individuals included in a group matches total number of individuals")
end

# create vectors that indicate the groups and plot order for this analysis:
clusterMembershipWithHets = fill("none", nrow(PCAmodelAll.metadata))
plotOrderWithHets = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembershipWithHets[clusterArray[:,i]] .= clusterNamesWithHets[i]
    plotOrderWithHets[clusterArray[:,i]] .= i
end

# Add column to main metadata object containing the cluster membership for this highHet region:
command = "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
eval(Meta.parse(command)) # this executes the command constructed above

# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.Fst_group = clusterMembershipWithHets
PCAmodelAll.metadata.plot_order = plotOrderWithHets

# limit the number of individuals per group to plot
numIndsToPlotWithHets = fill(15, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show GBI plot according to original groups and plot order

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with all individuals

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

# Set no limit (or high limit anyway) on the number of individuals per group to plot
numIndsToPlotWithHets = fill(1000, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with only vir and plumb pops

includeTheseClusters = ["virLud", "plumb"] # these are the haplotype clusters to include in the choice below of SNPs to show

freqs_local, sampleSizes_local = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)

selectedSNPs = (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
genosForGBI = genos_selectedSNPs[:, selectedSNPs]
posForGBI = pos_selectedSNPs[selectedSNPs, :]
freqsForGBI = freqs_local[:, selectedSNPs]

plotGroups = ["vir", "plumb", "plumb_vir"] # these are the original Fst_groups
plotGroupColors = ["blue", "red", "purple"]

metadataForGBI = copy(PCAmodelAll.metadata)

metadataForGBI.Fst_group = metadataForGBI.original_Fst_groups

plotGenotypeByIndividual(regionInfo, posForGBI,
                genosForGBI, metadataForGBI, freqsForGBI, plotGroups, plotGroupColors;
                missingFractionAllowed = missingFractionAllowed);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with only vir lud troch pops

includeTheseClusters = ["virLud", "troch"] # these are the haplotype clusters to include in the choice below of SNPs to show

freqs_local, sampleSizes_local = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)

selectedSNPs = (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
genosForGBI = genos_selectedSNPs[:, selectedSNPs]
posForGBI = pos_selectedSNPs[selectedSNPs, :]
freqsForGBI = freqs_local[:, selectedSNPs]

plotGroups = ["vir", "plumb", "plumb_vir"] # these are the original Fst_groups
plotGroupColors = ["blue", "red", "purple"]

metadataForGBI = copy(PCAmodelAll.metadata)
metadataForGBI.Fst_group = metadataForGBI.original_Fst_groups

plotGroups = ["vir", "vir_S", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML", "troch_west", "troch_LN"]
plotGroupColors = ["blue","turquoise1", "seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow"]

# Set no limit (or high limit anyway) on the number of individuals per group to plot
numIndsToPlotWithHets = fill(10, length(plotGroups))

genosForGBI_limited, indMetadataforGBI_limited = limitIndsToPlot(plotGroups,
                                            numIndsToPlotWithHets, 
                                            genosForGBI, metadataForGBI;
                                            sortByMissing = false)

plotGenotypeByIndividual(regionInfo, posForGBI,
                genosForGBI_limited, indMetadataforGBI_limited, freqsForGBI, plotGroups, plotGroupColors;
                missingFractionAllowed = missingFractionAllowed);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with only troch plumb pops

includeTheseClusters = ["troch", "obs", "plumb"] # these are the haplotype clusters to include in the choice below of SNPs to show

freqs_local, sampleSizes_local = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)

selectedSNPs = (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
genosForGBI = genos_selectedSNPs[:, selectedSNPs]
posForGBI = pos_selectedSNPs[selectedSNPs, :]
freqsForGBI = freqs_local[:, selectedSNPs]

metadataForGBI = copy(PCAmodelAll.metadata)
metadataForGBI.Fst_group = metadataForGBI.original_Fst_groups

# remove individuals that have vir haplotypes, as this could otherwise be mistaken for introgression from obscuratus:

removeTheseInds = ["GW_Armando_plate1_JF24G02", # gw19 hetero from plumb 
                    "GW_Armando_plate1_JF07G03", # gw19 hetero from plumb
                    "GW_Armando_plate1_JF12G02", # gw19 hetero from plumb
                    "GW_Armando_plate1_JF09G01"] # gw28 is hetero from plumb 
selection = map(in(removeTheseInds), metadataForGBI.ind)
metadataForGBI = metadataForGBI[.!selection, :]
genosForGBI = genosForGBI[.!selection, :]

plotGroups = ["troch_LN","troch_EM","obs","plumb_BJ","plumb"]
plotGroupColors = ["yellow","gold","orange","pink","red"]

# Set  limit on the number of individuals per group to plot
numIndsToPlotWithHets = fill(15, length(plotGroups))

# metadataForGBI[metadataForGBI.Fst_group .== "plumb", :]

genosForGBI_limited, indMetadataforGBI_limited = limitIndsToPlot(plotGroups,
                                            numIndsToPlotWithHets, 
                                            genosForGBI, metadataForGBI;
                                            sortByMissing = false)

# indMetadataforGBI_limited[indMetadataforGBI_limited.Fst_group .== "plumb", :]

plotGenotypeByIndividual(regionInfo, posForGBI,
                genosForGBI_limited, indMetadataforGBI_limited, freqsForGBI, plotGroups, plotGroupColors;
                missingFractionAllowed = missingFractionAllowed);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the west area (without nitidus)

clusterNamesWithHetsWest = ["virLud",
                        "virLud_troch",
                        "troch"]

clusterColorsWithHetsWest = ["blue",
                        "yellowgreen",
                        "yellow"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = [100, 100, 100]

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the east area

clusterNamesWithHetsWest = ["troch",
                            "obs",
                            "obsHet",
                            "obs_plumb",
                            "plumb"]

clusterColorsWithHetsWest = ["yellow",
                            "orange",
                            "darkgoldenrod1",
                            "darkorange1",
                            "red"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = [100, 100, 100, 100, 100]

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the northern area

clusterNamesWithHetsWest = ["virLud",
                            "vir_plumb",
                            "plumb"]

clusterColorsWithHetsWest = ["blue",
                            "purple",
                            "red"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = [100, 100, 100]

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now do the same for chr 26

# choose scaffold
chr = "gw26"

positionMin, positionMax, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
    getWindowedIndHetStanRegion(genosOnly_included, 
                            pos_SNP_filtered, 
                            highViSHetRegions, chr;
                            windowSize = 500)

# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)

# Add column to metadata containing the regionIndHetStan for this highHet region:
command = "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
eval(Meta.parse(command)) # this executes the command constructed above
ind_with_metadata_included.regionIndHetStan = meanAcrossRegionIndHetStan

# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)

# PCA of all individuals:

genos_highViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))

flipPC1 = false
flipPC2 = false

PCAmodelAll = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelAll.PCAfig)

# Add PC values to metadata for individuals included in PCA above:
if flipPC1
    PCAmodelAll.metadata.PC1 = -1 .* PCAmodelAll.values[1,:]
else 
    PCAmodelAll.metadata.PC1 = PCAmodelAll.values[1,:]
end
if flipPC2
    PCAmodelAll.metadata.PC2 = -1 .* PCAmodelAll.values[2,:]
else
    PCAmodelAll.metadata.PC2 = PCAmodelAll.values[2,:]
end
PCAmodelAll.metadata.PC3 = PCAmodelAll.values[3,:]

# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`

# choose inds with low IndHet in high ViSHet region:
indSelection_lowIndHetStan = (meanAcrossRegionIndHetStan .< 1.5) 

#Plot only the lowIndHetStan individuals:

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Save the individual colors in the metadata

indColors = fill("", size(PCAmodelAll.metadata, 1))
for i in axes(PCAmodelAll.metadata, 1)
    indColors[i] = group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
end
PCAmodelAll.metadata.indColorLeft = indColors
PCAmodelAll.metadata.indColorRight = indColors;

Plot PC1 vs. PC2

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Plot PC1 vs. PC3

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC3",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC3", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

At chr 26 high ViSHet region, there are only 5 clear haplogroups (one green is somewhat away). Vir and lud vary along PC3 but cannot be clearly grouped. Divide samples into those groups, based on PCA scores, and then calculate pi and Dxy.

# Inspect chromosome 26 PCA of low IndHet (< 1.5) individuals,
# and specify group boundaries:

clusterNames = ["virLud",
                "nit",
                "troch",
                "obs",
                "plumb"]

clusterColors = ["blue",
                "grey",
                "yellow",
                "orange",
                "red"]

virLud = (PCAmodelAll.metadata.PC1 .< -5.5) .& indSelection_lowIndHetStan
nit = (-5.5 .< PCAmodelAll.metadata.PC1 .< -4) .& indSelection_lowIndHetStan
troch = (PCAmodelAll.metadata.PC2 .< -6) .& indSelection_lowIndHetStan
obs = (-6 .< PCAmodelAll.metadata.PC2 .< -4) .& indSelection_lowIndHetStan
plumb = (6 .< PCAmodelAll.metadata.PC1) .& (2 .< PCAmodelAll.PC2) .& indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]

clusterArray = [virLud nit troch obs plumb]

# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")

# create vectors that indicate the groups and plot order for this analysis:
clusterMembership = fill("none", nrow(PCAmodelAll.metadata))
plotOrder = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembership[clusterArray[:,i]] .= clusterNames[i]
    plotOrder[clusterArray[:,i]] .= i
end

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
println("Calculated population allele frequencies and sample sizes")

# Calculate per-site pi (within-group nucleotide distance)

sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"

Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames; among=false)  # set among to FALSE if no among Fst wanted (some things won't work without it) 

# Now get averages of pi and Dxy for whole region:

regionPiTable = DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
#= 5×2 DataFrame
 Row │ cluster  pi         
     │ String   Float64    
─────┼─────────────────────
   1 │ virLud   0.0135205
   2 │ nit      0.00548557
   3 │ troch    0.00975861
   4 │ obs      0.00902527
   5 │ plumb    0.00510553 =#

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 10×2 DataFrame
 Row │ cluster_pair  Dxy       
     │ String        Float64   
─────┼─────────────────────────
   1 │ virLud_nit    0.0243846
   2 │ virLud_troch  0.0324256
   3 │ virLud_obs    0.0332749
   4 │ virLud_plumb  0.0390193
   5 │ nit_troch     0.0341654
   6 │ nit_obs       0.0344734
   7 │ nit_plumb     0.0403857
   8 │ troch_obs     0.0176458
   9 │ troch_plumb   0.0296574
  10 │ obs_plumb     0.0300157 =#

# Make a genotype-by-individual plot using all variable loci in the region,
missingFractionAllowed = 0.1
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.original_Fst_groups = PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.Fst_group = clusterMembership
PCAmodelAll.metadata.original_plot_order = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.plot_order = plotOrder

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
pos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
Fst_selectedSNPs = Fst[:, selectedSNPs]
freqs_selectedSNPs = freqs[:, selectedSNPs]

# limit the number of individuals per group to plot
numIndsToPlot = fill(15, length(clusterNames))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNames, numIndsToPlot, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;
                missingFractionAllowed = missingFractionAllowed,
                indColorRightProvided = true);
The numbers in each group are [71 2 62 3 67] and the sum of those is 205
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now show a GBI plot like above, but with heterozygotes

clusterNamesWithHets = ["virLud",
                        "nit",
                        "virLud_troch",
                        "troch",
                        "obs",
                        "obs_plumb",
                        "plumb",
                        "vir_plumb"]

clusterColorsWithHets = ["blue",
                        "grey",
                        "yellowgreen",
                        "yellow",
                        "orange",
                        "darkorange1",
                        "red",
                        "purple"]

virLud_troch = (-5.5 .< PCAmodelAll.metadata.PC1 .< -2.2) .&
                (-4 .< PCAmodelAll.metadata.PC2 .< 2) .&
                 .!indSelection_lowIndHetStan
obs_plumb = (2.5 .< PCAmodelAll.metadata.PC1 .< 5) .&
                (-3.5 .< PCAmodelAll.metadata.PC2 .< -1.5) .&
                 .!indSelection_lowIndHetStan
vir_plumb = (-2 .< PCAmodelAll.metadata.PC1 .< 3) .&
                (2.5 .< PCAmodelAll.metadata.PC2 .< 5) .&
                 .!indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[virLud_troch]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[obs_plumb]
PCAmodelAll.metadata.Fst_group[plumb]
PCAmodelAll.metadata.Fst_group[vir_plumb]

clusterArray = [virLud nit virLud_troch troch obs obs_plumb plumb vir_plumb]

sum(clusterArray, dims=1)

if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
    println("Good news: Individuals included in a group matches total number of individuals")
end

# create vectors that indicate the groups and plot order for this analysis:
clusterMembershipWithHets = fill("none", nrow(PCAmodelAll.metadata))
plotOrderWithHets = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembershipWithHets[clusterArray[:,i]] .= clusterNamesWithHets[i]
    plotOrderWithHets[clusterArray[:,i]] .= i
end

# Add column to main metadata object containing the cluster membership for this highHet region:
command = "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
eval(Meta.parse(command)) # this executes the command constructed above

# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.Fst_group = clusterMembershipWithHets
PCAmodelAll.metadata.plot_order = plotOrderWithHets

# limit the number of individuals per group to plot
numIndsToPlotWithHets = fill(15, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show GBI plot according to original groups and plot order

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with all individuals

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

# Set no limit (or high limit anyway) on the number of individuals per group to plot
numIndsToPlotWithHets = fill(1000, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with only vir and plumb pops

includeTheseClusters = ["virLud", "plumb"] # these are the haplotype clusters to include in the choice below of SNPs to show

# Calculate allele freqs and sample sizes
freqs_local, sampleSizes_local = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)

selectedSNPs = (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
genosForGBI = genos_selectedSNPs[:, selectedSNPs]
posForGBI = pos_selectedSNPs[selectedSNPs, :]
freqsForGBI = freqs_local[:, selectedSNPs]

plotGroups = ["vir", "plumb", "plumb_vir"] # these are the original Fst_groups
plotGroupColors = ["blue", "red", "purple"]

metadataForGBI = copy(PCAmodelAll.metadata)

metadataForGBI.Fst_group = metadataForGBI.original_Fst_groups

plotGenotypeByIndividual(regionInfo, posForGBI,
                genosForGBI, metadataForGBI, freqsForGBI, plotGroups, plotGroupColors;
                missingFractionAllowed = missingFractionAllowed)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

(Scene (768px, 960px):
  0 Plots
  2 Child Scenes:
    ├ Scene (768px, 960px)
    └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 1 0; … ; 2 2 … 2 2; 2 1 … 0 0], [4175655, 4201372, 4202662, 4220609, 4269946, 4272732, 4280387, 4397231, 4397235, 4411591  …  5519560, 5519564, 5524191, 5524276, 5530753, 5530774, 5535422, 5540526, 5549587, 5549602], 100×24 DataFrame
 Row  ind                        ID                         location  group   ⋯
     │ String                     String                     String7   String1 ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ GW_Armando_plate1_JF12G04  GW_Armando_plate1_JF12G04  ST_vi     vir     ⋯
   2 │ GW_Armando_plate2_JF03G01  GW_Armando_plate2_JF03G01  ST_vi     vir_mis
   3 │ GW_Armando_plate2_JF30G01  GW_Armando_plate2_JF30G01  ST_vi     vir_mis
   4 │ GW_Lane5_STvi1             GW_Lane5_STvi1             ST_vi     vir
   5 │ GW_Lane5_STvi2             GW_Lane5_STvi2             ST_vi     vir     ⋯
   6 │ GW_Lane5_STvi3             GW_Lane5_STvi3             ST_vi     vir
   7 │ GW_Armando_plate1_JF16G01  GW_Armando_plate1_JF16G01  DV_vi     plumb_v
   8 │ GW_Armando_plate2_JF16G02  GW_Armando_plate2_JF16G02  DV_vi     plumb_v
   9 │ GW_Armando_plate2_JE31G01  GW_Armando_plate2_JE31G01  VB_vi     vir_mis ⋯
  10 │ GW_Armando_plate2_JF03G02  GW_Armando_plate2_JF03G02  VB_vi     vir_mis
  11 │ GW_Lane5_YK11              GW_Lane5_YK11              YK        vir
  ⋮  │             ⋮                          ⋮                 ⋮          ⋮   ⋱
  91 │ GW_Armando_plate2_JF24G01  GW_Armando_plate2_JF24G01  VB        plumb
  92 │ GW_Armando_plate2_JF25G01  GW_Armando_plate2_JF25G01  VB        plumb   ⋯
  93 │ GW_Armando_plate1_JG02G02  GW_Armando_plate1_JG02G02  PR        plumb
  94 │ GW_Armando_plate1_JG02G04  GW_Armando_plate1_JG02G04  PR        plumb
  95 │ GW_Armando_plate2_JG01G01  GW_Armando_plate2_JG01G01  PR        plumb
  96 │ GW_Armando_plate2_JG02G01  GW_Armando_plate2_JG02G01  PR        plumb   ⋯
  97 │ GW_Armando_plate2_JG02G03  GW_Armando_plate2_JG02G03  PR        plumb
  98 │ GW_Lane5_SL1               GW_Lane5_SL1               SL        plumb
  99 │ GW_Lane5_SL2               GW_Lane5_SL2               SL        plumb
 100 │ GW_Armando_plate1_JF10G03  GW_Armando_plate1_JF10G03  ST        plumb_v ⋯
                                                  21 columns and 79 rows omitted)

Show just the west clusters (without nitidus)

clusterNamesWithHetsWest = ["virLud",
                        "virLud_troch",
                        "troch"]

clusterColorsWithHetsWest = ["blue",
                        "yellowgreen",
                        "yellow"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = [100, 100, 100]

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the east area

clusterNamesWithHetsWest = ["troch",
                            "obs",
                            "obs_plumb",
                            "plumb",]

clusterColorsWithHetsWest = ["yellow",
                            "orange",
                            "darkorange1",
                            "red"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = [100, 100, 100, 100]

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the northern area

clusterNamesWithHetsWest = ["virLud",
                            "vir_plumb",
                            "plumb"]

clusterColorsWithHetsWest = ["blue",
                            "purple",
                            "red"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = [100, 100, 100]

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Do a PCA based on a same-size region elsewhere on gw26 (with low ViSHet):

# get length of region
lengthHighViSHetRegion = positionMax - positionMin

leftLocus = 1_000_000 # start at 1 Mb from left side
rightLocus = leftLocus + lengthHighViSHetRegion
regionText_lowViSHetRegion = string("chr ", chr, " ",leftLocus," to ",rightLocus)

lociSelection = (leftLocus .<= pos_region.position .<= rightLocus)
genotypes_lowViSHetRegion = genotypes_region[:, lociSelection]

# impute missing genotypes:
genotypes_lowViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genotypes_lowViSHetRegion))

flipPC1 = true
flipPC2 = true

PCAmodel = plotPCA(genotypes_lowViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText_lowViSHetRegion,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodel.PCAfig)
if false  # set to true to save plot
    save("FigureS2C_gw26_nonHLBRarbitrary_from_Julia.png", PCAmodel.PCAfig, px_per_unit = 2.0)
end 
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Do similar as above but for chr 1A

(Tried chr 1 but seems to be some recomb that makes it less clear to assign to groups)

# choose scaffold
chr = "gw1A"

positionMin, positionMax, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
    getWindowedIndHetStanRegion(genosOnly_included, 
                            pos_SNP_filtered, 
                            highViSHetRegions, chr;
                            windowSize = 500)

# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)

# Add column to metadata containing the regionIndHetStan for this highHet region:
command = "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
eval(Meta.parse(command)) # this executes the command constructed above
ind_with_metadata_included.regionIndHetStan = meanAcrossRegionIndHetStan

# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)

# PCA of all individuals:

genos_highViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))

flipPC1 = true
flipPC2 = true

PCAmodelAll = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelAll.PCAfig)

# Add PC values to metadata for individuals included in PCA above:
if flipPC1
    PCAmodelAll.metadata.PC1 = -1 .* PCAmodelAll.values[1,:]
else 
    PCAmodelAll.metadata.PC1 = PCAmodelAll.values[1,:]
end
if flipPC2
    PCAmodelAll.metadata.PC2 = -1 .* PCAmodelAll.values[2,:]
else
    PCAmodelAll.metadata.PC2 = PCAmodelAll.values[2,:]
end
PCAmodelAll.metadata.PC3 = PCAmodelAll.values[3,:]

# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`

# choose inds with low IndHet in high ViSHet region:
indSelection_lowIndHetStan = (meanAcrossRegionIndHetStan .< 1.5) 

#Plot only the lowIndHetStan individuals:

f = CairoMakie.Figure();
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
More than 1 region on that scaffold. Using just the longest one.
2×3 DataFrame
Row regionChrom regionStart regionEnd
String Int64 Int64
1 gw1A 4674 3771263
2 gw1A 23592559 30616953
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Save the individual colors in the metadata

indColors = fill("", size(PCAmodelAll.metadata, 1))
for i in axes(PCAmodelAll.metadata, 1)
    indColors[i] = group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
end
PCAmodelAll.metadata.indColorLeft = indColors
PCAmodelAll.metadata.indColorRight = indColors;

Plot PC1 vs. PC2

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Plot PC1 vs. PC3

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC3",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC3", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

At chr 1A high ViSHet region, there are only 5 clear haplogroups (checked that vir and lud don’t form distinct groups on PC3). Divide samples into those groups, based on PCA scores, and then calculate pi and Dxy.

clusterNames = ["virLud",
                "nit",
                "troch",
                "obs",
                "plumb"]

clusterColors = ["green",
                "grey",
                "yellowgreen",
                "orange",
                "red"]

virLud = (PCAmodelAll.metadata.PC1 .< -8) .& 
            indSelection_lowIndHetStan
nit = (-8 .< PCAmodelAll.metadata.PC1 .< -4) .&
            indSelection_lowIndHetStan
troch = (4 .< PCAmodelAll.metadata.PC1 .< 9) .&
            (PCAmodelAll.metadata.PC2 .< -5) .&
            indSelection_lowIndHetStan
obs = (2 .< PCAmodelAll.metadata.PC1 .< 6) .&
            (-5 .< PCAmodelAll.metadata.PC2 .< -1) .& 
            indSelection_lowIndHetStan
plumb = (3 .< PCAmodelAll.metadata.PC1) .& 
            (7.5 .< PCAmodelAll.metadata.PC2) .&
            indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]

clusterArray = [virLud nit troch obs plumb]

# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")

# create vectors that indicate the groups and plot order for this analysis:
clusterMembership = fill("none", nrow(PCAmodelAll.metadata))
plotOrder = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembership[clusterArray[:,i]] .= clusterNames[i]
    plotOrder[clusterArray[:,i]] .= i
end

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
println("Calculated population allele frequencies and sample sizes")

# Calculate per-site pi (within-group nucleotide distance)

sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"

Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames; among=false)  # set among to FALSE if no among Fst wanted (some things won't work without it) 

# Now get averages of pi and Dxy for whole region:

regionPiTable = DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
#= 5×2 DataFrame
 Row │ cluster  pi         
     │ String   Float64    
─────┼─────────────────────
   1 │ virLud   0.00559696
   2 │ nit      0.00458482
   3 │ troch    0.00470781
   4 │ obs      0.00524545
   5 │ plumb    0.00659452 =#

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 10×2 DataFrame
 Row │ cluster_pair  Dxy       
     │ String        Float64   
─────┼─────────────────────────
   1 │ virLud_nit    0.0234051
   2 │ virLud_troch  0.0303858
   3 │ virLud_obs    0.0285612
   4 │ virLud_plumb  0.0298279
   5 │ nit_troch     0.036893
   6 │ nit_obs       0.0346282
   7 │ nit_plumb     0.0363109
   8 │ troch_obs     0.0169886
   9 │ troch_plumb   0.0274903
  10 │ obs_plumb     0.0256253 =#

# Make a genotype-by-individual plot using all variable loci in the region,
missingFractionAllowed = 0.1
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.original_Fst_groups = PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.Fst_group = clusterMembership
PCAmodelAll.metadata.original_plot_order = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.plot_order = plotOrder

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
pos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
Fst_selectedSNPs = Fst[:, selectedSNPs]
freqs_selectedSNPs = freqs[:, selectedSNPs]

# limit the number of individuals per group to plot
numIndsToPlot = fill(15, length(clusterNames))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNames, numIndsToPlot, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;
                missingFractionAllowed = missingFractionAllowed,
                indColorRightProvided = true);
The numbers in each group are [75 2 65 5 68] and the sum of those is 215
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now show a GBI plot like above, but with heterozygotes

clusterNamesWithHets = ["virLud",
                        "nit",
                        "virLud_troch",
                        "troch",
                        "obs",
                        "plumb",
                        "vir_plumb"]

clusterColorsWithHets = ["blue",
                        "grey",
                        "yellowgreen",
                        "yellow",
                        "orange",
                        "red",
                        "purple"]

virLud_troch = (-5 .< PCAmodelAll.metadata.PC1 .< 2) .&
                (-8 .< PCAmodelAll.metadata.PC2 .< -2) .&
                 .!indSelection_lowIndHetStan
vir_plumb = (-5 .< PCAmodelAll.metadata.PC1 .< 0) .&
                (3 .< PCAmodelAll.metadata.PC2 .< 7) .&
                 .!indSelection_lowIndHetStan

clusterArray = [virLud nit virLud_troch troch obs plumb vir_plumb]

sum(clusterArray, dims=1)

if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
    println("Good news: Individuals included in a group matches total number of individuals")
else 
    println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end

# create vectors that indicate the groups and plot order for this analysis:
clusterMembershipWithHets = fill("none", nrow(PCAmodelAll.metadata))
plotOrderWithHets = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembershipWithHets[clusterArray[:,i]] .= clusterNamesWithHets[i]
    plotOrderWithHets[clusterArray[:,i]] .= i
end

# Add column to main metadata object containing the cluster membership for this highHet region:
command = "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
eval(Meta.parse(command)) # this executes the command constructed above

# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.Fst_group = clusterMembershipWithHets
PCAmodelAll.metadata.plot_order = plotOrderWithHets

# limit the number of individuals per group to plot
numIndsToPlotWithHets = fill(15, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the west area (without nitidus)

clusterNamesWithHetsWest = ["virLud",
                        "virLud_troch",
                        "troch"]

clusterColorsWithHetsWest = ["blue",
                        "yellowgreen",
                        "yellow"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = [100, 100, 100]

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the east area

clusterNamesWithHetsEast = ["troch",
                            "obs",
                            "plumb"]

clusterColorsWithHetsEast = ["yellow",
                            "orange",
                            "red"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHetsEast = fill(100, length(clusterNamesWithHetsEast))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the northern area

clusterNamesWithHetsNorth = ["virLud",
                            "vir_plumb",
                            "plumb"]

clusterColorsWithHetsNorth = ["blue",
                            "purple",
                            "red"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = [100, 100, 100]

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Do similar as above but for chr 2:

This one doesn’t look like a super clear pattern (in terms of assigning homozygous and heterozygous haploblock genotypes), but we’ll see what it shows:

# choose scaffold
chr = "gw2"

positionMin, positionMax, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
    getWindowedIndHetStanRegion(genosOnly_included, 
                            pos_SNP_filtered, 
                            highViSHetRegions, chr;
                            windowSize = 500)

# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)

# Add column to metadata containing the regionIndHetStan for this highHet region:
command = "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
eval(Meta.parse(command)) # this executes the command constructed above
ind_with_metadata_included.regionIndHetStan = meanAcrossRegionIndHetStan

# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)

# PCA of all individuals:

genos_highViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))

flipPC1 = false
flipPC2 = true

PCAmodelAll = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelAll.PCAfig)

# Add PC values to metadata for individuals included in PCA above:
if flipPC1
    PCAmodelAll.metadata.PC1 = -1 .* PCAmodelAll.values[1,:]
else 
    PCAmodelAll.metadata.PC1 = PCAmodelAll.values[1,:]
end
if flipPC2
    PCAmodelAll.metadata.PC2 = -1 .* PCAmodelAll.values[2,:]
else
    PCAmodelAll.metadata.PC2 = PCAmodelAll.values[2,:]
end
PCAmodelAll.metadata.PC3 = PCAmodelAll.values[3,:]

# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`

# choose inds with low IndHet in high ViSHet region:
indSelection_lowIndHetStan = (meanAcrossRegionIndHetStan .< 1.25) 

#Plot only the lowIndHetStan individuals:

f = CairoMakie.Figure();
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
More than 1 region on that scaffold. Using just the longest one.
2×3 DataFrame
Row regionChrom regionStart regionEnd
String Int64 Int64
1 gw2 54537375 59262130
2 gw2 60234161 61533451
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Save the individual colors in the metadata

indColors = fill("", size(PCAmodelAll.metadata, 1))
for i in axes(PCAmodelAll.metadata, 1)
    indColors[i] = group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
end
PCAmodelAll.metadata.indColorLeft = indColors
PCAmodelAll.metadata.indColorRight = indColors;

Plot PC1 vs. PC2

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Plot PC1 vs. PC3

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC3",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC3", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

At chr 2 high ViSHet region, there are 5 clear haplogroups. Divide samples into those groups, based on PCA scores, and then calculate pi and Dxy.

clusterNames = ["virLud",
                "nit",
                "troch",
                "obs",
                "plumb"]

clusterColors = ["green",
                "grey",
                "yellowgreen",
                "orange",
                "red"]

virLud = (PCAmodelAll.metadata.PC1 .< -4) .& 
            indSelection_lowIndHetStan
nit = (-3 .< PCAmodelAll.metadata.PC1 .< -1) .&
        (0 .< PCAmodelAll.metadata.PC2 .< 2) .&
            indSelection_lowIndHetStan
troch = (-2.5 .< PCAmodelAll.metadata.PC1 .< 0) .&
            (PCAmodelAll.metadata.PC2 .< -5) .&
            indSelection_lowIndHetStan
obs = (-2 .< PCAmodelAll.metadata.PC1 .< 1) .&
            (-5 .< PCAmodelAll.metadata.PC2 .< -3) .& 
            indSelection_lowIndHetStan
plumb = (6 .< PCAmodelAll.metadata.PC1) .& 
            (1 .< PCAmodelAll.metadata.PC2) .&
            indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]

clusterArray = [virLud nit troch obs plumb]

# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")

# create vectors that indicate the groups and plot order for this analysis:
clusterMembership = fill("none", nrow(PCAmodelAll.metadata))
plotOrder = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembership[clusterArray[:,i]] .= clusterNames[i]
    plotOrder[clusterArray[:,i]] .= i
end

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
println("Calculated population allele frequencies and sample sizes")

# Calculate per-site pi (within-group nucleotide distance)

sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"

Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames; among=false)  # set among to FALSE if no among Fst wanted (some things won't work without it) 

# Now get averages of pi and Dxy for whole region:

regionPiTable = DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
#= 5×2 DataFrame
 Row │ cluster  pi         
     │ String   Float64    
─────┼─────────────────────
   1 │ virLud   0.0123364
   2 │ nit      0.00557103
   3 │ troch    0.00911341
   4 │ obs      0.00891506
   5 │ plumb    0.0086287 =#

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 10×2 DataFrame
 Row │ cluster_pair  Dxy       
     │ String        Float64   
─────┼─────────────────────────
   1 │ virLud_nit    0.0328534
   2 │ virLud_troch  0.0337586
   3 │ virLud_obs    0.0328064
   4 │ virLud_plumb  0.0416095
   5 │ nit_troch     0.0376123
   6 │ nit_obs       0.0363568
   7 │ nit_plumb     0.0456889
   8 │ troch_obs     0.0144702
   9 │ troch_plumb   0.0331178
  10 │ obs_plumb     0.0318128 =#

# Make a genotype-by-individual plot using all variable loci in the region,
missingFractionAllowed = 0.1
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.original_Fst_groups = PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.Fst_group = clusterMembership
PCAmodelAll.metadata.original_plot_order = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.plot_order = plotOrder

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
pos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
Fst_selectedSNPs = Fst[:, selectedSNPs]
freqs_selectedSNPs = freqs[:, selectedSNPs]

# limit the number of individuals per group to plot
numIndsToPlot = fill(15, length(clusterNames))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNames, numIndsToPlot, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;
                missingFractionAllowed = missingFractionAllowed,
                indColorRightProvided = true);
The numbers in each group are [59 1 72 4 69] and the sum of those is 205
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now show a GBI plot like above, but with heterozygotes

clusterNamesWithHets = ["virLud",
                        "virLudHet",
                        "nit",
                        "nitHet",
                        "virLud_troch",
                        "troch",
                        "trochHet",
                        "obs",
                        "obs_plumb",
                        "plumb",
                        "plumbHet",
                        "vir_plumb"]

clusterColorsWithHets = ["blue",
                        "blue",
                        "grey",
                        "grey",
                        "yellowgreen",
                        "yellow",
                        "yellow",
                        "orange",
                        "darkorange1",
                        "red",
                        "red",
                        "purple"]

virLudHet = (PCAmodelAll.metadata.PC1 .< -4) .& 
            (2.5 .< PCAmodelAll.metadata.PC2 .< 7) .&
            .!indSelection_lowIndHetStan
nitHet = (-3 .< PCAmodelAll.metadata.PC1 .< -2) .&
        (0.5 .< PCAmodelAll.metadata.PC2 .< 1.5) .&
            .!indSelection_lowIndHetStan
virLud_troch = (-5 .< PCAmodelAll.metadata.PC1 .< -2) .&
                (-3 .< PCAmodelAll.metadata.PC2 .< 1) .&
                 .!indSelection_lowIndHetStan
trochHet = (-2.5 .< PCAmodelAll.metadata.PC1 .< 0) .&
            (PCAmodelAll.metadata.PC2 .< -5) .&
            .!indSelection_lowIndHetStan
obs_plumb = (2 .< PCAmodelAll.metadata.PC1 .< 3) .&
                (-3 .< PCAmodelAll.metadata.PC2 .< 2) .&
                 .!indSelection_lowIndHetStan
plumbHet = (6 .< PCAmodelAll.metadata.PC1) .& 
            (1 .< PCAmodelAll.metadata.PC2) .&
            .!indSelection_lowIndHetStan
vir_plumb = (-3 .< PCAmodelAll.metadata.PC1 .< 3) .&
                (2 .< PCAmodelAll.metadata.PC2 .< 5) .&
                 .!indSelection_lowIndHetStan

clusterArray = [virLud virLudHet nit nitHet virLud_troch troch trochHet obs obs_plumb plumb plumbHet vir_plumb]

sum(clusterArray, dims=1)

if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
    println("Good news: Individuals included in a group matches total number of individuals")
else 
    println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end

# check which individuals left out:

sum(clusterArray, dims=2)

PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[vec(sum(clusterArray, dims=2) .== 0)]

# create vectors that indicate the groups and plot order for this analysis:
clusterMembershipWithHets = fill("none", nrow(PCAmodelAll.metadata))
plotOrderWithHets = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembershipWithHets[clusterArray[:,i]] .= clusterNamesWithHets[i]
    plotOrderWithHets[clusterArray[:,i]] .= i
end

# Add column to main metadata object containing the cluster membership for this highHet region:
command = "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
eval(Meta.parse(command)) # this executes the command constructed above

# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.Fst_group = clusterMembershipWithHets
PCAmodelAll.metadata.plot_order = plotOrderWithHets

# limit the number of individuals per group to plot
numIndsToPlotWithHets = fill(15, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show GBI plot according to original groups and plot order

#PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.original_Fst_group
PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

show same but with all individuals

#PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.original_Fst_group
PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

# Set no limit (or high limit anyway) on the number of individuals per group to plot
numIndsToPlotWithHets = fill(1000, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Chr 2 is complicated, shows recomb and perhaps some haploblock sharing between east Siberia and the southwestern area. Hard to show in summary figure but should perhaps mention.

Same for chr 3

# choose scaffold
chr = "gw3"

positionMin, positionMax, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
    getWindowedIndHetStanRegion(genosOnly_included, 
                            pos_SNP_filtered, 
                            highViSHetRegions, chr;
                            windowSize = 500)

# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)

# Add column to metadata containing the regionIndHetStan for this highHet region:
command = "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
eval(Meta.parse(command)) # this executes the command constructed above
ind_with_metadata_included.regionIndHetStan = meanAcrossRegionIndHetStan

#names(ind_with_metadata_included)

# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)

# PCA of all individuals:

genos_highViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))

flipPC1 = false
flipPC2 = true

PCAmodelAll = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelAll.PCAfig)

# Add PC values to metadata for individuals included in PCA above:
if flipPC1
    PCAmodelAll.metadata.PC1 = -1 .* PCAmodelAll.values[1,:]
else 
    PCAmodelAll.metadata.PC1 = PCAmodelAll.values[1,:]
end
if flipPC2
    PCAmodelAll.metadata.PC2 = -1 .* PCAmodelAll.values[2,:]
else
    PCAmodelAll.metadata.PC2 = PCAmodelAll.values[2,:]
end
PCAmodelAll.metadata.PC3 = PCAmodelAll.values[3,:]

# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`

# choose inds with low IndHet in high ViSHet region:
indSelection_lowIndHetStan = (meanAcrossRegionIndHetStan .< 1.25) 

#Plot only the lowIndHetStan individuals:

f = CairoMakie.Figure();
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
More than 1 region on that scaffold. Using just the longest one.
2×3 DataFrame
Row regionChrom regionStart regionEnd
String Int64 Int64
1 gw3 101192949 103495514
2 gw3 104554714 108279595
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Save the individual colors in the metadata

indColors = fill("", size(PCAmodelAll.metadata, 1))
for i in axes(PCAmodelAll.metadata, 1)
    indColors[i] = group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
end
PCAmodelAll.metadata.indColorLeft = indColors
PCAmodelAll.metadata.indColorRight = indColors;

Plot PC1 vs. PC2

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Plot PC1 vs. PC3

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC3",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC3", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

At chr 3 high ViSHet region, there are only 4 clear haplogroups (vir and lud separated though on PC3, but not in a clean way that I can distinguish clearly). Divide samples into those groups, based on PCA scores, and then calculate pi and Dxy

clusterNames = ["virLud",
                "nit",
                "trochObs",
                "plumb"]

clusterColors = ["blue",
                "grey",
                "yellow",
                "red"]

virLud = (PCAmodelAll.metadata.PC1 .< -4) .&
            indSelection_lowIndHetStan
nit = (-4 .< PCAmodelAll.metadata.PC1 .< -2) .&
            indSelection_lowIndHetStan
trochObs = (-2 .< PCAmodelAll.metadata.PC1 .< 2.5) .&
            (PCAmodelAll.metadata.PC2 .< -3) .&
            indSelection_lowIndHetStan
plumb = (5 .< PCAmodelAll.metadata.PC1) .& 
            (2 .< PCAmodelAll.metadata.PC2) .&
            indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[trochObs]
PCAmodelAll.metadata.Fst_group[plumb]

clusterArray = [virLud nit trochObs plumb]

# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")

# create vectors that indicate the groups and plot order for this analysis:
clusterMembership = fill("none", nrow(PCAmodelAll.metadata))
plotOrder = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembership[clusterArray[:,i]] .= clusterNames[i]
    plotOrder[clusterArray[:,i]] .= i
end

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
println("Calculated population allele frequencies and sample sizes")

# Calculate per-site pi (within-group nucleotide distance)
sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames; among=false)  # set among to FALSE if no among Fst wanted (some things won't work without it) 

# Now get averages of pi and Dxy for whole region:

regionPiTable = DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
#= 4×2 DataFrame
 Row │ cluster   pi         
     │ String    Float64    
─────┼──────────────────────
   1 │ virLud    0.00950795
   2 │ nit       0.00509165
   3 │ trochObs  0.00992915
   4 │ plumb     0.00992294 =#

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 6×2 DataFrame
 Row │ cluster_pair     Dxy       
     │ String           Float64   
─────┼────────────────────────────
   1 │ virLud_nit       0.0234761
   2 │ virLud_trochObs  0.0309999
   3 │ virLud_plumb     0.0345515
   4 │ nit_trochObs     0.0320461
   5 │ nit_plumb        0.0351086
   6 │ trochObs_plumb   0.0305924 =#

# Make a genotype-by-individual plot using all variable loci in the region,
missingFractionAllowed = 0.1
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.original_Fst_groups = PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.Fst_group = clusterMembership
PCAmodelAll.metadata.original_plot_order = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.plot_order = plotOrder

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
pos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
Fst_selectedSNPs = Fst[:, selectedSNPs]
freqs_selectedSNPs = freqs[:, selectedSNPs]

# limit the number of individuals per group to plot
numIndsToPlot = fill(150, length(clusterNames))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNames, numIndsToPlot, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = false)

# sort based on original_plot_order, and then together with function below will arrange individuals in population order within clusters:
sortOrder = sortperm(indMetadataforGBI.original_plot_order, rev=false)
indMetadataforGBI = indMetadataforGBI[sortOrder, :]
genosForGBI = genosForGBI[sortOrder, :]

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;
                indFontSize=6, figureSize=(800, 1800),
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = true,
                indColorRightProvided = true);
The numbers in each group are [64 2 72 63] and the sum of those is 201
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now show a GBI plot like above, but with heterozygotes

clusterNamesWithHets = ["virLud",
                        "virLudHet",
                        "nit",
                        "virLud_trochObs",
                        "trochObs",
                        "trochObsHet",
                        "plumb",
                        "plumbHet",
                        "vir_plumb"]

clusterColorsWithHets = ["blue",
                        "blue",
                        "grey",
                        "green",
                        "yellow",
                        "orange",
                        "red",
                        "red",
                        "purple"]

virLudHet = (PCAmodelAll.metadata.PC1 .< -4) .& 
            (2.5 .< PCAmodelAll.metadata.PC2) .&
            .!indSelection_lowIndHetStan
virLud_trochObs = (-5 .< PCAmodelAll.metadata.PC1 .< -1.5) .&
                (-3 .< PCAmodelAll.metadata.PC2 .< 1) .&
                 .!indSelection_lowIndHetStan
trochObsHet = (-2 .< PCAmodelAll.metadata.PC1 .< 2.5) .&
            (PCAmodelAll.metadata.PC2 .< -3) .&
            .!indSelection_lowIndHetStan
plumbHet = (5 .< PCAmodelAll.metadata.PC1) .& 
            (2 .< PCAmodelAll.metadata.PC2) .&
            .!indSelection_lowIndHetStan
vir_plumb = (-1 .< PCAmodelAll.metadata.PC1 .< 2) .&
                (3 .< PCAmodelAll.metadata.PC2 .< 5) .&
                 .!indSelection_lowIndHetStan

clusterArray = [virLud virLudHet nit virLud_trochObs trochObs trochObsHet plumb plumbHet vir_plumb]

sum(clusterArray, dims=1)

if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
    println("Good news: Individuals included in a group matches total number of individuals")
else 
    println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end

# check which individuals left out:
sum(clusterArray, dims=2)

PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[vec(sum(clusterArray, dims=2) .== 0)]

# create vectors that indicate the groups and plot order for this analysis:
clusterMembershipWithHets = fill("none", nrow(PCAmodelAll.metadata))
plotOrderWithHets = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembershipWithHets[clusterArray[:,i]] .= clusterNamesWithHets[i]
    plotOrderWithHets[clusterArray[:,i]] .= i
end

# Add column to main metadata object containing the cluster membership for this highHet region:
command = "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
eval(Meta.parse(command)) # this executes the command constructed above

# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.Fst_group = clusterMembershipWithHets
PCAmodelAll.metadata.plot_order = plotOrderWithHets

# limit the number of individuals per group to plot
numIndsToPlotWithHets = fill(15, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the west area (without nitidus)

clusterNamesWithHetsWest = ["virLud",
                        "virLudHet",
                        "virLud_trochObs",
                        "trochObs",
                        "trochObsHet"]

clusterColorsWithHetsWest = ["blue",
                        "blue",
                        "yellowgreen",
                        "yellow",
                        "yellow"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = fill(100, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the east area

clusterNamesWithHetsEast = ["trochObs",
                            "trochObsHet",
                            "plumb",
                            "plumbHet"]

clusterColorsWithHetsEast = ["yellow",
                            "yellow",
                            "red",
                            "red"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHetsEast = fill(100, length(clusterNamesWithHetsEast))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the northern area

clusterNamesWithHetsNorth = ["virLud",
                            "virLudHet",
                            "vir_plumb",
                            "plumb",
                            "plumbHet"]

clusterColorsWithHetsNorth = ["blue",
                            "blue",
                            "purple",
                            "red",
                            "red"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = fill(100, length(clusterNamesWithHetsNorth))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Same for chr 18

# choose scaffold
chr = "gw18"

positionMin, positionMax, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
    getWindowedIndHetStanRegion(genosOnly_included, 
                            pos_SNP_filtered, 
                            highViSHetRegions, chr;
                            windowSize = 500)

# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)

# Add column to metadata containing the regionIndHetStan for this highHet region:
command = "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
eval(Meta.parse(command)) # this executes the command constructed above
ind_with_metadata_included.regionIndHetStan = meanAcrossRegionIndHetStan

# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)

# PCA of all individuals:

genos_highViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))

flipPC1 = true
flipPC2 = true

PCAmodelAll = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelAll.PCAfig)

# Add PC values to metadata for individuals included in PCA above:
if flipPC1
    PCAmodelAll.metadata.PC1 = -1 .* PCAmodelAll.values[1,:]
else 
    PCAmodelAll.metadata.PC1 = PCAmodelAll.values[1,:]
end
if flipPC2
    PCAmodelAll.metadata.PC2 = -1 .* PCAmodelAll.values[2,:]
else
    PCAmodelAll.metadata.PC2 = PCAmodelAll.values[2,:]
end
PCAmodelAll.metadata.PC3 = PCAmodelAll.values[3,:]

# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`

# choose inds with low IndHet in high ViSHet region:
indSelection_lowIndHetStan = (meanAcrossRegionIndHetStan .< 1.55) 

#Plot only the lowIndHetStan individuals:

f = CairoMakie.Figure();
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Save the individual colors in the metadata

indColors = fill("", size(PCAmodelAll.metadata, 1))
for i in axes(PCAmodelAll.metadata, 1)
    indColors[i] = group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
end
PCAmodelAll.metadata.indColorLeft = indColors
PCAmodelAll.metadata.indColorRight = indColors;

Plot PC1 vs. PC2

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Plot PC1 vs. PC3

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC3",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC3", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

At chr 18 high ViSHet region, there are 5 clear haplogroups (vir and lud separated though on PC3, though not clearly enough to indicate as different in summary plot). Divide samples into those groups, based on PCA scores, and calculate pi and Dxy.

clusterNames = ["virLud",
                "nit",
                "troch",
                "obs",
                "plumb"]

clusterColors = ["blue",
                "grey",
                "yellow",
                "orange",
                "red"]

virLud = (PCAmodelAll.metadata.PC1 .< -7) .& 
            indSelection_lowIndHetStan
nit = (-6 .< PCAmodelAll.metadata.PC1 .< -4) .&
            indSelection_lowIndHetStan
troch = (2 .< PCAmodelAll.metadata.PC1 .< 5) .&
            (PCAmodelAll.metadata.PC2 .< -5) .&
            indSelection_lowIndHetStan
obs = (2 .< PCAmodelAll.metadata.PC1 .< 5) .&
            (-5 .< PCAmodelAll.metadata.PC2 .< -2) .&
            indSelection_lowIndHetStan
plumb = (4 .< PCAmodelAll.metadata.PC1) .& 
            (3 .< PCAmodelAll.metadata.PC2) .&
            indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]

clusterArray = [virLud nit troch obs plumb]

# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")

# create vectors that indicate the groups and plot order for this analysis:
clusterMembership = fill("none", nrow(PCAmodelAll.metadata))
plotOrder = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembership[clusterArray[:,i]] .= clusterNames[i]
    plotOrder[clusterArray[:,i]] .= i
end

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
println("Calculated population allele frequencies and sample sizes")

# Calculate per-site pi (within-group nucleotide distance)

sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"

Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames; among=false)  # set among to FALSE if no among Fst wanted (some things won't work without it) 

# Now get averages of pi and Dxy for whole region:

regionPiTable = DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
#= 5×2 DataFrame
 Row │ cluster  pi         
     │ String   Float64    
─────┼─────────────────────
   1 │ virLud   0.0110074
   2 │ nit      0.00453689
   3 │ troch    0.00973106
   4 │ obs      0.0123218
   5 │ plumb    0.00925472 =#

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 10×2 DataFrame
 Row │ cluster_pair  Dxy       
     │ String        Float64   
─────┼─────────────────────────
   1 │ virLud_nit    0.0263493
   2 │ virLud_troch  0.0361335
   3 │ virLud_obs    0.0359267
   4 │ virLud_plumb  0.0395363
   5 │ nit_troch     0.0371472
   6 │ nit_obs       0.0377076
   7 │ nit_plumb     0.0400618
   8 │ troch_obs     0.0169656
   9 │ troch_plumb   0.0287838
  10 │ obs_plumb     0.0290661 =#

# Make a genotype-by-individual plot using all variable loci in the region,
missingFractionAllowed = 0.1
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.original_Fst_groups = PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.Fst_group = clusterMembership
PCAmodelAll.metadata.original_plot_order = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.plot_order = plotOrder

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
pos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
Fst_selectedSNPs = Fst[:, selectedSNPs]
freqs_selectedSNPs = freqs[:, selectedSNPs]

# limit the number of individuals per group to plot
numIndsToPlot = fill(15, length(clusterNames))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNames, numIndsToPlot, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;
                missingFractionAllowed = missingFractionAllowed,
                indColorRightProvided = true);
The numbers in each group are [72 2 76 4 70] and the sum of those is 224
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now show a GBI plot like above, but with heterozygotes

clusterNamesWithHets = ["virLud",
                        "nit",
                        "virLud_troch",
                        "troch",
                        "obs",
                        "obs_plumb",
                        "plumb",
                        "vir_plumb"]

clusterColorsWithHets = ["blue",
                        "grey",
                        "green",
                        "yellow",
                        "orange",
                        "darkorange1",
                        "red",
                        "purple"]

virLud_troch = (-5.5 .< PCAmodelAll.metadata.PC1 .< -0.5) .&
                (-4 .< PCAmodelAll.metadata.PC2 .< -0.25) .&
                 .!indSelection_lowIndHetStan
obs_plumb = (4 .< PCAmodelAll.metadata.PC1 .< 5) .& 
            (0 .< PCAmodelAll.metadata.PC2 .< 2) .&
            .!indSelection_lowIndHetStan
vir_plumb = (-3 .< PCAmodelAll.metadata.PC1 .< -1) .&
                (2.5 .< PCAmodelAll.metadata.PC2 .< 5) .&
                 .!indSelection_lowIndHetStan

clusterArray = [virLud nit virLud_troch troch obs obs_plumb plumb vir_plumb]

sum(clusterArray, dims=1)

if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
    println("Good news: Individuals included in a group matches total number of individuals")
else 
    println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end

# check which individuals left out:

sum(clusterArray, dims=2)

PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[vec(sum(clusterArray, dims=2) .== 0)]

# create vectors that indicate the groups and plot order for this analysis:
clusterMembershipWithHets = fill("none", nrow(PCAmodelAll.metadata))
plotOrderWithHets = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembershipWithHets[clusterArray[:,i]] .= clusterNamesWithHets[i]
    plotOrderWithHets[clusterArray[:,i]] .= i
end

# Add column to main metadata object containing the cluster membership for this highHet region:
command = "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
eval(Meta.parse(command)) # this executes the command constructed above

# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.Fst_group = clusterMembershipWithHets
PCAmodelAll.metadata.plot_order = plotOrderWithHets

# limit the number of individuals per group to plot
numIndsToPlotWithHets = fill(15, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show GBI plot according to original groups and plot order

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with all individuals

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

# Set no limit (or high limit anyway) on the number of individuals per group to plot
numIndsToPlotWithHets = fill(1000, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with only vir and plumb pops

includeTheseClusters = ["virLud", "plumb"] # these are the haplotype clusters to include in the choice below of SNPs to show

# Calculate allele freqs and sample sizes
freqs_local, sampleSizes_local = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)

selectedSNPs = (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
genosForGBI = genos_selectedSNPs[:, selectedSNPs]
posForGBI = pos_selectedSNPs[selectedSNPs, :]
freqsForGBI = freqs_local[:, selectedSNPs]

plotGroups = ["vir", "plumb", "plumb_vir"] # these are the original Fst_groups
plotGroupColors = ["blue", "red", "purple"]

metadataForGBI = copy(PCAmodelAll.metadata)

metadataForGBI.Fst_group = metadataForGBI.original_Fst_groups

plotGenotypeByIndividual(regionInfo, posForGBI,
                genosForGBI, metadataForGBI, freqsForGBI, plotGroups, plotGroupColors;
                missingFractionAllowed = missingFractionAllowed)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

(Scene (768px, 960px):
  0 Plots
  2 Child Scenes:
    ├ Scene (768px, 960px)
    └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 0 0; … ; 2 2 … 0 2; 2 2 … 1 2], [6108876, 6293642, 6414325, 6456723, 6456819, 6477300, 6631300, 6812016, 6812064, 6838561  …  8668147, 8681264, 8750618, 8750639, 8750642, 8772183, 8773241, 8773281, 8784596, 8833826], 100×32 DataFrame
 Row  ind                        ID                         location  group   ⋯
     │ String                     String                     String7   String1 ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ GW_Armando_plate1_JF12G04  GW_Armando_plate1_JF12G04  ST_vi     vir     ⋯
   2 │ GW_Armando_plate2_JF03G01  GW_Armando_plate2_JF03G01  ST_vi     vir_mis
   3 │ GW_Armando_plate2_JF30G01  GW_Armando_plate2_JF30G01  ST_vi     vir_mis
   4 │ GW_Lane5_STvi1             GW_Lane5_STvi1             ST_vi     vir
   5 │ GW_Lane5_STvi2             GW_Lane5_STvi2             ST_vi     vir     ⋯
   6 │ GW_Lane5_STvi3             GW_Lane5_STvi3             ST_vi     vir
   7 │ GW_Armando_plate1_JF16G01  GW_Armando_plate1_JF16G01  DV_vi     plumb_v
   8 │ GW_Armando_plate2_JF16G02  GW_Armando_plate2_JF16G02  DV_vi     plumb_v
   9 │ GW_Armando_plate2_JE31G01  GW_Armando_plate2_JE31G01  VB_vi     vir_mis ⋯
  10 │ GW_Armando_plate2_JF03G02  GW_Armando_plate2_JF03G02  VB_vi     vir_mis
  11 │ GW_Lane5_YK11              GW_Lane5_YK11              YK        vir
  ⋮  │             ⋮                          ⋮                 ⋮          ⋮   ⋱
  91 │ GW_Armando_plate2_JF24G01  GW_Armando_plate2_JF24G01  VB        plumb
  92 │ GW_Armando_plate2_JF25G01  GW_Armando_plate2_JF25G01  VB        plumb   ⋯
  93 │ GW_Armando_plate1_JG02G02  GW_Armando_plate1_JG02G02  PR        plumb
  94 │ GW_Armando_plate1_JG02G04  GW_Armando_plate1_JG02G04  PR        plumb
  95 │ GW_Armando_plate2_JG01G01  GW_Armando_plate2_JG01G01  PR        plumb
  96 │ GW_Armando_plate2_JG02G01  GW_Armando_plate2_JG02G01  PR        plumb   ⋯
  97 │ GW_Armando_plate2_JG02G03  GW_Armando_plate2_JG02G03  PR        plumb
  98 │ GW_Lane5_SL1               GW_Lane5_SL1               SL        plumb
  99 │ GW_Lane5_SL2               GW_Lane5_SL2               SL        plumb
 100 │ GW_Armando_plate1_JF10G03  GW_Armando_plate1_JF10G03  ST        plumb_v ⋯
                                                  29 columns and 79 rows omitted)

Show just the west area (without nitidus)

clusterNamesWithHetsWest = ["virLud",
                        "virLud_troch",
                        "troch"]

clusterColorsWithHetsWest = ["blue",
                        "green",
                        "yellow"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = fill(100, length(clusterNamesWithHetsWest))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the east area

clusterNamesWithHetsEast = ["obs",
                            "obs_plumb",
                            "plumb"]

clusterColorsWithHetsEast = ["yellow",
                            "darkorange1",
                            "red"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHetsEast = fill(100, length(clusterNamesWithHetsEast))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the northern area

clusterNamesWithHetsNorth = ["virLud",
                            "vir_plumb",
                            "plumb"]

clusterColorsWithHetsNorth = ["blue",
                            "purple",
                            "red"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = fill(100, length(clusterNamesWithHetsNorth))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Tried chr 12 but sort of a mess. The ludlowi samples fall in all clusters, even plumb! Would be good to look more at this one in the future.

Same for chr 13

# choose scaffold
chr = "gw13"

positionMin, positionMax, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
    getWindowedIndHetStanRegion(genosOnly_included, 
                            pos_SNP_filtered, 
                            highViSHetRegions, chr;
                            windowSize = 500)

# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)

# Add column to metadata containing the regionIndHetStan for this highHet region:
command = "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
eval(Meta.parse(command)) # this executes the command constructed above
ind_with_metadata_included.regionIndHetStan = meanAcrossRegionIndHetStan

# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)

# PCA of all individuals:

genos_highViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))

flipPC1 = true
flipPC2 = true

PCAmodelAll = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelAll.PCAfig)

# Add PC values to metadata for individuals included in PCA above:
if flipPC1
    PCAmodelAll.metadata.PC1 = -1 .* PCAmodelAll.values[1,:]
else 
    PCAmodelAll.metadata.PC1 = PCAmodelAll.values[1,:]
end
if flipPC2
    PCAmodelAll.metadata.PC2 = -1 .* PCAmodelAll.values[2,:]
else
    PCAmodelAll.metadata.PC2 = PCAmodelAll.values[2,:]
end
PCAmodelAll.metadata.PC3 = PCAmodelAll.values[3,:]

# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`

# choose inds with low IndHet in high ViSHet region:
indSelection_lowIndHetStan = (meanAcrossRegionIndHetStan .< 1.75) 

#Plot only the lowIndHetStan individuals:

f = CairoMakie.Figure();
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
More than 1 region on that scaffold. Using just the longest one.
3×3 DataFrame
Row regionChrom regionStart regionEnd
String Int64 Int64
1 gw13 13574177 13722280
2 gw13 14099239 15243036
3 gw13 15413381 15607553
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Save the individual colors in the metadata

indColors = fill("", size(PCAmodelAll.metadata, 1))
for i in axes(PCAmodelAll.metadata, 1)
    indColors[i] = group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
end
PCAmodelAll.metadata.indColorLeft = indColors
PCAmodelAll.metadata.indColorRight = indColors;

Plot PC1 vs. PC2

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Plot PC1 vs. PC3

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC3",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC3", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

At chr 13 high ViSHet region, there are 6 clear haplogroups (vir and lud separated cleanly on PC3, with one hetero between them). Divide samples into those groups, based on PCA scores, and calculate pi and Dxy.

clusterNames = ["vir",
                "nit",
                "lud",
                "troch",
                "obs",
                "plumb"]

clusterColors = ["blue",
                "grey",
                "green",
                "yellow",
                "orange",
                "red"]

vir = (PCAmodelAll.metadata.PC1 .< -4) .&
        (2 .< PCAmodelAll.metadata.PC3) .& 
            indSelection_lowIndHetStan
nit = (-4 .< PCAmodelAll.metadata.PC1 .< -1) .&
        (2.5 .< PCAmodelAll.metadata.PC2 .< 3.5) .&
            indSelection_lowIndHetStan
lud = (PCAmodelAll.metadata.PC1 .< -5) .&
        (PCAmodelAll.metadata.PC3 .< -2) .& 
            indSelection_lowIndHetStan
troch = (-2 .< PCAmodelAll.metadata.PC1 .< 0) .&
            (PCAmodelAll.metadata.PC2 .< -5.5) .&
            indSelection_lowIndHetStan
obs = (-1 .< PCAmodelAll.metadata.PC1 .< 3) .&
            (-5.5 .< PCAmodelAll.metadata.PC2 .< -2.5) .&
            indSelection_lowIndHetStan
plumb = (6 .< PCAmodelAll.metadata.PC1) .& 
            (1 .< PCAmodelAll.metadata.PC2) .&
            indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[vir]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[lud]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]

clusterArray = [vir nit lud troch obs plumb]

# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")

# create vectors that indicate the groups and plot order for this analysis:
clusterMembership = fill("none", nrow(PCAmodelAll.metadata))
plotOrder = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembership[clusterArray[:,i]] .= clusterNames[i]
    plotOrder[clusterArray[:,i]] .= i
end

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
println("Calculated population allele frequencies and sample sizes")

# Calculate per-site pi (within-group nucleotide distance)
sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames; among=false)  # set among to FALSE if no among Fst wanted (some things won't work without it) 

# Now get averages of pi and Dxy for whole region:

regionPiTable = DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
#= 6×2 DataFrame
 Row │ cluster  pi         
     │ String   Float64    
─────┼─────────────────────
   1 │ vir      0.00875059
   2 │ nit      0.00517962
   3 │ lud      0.00819617
   4 │ troch    0.00565913
   5 │ obs      0.0090813
   6 │ plumb    0.00929977 =#

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 15×2 DataFrame
 Row │ cluster_pair  Dxy       
     │ String        Float64   
─────┼─────────────────────────
   1 │ vir_nit       0.035675
   2 │ vir_lud       0.0188542
   3 │ vir_troch     0.0297034
   4 │ vir_obs       0.028434
   5 │ vir_plumb     0.0382774
   6 │ nit_lud       0.0377189
   7 │ nit_troch     0.0437711
   8 │ nit_obs       0.0424561
   9 │ nit_plumb     0.0482994
  10 │ lud_troch     0.0303352
  11 │ lud_obs       0.0294719
  12 │ lud_plumb     0.0394332
  13 │ troch_obs     0.0124742
  14 │ troch_plumb   0.0313941
  15 │ obs_plumb     0.0300717 =#

# Make a genotype-by-individual plot using all variable loci in the region,
missingFractionAllowed = 0.1
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.original_Fst_groups = PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.Fst_group = clusterMembership
PCAmodelAll.metadata.original_plot_order = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.plot_order = plotOrder

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
pos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
Fst_selectedSNPs = Fst[:, selectedSNPs]
freqs_selectedSNPs = freqs[:, selectedSNPs]

# limit the number of individuals per group to plot
numIndsToPlot = fill(15, length(clusterNames))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNames, numIndsToPlot, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;
                missingFractionAllowed = missingFractionAllowed,
                indColorRightProvided = true);
The numbers in each group are [38 2 41 67 5 68] and the sum of those is 221
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show a GBI plot like above, but with heterozygotes

clusterNamesWithHets = ["vir",
                "vir_lud",
                "nit",
                "lud",
                "lud_troch",
                "troch",
                "obs",
                "plumb",
                "plumbHet",
                "vir_plumb"]

clusterColorsWithHets = ["blue",
                "seagreen",
                "grey",
                "green",
                "green2",
                "yellow",
                "orange",
                "red",
                "red",
                "purple"]

vir_lud = (PCAmodelAll.metadata.PC1 .< -5) .&
            (-1 .< PCAmodelAll.metadata.PC3 .< 1)
lud_troch = (-5 .< PCAmodelAll.metadata.PC1 .< -2) .&
                (-3.5 .< PCAmodelAll.metadata.PC2 .< 0) .&
                 .!indSelection_lowIndHetStan
plumbHet = (7 .< PCAmodelAll.metadata.PC1) .& 
            (1 .< PCAmodelAll.metadata.PC2) .&
            .!indSelection_lowIndHetStan
vir_plumb = (1 .< PCAmodelAll.metadata.PC1 .< 4) .&
                (2 .< PCAmodelAll.metadata.PC2 .< 5) .&
                 .!indSelection_lowIndHetStan

clusterArray = [vir vir_lud nit lud lud_troch troch obs plumb plumbHet vir_plumb]

sum(clusterArray, dims=1)

if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
    println("Good news: Individuals included in a group matches total number of individuals")
else 
    println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end

# check which individuals left out:
sum(clusterArray, dims=2)

PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[vec(sum(clusterArray, dims=2) .== 0)]

# create vectors that indicate the groups and plot order for this analysis:
clusterMembershipWithHets = fill("none", nrow(PCAmodelAll.metadata))
plotOrderWithHets = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembershipWithHets[clusterArray[:,i]] .= clusterNamesWithHets[i]
    plotOrderWithHets[clusterArray[:,i]] .= i
end

# Add column to main metadata object containing the cluster membership for this highHet region:
command = "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
eval(Meta.parse(command)) # this executes the command constructed above

# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.Fst_group = clusterMembershipWithHets
PCAmodelAll.metadata.plot_order = plotOrderWithHets

# limit the number of individuals per group to plot
numIndsToPlotWithHets = fill(15, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show GBI plot according to original groups and plot order

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with all individuals

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

# Set no limit (or high limit anyway) on the number of individuals per group to plot
numIndsToPlotWithHets = fill(1000, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with only vir and plumb pops

includeTheseClusters = ["vir", "plumb"] # these are the haplotype clusters to include in the choice below of SNPs to show

# Calculate allele freqs and sample sizes
freqs_local, sampleSizes_local = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)

selectedSNPs = (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
genosForGBI = genos_selectedSNPs[:, selectedSNPs]
posForGBI = pos_selectedSNPs[selectedSNPs, :]
freqsForGBI = freqs_local[:, selectedSNPs]

plotGroups = ["vir", "plumb", "plumb_vir"] # these are the original Fst_groups
plotGroupColors = ["blue", "red", "purple"]

metadataForGBI = copy(PCAmodelAll.metadata)

metadataForGBI.Fst_group = metadataForGBI.original_Fst_groups

plotGenotypeByIndividual(regionInfo, posForGBI,
                genosForGBI, metadataForGBI, freqsForGBI, plotGroups, plotGroupColors;
                missingFractionAllowed = missingFractionAllowed)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

(Scene (768px, 960px):
  0 Plots
  2 Child Scenes:
    ├ Scene (768px, 960px)
    └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 0 0; … ; 2 2 … 2 2; 2 2 … 2 2], [14109196, 14121642, 14127880, 14152299, 14191715, 14212308, 14234576, 14246931, 14261718, 14282920  …  15059893, 15061446, 15061449, 15084470, 15089863, 15108049, 15127266, 15134320, 15177202, 15177253], 100×34 DataFrame
 Row  ind                        ID                         location  group   ⋯
     │ String                     String                     String7   String1 ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ GW_Armando_plate1_JF12G04  GW_Armando_plate1_JF12G04  ST_vi     vir     ⋯
   2 │ GW_Armando_plate2_JF03G01  GW_Armando_plate2_JF03G01  ST_vi     vir_mis
   3 │ GW_Armando_plate2_JF30G01  GW_Armando_plate2_JF30G01  ST_vi     vir_mis
   4 │ GW_Lane5_STvi1             GW_Lane5_STvi1             ST_vi     vir
   5 │ GW_Lane5_STvi2             GW_Lane5_STvi2             ST_vi     vir     ⋯
   6 │ GW_Lane5_STvi3             GW_Lane5_STvi3             ST_vi     vir
   7 │ GW_Armando_plate1_JF16G01  GW_Armando_plate1_JF16G01  DV_vi     plumb_v
   8 │ GW_Armando_plate2_JF16G02  GW_Armando_plate2_JF16G02  DV_vi     plumb_v
   9 │ GW_Armando_plate2_JE31G01  GW_Armando_plate2_JE31G01  VB_vi     vir_mis ⋯
  10 │ GW_Armando_plate2_JF03G02  GW_Armando_plate2_JF03G02  VB_vi     vir_mis
  11 │ GW_Lane5_YK11              GW_Lane5_YK11              YK        vir
  ⋮  │             ⋮                          ⋮                 ⋮          ⋮   ⋱
  91 │ GW_Armando_plate2_JF24G01  GW_Armando_plate2_JF24G01  VB        plumb
  92 │ GW_Armando_plate2_JF25G01  GW_Armando_plate2_JF25G01  VB        plumb   ⋯
  93 │ GW_Armando_plate1_JG02G02  GW_Armando_plate1_JG02G02  PR        plumb
  94 │ GW_Armando_plate1_JG02G04  GW_Armando_plate1_JG02G04  PR        plumb
  95 │ GW_Armando_plate2_JG01G01  GW_Armando_plate2_JG01G01  PR        plumb
  96 │ GW_Armando_plate2_JG02G01  GW_Armando_plate2_JG02G01  PR        plumb   ⋯
  97 │ GW_Armando_plate2_JG02G03  GW_Armando_plate2_JG02G03  PR        plumb
  98 │ GW_Lane5_SL1               GW_Lane5_SL1               SL        plumb
  99 │ GW_Lane5_SL2               GW_Lane5_SL2               SL        plumb
 100 │ GW_Armando_plate1_JF10G03  GW_Armando_plate1_JF10G03  ST        plumb_v ⋯
                                                  31 columns and 79 rows omitted)

Show just the west area (without nitidus)

clusterNamesWithHetsWest = ["vir",
                        "vir_lud",
                        "lud",
                        "lud_troch",
                        "troch"]

clusterColorsWithHetsWest = ["blue",
                        "seagreen",
                        "green",
                        "green2",
                        "yellow"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = fill(100, length(clusterNamesWithHetsWest))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the east area

clusterNamesWithHetsEast = ["obs",
                            "plumb",
                            "plumbHet"]

clusterColorsWithHetsEast = ["orange",
                            "red",
                            "red"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHetsEast = fill(100, length(clusterNamesWithHetsEast))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the northern area

clusterNamesWithHetsNorth = ["vir",
                            "vir_plumb",
                            "plumb",
                            "plumbHet"]

clusterColorsWithHetsNorth = ["blue",
                            "purple",
                            "red",
                            "red"]

freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = fill(100, length(clusterNamesWithHetsNorth))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Do a PCA based on a same-size region elsewhere on gw13 (with low ViSHet):

# get length of region
lengthHighViSHetRegion = positionMax - positionMin

leftLocus = 1_000_000 # start at 1 Mb from left side
rightLocus = leftLocus + lengthHighViSHetRegion
regionText_lowViSHetRegion = string("chr ", chr, " ",leftLocus," to ",rightLocus)

lociSelection = (leftLocus .<= pos_region.position .<= rightLocus)
genotypes_lowViSHetRegion = genotypes_region[:, lociSelection]

# impute missing genotypes:
genotypes_lowViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genotypes_lowViSHetRegion))

flipPC1 = true
flipPC2 = true

PCAmodel = plotPCA(genotypes_lowViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText_lowViSHetRegion,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodel.PCAfig)
if false  # set to true to save plot
    save("FigureS2B_gw13_nonHLBRarbitrary_from_Julia.png", PCAmodel.PCAfig, px_per_unit = 2.0)
end 
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Tried chr 14 but not very clear. Looks like recombination backcrosses in ludlowi.

Same for chr 17

Shows a pattern seen in chr 2 (and others) where ludlowi samples have some plumb haplotypes.

# choose scaffold
chr = "gw17"

positionMin, positionMax, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
    getWindowedIndHetStanRegion(genosOnly_included, 
                            pos_SNP_filtered, 
                            highViSHetRegions, chr;
                            windowSize = 500)

# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)

# Add column to metadata containing the regionIndHetStan for this highHet region:
command = "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
eval(Meta.parse(command)) # this executes the command constructed above
ind_with_metadata_included.regionIndHetStan = meanAcrossRegionIndHetStan

# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)

# PCA of all individuals:

genos_highViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))

flipPC1 = false
flipPC2 = false

PCAmodelAll = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelAll.PCAfig)

# Add PC values to metadata for individuals included in PCA above:
if flipPC1
    PCAmodelAll.metadata.PC1 = -1 .* PCAmodelAll.values[1,:]
else 
    PCAmodelAll.metadata.PC1 = PCAmodelAll.values[1,:]
end
if flipPC2
    PCAmodelAll.metadata.PC2 = -1 .* PCAmodelAll.values[2,:]
else
    PCAmodelAll.metadata.PC2 = PCAmodelAll.values[2,:]
end
PCAmodelAll.metadata.PC3 = PCAmodelAll.values[3,:]

# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`

# choose inds with low IndHet in high ViSHet region:
indSelection_lowIndHetStan = (meanAcrossRegionIndHetStan .< 1.4) 

#Plot only the lowIndHetStan individuals:

f = CairoMakie.Figure();
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Save the individual colors in the metadata

indColors = fill("", size(PCAmodelAll.metadata, 1))
for i in axes(PCAmodelAll.metadata, 1)
    indColors[i] = group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
end
PCAmodelAll.metadata.indColorLeft = indColors
PCAmodelAll.metadata.indColorRight = indColors;

Plot PC1 vs. PC2

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Plot PC1 vs. PC3

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC3",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC3", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}
clusterNames = ["virLud",
                "nit",
                "troch",
                "obs",
                "plumb"]

clusterColors = ["blue",
                "grey",
                "yellow",
                "orange",
                "red"]

virLud = (PCAmodelAll.metadata.PC1 .< -6.5) .&
            (0.5 .< PCAmodelAll.metadata.PC2 .< 4) .&
            indSelection_lowIndHetStan
nit = (-6 .< PCAmodelAll.metadata.PC1 .< -4.5) .&
        (0.5 .< PCAmodelAll.metadata.PC2 .< 1.5) .&
            indSelection_lowIndHetStan
troch = (2 .< PCAmodelAll.metadata.PC1 .< 4.5) .&
            (PCAmodelAll.metadata.PC2 .< -4) .&
            indSelection_lowIndHetStan
obs = (1.5 .< PCAmodelAll.metadata.PC1 .< 5) .&
            (-3.5 .< PCAmodelAll.metadata.PC2 .< 2.5) .&
            indSelection_lowIndHetStan
plumb = (3.5 .< PCAmodelAll.metadata.PC1) .& 
            (3 .< PCAmodelAll.metadata.PC2) .&
            indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]

clusterArray = [virLud nit troch obs plumb]

# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")

# create vectors that indicate the groups and plot order for this analysis:
clusterMembership = fill("none", nrow(PCAmodelAll.metadata))
plotOrder = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembership[clusterArray[:,i]] .= clusterNames[i]
    plotOrder[clusterArray[:,i]] .= i
end

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
println("Calculated population allele frequencies and sample sizes")

# Calculate per-site pi (within-group nucleotide distance)
sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames; among=false)  # set among to FALSE if no among Fst wanted (some things won't work without it) 

# Now get averages of pi and Dxy for whole region:

regionPiTable = DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
#= 5×2 DataFrame
 Row │ cluster  pi         
     │ String   Float64    
─────┼─────────────────────
   1 │ virLud   0.0116354
   2 │ nit      0.0010142
   3 │ troch    0.00706002
   4 │ obs      0.0162496
   5 │ plumb    0.00402182 =#

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 10×2 DataFrame
 Row │ cluster_pair  Dxy       
     │ String        Float64   
─────┼─────────────────────────
   1 │ virLud_nit    0.021495
   2 │ virLud_troch  0.0329751
   3 │ virLud_obs    0.0328026
   4 │ virLud_plumb  0.0354124
   5 │ nit_troch     0.0339931
   6 │ nit_obs       0.0337221
   7 │ nit_plumb     0.036556
   8 │ troch_obs     0.0201926
   9 │ troch_plumb   0.0240156
  10 │ obs_plumb     0.0195389 =#

# Make a genotype-by-individual plot using all variable loci in the region,
missingFractionAllowed = 0.1
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.original_Fst_groups = PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.Fst_group = clusterMembership
PCAmodelAll.metadata.original_plot_order = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.plot_order = plotOrder

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
pos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
Fst_selectedSNPs = Fst[:, selectedSNPs]
freqs_selectedSNPs = freqs[:, selectedSNPs]

# limit the number of individuals per group to plot
numIndsToPlot = fill(15, length(clusterNames))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNames, numIndsToPlot, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;
                missingFractionAllowed = missingFractionAllowed,
                indColorRightProvided = true);
The numbers in each group are [73 2 62 3 64] and the sum of those is 204
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now show a GBI plot like above, but with heterozygotes:

clusterNamesWithHets = ["virLud",
                        "nit",
                        "virLud_troch",
                        "troch",
                        "virLud_obs",
                        "obs",
                        "troch_plumb",
                        "plumb",
                        "vir_plumb"]

clusterColorsWithHets = ["blue",
                        "grey",
                        "green",
                        "yellow",
                        "olive",
                        "orange",
                        "coral",
                        "red",
                        "purple"]

virLud_troch = (-5 .< PCAmodelAll.metadata.PC1 .< 0) .&
                (-4 .< PCAmodelAll.metadata.PC2 .< -0.7) .&
                 .!indSelection_lowIndHetStan
virLud_obs = (-3 .< PCAmodelAll.metadata.PC1 .< -1) .&
                (-0.5 .< PCAmodelAll.metadata.PC2 .< 0) .&
                 .!indSelection_lowIndHetStan
troch_plumb = (1.5 .< PCAmodelAll.metadata.PC1 .< 5.5) .&
                (-3 .< PCAmodelAll.metadata.PC2 .< 2) .&
                 .!indSelection_lowIndHetStan
vir_plumb = (-2.5 .< PCAmodelAll.metadata.PC1 .< 1) .&
                (2 .< PCAmodelAll.metadata.PC2 .< 5) .&
                 .!indSelection_lowIndHetStan

clusterArray = [virLud nit virLud_troch troch virLud_obs obs troch_plumb plumb vir_plumb]

sum(clusterArray, dims=1)

if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
    println("Good news: Individuals included in a group matches total number of individuals")
else 
    println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end

# check which individuals left out:
sum(clusterArray, dims=2)

PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[vec(sum(clusterArray, dims=2) .== 0)]

# create vectors that indicate the groups and plot order for this analysis:
clusterMembershipWithHets = fill("none", nrow(PCAmodelAll.metadata))
plotOrderWithHets = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembershipWithHets[clusterArray[:,i]] .= clusterNamesWithHets[i]
    plotOrderWithHets[clusterArray[:,i]] .= i
end

# Add column to main metadata object containing the cluster membership for this highHet region:
command = "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
eval(Meta.parse(command)) # this executes the command constructed above

# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.Fst_group = clusterMembershipWithHets
PCAmodelAll.metadata.plot_order = plotOrderWithHets

# limit the number of individuals per group to plot
numIndsToPlotWithHets = fill(100, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Chr 17 above shows interesting pattern with plumb haplotype more widespread—found in ludlowi, and the obscuratus pattern is complex with recombination. I checked chr 17 carefully in the summary plot, and it looks good.

Same for chr 19

# choose scaffold
chr = "gw19"

positionMin, positionMax, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
    getWindowedIndHetStanRegion(genosOnly_included, 
                            pos_SNP_filtered, 
                            highViSHetRegions, chr;
                            windowSize = 500)

# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)

# Add column to metadata containing the regionIndHetStan for this highHet region:
command = "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
eval(Meta.parse(command)) # this executes the command constructed above
ind_with_metadata_included.regionIndHetStan = meanAcrossRegionIndHetStan

# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)

# PCA of all individuals:

genos_highViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))

flipPC1 = true
flipPC2 = true

PCAmodelAll = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelAll.PCAfig)

# Add PC values to metadata for individuals included in PCA above:
if flipPC1
    PCAmodelAll.metadata.PC1 = -1 .* PCAmodelAll.values[1,:]
else 
    PCAmodelAll.metadata.PC1 = PCAmodelAll.values[1,:]
end
if flipPC2
    PCAmodelAll.metadata.PC2 = -1 .* PCAmodelAll.values[2,:]
else
    PCAmodelAll.metadata.PC2 = PCAmodelAll.values[2,:]
end
PCAmodelAll.metadata.PC3 = PCAmodelAll.values[3,:]

# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`

# choose inds with low IndHet in high ViSHet region:
indSelection_lowIndHetStan = (meanAcrossRegionIndHetStan .< 1.5) 

#Plot only the lowIndHetStan individuals:

f = CairoMakie.Figure();
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Save the individual colors in the metadata

indColors = fill("", size(PCAmodelAll.metadata, 1))
for i in axes(PCAmodelAll.metadata, 1)
    indColors[i] = group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
end
PCAmodelAll.metadata.indColorLeft = indColors
PCAmodelAll.metadata.indColorRight = indColors;

Plot PC1 vs. PC2

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Plot PC1 vs. PC3

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC3",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC3", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

At chr 19 high ViSHet region, there are 4 clear homozygous haplogroups (vir and lud separated though on PC3, but not clearly enough to show in summary plot). Divide samples into those groups, based on PCA scores, and calculate pi and Dxy.

clusterNames = ["virLud",
                "nit",
                "trochObs",
                "plumb"]

clusterColors = ["blue",
                "grey",
                "yellow",
                "red"]

virLud = (PCAmodelAll.metadata.PC1 .< -4) .&
            (2 .< PCAmodelAll.metadata.PC2) .&
            indSelection_lowIndHetStan
nit = (-4 .< PCAmodelAll.metadata.PC1 .< -2) .&
        (1 .< PCAmodelAll.metadata.PC2 .< 2.5) .&
            indSelection_lowIndHetStan
trochObs = (-1.5 .< PCAmodelAll.metadata.PC1 .< 1.5) .&
            (PCAmodelAll.metadata.PC2 .< -3.5) .&
            indSelection_lowIndHetStan
plumb = (5 .< PCAmodelAll.metadata.PC1) .& 
            (1.5 .< PCAmodelAll.metadata.PC2) .&
            indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[trochObs]
PCAmodelAll.metadata.Fst_group[plumb]

clusterArray = [virLud nit trochObs plumb]

# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")

# create vectors that indicate the groups and plot order for this analysis:
clusterMembership = fill("none", nrow(PCAmodelAll.metadata))
plotOrder = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembership[clusterArray[:,i]] .= clusterNames[i]
    plotOrder[clusterArray[:,i]] .= i
end

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
println("Calculated population allele frequencies and sample sizes")

# Calculate per-site pi (within-group nucleotide distance)
sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames; among=false)  # set among to FALSE if no among Fst wanted (some things won't work without it) 

# Now get averages of pi and Dxy for whole region:

regionPiTable = DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
#= 4×2 DataFrame
 Row │ cluster   pi         
     │ String    Float64    
─────┼──────────────────────
   1 │ virLud    0.0144925
   2 │ nit       0.0052608
   3 │ trochObs  0.0150341
   4 │ plumb     0.00320386 =#

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 6×2 DataFrame
 Row │ cluster_pair     Dxy       
     │ String           Float64   
─────┼────────────────────────────
   1 │ virLud_nit       0.0291485
   2 │ virLud_trochObs  0.0330435
   3 │ virLud_plumb     0.0347335
   4 │ nit_trochObs     0.0359384
   5 │ nit_plumb        0.0373399
   6 │ trochObs_plumb   0.0289202 =#

# Make a genotype-by-individual plot using all variable loci in the region,
missingFractionAllowed = 0.1
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.original_Fst_groups = PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.Fst_group = clusterMembership
PCAmodelAll.metadata.original_plot_order = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.plot_order = plotOrder

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
pos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
Fst_selectedSNPs = Fst[:, selectedSNPs]
freqs_selectedSNPs = freqs[:, selectedSNPs]

# limit the number of individuals per group to plot
numIndsToPlot = fill(15, length(clusterNames))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNames, numIndsToPlot, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;
                missingFractionAllowed = missingFractionAllowed,
                indColorRightProvided = true);
The numbers in each group are [70 2 67 66] and the sum of those is 205
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now show a GBI plot like above, but with heterozygotes

clusterNamesWithHets = ["virLud",
                "virLudHet",
                "nit",
                "virLud_trochObs",
                "trochObs",
                "trochObsHet",
                "trochObs_plumb",
                "plumb",
                "vir_plumb"]

clusterColorsWithHets = ["blue",
                "blue",
                "grey",
                "green",
                "yellow",
                "yellow",
                "orange",
                "red",
                "purple"]

virLudHet = (-8 .< PCAmodelAll.metadata.PC1 .< -4) .&
                (1.5 .< PCAmodelAll.metadata.PC2 .< 5) .&
                 .!indSelection_lowIndHetStan
virLud_trochObs = (-4 .< PCAmodelAll.metadata.PC1 .< -1) .&
                (-3.5 .< PCAmodelAll.metadata.PC2 .< 0.5) .&
                 .!indSelection_lowIndHetStan
trochObsHet = (-1.5 .< PCAmodelAll.metadata.PC1 .< 1.5) .&
                (PCAmodelAll.metadata.PC2 .< -3.5) .&
                 .!indSelection_lowIndHetStan
trochObs_plumb = (1.5 .< PCAmodelAll.metadata.PC1 .< 5) .&
                (-3.5 .< PCAmodelAll.metadata.PC2 .< 0) .&
                 .!indSelection_lowIndHetStan
vir_plumb = (-1 .< PCAmodelAll.metadata.PC1 .< 1.5) .&
                (2 .< PCAmodelAll.metadata.PC2 .< 5) .&
                 .!indSelection_lowIndHetStan

clusterArray = [virLud virLudHet nit virLud_trochObs trochObs trochObsHet trochObs_plumb plumb vir_plumb]

sum(clusterArray, dims=1)

if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
    println("Good news: Individuals included in a group matches total number of individuals")
else 
    println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end

# check which individuals left out:
sum(clusterArray, dims=2)
PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[vec(sum(clusterArray, dims=2) .== 0)]

# create vectors that indicate the groups and plot order for this analysis:
clusterMembershipWithHets = fill("none", nrow(PCAmodelAll.metadata))
plotOrderWithHets = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembershipWithHets[clusterArray[:,i]] .= clusterNamesWithHets[i]
    plotOrderWithHets[clusterArray[:,i]] .= i
end

# Add column to main metadata object containing the cluster membership for this highHet region:
command = "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
eval(Meta.parse(command)) # this executes the command constructed above

# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.Fst_group = clusterMembershipWithHets
PCAmodelAll.metadata.plot_order = plotOrderWithHets

# limit the number of individuals per group to plot
numIndsToPlotWithHets = fill(15, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show GBI plot according to original groups and plot order

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with all individuals

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

# Set no limit (or high limit anyway) on the number of individuals per group to plot
numIndsToPlotWithHets = fill(1000, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = false)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with only vir and plumb pops

includeTheseClusters = ["virLud", "plumb"] # these are the haplotype clusters to include in the choice below of SNPs to show

# Calculate allele freqs and sample sizes
freqs_local, sampleSizes_local = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
selectedSNPs = (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
genosForGBI = genos_selectedSNPs[:, selectedSNPs]
posForGBI = pos_selectedSNPs[selectedSNPs, :]
freqsForGBI = freqs_local[:, selectedSNPs]

plotGroups = ["vir", "plumb", "plumb_vir"] # these are the original Fst_groups
plotGroupColors = ["blue", "red", "purple"]

metadataForGBI = copy(PCAmodelAll.metadata)

metadataForGBI.Fst_group = metadataForGBI.original_Fst_groups

plotGenotypeByIndividual(regionInfo, posForGBI,
                genosForGBI, metadataForGBI, freqsForGBI, plotGroups, plotGroupColors;
                missingFractionAllowed = missingFractionAllowed)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

(Scene (768px, 960px):
  0 Plots
  2 Child Scenes:
    ├ Scene (768px, 960px)
    └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 0 0; … ; 2 2 … 2 2; 0 1 … 1 2], [63088, 70570, 71423, 78271, 89983, 153219, 199863, 252463, 296231, 423565  …  882012, 900073, 925752, 925928, 939829, 945566, 952095, 975932, 976062, 984894], 100×38 DataFrame
 Row  ind                        ID                         location  group   ⋯
     │ String                     String                     String7   String1 ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ GW_Armando_plate1_JF12G04  GW_Armando_plate1_JF12G04  ST_vi     vir     ⋯
   2 │ GW_Armando_plate2_JF03G01  GW_Armando_plate2_JF03G01  ST_vi     vir_mis
   3 │ GW_Armando_plate2_JF30G01  GW_Armando_plate2_JF30G01  ST_vi     vir_mis
   4 │ GW_Lane5_STvi1             GW_Lane5_STvi1             ST_vi     vir
   5 │ GW_Lane5_STvi2             GW_Lane5_STvi2             ST_vi     vir     ⋯
   6 │ GW_Lane5_STvi3             GW_Lane5_STvi3             ST_vi     vir
   7 │ GW_Armando_plate1_JF16G01  GW_Armando_plate1_JF16G01  DV_vi     plumb_v
   8 │ GW_Armando_plate2_JF16G02  GW_Armando_plate2_JF16G02  DV_vi     plumb_v
   9 │ GW_Armando_plate2_JE31G01  GW_Armando_plate2_JE31G01  VB_vi     vir_mis ⋯
  10 │ GW_Armando_plate2_JF03G02  GW_Armando_plate2_JF03G02  VB_vi     vir_mis
  11 │ GW_Lane5_YK11              GW_Lane5_YK11              YK        vir
  ⋮  │             ⋮                          ⋮                 ⋮          ⋮   ⋱
  91 │ GW_Armando_plate2_JF24G01  GW_Armando_plate2_JF24G01  VB        plumb
  92 │ GW_Armando_plate2_JF25G01  GW_Armando_plate2_JF25G01  VB        plumb   ⋯
  93 │ GW_Armando_plate1_JG02G02  GW_Armando_plate1_JG02G02  PR        plumb
  94 │ GW_Armando_plate1_JG02G04  GW_Armando_plate1_JG02G04  PR        plumb
  95 │ GW_Armando_plate2_JG01G01  GW_Armando_plate2_JG01G01  PR        plumb
  96 │ GW_Armando_plate2_JG02G01  GW_Armando_plate2_JG02G01  PR        plumb   ⋯
  97 │ GW_Armando_plate2_JG02G03  GW_Armando_plate2_JG02G03  PR        plumb
  98 │ GW_Lane5_SL1               GW_Lane5_SL1               SL        plumb
  99 │ GW_Lane5_SL2               GW_Lane5_SL2               SL        plumb
 100 │ GW_Armando_plate1_JF10G03  GW_Armando_plate1_JF10G03  ST        plumb_v ⋯
                                                  35 columns and 79 rows omitted)

Show same but with only vir lud troch pops

includeTheseClusters = ["virLud", "trochObs"] # these are the haplotype clusters to include in the choice below of SNPs to show

# Calculate allele freqs and sample sizes
freqs_local, sampleSizes_local = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
selectedSNPs = (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
genosForGBI = genos_selectedSNPs[:, selectedSNPs]
posForGBI = pos_selectedSNPs[selectedSNPs, :]
freqsForGBI = freqs_local[:, selectedSNPs]

metadataForGBI = copy(PCAmodelAll.metadata)
metadataForGBI.Fst_group = metadataForGBI.original_Fst_groups

plotGroups = ["vir", "vir_S", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML", "troch_west", "troch_LN"]
plotGroupColors = ["blue","turquoise1", "seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow"]

# Set no limit (or high limit anyway) on the number of individuals per group to plot
numIndsToPlotWithHets = fill(10, length(plotGroups))

genosForGBI_limited, indMetadataforGBI_limited = limitIndsToPlot(plotGroups,
                                            numIndsToPlotWithHets, 
                                            genosForGBI, metadataForGBI;
                                            sortByMissing = false)

plotGenotypeByIndividual(regionInfo, posForGBI,
                genosForGBI_limited, indMetadataforGBI_limited, freqsForGBI, plotGroups, plotGroupColors;
                missingFractionAllowed = missingFractionAllowed)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

(Scene (768px, 960px):
  0 Plots
  2 Child Scenes:
    ├ Scene (768px, 960px)
    └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 0 0; … ; 2 2 … 2 2; 2 2 … 2 2], [63088, 153219, 167981, 204946, 252487, 296231, 367641, 423565, 431630, 443933  …  678802, 684399, 741883, 767533, 773736, 792896, 900073, 925762, 939829, 952095], 74×38 DataFrame
 Row  ind                        ID                         location  group   ⋯
     │ String                     String                     String7   String1 ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ GW_Armando_plate1_JF12G04  GW_Armando_plate1_JF12G04  ST_vi     vir     ⋯
   2 │ GW_Armando_plate2_JF03G01  GW_Armando_plate2_JF03G01  ST_vi     vir_mis
   3 │ GW_Armando_plate2_JF30G01  GW_Armando_plate2_JF30G01  ST_vi     vir_mis
   4 │ GW_Armando_plate1_JF16G01  GW_Armando_plate1_JF16G01  DV_vi     plumb_v
   5 │ GW_Armando_plate2_JF16G02  GW_Armando_plate2_JF16G02  DV_vi     plumb_v ⋯
   6 │ GW_Armando_plate2_JE31G01  GW_Armando_plate2_JE31G01  VB_vi     vir_mis
   7 │ GW_Armando_plate2_JF03G02  GW_Armando_plate2_JF03G02  VB_vi     vir_mis
   8 │ GW_Armando_plate1_AB1      GW_Armando_plate1_AB1      AB        vir
   9 │ GW_Lane5_AB2               GW_Lane5_AB2               AB        vir     ⋯
  10 │ GW_Armando_plate1_TL3      GW_Armando_plate1_TL3      TL        vir
  11 │ GW_Lane5_AA1               GW_Lane5_AA1               AA        vir_S
  ⋮  │             ⋮                          ⋮                 ⋮          ⋮   ⋱
  65 │ GW_Armando_plate2_LN2      GW_Armando_plate2_LN2      LN        troch_L
  66 │ GW_Lane5_LN1               GW_Lane5_LN1               LN        troch_L ⋯
  67 │ GW_Lane5_LN10              GW_Lane5_LN10              LN        troch_L
  68 │ GW_Lane5_LN12              GW_Lane5_LN12              LN        troch_L
  69 │ GW_Lane5_LN14              GW_Lane5_LN14              LN        troch_L
  70 │ GW_Lane5_LN16              GW_Lane5_LN16              LN        troch_L ⋯
  71 │ GW_Lane5_LN18              GW_Lane5_LN18              LN        troch_L
  72 │ GW_Lane5_LN19              GW_Lane5_LN19              LN        troch_L
  73 │ GW_Lane5_LN20              GW_Lane5_LN20              LN        troch_L
  74 │ GW_Lane5_LN3               GW_Lane5_LN3               LN        troch_L ⋯
                                                  35 columns and 53 rows omitted)

Show same but with only troch obs plumb pops

includeTheseClusters = ["trochObs", "plumb"] # these are the haplotype clusters to include in the choice below of SNPs to show

# Calculate allele freqs and sample sizes
freqs_local, sampleSizes_local = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
selectedSNPs = (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
genosForGBI = genos_selectedSNPs[:, selectedSNPs]
posForGBI = pos_selectedSNPs[selectedSNPs, :]
freqsForGBI = freqs_local[:, selectedSNPs]

metadataForGBI = copy(PCAmodelAll.metadata)
metadataForGBI.Fst_group = metadataForGBI.original_Fst_groups

# remove individuals that have vir haplotypes, as this could otherwise be mistaken for introgression from obscuratus:

removeTheseInds = ["GW_Armando_plate1_JF24G02", # gw19 hetero from plumb 
                    "GW_Armando_plate1_JF07G03", # gw19 hetero from plumb
                    "GW_Armando_plate1_JF12G02", # gw19 hetero from plumb
                    "GW_Armando_plate1_JF09G01"] # gw28 is hetero from plumb  
selection = map(in(removeTheseInds), metadataForGBI.ind)
metadataForGBI = metadataForGBI[.!selection, :]
genosForGBI = genosForGBI[.!selection, :]

plotGroups = ["troch_LN","troch_EM","obs","plumb_BJ","plumb"]
plotGroupColors = ["yellow","gold","orange","pink","red"]

# Set  limit on the number of individuals per group to plot
numIndsToPlotWithHets = fill(15, length(plotGroups))

# metadataForGBI[metadataForGBI.Fst_group .== "plumb", :]

genosForGBI_limited, indMetadataforGBI_limited = limitIndsToPlot(plotGroups,
                                            numIndsToPlotWithHets, 
                                            genosForGBI, metadataForGBI;
                                            sortByMissing = false)

# indMetadataforGBI_limited[indMetadataforGBI_limited.Fst_group .== "plumb", :]

plotGenotypeByIndividual(regionInfo, posForGBI,
                genosForGBI_limited, indMetadataforGBI_limited, freqsForGBI, plotGroups, plotGroupColors;
                missingFractionAllowed = missingFractionAllowed)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

(Scene (768px, 960px):
  0 Plots
  2 Child Scenes:
    ├ Scene (768px, 960px)
    └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 0 0; … ; 2 2 … 2 2; 2 2 … 2 2], [70570, 71423, 78271, 89983, 167981, 199863, 204946, 252463, 252487, 367641  …  792896, 867410, 881923, 882012, 925752, 925762, 925928, 945566, 976062, 984894], 37×38 DataFrame
 Row  ind                        ID                         location  group   ⋯
     │ String                     String                     String7   String1 ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ GW_Armando_plate2_LN2      GW_Armando_plate2_LN2      LN        troch_L ⋯
   2 │ GW_Lane5_LN1               GW_Lane5_LN1               LN        troch_L
   3 │ GW_Lane5_LN10              GW_Lane5_LN10              LN        troch_L
   4 │ GW_Lane5_LN12              GW_Lane5_LN12              LN        troch_L
   5 │ GW_Lane5_LN14              GW_Lane5_LN14              LN        troch_L ⋯
   6 │ GW_Lane5_LN16              GW_Lane5_LN16              LN        troch_L
   7 │ GW_Lane5_LN18              GW_Lane5_LN18              LN        troch_L
   8 │ GW_Lane5_LN19              GW_Lane5_LN19              LN        troch_L
   9 │ GW_Lane5_LN20              GW_Lane5_LN20              LN        troch_L ⋯
  10 │ GW_Lane5_LN3               GW_Lane5_LN3               LN        troch_L
  11 │ GW_Lane5_LN4               GW_Lane5_LN4               LN        troch_L
  ⋮  │             ⋮                          ⋮                 ⋮         ⋮    ⋱
  28 │ GW_Armando_plate1_JF09G02  GW_Armando_plate1_JF09G02  ST        plumb
  29 │ GW_Armando_plate1_JF11G01  GW_Armando_plate1_JF11G01  ST        plumb   ⋯
  30 │ GW_Armando_plate1_JF12G01  GW_Armando_plate1_JF12G01  ST        plumb
  31 │ GW_Armando_plate1_JF13G01  GW_Armando_plate1_JF13G01  ST        plumb
  32 │ GW_Armando_plate1_JF26G01  GW_Armando_plate1_JF26G01  ST        plumb
  33 │ GW_Armando_plate1_JF27G01  GW_Armando_plate1_JF27G01  ST        plumb   ⋯
  34 │ GW_Armando_plate1_JF29G01  GW_Armando_plate1_JF29G01  ST        plumb
  35 │ GW_Armando_plate1_JF15G03  GW_Armando_plate1_JF15G03  DV        plumb
  36 │ GW_Armando_plate1_JF23G01  GW_Armando_plate1_JF23G01  VB        plumb
  37 │ GW_Armando_plate1_JF23G02  GW_Armando_plate1_JF23G02  VB        plumb   ⋯
                                                  35 columns and 16 rows omitted)

Show just the west area (without nitidus)

clusterNamesWithHetsWest = ["virLud",
                        "virLudHet",
                        "virLud_trochObs",
                        "trochObs",
                        "trochObsHet"]

clusterColorsWithHetsWest = ["blue",
                        "blue",
                        "green",
                        "yellow",
                        "yellow"]

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = fill(100, length(clusterNamesWithHetsWest))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the east area

clusterNamesWithHetsEast = ["trochObs",
                            "trochObsHet",
                            "trochObs_plumb",
                            "plumb"]

clusterColorsWithHetsEast = ["yellow",
                            "yellow",
                            "orange",
                            "red"]

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHetsEast = fill(100, length(clusterNamesWithHetsEast))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the northern area

clusterNamesWithHetsNorth = ["virLud",
                            "virLudHet",
                            "vir_plumb",
                            "plumb"]

clusterColorsWithHetsNorth = ["blue",
                            "blue",
                            "purple",
                            "red"]

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = fill(100, length(clusterNamesWithHetsNorth))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Same for chr 4A

# choose scaffold
chr = "gw4A"

positionMin, positionMax, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
    getWindowedIndHetStanRegion(genosOnly_included, 
                            pos_SNP_filtered, 
                            highViSHetRegions, chr;
                            windowSize = 500)

# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)

# Add column to metadata containing the regionIndHetStan for this highHet region:
command = "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
eval(Meta.parse(command)) # this executes the command constructed above
ind_with_metadata_included.regionIndHetStan = meanAcrossRegionIndHetStan

# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)

# PCA of all individuals:

genos_highViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))

flipPC1 = true
flipPC2 = true

PCAmodelAll = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelAll.PCAfig)

# Add PC values to metadata for individuals included in PCA above:
if flipPC1
    PCAmodelAll.metadata.PC1 = -1 .* PCAmodelAll.values[1,:]
else 
    PCAmodelAll.metadata.PC1 = PCAmodelAll.values[1,:]
end
if flipPC2
    PCAmodelAll.metadata.PC2 = -1 .* PCAmodelAll.values[2,:]
else
    PCAmodelAll.metadata.PC2 = PCAmodelAll.values[2,:]
end
PCAmodelAll.metadata.PC3 = PCAmodelAll.values[3,:]

# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`

# choose inds with low IndHet in high ViSHet region:
indSelection_lowIndHetStan = (meanAcrossRegionIndHetStan .< 1.5) 

#Plot only the lowIndHetStan individuals:

f = CairoMakie.Figure();
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Save the individual colors in the metadata

indColors = fill("", size(PCAmodelAll.metadata, 1))
for i in axes(PCAmodelAll.metadata, 1)
    indColors[i] = group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
end
PCAmodelAll.metadata.indColorLeft = indColors
PCAmodelAll.metadata.indColorRight = indColors;

Plot PC1 vs. PC2

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Plot PC1 vs. PC3

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC3",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC3", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

At chr 4A high ViSHet region, there are 4 clear haplogroups (no discernment of lud or obs on PC3, which is driven by nitidus). Divide samples into those groups, based on PCA scores, and calculate pi and Dxy

clusterNames = ["virLud",
                "nit",
                "troch",
                "obsPlumb"]

clusterColors = ["blue",
                "grey",
                "yellow",
                "red"]

virLud = (PCAmodelAll.metadata.PC1 .< -2) .&
            (PCAmodelAll.metadata.PC2 .< -2.5) .&
            indSelection_lowIndHetStan
nit = (PCAmodelAll.metadata.PC3 .< -7) .&
            indSelection_lowIndHetStan
troch = (3 .< PCAmodelAll.metadata.PC1) .&
            indSelection_lowIndHetStan
obsPlumb = (-4 .< PCAmodelAll.metadata.PC1 .< -1) .& 
            (2.5 .< PCAmodelAll.metadata.PC2) .&
            indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obsPlumb]

clusterArray = [virLud nit troch obsPlumb]

# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")

# create vectors that indicate the groups and plot order for this analysis:
clusterMembership = fill("none", nrow(PCAmodelAll.metadata))
plotOrder = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembership[clusterArray[:,i]] .= clusterNames[i]
    plotOrder[clusterArray[:,i]] .= i
end

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
println("Calculated population allele frequencies and sample sizes")

# Calculate per-site pi (within-group nucleotide distance)
sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames; among=false)  # set among to FALSE if no among Fst wanted (some things won't work without it) 

# Now get averages of pi and Dxy for whole region:

regionPiTable = DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
#= 4×2 DataFrame
 Row │ cluster   pi         
     │ String    Float64    
─────┼──────────────────────
   1 │ virLud    0.00664772
   2 │ nit       0.00609756
   3 │ troch     0.00614846
   4 │ obsPlumb  0.00206023 =#

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 6×2 DataFrame
 Row │ cluster_pair     Dxy       
     │ String           Float64   
─────┼────────────────────────────
   1 │ virLud_nit       0.0447451
   2 │ virLud_troch     0.0343779
   3 │ virLud_obsPlumb  0.0217185
   4 │ nit_troch        0.0373178
   5 │ nit_obsPlumb     0.0317873
   6 │ troch_obsPlumb   0.0237405 =#

# Make a genotype-by-individual plot using all variable loci in the region,
missingFractionAllowed = 0.1
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.original_Fst_groups = PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.Fst_group = clusterMembership
PCAmodelAll.metadata.original_plot_order = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.plot_order = plotOrder

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
pos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
Fst_selectedSNPs = Fst[:, selectedSNPs]
freqs_selectedSNPs = freqs[:, selectedSNPs]

# limit the number of individuals per group to plot
numIndsToPlot = fill(150, length(clusterNames))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNames, numIndsToPlot, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = false)

# sort based on original_plot_order, and then together with function below will arrange individuals in population order within clusters:
sortOrder = sortperm(indMetadataforGBI.original_plot_order, rev=false)
indMetadataforGBI = indMetadataforGBI[sortOrder, :]
genosForGBI = genosForGBI[sortOrder, :]

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;
                indFontSize=6, figureSize=(800, 1800),
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = true,
                indColorRightProvided = true);
The numbers in each group are [40 2 62 91] and the sum of those is 195
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now show a GBI plot like above, but with heterozygotes

clusterNamesWithHets = ["virLud",
                "virLudHet",
                "nit",
                "virLud_troch",
                "troch",
                "trochHet",
                "troch_obsPlumb",
                "obsPlumb",
                "obsPlumbHet",
                "virLud_obsPlumb"]

clusterColorsWithHets = ["blue",
                "blue",
                "grey",
                "green",
                "yellow",
                "yellow",
                "orange",
                "red",
                "red",
                "purple"]

virLud = (PCAmodelAll.metadata.PC1 .< -2) .&
            (PCAmodelAll.metadata.PC2 .< -2.5) .&
            indSelection_lowIndHetStan
nit = (PCAmodelAll.metadata.PC3 .< -7) .&
            indSelection_lowIndHetStan
troch = (3 .< PCAmodelAll.metadata.PC1) .&
            indSelection_lowIndHetStan
obsPlumb = (-4 .< PCAmodelAll.metadata.PC1 .< -1) .& 
            (2.5 .< PCAmodelAll.metadata.PC2) .&
            indSelection_lowIndHetStan
virLudHet = (PCAmodelAll.metadata.PC1 .< -2) .&
            (PCAmodelAll.metadata.PC2 .< -2.5) .&
            .!indSelection_lowIndHetStan
virLud_troch = (0 .< PCAmodelAll.metadata.PC1 .< 3) .&
                (-4 .< PCAmodelAll.metadata.PC2 .< -0.5) .&
                 .!indSelection_lowIndHetStan
trochHet = (3 .< PCAmodelAll.metadata.PC1) .&
            .!indSelection_lowIndHetStan
troch_obsPlumb = (0 .< PCAmodelAll.metadata.PC1 .< 3) .& 
            (0 .< PCAmodelAll.metadata.PC2 .< 2.5) .&
            (-2.5 .< PCAmodelAll.metadata.PC3) .&
            .!indSelection_lowIndHetStan
obsPlumbHet = (-4 .< PCAmodelAll.metadata.PC1 .< -1) .& 
            (2.5 .< PCAmodelAll.metadata.PC2) .&
            .!indSelection_lowIndHetStan
virLud_obsPlumb = (-4 .< PCAmodelAll.metadata.PC1 .< -1.5) .&
                (-2.5 .< PCAmodelAll.metadata.PC2 .< 1.5) .&
                 .!indSelection_lowIndHetStan

clusterArray = [virLud virLudHet nit virLud_troch troch trochHet troch_obsPlumb obsPlumb obsPlumbHet virLud_obsPlumb]

sum(clusterArray, dims=1)

if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
    println("Good news: Individuals included in a group matches total number of individuals")
else 
    println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end

# check which individuals left out:
sum(clusterArray, dims=2)

PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[vec(sum(clusterArray, dims=2) .== 0)]

# create vectors that indicate the groups and plot order for this analysis:
clusterMembershipWithHets = fill("none", nrow(PCAmodelAll.metadata))
plotOrderWithHets = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembershipWithHets[clusterArray[:,i]] .= clusterNamesWithHets[i]
    plotOrderWithHets[clusterArray[:,i]] .= i
end

# Add column to main metadata object containing the cluster membership for this highHet region:
command = "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
eval(Meta.parse(command)) # this executes the command constructed above

# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.Fst_group = clusterMembershipWithHets
PCAmodelAll.metadata.plot_order = plotOrderWithHets

# limit the number of individuals per group to plot
numIndsToPlotWithHets = fill(15, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show GBI plot according to original groups and plot order

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with all individuals

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

# Set no limit (or high limit anyway) on the number of individuals per group to plot
numIndsToPlotWithHets = fill(1000, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with only vir and plumb pops

includeTheseClusters = ["virLud", "obsPlumb"] # these are the haplotype clusters to include in the choice below of SNPs to show

# Calculate allele freqs and sample sizes
freqs_local, sampleSizes_local = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
selectedSNPs = (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
genosForGBI = genos_selectedSNPs[:, selectedSNPs]
posForGBI = pos_selectedSNPs[selectedSNPs, :]
freqsForGBI = freqs_local[:, selectedSNPs]

plotGroups = ["vir", "plumb", "plumb_vir"] # these are the original Fst_groups
plotGroupColors = ["blue", "red", "purple"]

metadataForGBI = copy(PCAmodelAll.metadata)

metadataForGBI.Fst_group = metadataForGBI.original_Fst_groups

plotGenotypeByIndividual(regionInfo, posForGBI,
                genosForGBI, metadataForGBI, freqsForGBI, plotGroups, plotGroupColors;
                missingFractionAllowed = missingFractionAllowed)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

(Scene (768px, 960px):
  0 Plots
  2 Child Scenes:
    ├ Scene (768px, 960px)
    └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 0 0; … ; 2 2 … 2 2; 1 2 … 1 0], [397824, 447665, 454290, 505522, 510299, 520268, 527334, 531726, 578230, 582505, 587143, 614605, 617468, 617555, 621740, 633718, 633803, 690700], 100×40 DataFrame
 Row  ind                        ID                         location  group   ⋯
     │ String                     String                     String7   String1 ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ GW_Armando_plate1_JF12G04  GW_Armando_plate1_JF12G04  ST_vi     vir     ⋯
   2 │ GW_Armando_plate2_JF03G01  GW_Armando_plate2_JF03G01  ST_vi     vir_mis
   3 │ GW_Armando_plate2_JF30G01  GW_Armando_plate2_JF30G01  ST_vi     vir_mis
   4 │ GW_Lane5_STvi1             GW_Lane5_STvi1             ST_vi     vir
   5 │ GW_Lane5_STvi2             GW_Lane5_STvi2             ST_vi     vir     ⋯
   6 │ GW_Lane5_STvi3             GW_Lane5_STvi3             ST_vi     vir
   7 │ GW_Armando_plate1_JF16G01  GW_Armando_plate1_JF16G01  DV_vi     plumb_v
   8 │ GW_Armando_plate2_JF16G02  GW_Armando_plate2_JF16G02  DV_vi     plumb_v
   9 │ GW_Armando_plate2_JE31G01  GW_Armando_plate2_JE31G01  VB_vi     vir_mis ⋯
  10 │ GW_Armando_plate2_JF03G02  GW_Armando_plate2_JF03G02  VB_vi     vir_mis
  11 │ GW_Lane5_YK11              GW_Lane5_YK11              YK        vir
  ⋮  │             ⋮                          ⋮                 ⋮          ⋮   ⋱
  91 │ GW_Armando_plate2_JF24G01  GW_Armando_plate2_JF24G01  VB        plumb
  92 │ GW_Armando_plate2_JF25G01  GW_Armando_plate2_JF25G01  VB        plumb   ⋯
  93 │ GW_Armando_plate1_JG02G02  GW_Armando_plate1_JG02G02  PR        plumb
  94 │ GW_Armando_plate1_JG02G04  GW_Armando_plate1_JG02G04  PR        plumb
  95 │ GW_Armando_plate2_JG01G01  GW_Armando_plate2_JG01G01  PR        plumb
  96 │ GW_Armando_plate2_JG02G01  GW_Armando_plate2_JG02G01  PR        plumb   ⋯
  97 │ GW_Armando_plate2_JG02G03  GW_Armando_plate2_JG02G03  PR        plumb
  98 │ GW_Lane5_SL1               GW_Lane5_SL1               SL        plumb
  99 │ GW_Lane5_SL2               GW_Lane5_SL2               SL        plumb
 100 │ GW_Armando_plate1_JF10G03  GW_Armando_plate1_JF10G03  ST        plumb_v ⋯
                                                  37 columns and 79 rows omitted)

Show same but whole ring (but not nit)

includeTheseClusters = ["virLud",
                        "troch",
                        "obsPlumb"] # these are the haplotype clusters to include in the choice below of SNPs to show

# Calculate allele freqs and sample sizes
freqs_local, sampleSizes_local = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
selectedSNPs = (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
genosForGBI = genos_selectedSNPs[:, selectedSNPs]
posForGBI = pos_selectedSNPs[selectedSNPs, :]
freqsForGBI = freqs_local[:, selectedSNPs]

metadataForGBI = copy(PCAmodelAll.metadata)
metadataForGBI.Fst_group = metadataForGBI.original_Fst_groups

plotGroups = ["vir","vir_S","lud_PK","lud_KS","lud_central","troch_LN","troch_EM","obs","plumb_BJ","plumb","plumb_vir"]
plotGroupColors = ["blue","turquoise1","seagreen4","seagreen3","seagreen2","yellow","gold","orange", "pink","red","purple"]

# Set  limit on the number of individuals per group to plot
numIndsToPlotWithHets = [10, 5, 5, 2, 10, 10, 1, 4, 3, 10, 1] # maximum number of individuals to plot from each group

genosForGBI_limited, indMetadataforGBI_limited = limitIndsToPlot(plotGroups,
                                            numIndsToPlotWithHets, 
                                            genosForGBI, metadataForGBI;
                                            sortByMissing = false)

plotGenotypeByIndividual(regionInfo, posForGBI,
                genosForGBI_limited, indMetadataforGBI_limited, freqsForGBI, plotGroups, plotGroupColors;
                missingFractionAllowed = missingFractionAllowed)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

(Scene (768px, 960px):
  0 Plots
  2 Child Scenes:
    ├ Scene (768px, 960px)
    └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 1 … 0 0], [397781, 397824, 405798, 419198, 425296, 447665, 454290, 505522, 510299, 513889  …  617468, 617539, 617555, 621740, 633718, 633787, 633803, 676526, 690700, 711862], 61×40 DataFrame
 Row  ind                        ID                         location  group   ⋯
     │ String                     String                     String7   String1 ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ GW_Armando_plate1_JF12G04  GW_Armando_plate1_JF12G04  ST_vi     vir     ⋯
   2 │ GW_Armando_plate2_JF03G01  GW_Armando_plate2_JF03G01  ST_vi     vir_mis
   3 │ GW_Armando_plate2_JF30G01  GW_Armando_plate2_JF30G01  ST_vi     vir_mis
   4 │ GW_Armando_plate1_JF16G01  GW_Armando_plate1_JF16G01  DV_vi     plumb_v
   5 │ GW_Armando_plate2_JF16G02  GW_Armando_plate2_JF16G02  DV_vi     plumb_v ⋯
   6 │ GW_Armando_plate2_JE31G01  GW_Armando_plate2_JE31G01  VB_vi     vir_mis
   7 │ GW_Armando_plate2_JF03G02  GW_Armando_plate2_JF03G02  VB_vi     vir_mis
   8 │ GW_Armando_plate1_AB1      GW_Armando_plate1_AB1      AB        vir
   9 │ GW_Lane5_AB2               GW_Lane5_AB2               AB        vir     ⋯
  10 │ GW_Armando_plate1_TL3      GW_Armando_plate1_TL3      TL        vir
  11 │ GW_Lane5_AA1               GW_Lane5_AA1               AA        vir_S
  ⋮  │             ⋮                          ⋮                 ⋮          ⋮   ⋱
  52 │ GW_Armando_plate1_JF07G03  GW_Armando_plate1_JF07G03  ST        plumb
  53 │ GW_Armando_plate1_JF07G04  GW_Armando_plate1_JF07G04  ST        plumb   ⋯
  54 │ GW_Armando_plate1_JF08G02  GW_Armando_plate1_JF08G02  ST        plumb
  55 │ GW_Armando_plate1_JF09G01  GW_Armando_plate1_JF09G01  ST        plumb
  56 │ GW_Armando_plate1_JF09G02  GW_Armando_plate1_JF09G02  ST        plumb
  57 │ GW_Armando_plate1_JF11G01  GW_Armando_plate1_JF11G01  ST        plumb   ⋯
  58 │ GW_Armando_plate1_JF12G01  GW_Armando_plate1_JF12G01  ST        plumb
  59 │ GW_Armando_plate1_JF12G02  GW_Armando_plate1_JF12G02  ST        plumb
  60 │ GW_Armando_plate1_JF13G01  GW_Armando_plate1_JF13G01  ST        plumb
  61 │ GW_Armando_plate1_JF10G03  GW_Armando_plate1_JF10G03  ST        plumb_v ⋯
                                                  37 columns and 40 rows omitted)

Show just the west area (without nitidus)

clusterNamesWithHetsWest = ["virLud",
                            "virLudHet",
                            "virLud_troch",
                            "troch",
                            "trochHet"]

clusterColorsWithHetsWest = ["blue",
                            "blue",
                            "green",
                            "yellow",
                            "yellow"]

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = fill(100, length(clusterNamesWithHetsWest))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the east area

clusterNamesWithHetsEast = ["troch",
                            "trochHet",
                            "troch_obsPlumb",
                            "obsPlumb",
                            "obsPlumbHet"]

clusterColorsWithHetsEast = ["yellow",
                            "yellow",
                            "orange",
                            "red",
                            "red"]

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHetsEast = fill(100, length(clusterNamesWithHetsEast))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the northern area

clusterNamesWithHetsNorth = ["virLud",
                            "virLudHet",
                            "virLud_obsPlumb",
                            "obsPlumb",
                            "obsPlumbHet"]

clusterColorsWithHetsNorth = ["blue",
                            "blue",
                            "purple",
                            "red",
                            "red"]

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = fill(100, length(clusterNamesWithHetsNorth))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Do a PCA based on a same-size region elsewhere on gw4A (with low ViSHet):

# get length of region
lengthHighViSHetRegion = positionMax - positionMin

# because the region is on the left side of chr 4A, will put the non-interesting region on the right side:
rightLocus = scaffold_lengths["gw4A"] - 1_000_000
leftLocus = rightLocus - lengthHighViSHetRegion

regionText_lowViSHetRegion = string("chr ", chr, " ",leftLocus," to ",rightLocus)

lociSelection = (leftLocus .<= pos_region.position .<= rightLocus)
genotypes_lowViSHetRegion = genotypes_region[:, lociSelection]

# impute missing genotypes:
genotypes_lowViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genotypes_lowViSHetRegion))

flipPC1 = true
flipPC2 = true

PCAmodel = plotPCA(genotypes_lowViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText_lowViSHetRegion,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodel.PCAfig)
if true  # set to true to save plot
    save("FigureS2D_gw4A_nonHLBRarbitrary_from_Julia.png", PCAmodel.PCAfig, px_per_unit = 2.0)
end 
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}
Chr 4A shows really remarkable patterns. Absolutely must be selective sweeps.

Same for chr 20

# choose scaffold
chr = "gw20"

positionMin, positionMax, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
    getWindowedIndHetStanRegion(genosOnly_included, 
                            pos_SNP_filtered, 
                            highViSHetRegions, chr;
                            windowSize = 500)

# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)

# Add column to metadata containing the regionIndHetStan for this highHet region:
command = "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
eval(Meta.parse(command)) # this executes the command constructed above
ind_with_metadata_included.regionIndHetStan = meanAcrossRegionIndHetStan

# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)

# PCA of all individuals:

genos_highViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))

flipPC1 = false
flipPC2 = true

PCAmodelAll = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelAll.PCAfig)

# Add PC values to metadata for individuals included in PCA above:
if flipPC1
    PCAmodelAll.metadata.PC1 = -1 .* PCAmodelAll.values[1,:]
else 
    PCAmodelAll.metadata.PC1 = PCAmodelAll.values[1,:]
end
if flipPC2
    PCAmodelAll.metadata.PC2 = -1 .* PCAmodelAll.values[2,:]
else
    PCAmodelAll.metadata.PC2 = PCAmodelAll.values[2,:]
end
PCAmodelAll.metadata.PC3 = PCAmodelAll.values[3,:]

# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`

# choose inds with low IndHet in high ViSHet region:
indSelection_lowIndHetStan = (meanAcrossRegionIndHetStan .< 1.5) 

#Plot only the lowIndHetStan individuals:

f = CairoMakie.Figure();
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
if false  # set to true to save plot
    save("Figure6_top_gw28GBIplotEast_from_Julia.png", plotInfo[1], px_per_unit = 2.0)
end 
More than 1 region on that scaffold. Using just the longest one.
2×3 DataFrame
Row regionChrom regionStart regionEnd
String Int64 Int64
1 gw20 27354 721651
2 gw20 5852254 6671670
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Save the individual colors in the metadata

indColors = fill("", size(PCAmodelAll.metadata, 1))
for i in axes(PCAmodelAll.metadata, 1)
    indColors[i] = group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
end
PCAmodelAll.metadata.indColorLeft = indColors
PCAmodelAll.metadata.indColorRight = indColors;

Plot PC1 vs. PC2

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Plot PC1 vs. PC3

f = CairoMakie.Figure()
ax = Axis(f[1, 1],
    title = "PC1 vs. PC3",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC3", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

At chr 20 high ViSHet region, there are 6 clear haplogroups (with vir and lud clearly different on PC3). Divide samples into those groups, based on PCA scores, and calculate pi and Dxy.

clusterNames = ["vir",
                "nit",
                "lud",
                "troch",
                "obs",
                "plumb"]

clusterColors = ["blue",
                "grey",
                "green",
                "yellow",
                "orange",
                "red"]

vir = (PCAmodelAll.metadata.PC1 .< -5) .&
            (-5 .< PCAmodelAll.metadata.PC3 .< -0.5) .&
            indSelection_lowIndHetStan
nit = (-5 .< PCAmodelAll.metadata.PC1 .< -3) .&
        (PCAmodelAll.metadata.PC3 .< -5) .&
            indSelection_lowIndHetStan
lud = (PCAmodelAll.metadata.PC1 .< -4) .&
            (0.5 .< PCAmodelAll.metadata.PC3 .< 7) .&
            indSelection_lowIndHetStan
troch = (2 .< PCAmodelAll.metadata.PC1 .< 5) .&
            (PCAmodelAll.metadata.PC2 .< -4.2) .&
            indSelection_lowIndHetStan
obs = (3 .< PCAmodelAll.metadata.PC1 .< 4) .&
            (-4.2 .< PCAmodelAll.metadata.PC2 .< -2.5) .&
            indSelection_lowIndHetStan
plumb = (2.5 .< PCAmodelAll.metadata.PC1 .< 6) .& 
            (3 .< PCAmodelAll.metadata.PC2) .&
            indSelection_lowIndHetStan

# check the individuals in each group
PCAmodelAll.metadata.Fst_group[vir]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[lud]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]

clusterArray = [vir nit lud troch obs plumb]

# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")

# create vectors that indicate the groups and plot order for this analysis:
clusterMembership = fill("none", nrow(PCAmodelAll.metadata))
plotOrder = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembership[clusterArray[:,i]] .= clusterNames[i]
    plotOrder[clusterArray[:,i]] .= i
end

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
println("Calculated population allele frequencies and sample sizes")

# Calculate per-site pi (within-group nucleotide distance)
sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames; among=false)  # set among to FALSE if no among Fst wanted (some things won't work without it) 

# Now get averages of pi and Dxy for whole region:

regionPiTable = DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
#= 6×2 DataFrame
 Row │ cluster  pi         
     │ String   Float64    
─────┼─────────────────────
   1 │ vir      0.0132903
   2 │ nit      0.00761773
   3 │ lud      0.014873
   4 │ troch    0.0101873
   5 │ obs      0.00904222
   6 │ plumb    0.00593251 =#

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 15×2 DataFrame
 Row │ cluster_pair  Dxy       
     │ String        Float64   
─────┼─────────────────────────
   1 │ vir_nit       0.0280243
   2 │ vir_lud       0.0204941
   3 │ vir_troch     0.0394257
   4 │ vir_obs       0.0403572
   5 │ vir_plumb     0.0376188
   6 │ nit_lud       0.0288021
   7 │ nit_troch     0.0377964
   8 │ nit_obs       0.0389254
   9 │ nit_plumb     0.0359742
  10 │ lud_troch     0.0390498
  11 │ lud_obs       0.0398045
  12 │ lud_plumb     0.0371989
  13 │ troch_obs     0.015702
  14 │ troch_plumb   0.0285113
  15 │ obs_plumb     0.0286543 =#

# Make a genotype-by-individual plot using all variable loci in the region,
missingFractionAllowed = 0.1
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.original_Fst_groups = PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.Fst_group = clusterMembership
PCAmodelAll.metadata.original_plot_order = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.plot_order = plotOrder

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5 
# and the minimum should be < 0.5)
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
pos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
Fst_selectedSNPs = Fst[:, selectedSNPs]
freqs_selectedSNPs = freqs[:, selectedSNPs]

# limit the number of individuals per group to plot
numIndsToPlot = fill(15, length(clusterNames))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNames, numIndsToPlot, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;
                missingFractionAllowed = missingFractionAllowed,
                indColorRightProvided = true);
The numbers in each group are [38 2 29 68 4 66] and the sum of those is 207
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now show a GBI plot like above, but with heterozygotes

clusterNamesWithHets = ["vir",
                "nit",
                "lud",
                "ludHet",
                "lud_troch",
                "troch",
                "obs",
                "obs_plumb",
                "plumb",
                "vir_plumb"]

clusterColorsWithHets = ["blue",
                "grey",
                "green",
                "green",
                "seagreen",
                "yellow",
                "orange",
                "darkorange1",
                "red",
                "purple"]

ludHet = (PCAmodelAll.metadata.PC1 .< -4) .&
            (0.5 .< PCAmodelAll.metadata.PC3 .< 7) .&
            .!indSelection_lowIndHetStan
lud_troch = (-4 .< PCAmodelAll.metadata.PC1 .< 0) .&
            (-4 .< PCAmodelAll.metadata.PC2 .< -1) .&
            .!indSelection_lowIndHetStan
obs_plumb = (3 .< PCAmodelAll.metadata.PC1 .< 5) .&
            (0 .< PCAmodelAll.metadata.PC2 .< 1) .&
            .!indSelection_lowIndHetStan
vir_plumb = (-3 .< PCAmodelAll.metadata.PC1 .< 0) .&
            (2.5 .< PCAmodelAll.metadata.PC2 .< 5) .&
            .!indSelection_lowIndHetStan

clusterArray = [vir nit lud ludHet lud_troch troch obs obs_plumb plumb vir_plumb]

sum(clusterArray, dims=1)

if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
    println("Good news: Individuals included in a group matches total number of individuals")
else 
    println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end

# check which individuals left out:
sum(clusterArray, dims=2)

PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[vec(sum(clusterArray, dims=2) .== 0)]

# create vectors that indicate the groups and plot order for this analysis:
clusterMembershipWithHets = fill("none", nrow(PCAmodelAll.metadata))
plotOrderWithHets = fill(-9, nrow(PCAmodelAll.metadata))
for i in eachindex(clusterArray[1,:])
    clusterMembershipWithHets[clusterArray[:,i]] .= clusterNamesWithHets[i]
    plotOrderWithHets[clusterArray[:,i]] .= i
end

# Add column to main metadata object containing the cluster membership for this highHet region:
command = "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
eval(Meta.parse(command)) # this executes the command constructed above

# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
PCAmodelAll.metadata.Fst_group = clusterMembershipWithHets
PCAmodelAll.metadata.plot_order = plotOrderWithHets

# limit the number of individuals per group to plot
numIndsToPlotWithHets = fill(15, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show GBI plot according to original groups and plot order

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show same but with all individuals

PCAmodelAll.metadata.plot_order = PCAmodelAll.metadata.original_plot_order

# Set no limit (or high limit anyway) on the number of individuals per group to plot
numIndsToPlotWithHets = fill(1000, length(clusterNamesWithHets))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets, 
                                            genos_selectedSNPs, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the west area (without nitidus)

clusterNamesWithHetsWest = ["vir",
                "lud",
                "ludHet",
                "lud_troch",
                "troch"]

clusterColorsWithHetsWest = ["blue",
                "green",
                "green",
                "seagreen",
                "yellow"]

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = fill(100, length(clusterNamesWithHetsWest))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the east area

clusterNamesWithHetsEast = ["troch",
                            "obs",
                            "obs_plumb",
                            "plumb"]

clusterColorsWithHetsEast = ["yellow",
                            "orange",
                            "darkorange1",
                            "red"]

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHetsEast = fill(100, length(clusterNamesWithHetsEast))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Show just the northern area

clusterNamesWithHetsNorth = ["vir",
                            "vir_plumb",
                            "plumb"]

clusterColorsWithHetsNorth = ["blue",
                            "purple",
                            "red"]

# limit the SNPs to those with variants greater than 50% in 
# at least one pop, and less than 50% in at least one pop.
freqs, sampleSizes = getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
println("Calculated population allele frequencies and sample sizes")
selectedSNPs = (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
genos_selectedSNPs2 = genos_selectedSNPs[:, selectedSNPs]
pos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
freqs_selectedSNPs2 = freqs[:, selectedSNPs]

numIndsToPlotWithHets = fill(100, length(clusterNamesWithHetsNorth))

genosForGBI, indMetadataforGBI = limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets, 
                                            genos_selectedSNPs2, PCAmodelAll.metadata;
                                            sortByMissing = true)

plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
                genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;
                missingFractionAllowed = missingFractionAllowed,
                indColorLeftProvided = false,
                indColorRightProvided = true);
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Try chr 27

# choose scaffold
chr = "gw27"

positionMin, positionMax, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
    getWindowedIndHetStanRegion(genosOnly_included, 
                            pos_SNP_filtered, 
                            highViSHetRegions, chr;
                            windowSize = 500)

# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)

# Add column to metadata containing the regionIndHetStan for this highHet region:
command = "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
eval(Meta.parse(command)) # this executes the command constructed above
ind_with_metadata_included.regionIndHetStan = meanAcrossRegionIndHetStan

# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)

# PCA of all individuals:

genos_highViSHetRegion_imputed = Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))

flipPC1 = true
flipPC2 = true

PCAmodelAll = plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included, 
            groups_to_plot_PCA, group_colors_PCA; 
            sampleSet = "greenish warblers", regionText = regionText,
            flip1 = flipPC1, flip2 = flipPC2,
            lineOpacity = 0.7, fillOpacity = 0.6,
            symbolSize = 14, showTitle = true,
            xLabelText = string("Region PC1"), yLabelText = string("Region PC2"),
            showPlot = false)

display(PCAmodelAll.PCAfig)

# Add PC values to metadata for individuals included in PCA above:
if flipPC1
    PCAmodelAll.metadata.PC1 = -1 .* PCAmodelAll.values[1,:]
else 
    PCAmodelAll.metadata.PC1 = PCAmodelAll.values[1,:]
end
if flipPC2
    PCAmodelAll.metadata.PC2 = -1 .* PCAmodelAll.values[2,:]
else
    PCAmodelAll.metadata.PC2 = PCAmodelAll.values[2,:]
end
PCAmodelAll.metadata.PC3 = PCAmodelAll.values[3,:]

# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`

# choose inds with low IndHet in high ViSHet region:
indSelection_lowIndHetStan = (meanAcrossRegionIndHetStan .< 1.4) 

#Plot only the lowIndHetStan individuals:

f = CairoMakie.Figure();
ax = Axis(f[1, 1],
    title = "PC1 vs. PC2, only low heterozygosity",
    xlabel = "Region PC1", xlabelsize = 24,
    ylabel = "Region PC2", ylabelsize = 24,
    autolimitaspect = 1)
hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA) 
    selection = (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
    CairoMakie.scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

CairoMakie.Screen{IMAGE}

Summary of chromosome LHBR patterns

Tried chr 1 but some unclean distinguishing of ludlowi clusters (no wide sharing of plumb)
Included chr 1A
Tried chr 2 but some unclean distinguishing of ludlowi clusters (with some sharing of plumb)
Included chr 3
Tried chr 4 but not clean separation of high vs. low IndHet
Included chr 4A
Tried chr 5 but not clean separation of high vs. low IndHet
Tried chr 6 but not clean separation of high vs. low IndHet
Tried chr 7 but not clean separation of high vs. low IndHet, recomb in ludlowi
Tried chr 8 but not clean separation of high vs. low IndHet, recomb in ludlowi
Tried chr 9 but not clean separation of high vs. low IndHet
Tried chr 10 but not clean separation of high vs. low IndHet
Tried chr 11 and has potential, and shows an obs with two plumb types, but ludlowi not cleanly distinguished into types
Tried chr 12 but not a very clear separation of high vs. low IndHet (shows a lot of sharing of plumb haps in ludlowi)
Tried chr 14 but not clean separation of high vs. low IndHet, recomb in ludlowi
Included chr 15
No chr 16
Tried chr 23 but not a very clear separation of high vs. low IndHet
Included chr 17, chr 18, chr 19, chr 20
Tried chr 21 but not a very clear separation of high vs. low IndHet
Tried chr 22 but not a very clear separation of high vs. low IndHet. lud is all over the place.
Tried chr 23 and almost included, but a few inds would be tough to categorize. Similar pattern as some others. Not a very clear separation of high vs. low IndHet.
Tried chr 24 but not a very clear separation of high vs. low IndHet.
Tried chr 25 but not a very clear separation of high vs. low IndHet, and hard to categorize a lot of inds.
Included chr 26
Tried chr 27 but not a very clear separation of high vs. low IndHet.
Included chr 28
Included chr Z

Make a summary plot for the cluster types at different chromosome haploblocks (west without nitidus)

Will modify the plotGenotypeByIndividual() function, but need to construct a genotype data structure based on the groups (determined above) for each haploblock.

For west side (without nitidus):

#= # For debugging function:

indMetadata = ind_with_metadata_included
plotGroups = plotGroupsForSummary
plotGroupColors = groupColorsForSummary
regionNames = HaploblockRegions
indFontSize = 10
figureSize = (1200, 1200)
plotTitle = nothing
indColorLeftProvided = false
indColorRightProvided = false =#

"""
    plotHaploblockSummary(genosSummary, indMetadata,
                            plotGroups, plotGroupColors;
                            regionNames,
                            indFontSize=10, figureSize=(1200, 1200),
                            plotTitle = nothing,
                            indColorLeftProvided = false,
                            indColorRightProvided = false)

Construct a genotype-by-individual plot, with option to filter out SNPs with too much missing data. 

Under the default setting, alleles are colored (dark purple vs. light purple) according to whichever allele is designated as `group1`. 

​# Arguments

- `genosSummary`: Matrix containing summary genotype data (individuals in rows, loci in columns).
- `indMetadata`: Matrix of metadata for individuals; must contain `Fst_group` and `plot_order` columns.
- `plotGroups`: Vector of group names to include in plot.
- `plotGroupColors`: Vector of plotting colors corresponding to the groups.
- `regionNames`: Optional; Names of the genotyped regions.
- `indFontSize`: Optional; the font size of the individual ID labels.
- `figureSize`: Optional; the size of the figure; default is `(1200, 1200)`.  
- `plotTitle`: Optional; default will make a title. For no title, set to `""`.
- `indColorLeftProvided`: Optional; Default is `false`. Set to `true` if there is a column labeled `indColorLeft` in the metadata providing color of each individual for plotting on left side.
- `indColorRightProvided`: Optional; same as above but for right side (requires `indColorRight` column in metadata).

# Notes
Returns a tuple containing:
- the figure
- the plotted genotypes
- the sorted metadata matrix for the plotted individuals
"""
function plotHaploblockSummary(genosSummary, indMetadata,
                                plotGroups, plotGroupColors;
                                regionNames = nothing,        
                                indFontSize=10, figureSize=(1200, 1200),
                                plotTitle = nothing,
                                indColorLeftProvided = false,
                                indColorRightProvided = false)
    
    # if the genoData has missing values, then convert to -1:
    genosSummary[ismissing.(genosSummary)] .= -1

    numRegions = size(genosSummary, 2)

    genosSummary_subset = genosSummary[indMetadata.Fst_group .∈ Ref(plotGroups), :]
    indMetadata_subset = indMetadata[indMetadata.Fst_group .∈ Ref(plotGroups), :]

    # Choose sorting order by plot_order column in input metadata file

    sorted_genosSummary_subset = genosSummary_subset[sortperm(indMetadata_subset.plot_order, rev=false), :]
    numInds = size(sorted_genosSummary_subset, 1)
    sorted_indMetadata_subset = indMetadata_subset[sortperm(indMetadata_subset.plot_order, rev=false), :]

    # Set up the plot window:
    f = CairoMakie.Figure(size=figureSize)

    if isnothing(plotTitle)
        plotTitle = "Summary of $numRegions haploblock genotypes for $numInds individuals"
    end 

    # Set up the main Axis: 
    ax = Axis(f[1, 1],
        title = plotTitle,
        titlesize=30,
        limits=(0.5 - 0.09 * (numRegions), 0.5 + 1.09 * (numRegions),
            0.5 - 0.3 * numInds, 0.5 + numInds)
    )
    hidedecorations!(ax) # hide background lattice and axis labels
    hidespines!(ax) # hide box around plot

    genotypeColors = ["#3f007d", "#807dba", "#dadaeb", "grey50"]  # purple shades from colorbrewer

    # plot evenly spaced by SNP order along chromosome:
    # make top part of fig (genotypes for individuals)
    labelCushion = numRegions / 100
    label_x_left = 0.5 - labelCushion
    label_x_right = 0.5 + numRegions + labelCushion
    colorBoxCushion = 0.07 * numRegions
    groupColorBox_x_left = 0.5 - colorBoxCushion
    groupColorBox_x_right = 0.5 + numRegions + colorBoxCushion
    boxWidth = 0.005 * numRegions * 2
    groupColorBox_x_left = [-boxWidth, -boxWidth, boxWidth, boxWidth, -boxWidth] .+ groupColorBox_x_left
    groupColorBox_x_right = [-boxWidth, -boxWidth, boxWidth, boxWidth, -boxWidth] .+ groupColorBox_x_right
    groupColorBox_y = [0.4, -0.4, -0.4, 0.4, 0.4]

    for i in 1:numInds
        y = numInds + 1 - i  # y is location for plotting; this reverses order of plot top-bottom
        labelText = last(split(sorted_indMetadata_subset.ID[i], "_"))  # this gets the last part of the sample ID (usually the main ID part)
        # put sample label on left side:
        CairoMakie.text!(label_x_left, y; text=labelText, align=(:right, :center), fontsize=indFontSize)
        # put sample label on left side:
        CairoMakie.text!(label_x_right, y; text=labelText, align=(:left, :center), fontsize=indFontSize)
        if indColorLeftProvided
            boxColorLeft = sorted_indMetadata_subset.indColorLeft[i]
        else
            boxColorLeft = plotGroupColors[findfirst(plotGroups .== sorted_indMetadata_subset.Fst_group[i])]
        end
        if indColorRightProvided
            boxColorRight = sorted_indMetadata_subset.indColorRight[i]
        else
            boxColorRight = plotGroupColors[findfirst(plotGroups .== sorted_indMetadata_subset.Fst_group[i])]
        end
        CairoMakie.poly!(Point2f.(groupColorBox_x_left, (y .+ groupColorBox_y)), color=boxColorLeft)
        CairoMakie.poly!(Point2f.(groupColorBox_x_right, (y .+ groupColorBox_y)), color=boxColorRight)
    end

    # generate my own plotting symbol (a rectangle)
    box_x = [-0.45, -0.45, 0.45, 0.45, -0.45]
    #box_x = [-0.5, -0.5, 0.5, 0.5, -0.5]
    box_y = [0.4, -0.4, -0.4, 0.4, 0.4]
    # generate triangles for plotting heterozygotes
    triangle1_x = [-0.45, -0.45, 0.45, -0.45]
    #triangle1_x = [-0.5, -0.5, 0.5, -0.5]
    triangle1_y = [0.4, -0.4, 0.4, 0.4]
    triangle2_x = [-0.45, 0.45, 0.45, -0.45]
    #triangle2_x = [-0.5, 0.5, 0.5, -0.5]
    triangle2_y = [-0.4, -0.4, 0.4, -0.4]
    # cycle through individuals, graphing each type of genotype:
    for i in 1:numInds
        y = numInds + 1 - i  # y is location for plotting; this reverses order of plot top-bottom
        #CairoMakie.lines!([0.5, numRegions + 0.5], [y, y], color="grey40") # for lines across the individual rows
        genotypes = sorted_genosSummary_subset[i, :]
        hom_ref_locs = findall(genotypes .== 0)
        if length(hom_ref_locs) > 0
            for j in eachindex(hom_ref_locs)
                CairoMakie.poly!(Point2f.((hom_ref_locs[j] .+ box_x), (y .+ box_y)), color=genotypeColors[1])
            end
        end
        het_locs = findall(genotypes .== 1)
        if length(het_locs) > 0
            for j in eachindex(het_locs)
                CairoMakie.poly!(Point2f.((het_locs[j] .+ triangle1_x), (y .+ triangle1_y)), color=genotypeColors[1])
                CairoMakie.poly!(Point2f.((het_locs[j] .+ triangle2_x), (y .+ triangle2_y)), color=genotypeColors[3])
            end
        end
        hom_alt_locs = findall(genotypes .== 2)
        if length(hom_alt_locs) > 0
            for j in eachindex(hom_alt_locs)
                CairoMakie.poly!(Point2f.((hom_alt_locs[j] .+ box_x), (y .+ box_y)), color=genotypeColors[3])
            end
        end
    end

    if isnothing(regionNames)
        regionNames = string.(1:numRegions)
    end

    # make labels on lower part
    y_label = 0.5 - 0.025numInds
    for i in 1:numRegions
        CairoMakie.text!(i, y_label; text = regionNames[i], align=(:center, :center), fontsize=30)
    end

    display(f)

    return f, sorted_genosSummary_subset, sorted_indMetadata_subset
end

# Set up a data structure to store the key to converting, for each haploblock region, 
# the cluster names to genotype integers. This is a dictiionary of dictionaries:
regionHaplotypeCode_west = Dict{String, Dict{String, Int}}(
    "gw1A" => Dict("virLud"=>0, "virLud_troch"=>1, "troch"=>2),
    "gw3" => Dict("virLud"=>0, "virLudHet"=>0, "virLud_trochObs"=>1, "trochObs"=>2, "trochObsHet"=>2),
    "gw13" => Dict("vir"=>0, "lud"=>0, "lud_troch"=>1, "troch"=>2),
    "gw15" => Dict("virLud"=>0, "virLud_troch"=>1, "troch"=>2),
    "gw18" => Dict("virLud"=>0, "virLud_troch"=>1, "troch"=>2),
    "gw19" => Dict("virLud"=>0, "virLudHet"=>0, "virLud_trochObs"=>1, "trochObs"=>2, "trochObsHet"=>2),
    "gw26" => Dict("virLud"=>0, "virLud_troch"=>1, "troch"=>2),
    "gw28" => Dict("virLud"=>0, "virLud_troch"=>1, "troch"=>2),
    "gwZ" => Dict("vir"=>0, "lud"=>0, "vir_lud"=>0, "lud_troch"=>1, "troch"=>2)
)

haploblockRegions = ["gw1A", "gw3", "gw13", "gw15", "gw18", "gw19", "gw26", "gw28", "gwZ"]
numHaploblockRegions = length(haploblockRegions)
numInds = size(ind_with_metadata_included, 1)
# create genotype object and fill with missing (-1) genotypes
genosSummary = fill(-1, (numInds, numHaploblockRegions)) 
# fill object with appropriate genotypes
for i in 1:numHaploblockRegions
    region = haploblockRegions[i]
    for (key, value) in regionHaplotypeCode_west[region]
        command = """genosSummary[ind_with_metadata_included.$(region)_cluster .== "$(key)", $i] .= """ * string(value)
        eval(Meta.parse(command)) # this executes the command constructed above
    end
end

# Must say I am pleased with the cleverness of above. Concise datastructure and code that does a lot. :) 

plotGroupsForSummaryWest = ["vir","vir_S","lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN"]
groupColorsForSummaryWest = ["blue","turquoise1","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow"] 

plotHaploblockSummary(genosSummary, ind_with_metadata_included,
                            plotGroupsForSummaryWest, groupColorsForSummaryWest;
                            regionNames = haploblockRegions,
                            indFontSize = 8, figureSize = (1200, 1600),
                            plotTitle = nothing,
                            indColorLeftProvided = false,
                            indColorRightProvided = false);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

The one missing element (the white cell in the gw28 column) is a heterozygote with the nitidus haplotype.

Make a summary plot for the east side

# Set up a data structure to store the key to converting, for each haploblock region, 
# the cluster names to genotype integers. This is a dictiionary of dictionaries:
regionHaplotypeCode_east = Dict{String, Dict{String, Int}}(
    "gw1A" => Dict("obs"=>0, "plumb"=>2),
    "gw3" => Dict("trochObs"=>0, "trochObsHet"=>0, "plumb"=>2, "plumbHet"=>2),
    "gw13" => Dict("obs"=>0, "plumb"=>2, "plumbHet"=>2),
    "gw15" => Dict("obs"=>0, "plumb"=>2),
    "gw18" => Dict("obs"=>0, "obs_plumb"=>1, "plumb"=>2),
    "gw19" => Dict("trochObs"=>0, "trochObs_plumb"=>1, "plumb"=>2),
    "gw26" => Dict("obs"=>0, "obs_plumb"=>1, "plumb"=>2),
    "gw28" => Dict("obs"=>0, "obsHet"=>0, "obs_plumb"=>1, "plumb"=>2),
    "gwZ" => Dict("obs"=>0, "plumb"=>2)
)

haploblockRegions = ["gw1A", "gw3", "gw13", "gw15", "gw18", "gw19", "gw26", "gw28", "gwZ"]
numHaploblockRegions = length(haploblockRegions)
numInds = size(ind_with_metadata_included, 1)
# create genotype object and fill with missing (-1) genotypes
genosSummary = fill(-1, (numInds, numHaploblockRegions)) 
# fill object with appropriate genotypes
for i in 1:numHaploblockRegions
    region = haploblockRegions[i]
    for (key, value) in regionHaplotypeCode_east[region]
        command = """genosSummary[ind_with_metadata_included.$(region)_cluster .== "$(key)", $i] .= """ * string(value)
        eval(Meta.parse(command)) # this executes the command constructed above
    end
end

plotGroupsForSummaryEast = ["obs","plumb_BJ","plumb"]
groupColorsForSummaryEast = ["orange","pink","red"] 

plotHaploblockSummary(genosSummary, ind_with_metadata_included,
                            plotGroupsForSummaryEast, groupColorsForSummaryEast;
                            regionNames = haploblockRegions,
                            indFontSize = 8, figureSize = (1200, 1600),
                            plotTitle = nothing,
                            indColorLeftProvided = false,
                            indColorRightProvided = false);
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

The white cells in the figure are heterozygotes between the plumbeitarsus and viridanus haplotypes.

Make a summary plot for the north side

# Set up a data structure to store the code to converting, for each haploblock region, 
# the cluster names to genotype integers. This is a dictiionary of dictionaries:
regionHaplotypeCode_north = Dict{String, Dict{String, Int}}(
    "gw1A" => Dict("virLud"=>0, "vir_plumb"=>1, "plumb"=>2),
    "gw3" => Dict("virLud"=>0, "virLudHet"=>0, "vir_plumb"=>1, "plumb"=>2, "plumbHet"=>2),
    "gw13" => Dict("vir"=>0, "vir_plumb"=>1, "plumb"=>2, "plumbHet"=>2),
    "gw15" => Dict("virLud"=>0, "vir_plumb"=>1, "plumb"=>2),
    "gw18" => Dict("virLud"=>0, "vir_plumb"=>1, "plumb"=>2),
    "gw19" => Dict("virLud"=>0, "virLudHet"=>0, "vir_plumb"=>1, "plumb"=>2),
    "gw26" => Dict("virLud"=>0, "vir_plumb"=>1, "plumb"=>2),
    "gw28" => Dict("virLud"=>0, "vir_plumb"=>1, "plumb"=>2),
    "gwZ" => Dict("vir"=>0, "plumb"=>2)
)

haploblockRegions = ["gw1A", "gw3", "gw13", "gw15", "gw18", "gw19", "gw26", "gw28", "gwZ"]
numHaploblockRegions = length(haploblockRegions)
numInds = size(ind_with_metadata_included, 1)
# create genotype object and fill with missing (-1) genotypes
genosSummary = fill(-1, (numInds, numHaploblockRegions)) 
# fill object with appropriate genotypes
for i in 1:numHaploblockRegions
    region = haploblockRegions[i]
    for (key, value) in regionHaplotypeCode_north[region]
        command = """genosSummary[ind_with_metadata_included.$(region)_cluster .== "$(key)", $i] .= """ * string(value)
        eval(Meta.parse(command)) # this executes the command constructed above
    end
end

plotGroupsForSummaryNorth = ["vir","plumb_vir","plumb"]
groupColorsForSummaryNorth = ["blue","purple","red"] 

plotHaploblockSummary(genosSummary, ind_with_metadata_included,
                            plotGroupsForSummaryNorth, groupColorsForSummaryNorth;
                            regionNames = haploblockRegions,
                            indFontSize = 8, figureSize = (1200, 1600),
                            plotTitle = nothing,
                            indColorLeftProvided = false,
                            indColorRightProvided = false)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

(Scene (768px, 960px):
  0 Plots
  1 Child Scene:
    └ Scene (768px, 960px), [0 0 … 0 0; 0 0 … 0 0; … ; 2 2 … 2 2; 1 2 … 1 2], 100×38 DataFrame
 Row  ind                        ID                         location  group   ⋯
     │ String                     String                     String7   String1 ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ GW_Armando_plate1_JF12G04  GW_Armando_plate1_JF12G04  ST_vi     vir     ⋯
   2 │ GW_Armando_plate2_JF03G01  GW_Armando_plate2_JF03G01  ST_vi     vir_mis
   3 │ GW_Armando_plate2_JF30G01  GW_Armando_plate2_JF30G01  ST_vi     vir_mis
   4 │ GW_Lane5_STvi1             GW_Lane5_STvi1             ST_vi     vir
   5 │ GW_Lane5_STvi2             GW_Lane5_STvi2             ST_vi     vir     ⋯
   6 │ GW_Lane5_STvi3             GW_Lane5_STvi3             ST_vi     vir
   7 │ GW_Armando_plate1_JF16G01  GW_Armando_plate1_JF16G01  DV_vi     plumb_v
   8 │ GW_Armando_plate2_JF16G02  GW_Armando_plate2_JF16G02  DV_vi     plumb_v
   9 │ GW_Armando_plate2_JE31G01  GW_Armando_plate2_JE31G01  VB_vi     vir_mis ⋯
  10 │ GW_Armando_plate2_JF03G02  GW_Armando_plate2_JF03G02  VB_vi     vir_mis
  11 │ GW_Lane5_YK11              GW_Lane5_YK11              YK        vir
  ⋮  │             ⋮                          ⋮                 ⋮          ⋮   ⋱
  91 │ GW_Armando_plate2_JF24G01  GW_Armando_plate2_JF24G01  VB        plumb
  92 │ GW_Armando_plate2_JF25G01  GW_Armando_plate2_JF25G01  VB        plumb   ⋯
  93 │ GW_Armando_plate1_JG02G02  GW_Armando_plate1_JG02G02  PR        plumb
  94 │ GW_Armando_plate1_JG02G04  GW_Armando_plate1_JG02G04  PR        plumb
  95 │ GW_Armando_plate2_JG01G01  GW_Armando_plate2_JG01G01  PR        plumb
  96 │ GW_Armando_plate2_JG02G01  GW_Armando_plate2_JG02G01  PR        plumb   ⋯
  97 │ GW_Armando_plate2_JG02G03  GW_Armando_plate2_JG02G03  PR        plumb
  98 │ GW_Lane5_SL1               GW_Lane5_SL1               SL        plumb
  99 │ GW_Lane5_SL2               GW_Lane5_SL2               SL        plumb
 100 │ GW_Armando_plate1_JF10G03  GW_Armando_plate1_JF10G03  ST        plumb_v ⋯
                                                  35 columns and 79 rows omitted)

Make a summary plot for the whole ring

# Set up a code converting integers to colors. 
# These will be used for all chromosome regions below.

integerToColorCodes = Dict{Int, String}(
    1 => "blue", # vir
    2 => "turquoise1", # vir south
    3 => "grey", # nit
    4 => "green", # lud
    5 => "yellow", # troch
    6 => "orange", # obs
    7 => "red", # plumb
)

# Set up a data structure to store the code to converting, for each haploblock region, 
# the cluster names to genotype integers corresponding to colors above. This is a dictionary of dictionaries.
# Each genotype will be encoded with a tuple representing the alleles.
regionHaplotypeCode_all = Dict{String, Dict{String, Tuple{Int, Int}}}(
    "gw1A" => Dict("virLud"=>(1,1), 
                    "nit"=>(3,3),
                    "virLud_troch"=>(1,5), 
                    "troch"=>(5,5),
                    "obs"=>(6,6),
                    "plumb"=>(7,7),
                    "vir_plumb"=>(1,7)),
    "gw3" => Dict("virLud"=>(1,1), 
                    "virLudHet"=>(1,1),
                    "nit"=>(3,3),
                    "virLud_trochObs"=>(1,5),
                    "trochObs"=>(5,5),
                    "trochObsHet"=>(5,5),
                    "plumb"=>(7,7),
                    "plumbHet"=>(7,7),
                    "vir_plumb"=>(1,7)),
    "gw4A" => Dict("virLud"=>(1,1),
                    "virLudHet"=>(1,1),
                    "nit"=>(3,3),
                    "virLud_troch"=>(1,5),
                    "troch"=>(5,5),
                    "trochHet"=>(5,5),
                    "troch_obsPlumb"=>(5,7),
                    "obsPlumb"=>(7,7),
                    "obsPlumbHet"=>(7,7),
                    "virLud_obsPlumb"=>(1,7)),
    "gw13" => Dict("vir"=>(1,1),
                    "vir_lud"=>(1,4),
                    "nit"=>(3,3),
                    "lud"=>(4,4),
                    "lud_troch"=>(4,5),
                    "troch"=>(5,5),
                    "obs"=>(6,6),
                    "plumb"=>(7,7),
                    "plumbHet"=>(7,7),
                    "vir_plumb"=>(1,7)),      
    "gw15" => Dict("virLud"=>(1,1),
                    "nit"=>(3,3),
                    "virLud_troch"=>(1,5),
                    "troch"=>(5,5),
                    "obs"=>(6,6),
                    "plumb"=>(7,7),
                    "vir_plumb"=>(1,7)),
    "gw17" => Dict("virLud"=>(1,1),
                    "nit"=>(3,3),
                    "virLud_troch"=>(1,5),
                    "troch"=>(5,5),
                    "virLud_obs"=>(1,6),
                    "obs"=>(6,6),
                    "troch_plumb"=>(5,7),
                    "plumb"=>(7,7),
                    "vir_plumb"=>(1,7)),
    "gw18" => Dict("virLud"=>(1,1),
                    "nit"=>(3,3),
                    "virLud_troch"=>(1,5),
                    "troch"=>(5,5),
                    "obs"=>(6,6),
                    "obs_plumb"=>(6,7),
                    "plumb"=>(7,7),
                    "vir_plumb"=>(1,7)),
    "gw19" => Dict("virLud"=>(1,1),
                    "virLudHet"=>(1,1),
                    "nit"=>(3,3),
                    "virLud_trochObs"=>(1,5),
                    "trochObs"=>(5,5),
                    "trochObsHet"=>(5,5),
                    "trochObs_plumb"=>(5,7),
                    "plumb"=>(7,7),
                    "vir_plumb"=>(1,7)),
    "gw20" => Dict("vir"=>(1,1),
                "nit"=>(3,3),
                "lud"=>(4,4),
                "ludHet"=>(4,4),
                "lud_troch"=>(4,5),
                "troch"=>(5,5),
                "obs"=>(6,6),
                "obs_plumb"=>(6,7),
                "plumb"=>(7,7),
                "vir_plumb"=>(1,7)),
    "gw26" => Dict("virLud"=>(1,1),
                    "nit"=>(3,3),
                    "virLud_troch"=>(1,5),
                    "troch"=>(5,5),
                    "obs"=>(6,6),
                    "obs_plumb"=>(6,7),
                    "plumb"=>(7,7),
                    "vir_plumb"=>(1,7)),
    "gw28" => Dict("virLud"=>(1,1),
                    "virLud_nit"=>(1,3),
                    "nit"=>(3,3),
                    "virLud_troch"=>(1,5),
                    "troch"=>(5,5),
                    "obs"=>(6,6),
                    "obsHet"=>(6,6),
                    "obs_plumb"=>(6,7),
                    "plumb"=>(7,7),
                    "vir_plumb"=>(1,7)),
    "gwZ" => Dict("vir"=>(1,1),
                    "vir_lud"=>(1,4),
                    "nit"=>(3,3),
                    "lud"=>(4,4),
                    "lud_troch"=>(4,5),
                    "troch"=>(5,5),
                    "obs"=>(6,6),
                    "plumb"=>(7,7))
)

haploblockRegions = ["gwZ", "gw1A", "gw3", "gw4A", "gw13", "gw15", "gw17", "gw18", "gw19", "gw20","gw26", "gw28"]

numHaploblockRegions = length(haploblockRegions)
numInds = size(ind_with_metadata_included, 1)
# create genotype object and fill with missing (-1) genotypes
genosSummary = fill((-9, -9), numInds, numHaploblockRegions)
# fill object with appropriate genotypes
for i in 1:numHaploblockRegions
    region = haploblockRegions[i]
    for (key, value) in regionHaplotypeCode_all[region]
        command = """genosSummary[ind_with_metadata_included.$(region)_cluster .== "$(key)", $i] .= ($(string(value)),)"""  # the construction at the end "protects" the tuple within a tuple, so it broadcasts correctly to each element on the left
        eval(Meta.parse(command)) # this executes the command constructed above
    end
end

plotGroupsForSummary_all = ["vir","vir_S","nit", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN","troch_EM","obs","plumb_BJ","plumb","plumb_vir"]
groupColorsForSummary_all = ["blue","turquoise1","grey","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow","gold","orange","pink","red","purple"] 


"""
    plotHaploblockSummaryWithColors(integerToColorCodes::Dict{Int, String},
                            genosSummary::Matrix{Tuple{Int64, Int64}},
                            indMetadata,
                            plotGroups, plotGroupColors;
                            regionNames,
                            indFontSize=10, figureSize=(1200, 1200),
                            plotTitle = nothing,
                            indColorLeftProvided = false,
                            indColorRightProvided = false)

Construct a genotype-by-individual plot, with option to filter out SNPs with too much missing data. 

In this version, more than two haplotype alleles can be plotted, using colors provided according to the first argument.

​# Arguments

- `integerToColorCodes`: The code matching integer haploblock types to colors.
- `genosSummary`: Matrix containing summary genotype data (individuals in rows, loci in columns), with each genotype represented by a tuple of 2 integers.
- `indMetadata`: Matrix of metadata for individuals; must contain `Fst_group` and `plot_order` columns.
- `plotGroups`: Vector of group names to include in plot.
- `plotGroupColors`: Vector of plotting colors corresponding to the groups.
- `regionNames`: Optional; Names of the genotyped regions.
- `indFontSize`: Optional; the font size of the individual ID labels.
- `figureSize`: Optional; the size of the figure; default is `(1200, 1200)`.  
- `plotTitle`: Optional; default will make a title. For no title, set to `""`.
- `indColorLeftProvided`: Optional; Default is `false`. Set to `true` if there is a column labeled `indColorLeft` in the metadata providing color of each individual for plotting on left side.
- `indColorRightProvided`: Optional; same as above but for right side (requires `indColorRight` column in metadata).

# Notes
Returns a tuple containing:
- the figure
- the plotted genotypes
- the sorted metadata matrix for the plotted individuals
"""
function plotHaploblockSummaryWithColors(integerToColorCodes::Dict{Int, String},
                                genosSummary::Matrix{Tuple{Int64, Int64}}, 
                                indMetadata,
                                plotGroups, plotGroupColors;
                                regionNames = nothing,        
                                indFontSize=10, figureSize=(1200, 1200),
                                plotTitle = nothing,
                                indColorLeftProvided = false,
                                indColorRightProvided = false)

    numRegions = size(genosSummary, 2)

    genosSummary_subset = genosSummary[indMetadata.Fst_group .∈ Ref(plotGroups), :]
    indMetadata_subset = indMetadata[indMetadata.Fst_group .∈ Ref(plotGroups), :]

    # Choose sorting order by plot_order column in input metadata file

    sorted_genosSummary_subset = genosSummary_subset[sortperm(indMetadata_subset.plot_order, rev=false), :]
    numInds = size(sorted_genosSummary_subset, 1)
    sorted_indMetadata_subset = indMetadata_subset[sortperm(indMetadata_subset.plot_order, rev=false), :]

    # Set up the plot window:
    f = CairoMakie.Figure(size=figureSize)

    if isnothing(plotTitle)
        plotTitle = "Summary of $numRegions haploblock genotypes for $numInds individuals"
    end 

    # Set up the main Axis: 
    ax = Axis(f[1, 1],
        title = plotTitle,
        titlesize=30,
        limits=(0.5 - 0.09 * (numRegions), 0.5 + 1.09 * (numRegions),
            0.5 - 0.3 * numInds, 0.5 + numInds)
    )
    hidedecorations!(ax) # hide background lattice and axis labels
    hidespines!(ax) # hide box around plot

    genotypeColors = ["#3f007d", "#807dba", "#dadaeb", "grey50"]  # purple shades from colorbrewer

    # plot evenly spaced by SNP order along chromosome:
    # make top part of fig (genotypes for individuals)
    labelCushion = numRegions / 100
    label_x_left = 0.5 - labelCushion
    label_x_right = 0.5 + numRegions + labelCushion
    colorBoxCushion = 0.07 * numRegions
    groupColorBox_x_left = 0.5 - colorBoxCushion
    groupColorBox_x_right = 0.5 + numRegions + colorBoxCushion
    boxWidth = 0.005 * numRegions * 2
    groupColorBox_x_left = [-boxWidth, -boxWidth, boxWidth, boxWidth, -boxWidth] .+ groupColorBox_x_left
    groupColorBox_x_right = [-boxWidth, -boxWidth, boxWidth, boxWidth, -boxWidth] .+ groupColorBox_x_right
    groupColorBox_y = [0.4, -0.4, -0.4, 0.4, 0.4]

    for i in 1:numInds
        y = numInds + 1 - i  # y is location for plotting; this reverses order of plot top-bottom
        labelText = last(split(sorted_indMetadata_subset.ID[i], "_"))  # this gets the last part of the sample ID (usually the main ID part)
        # put sample label on left side:
        CairoMakie.text!(label_x_left, y; text=labelText, align=(:right, :center), fontsize=indFontSize)
        # put sample label on left side:
        CairoMakie.text!(label_x_right, y; text=labelText, align=(:left, :center), fontsize=indFontSize)
        if indColorLeftProvided
            boxColorLeft = sorted_indMetadata_subset.indColorLeft[i]
        else
            boxColorLeft = plotGroupColors[findfirst(plotGroups .== sorted_indMetadata_subset.Fst_group[i])]
        end
        if indColorRightProvided
            boxColorRight = sorted_indMetadata_subset.indColorRight[i]
        else
            boxColorRight = plotGroupColors[findfirst(plotGroups .== sorted_indMetadata_subset.Fst_group[i])]
        end
        CairoMakie.poly!(Point2f.(groupColorBox_x_left, (y .+ groupColorBox_y)), color=boxColorLeft)
        CairoMakie.poly!(Point2f.(groupColorBox_x_right, (y .+ groupColorBox_y)), color=boxColorRight)
    end

    # generate my own plotting symbol (a rectangle)
    box_x = [-0.45, -0.45, 0.45, 0.45, -0.45]
    #box_x = [-0.5, -0.5, 0.5, 0.5, -0.5]
    box_y = [0.4, -0.4, -0.4, 0.4, 0.4]
    # generate triangles for plotting heterozygotes
    triangle1_x = [-0.45, -0.45, 0.45, -0.45]
    #triangle1_x = [-0.5, -0.5, 0.5, -0.5]
    triangle1_y = [0.4, -0.4, 0.4, 0.4]
    triangle2_x = [-0.45, 0.45, 0.45, -0.45]
    #triangle2_x = [-0.5, 0.5, 0.5, -0.5]
    triangle2_y = [-0.4, -0.4, 0.4, -0.4]
    # cycle through individuals, graphing each type of genotype:
    for i in 1:numInds
        y = numInds + 1 - i  # y is location for plotting; this reverses order of plot top-bottom
        #CairoMakie.lines!([0.5, numRegions + 0.5], [y, y], color="grey40") # for lines across the individual rows
        # cycle through regions for this individual
        for j in 1:numRegions
            genotype = sorted_genosSummary_subset[i, j]
            if genotype[1] == genotype[2]  # homozygous
                CairoMakie.poly!(Point2f.((j .+ box_x), (y .+ box_y)), color=integerToColorCodes[genotype[1]])
            else # heterozygous
                CairoMakie.poly!(Point2f.((j .+ triangle1_x), (y .+ triangle1_y)), color=integerToColorCodes[genotype[1]])
                CairoMakie.poly!(Point2f.((j .+ triangle2_x), (y .+ triangle2_y)), color=integerToColorCodes[genotype[2]])
            end
        end
    end

    if isnothing(regionNames)
        regionNames = string.(1:numRegions)
    end

    # make labels on lower part
    y_label = 0.5 - 0.025numInds
    for i in 1:numRegions
        CairoMakie.text!(i, y_label; text = regionNames[i], align=(:center, :center), fontsize=24)
    end

    display(f)

    return f, sorted_genosSummary_subset, sorted_indMetadata_subset
end


fig_5 = plotHaploblockSummaryWithColors(integerToColorCodes,
                            genosSummary, 
                            ind_with_metadata_included,
                            plotGroupsForSummary_all, groupColorsForSummary_all;
                            regionNames = haploblockRegions,
                            indFontSize = 7, figureSize = (1000, 2000),
                            plotTitle = nothing,
                            indColorLeftProvided = false,
                            indColorRightProvided = false)

if false  # set to true to save plot
    save("Figure5_from_Julia.png", fig_5[1], px_per_unit = 3.0)
end
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Produce Fst plot across genome

Calculate allele freqs and sample sizes (use column Fst_group)

groups = ["vir","troch_LN","plumb","plumb_vir"]
freqs, sampleSizes = getFreqsAndSampleSizes(genosOnly_included, ind_with_metadata_included.Fst_group, groups)
println("Calculated population allele frequencies and sample sizes")
Calculated population allele frequencies and sample sizes

calculate Fst for each SNP

Fst, FstNumerator, FstDenominator, pairwiseNamesFst = getFst(freqs, sampleSizes, groups; among=true)  # set among to FALSE if no among Fst wanted (some things won't work without it) 
println("Calculated Fst values")
Calculated Fst values

Make list of main scaffolds to include in Fst plot across genome:

scaffolds_for_Fst = "gw" .* string.(vcat(1, "1A", 2:4, "4A", 5:15, 17:28, "Z"))
30-element Vector{String}:
 "gw1"
 "gw1A"
 "gw2"
 "gw3"
 "gw4"
 "gw4A"
 "gw5"
 "gw6"
 "gw7"
 "gw8"
 "gw9"
 "gw10"
 "gw11"
 ⋮
 "gw18"
 "gw19"
 "gw20"
 "gw21"
 "gw22"
 "gw23"
 "gw24"
 "gw25"
 "gw26"
 "gw27"
 "gw28"
 "gwZ"

calculate windowed Fst

This is calculated according to Weir&Cockerham1984 (with sample size and pop number correction), calculated as windowed numerator over windowed denominator, in whole windows starting on left side of chromosome.

windowSize = 500

# calculate windowed Fst across all scaffolds:

windowed_pos_all = DataFrame(chrom = String[], mean_position = Float64[])
windowed_Fst_all = Array{Float32, 2}(undef, size(FstNumerator, 1), 0)
for chrom in scaffolds_for_Fst
    regionText = string("chr", chrom)
    loci_selection = (pos_SNP_filtered.chrom .== chrom)
    pos_region = pos_SNP_filtered[loci_selection, :]
    FstNumerator_region = FstNumerator[:, loci_selection]
    FstDenominator_region = FstDenominator[:, loci_selection]
    windowedPos, windowedFst = getWindowedFst(FstNumerator_region, FstDenominator_region, pos_region, windowSize)
    windowed_pos_chrom = DataFrame(chrom = repeat([chrom], length(windowedPos)), mean_position = windowedPos)
    windowed_pos_all = vcat(windowed_pos_all, windowed_pos_chrom)
    windowed_Fst_all = hcat(windowed_Fst_all, windowedFst)
end

# The below is just a test plot, showing nothing useful really (as it overlaps all chromosomes onto one x axis):
#plot(windowed_pos_all.mean_position, windowed_Fst_all[1,:])

The above has produced windowed Fst values across the whole genome, for each population comparison. These are stored in windowed_Fst_all and the location info is stored in windowed_pos_all.

Now make a plot of windowed Fst across all scaffolds:

scaffolds_to_plot = scaffolds_for_Fst

groupsToPlotFst = ["vir_troch_LN", "troch_LN_plumb", "vir_plumb"]
groupColorsFst = ["green3", "orange", "purple"]
 
figHandle_GenomeFst3 = plotGenomeFst(scaffolds_to_plot, 
                                    windowed_Fst_all,
                                    pairwiseNamesFst,
                                    windowed_pos_all,
                                    groupsToPlotFst,
                                    groupColorsFst;
                                    lineTransparency = 0.8,
                                    fillTransparency = 0.2,
                                    figureSize=(1200, 1200));
[["gw1", "gw4A", "gw6"], ["gw1A", "gw4", "gw9"], ["gw2", "gw8"], ["gw3", "gw5"], ["gw7", "gw10", "gw11", "gw12", "gw13", "gw14"], ["gw15", "gw17", "gw18", "gw19", "gw20", "gw21", "gw22", "gw23", "gw24", "gw25"], ["gw26", "gw27", "gw28", "gwZ"]]
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Now do one with just the vir_plumb connection

groupsToPlotFst = ["vir_plumb"]
groupColorsFst = ["purple"]

figHandle_GenomeFst1 = plotGenomeFst(scaffolds_to_plot, 
                                    windowed_Fst_all,
                                    pairwiseNamesFst,
                                    windowed_pos_all,
                                    groupsToPlotFst,
                                    groupColorsFst;
                                    lineTransparency = 0.8,
                                    fillTransparency = 0.2,
                                    figureSize=(1200, 800))

if true  # set to true to save plot
    filename = string("FigureS33_GenomeFst_fromJulia.png")
    save(filename, figHandle_GenomeFst1, px_per_unit = 2.0)
    println("Saved ", filename)
end 
[["gw1", "gw4A", "gw6"], ["gw1A", "gw4", "gw9"], ["gw2", "gw8"], ["gw3", "gw5"], ["gw7", "gw10", "gw11", "gw12", "gw13", "gw14"], ["gw15", "gw17", "gw18", "gw19", "gw20", "gw21", "gw22", "gw23", "gw24", "gw25"], ["gw26", "gw27", "gw28", "gwZ"]]
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Saved FigureS33_GenomeFst_fromJulia.png

Plot ViSHet vs. Fst

groupsToCompareUsingFst = "vir_plumb"
FstRow = findfirst(pairwiseNamesFst .== groupsToCompareUsingFst)
windowedFstValues = windowed_Fst_all[FstRow, :]
#plot(windowedFstValues, windowed_ViSHet_all)

fillOpacity = 0.3
lineOpacity = 0.8

f = Figure()
ax = Axis(f[1, 1],
    xlabel = "windowed Fst", xlabelsize = 24,
    ylabel = "windowed VisHet", ylabelsize = 24)
# hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
plot!(ax, windowedFstValues, windowed_ViSHet_all,
    marker = :circle, color = ("black", fillOpacity), markersize = 8, strokewidth=0.5, strokecolor = ("black", lineOpacity))

display(f)

if true  # set to true to save plot
    filename = string("FigureS34_windowedFstvViSHet_fromJulia.png")
    save(filename, f, px_per_unit = 2.0)
    println("Saved ", filename)
end 

# to see histograms of each distribution:
# hist(windowedFstValues)
# hist(windowed_ViSHet_all)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Saved FigureS34_windowedFstvViSHet_fromJulia.png

Examine chromosome 4A Large HaploBlock Region (LHBR) with invariant sites included

Before running below, changed 012NA file back into 012minus1 file, using commands like below, so can be read as integer:

cat /Users/darrenirwin/GW_data_from_cedar_Feb2024/GW2022_cedar/infoSites_vcfs/GW2022_all4plates.genotypes.allSites.chrgw4A.infoSites.max2allele_noindel.maxmiss60.MQ20.lowHet.tab.012NA | sed 's/NA/-1/g' > /Users/darrenirwin/GW_data_from_cedar_Feb2024/GW2022_cedar/infoSites_vcfs/GW2022_all4plates.genotypes.allSites.chrgw4A.infoSites.max2allele_noindel.maxmiss60.MQ20.lowHet.tab.012minus1
baseName = "/Users/darrenirwin/GW_data_from_cedar_Feb2024/GW2022_cedar/infoSites_vcfs/GW2022_all4plates.genotypes.allSites.chrgw4A.infoSites.max2allele_noindel.maxmiss60.MQ20.lowHet.tab"
# load metadata
cd(dataDirectory)
metadata_chr4A = DataFrame(CSV.File(metadataFile)) # the CSV.File function interprets the correct delimiter
num_metadata_cols_chr4A = ncol(metadata_chr4A)
num_individuals_chr4A = nrow(metadata_chr4A) 
# read in individual names for this dataset
individuals_file_name_chr4A = string(baseName, ".012.indv")
ind_chr4A = DataFrame(CSV.File(individuals_file_name_chr4A; header=["ind"], types=[String])) 
indNum_chr4A = size(ind_chr4A, 1) # number of individuals
if num_individuals_chr4A != indNum_chr4A
    println("WARNING: number of rows in metadata file different than number of individuals in .indv file")
end
# read in position data for this dataset
position_file_name_chr4A = string(baseName, ".012.pos")
pos_chr4A = DataFrame(CSV.File(position_file_name_chr4A; header=["chrom", "position"], types=[String, Int]))
# read in genotype data
genotype_file_name_chr4A = string(baseName, ".012minus1") 
@time if 1 <= indNum_chr4A <= 127   
    geno_chr4A = readdlm(genotype_file_name_chr4A, '\t', Int8, '\n'); # this has been sped up dramatically, by first converting "NA" to -1
elseif 128 <= indNum_chr4A <= 32767
    geno_chr4A = readdlm(genotype_file_name_chr4A, '\t', Int16, '\n'); # this needed for first column, which is number of individual; Int16 not much slower on import than Int8
else
    print("Error: Number of individuals in .indv appears outside of range from 1 to 32767")
end
loci_count_chr4A = size(geno_chr4A, 2) - 1   # because the first column is not a SNP (just a count from zero)
print(string("Read in genotypic data at ", loci_count_chr4A," loci for ", indNum_chr4A, " individuals. \n"))
 25.019594 seconds (340.65 M allocations: 9.788 GiB, 44.52% gc time, 2.54% compilation time)
Read in genotypic data at 364640 loci for 310 individuals. 

Check that individuals are same in genotype data and metadata

ind_with_metadata_chr4A = hcat(ind_chr4A, metadata_chr4A)
println(ind_with_metadata_chr4A)
println()  # prints a line break 
if isequal(ind_with_metadata_chr4A.ind, ind_with_metadata_chr4A.ID)
    println("GOOD NEWS: names of individuals in metadata file and genotype ind file match perfectly.")
else
    println("WARNING: names of individuals in metadata file and genotype ind file do not completely match.")
end
310×6 DataFrame
 Row │ ind                             ID                              location  group           Fst_group       plot_order 
     │ String                          String31                        String7   String15        String15        Float64    
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ GW_Armando_plate1_AB1           GW_Armando_plate1_AB1           AB        vir             vir                 20.01
   2 │ GW_Armando_plate1_JF07G02       GW_Armando_plate1_JF07G02       ST        plumb           plumb              108.0
   3 │ GW_Armando_plate1_JF07G03       GW_Armando_plate1_JF07G03       ST        plumb           plumb              109.0
   4 │ GW_Armando_plate1_JF07G04       GW_Armando_plate1_JF07G04       ST        plumb           plumb              110.0
   5 │ GW_Armando_plate1_JF08G02       GW_Armando_plate1_JF08G02       ST        plumb           plumb              111.0
   6 │ GW_Armando_plate1_JF09G01       GW_Armando_plate1_JF09G01       ST        plumb           plumb              112.0
   7 │ GW_Armando_plate1_JF09G02       GW_Armando_plate1_JF09G02       ST        plumb           plumb              113.0
   8 │ GW_Armando_plate1_JF10G03       GW_Armando_plate1_JF10G03       ST        plumb_vir       plumb_vir          170.0
   9 │ GW_Armando_plate1_JF11G01       GW_Armando_plate1_JF11G01       ST        plumb           plumb              114.0
  10 │ GW_Armando_plate1_JF12G01       GW_Armando_plate1_JF12G01       ST        plumb           plumb              115.0
  11 │ GW_Armando_plate1_JF12G02       GW_Armando_plate1_JF12G02       ST        plumb           plumb              116.0
  12 │ GW_Armando_plate1_JF12G04       GW_Armando_plate1_JF12G04       ST_vi     vir             vir                 24.001
  13 │ GW_Armando_plate1_JF13G01       GW_Armando_plate1_JF13G01       ST        plumb           plumb              117.0
  14 │ GW_Armando_plate1_JF15G03       GW_Armando_plate1_JF15G03       DV        plumb           plumb              103.0
  15 │ GW_Armando_plate1_JF16G01       GW_Armando_plate1_JF16G01       DV_vi     plumb_vir       vir                 24.041
  16 │ GW_Armando_plate1_JF20G01       GW_Armando_plate1_JF20G01       MB        plumb           plumb               94.0
  17 │ GW_Armando_plate1_JF22G01       GW_Armando_plate1_JF22G01       MB        plumb           plumb               95.0
  18 │ GW_Armando_plate1_JF23G01       GW_Armando_plate1_JF23G01       VB        plumb           plumb               98.0
  19 │ GW_Armando_plate1_JF23G02       GW_Armando_plate1_JF23G02       VB        plumb           plumb               99.0
  20 │ GW_Armando_plate1_JF24G02       GW_Armando_plate1_JF24G02       VB        plumb           plumb              100.0
  21 │ GW_Armando_plate1_JF26G01       GW_Armando_plate1_JF26G01       ST        plumb           plumb              118.0
  22 │ GW_Armando_plate1_JF27G01       GW_Armando_plate1_JF27G01       ST        plumb           plumb              119.0
  23 │ GW_Armando_plate1_JF29G01       GW_Armando_plate1_JF29G01       ST        plumb           plumb              120.0
  24 │ GW_Armando_plate1_JF29G02       GW_Armando_plate1_JF29G02       ST        plumb           plumb              121.0
  25 │ GW_Armando_plate1_JF29G03       GW_Armando_plate1_JF29G03       ST        plumb           plumb              122.0
  26 │ GW_Armando_plate1_JG02G02       GW_Armando_plate1_JG02G02       PR        plumb           plumb              145.0
  27 │ GW_Armando_plate1_JG02G04       GW_Armando_plate1_JG02G04       PR        plumb           plumb              146.0
  28 │ GW_Armando_plate1_JG08G01       GW_Armando_plate1_JG08G01       ST        plumb           plumb              123.0
  29 │ GW_Armando_plate1_JG08G02       GW_Armando_plate1_JG08G02       ST        plumb           plumb              124.0
  30 │ GW_Armando_plate1_JG10G01       GW_Armando_plate1_JG10G01       ST        plumb           plumb              125.0
  31 │ GW_Armando_plate1_JG12G01       GW_Armando_plate1_JG12G01       ST        plumb           plumb              126.0
  32 │ GW_Armando_plate1_JG17G01       GW_Armando_plate1_JG17G01       ST        plumb_vir       plumb              127.0
  33 │ GW_Armando_plate1_NO_BC_TTGW05  GW_Armando_plate1_NO_BC_TTGW05  blank     blank           blank              -99.0
  34 │ GW_Armando_plate1_NO_DNA        GW_Armando_plate1_NO_DNA        blank     blank           blank              -99.0
  35 │ GW_Armando_plate1_RF20G01       GW_Armando_plate1_RF20G01       BJ        obs_plumb       plumb_BJ            77.501
  36 │ GW_Armando_plate1_RF29G02       GW_Armando_plate1_RF29G02       BJ        obs_plumb       plumb_BJ            77.502
  37 │ GW_Armando_plate1_TL3           GW_Armando_plate1_TL3           TL        vir             vir                 11.01
  38 │ GW_Armando_plate1_TTGW01        GW_Armando_plate1_TTGW01        MN        troch_MN        troch_west          53.0
  39 │ GW_Armando_plate1_TTGW05_rep1   GW_Armando_plate1_TTGW05_rep1   MN_rep    troch_MN_rep    troch_west_rep      53.0
  40 │ GW_Armando_plate1_TTGW05_rep2   GW_Armando_plate1_TTGW05_rep2   MN        troch_MN        troch_west          53.0
  41 │ GW_Armando_plate1_TTGW06        GW_Armando_plate1_TTGW06        SU        lud_Sukhto      lud_central         47.0
  42 │ GW_Armando_plate1_TTGW07        GW_Armando_plate1_TTGW07        SU        lud_Sukhto      lud_central         47.0
  43 │ GW_Armando_plate1_TTGW10        GW_Armando_plate1_TTGW10        SU        lud_Sukhto      lud_central         47.0
  44 │ GW_Armando_plate1_TTGW11        GW_Armando_plate1_TTGW11        SU        lud_Sukhto      lud_central         47.0
  45 │ GW_Armando_plate1_TTGW13        GW_Armando_plate1_TTGW13        TH        lud_Thallighar  lud_central         43.0
  46 │ GW_Armando_plate1_TTGW17        GW_Armando_plate1_TTGW17        TH        lud_Thallighar  lud_central         43.0
  47 │ GW_Armando_plate1_TTGW19        GW_Armando_plate1_TTGW19        TH        lud_Thallighar  lud_central         43.0
  48 │ GW_Armando_plate1_TTGW21        GW_Armando_plate1_TTGW21        SR        lud_Sural       lud_central         45.0
  49 │ GW_Armando_plate1_TTGW22        GW_Armando_plate1_TTGW22        SR        lud_Sural       lud_central         45.0
  50 │ GW_Armando_plate1_TTGW23        GW_Armando_plate1_TTGW23        SR        lud_Sural       lud_central         45.0
  51 │ GW_Armando_plate1_TTGW29        GW_Armando_plate1_TTGW29        SR        lud_Sural       lud_central         45.0
  52 │ GW_Armando_plate1_TTGW52        GW_Armando_plate1_TTGW52        NG        lud_Nainaghar   lud_central         49.0
  53 │ GW_Armando_plate1_TTGW53        GW_Armando_plate1_TTGW53        NG        lud_Nainaghar   lud_central         49.0
  54 │ GW_Armando_plate1_TTGW55        GW_Armando_plate1_TTGW55        NG        lud_Nainaghar   lud_central         49.0
  55 │ GW_Armando_plate1_TTGW57        GW_Armando_plate1_TTGW57        NG        lud_Nainaghar   lud_central         49.0
  56 │ GW_Armando_plate1_TTGW58        GW_Armando_plate1_TTGW58        NG        lud_Nainaghar   lud_central         49.0
  57 │ GW_Armando_plate1_TTGW59        GW_Armando_plate1_TTGW59        NG        lud_Nainaghar   lud_central         49.0
  58 │ GW_Armando_plate1_TTGW63        GW_Armando_plate1_TTGW63        SP        lud_Spiti       troch_west          55.0
  59 │ GW_Armando_plate1_TTGW64        GW_Armando_plate1_TTGW64        SP        lud_Spiti       troch_west          55.0
  60 │ GW_Armando_plate1_TTGW65        GW_Armando_plate1_TTGW65        SP        lud_Spiti       troch_west          55.0
  61 │ GW_Armando_plate1_TTGW66        GW_Armando_plate1_TTGW66        SP        lud_Spiti       troch_west          55.0
  62 │ GW_Armando_plate1_TTGW68        GW_Armando_plate1_TTGW68        SP        lud_Spiti       troch_west          55.0
  63 │ GW_Armando_plate1_TTGW70        GW_Armando_plate1_TTGW70        SA        lud_Sathrundi   lud_Sath            41.0
  64 │ GW_Armando_plate1_TTGW71        GW_Armando_plate1_TTGW71        SA        lud_Sathrundi   lud_Sath            41.0
  65 │ GW_Armando_plate1_TTGW72        GW_Armando_plate1_TTGW72        SA        lud_Sathrundi   lud_Sath            41.0
  66 │ GW_Armando_plate1_TTGW74        GW_Armando_plate1_TTGW74        SA        lud_Sathrundi   lud_Sath            41.0
  67 │ GW_Armando_plate1_TTGW78        GW_Armando_plate1_TTGW78        SA        lud_Sathrundi   lud_Sath            41.0
  68 │ GW_Armando_plate1_TTGW_15_05    GW_Armando_plate1_TTGW_15_05    SR        lud_Sural       lud_central         45.0
  69 │ GW_Armando_plate1_TTGW_15_07    GW_Armando_plate1_TTGW_15_07    SR        lud_Sural       lud_central         45.0
  70 │ GW_Armando_plate1_TTGW_15_08    GW_Armando_plate1_TTGW_15_08    SR        lud_Sural       lud_central         45.0
  71 │ GW_Armando_plate1_TTGW_15_09    GW_Armando_plate1_TTGW_15_09    SR        lud_Sural       lud_central         45.0
  72 │ GW_Armando_plate1_UY1           GW_Armando_plate1_UY1           UY        plumb           plumb               87.0
  73 │ GW_Armando_plate2_IL2           GW_Armando_plate2_IL2           IL_rep    plumb_rep       plumb_rep           84.0
  74 │ GW_Armando_plate2_JE31G01       GW_Armando_plate2_JE31G01       VB_vi     vir_misID       vir                 24.002
  75 │ GW_Armando_plate2_JF03G01       GW_Armando_plate2_JF03G01       ST_vi     vir_misID       vir                 24.003
  76 │ GW_Armando_plate2_JF03G02       GW_Armando_plate2_JF03G02       VB_vi     vir_misID       vir                 24.004
  77 │ GW_Armando_plate2_JF07G01       GW_Armando_plate2_JF07G01       ST        plumb           plumb              128.0
  78 │ GW_Armando_plate2_JF08G04       GW_Armando_plate2_JF08G04       ST        plumb           plumb              129.0
  79 │ GW_Armando_plate2_JF10G02       GW_Armando_plate2_JF10G02       ST        plumb           plumb              130.0
  80 │ GW_Armando_plate2_JF11G02       GW_Armando_plate2_JF11G02       ST        plumb           plumb              131.0
  81 │ GW_Armando_plate2_JF12G03       GW_Armando_plate2_JF12G03       ST        plumb           plumb              132.0
  82 │ GW_Armando_plate2_JF12G05       GW_Armando_plate2_JF12G05       ST        plumb           plumb              133.0
  83 │ GW_Armando_plate2_JF13G02       GW_Armando_plate2_JF13G02       ST        plumb           plumb              134.0
  84 │ GW_Armando_plate2_JF14G01       GW_Armando_plate2_JF14G01       DV        plumb           plumb              104.0
  85 │ GW_Armando_plate2_JF14G02       GW_Armando_plate2_JF14G02       DV        plumb           plumb              105.0
  86 │ GW_Armando_plate2_JF15G01       GW_Armando_plate2_JF15G01       DV        plumb           plumb              106.0
  87 │ GW_Armando_plate2_JF15G02       GW_Armando_plate2_JF15G02       DV        plumb           plumb              107.0
  88 │ GW_Armando_plate2_JF16G02       GW_Armando_plate2_JF16G02       DV_vi     plumb_vir       vir                 24.042
  89 │ GW_Armando_plate2_JF19G01       GW_Armando_plate2_JF19G01       MB        plumb           plumb               96.0
  90 │ GW_Armando_plate2_JF20G02       GW_Armando_plate2_JF20G02       MB        plumb           plumb               97.0
  91 │ GW_Armando_plate2_JF24G01       GW_Armando_plate2_JF24G01       VB        plumb           plumb              101.0
  92 │ GW_Armando_plate2_JF24G03       GW_Armando_plate2_JF24G03       ST        plumb           plumb              135.0
  93 │ GW_Armando_plate2_JF25G01       GW_Armando_plate2_JF25G01       VB        plumb           plumb              102.0
  94 │ GW_Armando_plate2_JF26G02       GW_Armando_plate2_JF26G02       ST        plumb           plumb              136.0
  95 │ GW_Armando_plate2_JF27G02       GW_Armando_plate2_JF27G02       ST        plumb           plumb              137.0
  96 │ GW_Armando_plate2_JF30G01       GW_Armando_plate2_JF30G01       ST_vi     vir_misID       vir                 24.005
  97 │ GW_Armando_plate2_JG01G01       GW_Armando_plate2_JG01G01       PR        plumb           plumb              147.0
  98 │ GW_Armando_plate2_JG02G01       GW_Armando_plate2_JG02G01       PR        plumb           plumb              148.0
  99 │ GW_Armando_plate2_JG02G03       GW_Armando_plate2_JG02G03       PR        plumb           plumb              149.0
 100 │ GW_Armando_plate2_JG10G02       GW_Armando_plate2_JG10G02       ST        plumb           plumb              138.0
 101 │ GW_Armando_plate2_JG10G03       GW_Armando_plate2_JG10G03       ST        plumb           plumb              139.0
 102 │ GW_Armando_plate2_JG12G02       GW_Armando_plate2_JG12G02       ST        plumb           plumb              140.0
 103 │ GW_Armando_plate2_JG12G03       GW_Armando_plate2_JG12G03       ST        plumb           plumb              141.0
 104 │ GW_Armando_plate2_LN11          GW_Armando_plate2_LN11          LN_rep    troch_LN_rep    troch_LN_rep        65.01
 105 │ GW_Armando_plate2_LN2           GW_Armando_plate2_LN2           LN        troch_LN        troch_LN            58.01
 106 │ GW_Armando_plate2_NO_BC_TTGW05  GW_Armando_plate2_NO_BC_TTGW05  blank     blank           blank              -99.0
 107 │ GW_Armando_plate2_NO_DNA        GW_Armando_plate2_NO_DNA        blank     blank           blank              -99.0
 108 │ GW_Armando_plate2_RF29G01       GW_Armando_plate2_RF29G01       BJ        obs_plumb       plumb_BJ            77.503
 109 │ GW_Armando_plate2_TTGW02        GW_Armando_plate2_TTGW02        MN        troch_MN        troch_west          53.0
 110 │ GW_Armando_plate2_TTGW03        GW_Armando_plate2_TTGW03        MN        troch_MN        troch_west          53.0
 111 │ GW_Armando_plate2_TTGW05_rep3   GW_Armando_plate2_TTGW05_rep3   MN_rep    troch_MN_rep    troch_west_rep      53.0
 112 │ GW_Armando_plate2_TTGW05_rep4   GW_Armando_plate2_TTGW05_rep4   MN_rep    troch_MN_rep    troch_west_rep      53.0
 113 │ GW_Armando_plate2_TTGW08        GW_Armando_plate2_TTGW08        SU        lud_Sukhto      lud_central         47.0
 114 │ GW_Armando_plate2_TTGW09        GW_Armando_plate2_TTGW09        SU        lud_Sukhto      lud_central         47.0
 115 │ GW_Armando_plate2_TTGW12        GW_Armando_plate2_TTGW12        TH        lud_Thallighar  lud_central         43.0
 116 │ GW_Armando_plate2_TTGW14        GW_Armando_plate2_TTGW14        TH        lud_Thallighar  lud_central         43.0
 117 │ GW_Armando_plate2_TTGW15        GW_Armando_plate2_TTGW15        TH        lud_Thallighar  lud_central         43.0
 118 │ GW_Armando_plate2_TTGW16        GW_Armando_plate2_TTGW16        TH        lud_Thallighar  lud_central         43.0
 119 │ GW_Armando_plate2_TTGW18        GW_Armando_plate2_TTGW18        TH        lud_Thallighar  lud_central         43.0
 120 │ GW_Armando_plate2_TTGW20        GW_Armando_plate2_TTGW20        SR        lud_Sural       lud_central         45.0
 121 │ GW_Armando_plate2_TTGW24        GW_Armando_plate2_TTGW24        SR        lud_Sural       lud_central         45.0
 122 │ GW_Armando_plate2_TTGW25        GW_Armando_plate2_TTGW25        SR        lud_Sural       lud_central         45.0
 123 │ GW_Armando_plate2_TTGW27        GW_Armando_plate2_TTGW27        SR        lud_Sural       lud_central         45.0
 124 │ GW_Armando_plate2_TTGW28        GW_Armando_plate2_TTGW28        SR        lud_Sural       lud_central         45.0
 125 │ GW_Armando_plate2_TTGW50        GW_Armando_plate2_TTGW50        NG        lud_Nainaghar   lud_central         49.0
 126 │ GW_Armando_plate2_TTGW51        GW_Armando_plate2_TTGW51        NG        lud_Nainaghar   lud_central         49.0
 127 │ GW_Armando_plate2_TTGW54        GW_Armando_plate2_TTGW54        NG        lud_Nainaghar   lud_central         49.0
 128 │ GW_Armando_plate2_TTGW56        GW_Armando_plate2_TTGW56        NG        lud_Nainaghar   lud_central         49.0
 129 │ GW_Armando_plate2_TTGW60        GW_Armando_plate2_TTGW60        SP        lud_Spiti       troch_west          55.0
 130 │ GW_Armando_plate2_TTGW61        GW_Armando_plate2_TTGW61        SP        lud_Spiti       troch_west          55.0
 131 │ GW_Armando_plate2_TTGW62        GW_Armando_plate2_TTGW62        SP        lud_Spiti       troch_west          55.0
 132 │ GW_Armando_plate2_TTGW67        GW_Armando_plate2_TTGW67        SP        lud_Spiti       troch_west          55.0
 133 │ GW_Armando_plate2_TTGW69        GW_Armando_plate2_TTGW69        SP        lud_Spiti       troch_west          55.0
 134 │ GW_Armando_plate2_TTGW73        GW_Armando_plate2_TTGW73        SA        lud_Sathrundi   lud_Sath            41.0
 135 │ GW_Armando_plate2_TTGW75        GW_Armando_plate2_TTGW75        SA        lud_Sathrundi   lud_Sath            41.0
 136 │ GW_Armando_plate2_TTGW77        GW_Armando_plate2_TTGW77        SA        lud_Sathrundi   lud_Sath            41.0
 137 │ GW_Armando_plate2_TTGW79        GW_Armando_plate2_TTGW79        SA        lud_Sathrundi   lud_Sath            41.0
 138 │ GW_Armando_plate2_TTGW80        GW_Armando_plate2_TTGW80        SA        lud_Sathrundi   lud_Sath            41.0
 139 │ GW_Armando_plate2_TTGW_15_01    GW_Armando_plate2_TTGW_15_01    SR        lud_Sural       lud_central         45.0
 140 │ GW_Armando_plate2_TTGW_15_02    GW_Armando_plate2_TTGW_15_02    SR        lud_Sural       lud_central         45.0
 141 │ GW_Armando_plate2_TTGW_15_03    GW_Armando_plate2_TTGW_15_03    SR        lud_Sural       lud_central         45.0
 142 │ GW_Armando_plate2_TTGW_15_04    GW_Armando_plate2_TTGW_15_04    SR        lud_Sural       lud_central         45.0
 143 │ GW_Armando_plate2_TTGW_15_06    GW_Armando_plate2_TTGW_15_06    SR        lud_Sural       lud_central         45.0
 144 │ GW_Armando_plate2_TTGW_15_10    GW_Armando_plate2_TTGW_15_10    SR        lud_Sural       lud_central         45.0
 145 │ GW_Lane5_AA1                    GW_Lane5_AA1                    AA        vir_S           vir_S               25.0
 146 │ GW_Lane5_AA10                   GW_Lane5_AA10                   AA        vir_S           vir_S               33.0
 147 │ GW_Lane5_AA11                   GW_Lane5_AA11                   AA        vir_S           vir_S               34.0
 148 │ GW_Lane5_AA3                    GW_Lane5_AA3                    AA        vir_S           vir_S               26.0
 149 │ GW_Lane5_AA4                    GW_Lane5_AA4                    AA        vir_S           vir_S               27.0
 150 │ GW_Lane5_AA5                    GW_Lane5_AA5                    AA        vir_S           vir_S               28.0
 151 │ GW_Lane5_AA6                    GW_Lane5_AA6                    AA        vir_S           vir_S               29.0
 152 │ GW_Lane5_AA7                    GW_Lane5_AA7                    AA        vir_S           vir_S               30.0
 153 │ GW_Lane5_AA8                    GW_Lane5_AA8                    AA        vir_S           vir_S               31.0
 154 │ GW_Lane5_AA9                    GW_Lane5_AA9                    AA        vir_S           vir_S               32.0
 155 │ GW_Lane5_AB1                    GW_Lane5_AB1                    AB_rep    vir_rep         vir_rep             20.0
 156 │ GW_Lane5_AB2                    GW_Lane5_AB2                    AB        vir             vir                 21.0
 157 │ GW_Lane5_AN1                    GW_Lane5_AN1                    AN        plumb           plumb               80.0
 158 │ GW_Lane5_AN2                    GW_Lane5_AN2                    AN        plumb           plumb               81.0
 159 │ GW_Lane5_BK2                    GW_Lane5_BK2                    BK        plumb           plumb               78.0
 160 │ GW_Lane5_BK3                    GW_Lane5_BK3                    BK        plumb           plumb               79.0
 161 │ GW_Lane5_DA2                    GW_Lane5_DA2                    XN        obs             obs                 73.0
 162 │ GW_Lane5_DA3                    GW_Lane5_DA3                    XN        obs             obs                 74.0
 163 │ GW_Lane5_DA4                    GW_Lane5_DA4                    XN        obs             obs                 75.0
 164 │ GW_Lane5_DA6                    GW_Lane5_DA6                    XN        obs             low_reads           76.0
 165 │ GW_Lane5_DA7                    GW_Lane5_DA7                    XN        obs             obs                 77.0
 166 │ GW_Lane5_EM1                    GW_Lane5_EM1                    EM        troch_EM        troch_EM            72.0
 167 │ GW_Lane5_IL1                    GW_Lane5_IL1                    IL        plumb           plumb               82.0
 168 │ GW_Lane5_IL2                    GW_Lane5_IL2                    IL_rep    plumb_rep       plumb_rep           85.0
 169 │ GW_Lane5_IL4                    GW_Lane5_IL4                    IL        plumb           plumb               83.0
 170 │ GW_Lane5_KS1                    GW_Lane5_KS1                    OV        lud_KS          lud_KS              40.0
 171 │ GW_Lane5_KS2                    GW_Lane5_KS2                    OV        lud_KS          lud_KS              40.0
 172 │ GW_Lane5_LN1                    GW_Lane5_LN1                    LN        troch_LN        troch_LN            57.0
 173 │ GW_Lane5_LN10                   GW_Lane5_LN10                   LN        troch_LN        troch_LN            64.0
 174 │ GW_Lane5_LN11                   GW_Lane5_LN11                   LN        troch_LN        troch_LN            65.0
 175 │ GW_Lane5_LN12                   GW_Lane5_LN12                   LN        troch_LN        troch_LN            66.0
 176 │ GW_Lane5_LN14                   GW_Lane5_LN14                   LN        troch_LN        troch_LN            67.0
 177 │ GW_Lane5_LN16                   GW_Lane5_LN16                   LN        troch_LN        troch_LN            68.0
 178 │ GW_Lane5_LN18                   GW_Lane5_LN18                   LN        troch_LN        troch_LN            69.0
 179 │ GW_Lane5_LN19                   GW_Lane5_LN19                   LN        troch_LN        troch_LN            70.0
 180 │ GW_Lane5_LN2                    GW_Lane5_LN2                    LN_rep    troch_LN_rep    troch_LN_rep        58.0
 181 │ GW_Lane5_LN20                   GW_Lane5_LN20                   LN        troch_LN        troch_LN            71.0
 182 │ GW_Lane5_LN3                    GW_Lane5_LN3                    LN        troch_LN        troch_LN            59.0
 183 │ GW_Lane5_LN4                    GW_Lane5_LN4                    LN        troch_LN        troch_LN            60.0
 184 │ GW_Lane5_LN6                    GW_Lane5_LN6                    LN        troch_LN        troch_LN            61.0
 185 │ GW_Lane5_LN7                    GW_Lane5_LN7                    LN        troch_LN        troch_LN            62.0
 186 │ GW_Lane5_LN8                    GW_Lane5_LN8                    LN        troch_LN        troch_LN            63.0
 187 │ GW_Lane5_MN1                    GW_Lane5_MN1                    MN        troch_MN        troch_west          51.0
 188 │ GW_Lane5_MN12                   GW_Lane5_MN12                   MN        troch_MN        troch_west          56.0
 189 │ GW_Lane5_MN3                    GW_Lane5_MN3                    MN        troch_MN        troch_west          52.0
 190 │ GW_Lane5_MN5                    GW_Lane5_MN5                    MN        troch_MN        troch_west          53.0
 191 │ GW_Lane5_MN8                    GW_Lane5_MN8                    MN        troch_MN        troch_west          54.0
 192 │ GW_Lane5_MN9                    GW_Lane5_MN9                    MN        troch_MN        troch_west          55.0
 193 │ GW_Lane5_NA1                    GW_Lane5_NA1                    NR        lud_PK          lud_PK              39.2
 194 │ GW_Lane5_NA3-3ul                GW_Lane5_NA3-3ul                NR        lud_PK          lud_PK              39.2
 195 │ GW_Lane5_PT11                   GW_Lane5_PT11                   KL        lud_KL          lud_central         42.0
 196 │ GW_Lane5_PT12                   GW_Lane5_PT12                   KL        lud_KL          lud_central         42.0
 197 │ GW_Lane5_PT2                    GW_Lane5_PT2                    ML        lud_ML          lud_ML              51.0
 198 │ GW_Lane5_PT3                    GW_Lane5_PT3                    PA        lud_PA          lud_central         46.0
 199 │ GW_Lane5_PT4                    GW_Lane5_PT4                    PA        lud_PA          lud_central         46.0
 200 │ GW_Lane5_PT6                    GW_Lane5_PT6                    KL        lud_KL          lud_central         42.0
 201 │ GW_Lane5_SH1                    GW_Lane5_SH1                    SH        lud_PK          lud_PK              39.1
 202 │ GW_Lane5_SH2                    GW_Lane5_SH2                    SH        lud_PK          lud_PK              39.1
 203 │ GW_Lane5_SH4                    GW_Lane5_SH4                    SH        lud_PK          lud_PK              39.1
 204 │ GW_Lane5_SH5                    GW_Lane5_SH5                    SH        lud_PK          lud_PK              39.1
 205 │ GW_Lane5_SL1                    GW_Lane5_SL1                    SL        plumb           plumb              150.0
 206 │ GW_Lane5_SL2                    GW_Lane5_SL2                    SL        plumb           plumb              151.0
 207 │ GW_Lane5_ST1                    GW_Lane5_ST1                    ST        plumb           plumb              142.0
 208 │ GW_Lane5_ST12                   GW_Lane5_ST12                   ST        plumb           plumb              144.0
 209 │ GW_Lane5_ST3                    GW_Lane5_ST3                    ST        plumb           plumb              143.0
 210 │ GW_Lane5_STvi1                  GW_Lane5_STvi1                  ST_vi     vir             vir                 22.0
 211 │ GW_Lane5_STvi2                  GW_Lane5_STvi2                  ST_vi     vir             vir                 23.0
 212 │ GW_Lane5_STvi3                  GW_Lane5_STvi3                  ST_vi     vir             vir                 24.0
 213 │ GW_Lane5_TA1                    GW_Lane5_TA1                    TA        plumb           plumb               86.0
 214 │ GW_Lane5_TL1                    GW_Lane5_TL1                    TL        vir             vir                  9.0
 215 │ GW_Lane5_TL10                   GW_Lane5_TL10                   TL        vir             vir                 17.0
 216 │ GW_Lane5_TL11                   GW_Lane5_TL11                   TL        vir             vir                 18.0
 217 │ GW_Lane5_TL12                   GW_Lane5_TL12                   TL        vir             vir                 19.0
 218 │ GW_Lane5_TL2                    GW_Lane5_TL2                    TL        vir             vir                 10.0
 219 │ GW_Lane5_TL3                    GW_Lane5_TL3                    TL_rep    vir_rep         vir_rep             11.0
 220 │ GW_Lane5_TL4                    GW_Lane5_TL4                    TL        vir             vir                 12.0
 221 │ GW_Lane5_TL5                    GW_Lane5_TL5                    TL        vir             vir                 13.0
 222 │ GW_Lane5_TL7                    GW_Lane5_TL7                    TL        vir             vir                 14.0
 223 │ GW_Lane5_TL8                    GW_Lane5_TL8                    TL        vir             vir                 15.0
 224 │ GW_Lane5_TL9                    GW_Lane5_TL9                    TL        vir             vir                 16.0
 225 │ GW_Lane5_TU1                    GW_Lane5_TU1                    TU        nit             nit                 35.0
 226 │ GW_Lane5_TU2                    GW_Lane5_TU2                    TU        nit             nit                 36.0
 227 │ GW_Lane5_UY1                    GW_Lane5_UY1                    UY_rep    plumb_rep       plumb_rep           93.0
 228 │ GW_Lane5_UY2                    GW_Lane5_UY2                    UY        plumb           plumb               88.0
 229 │ GW_Lane5_UY3                    GW_Lane5_UY3                    UY        plumb           plumb               89.0
 230 │ GW_Lane5_UY4                    GW_Lane5_UY4                    UY        plumb           plumb               90.0
 231 │ GW_Lane5_UY5                    GW_Lane5_UY5                    UY        plumb           plumb               91.0
 232 │ GW_Lane5_UY6                    GW_Lane5_UY6                    UY        plumb           plumb               92.0
 233 │ GW_Lane5_YK1                    GW_Lane5_YK1                    YK        vir             vir                  1.0
 234 │ GW_Lane5_YK11                   GW_Lane5_YK11                   YK        vir             vir                  8.0
 235 │ GW_Lane5_YK3                    GW_Lane5_YK3                    YK        vir             vir                  2.0
 236 │ GW_Lane5_YK4                    GW_Lane5_YK4                    YK        vir             vir                  3.0
 237 │ GW_Lane5_YK5                    GW_Lane5_YK5                    YK        vir             vir                  4.0
 238 │ GW_Lane5_YK6                    GW_Lane5_YK6                    YK        vir             vir                  5.0
 239 │ GW_Lane5_YK7                    GW_Lane5_YK7                    YK        vir             vir                  6.0
 240 │ GW_Lane5_YK9                    GW_Lane5_YK9                    YK        vir             vir                  7.0
 241 │ GW_Liz_GBS_Liz10045             GW_Liz_GBS_Liz10045             ML        lud             lud_ML              51.01
 242 │ GW_Liz_GBS_Liz10094             GW_Liz_GBS_Liz10094             ML        lud             lud_ML              51.02
 243 │ GW_Liz_GBS_Liz5101              GW_Liz_GBS_Liz5101              ML        lud             lud_ML              51.03
 244 │ GW_Liz_GBS_Liz5101_R            GW_Liz_GBS_Liz5101_R            ML_rep    lud_rep         lud_ML_rep          51.04
 245 │ GW_Liz_GBS_Liz5118              GW_Liz_GBS_Liz5118              ML        lud             lud_ML              51.05
 246 │ GW_Liz_GBS_Liz5139              GW_Liz_GBS_Liz5139              ML        lud             lud_ML              51.06
 247 │ GW_Liz_GBS_Liz5142              GW_Liz_GBS_Liz5142              ML        lud             lud_ML              51.07
 248 │ GW_Liz_GBS_Liz5144              GW_Liz_GBS_Liz5144              ML        lud             lud_ML              51.08
 249 │ GW_Liz_GBS_Liz5150              GW_Liz_GBS_Liz5150              ML        lud             lud_ML              51.09
 250 │ GW_Liz_GBS_Liz5159              GW_Liz_GBS_Liz5159              ML        lud_chick       lud_ML              51.1
 251 │ GW_Liz_GBS_Liz5162              GW_Liz_GBS_Liz5162              ML        lud_chick       lud_ML              51.11
 252 │ GW_Liz_GBS_Liz5163              GW_Liz_GBS_Liz5163              ML        lud_chick       lud_ML              51.12
 253 │ GW_Liz_GBS_Liz5164              GW_Liz_GBS_Liz5164              ML        lud_chick       lud_ML              51.13
 254 │ GW_Liz_GBS_Liz5165              GW_Liz_GBS_Liz5165              ML        lud             lud_ML              51.14
 255 │ GW_Liz_GBS_Liz5167              GW_Liz_GBS_Liz5167              ML        lud_chick       lud_ML              51.15
 256 │ GW_Liz_GBS_Liz5168              GW_Liz_GBS_Liz5168              ML        lud_chick       lud_ML              51.16
 257 │ GW_Liz_GBS_Liz5169              GW_Liz_GBS_Liz5169              ML        lud_chick       lud_ML              51.17
 258 │ GW_Liz_GBS_Liz5171              GW_Liz_GBS_Liz5171              ML        lud             lud_ML              51.18
 259 │ GW_Liz_GBS_Liz5172              GW_Liz_GBS_Liz5172              ML        lud_chick       lud_ML              51.19
 260 │ GW_Liz_GBS_Liz5173              GW_Liz_GBS_Liz5173              ML        lud_chick       lud_ML              51.2
 261 │ GW_Liz_GBS_Liz5174              GW_Liz_GBS_Liz5174              ML        lud             lud_ML              51.21
 262 │ GW_Liz_GBS_Liz5175              GW_Liz_GBS_Liz5175              ML        lud             lud_ML              51.22
 263 │ GW_Liz_GBS_Liz5176              GW_Liz_GBS_Liz5176              ML        lud             lud_ML              51.23
 264 │ GW_Liz_GBS_Liz5177              GW_Liz_GBS_Liz5177              ML        lud_chick       lud_ML              51.24
 265 │ GW_Liz_GBS_Liz5178              GW_Liz_GBS_Liz5178              ML        lud_chick       lud_ML              51.25
 266 │ GW_Liz_GBS_Liz5179              GW_Liz_GBS_Liz5179              ML        lud_chick       lud_ML              51.26
 267 │ GW_Liz_GBS_Liz5180              GW_Liz_GBS_Liz5180              ML        lud             lud_ML              51.27
 268 │ GW_Liz_GBS_Liz5182              GW_Liz_GBS_Liz5182              ML        lud_chick       lud_ML              51.28
 269 │ GW_Liz_GBS_Liz5184              GW_Liz_GBS_Liz5184              ML        lud_chick       lud_ML              51.29
 270 │ GW_Liz_GBS_Liz5185              GW_Liz_GBS_Liz5185              ML        lud             lud_ML              51.3
 271 │ GW_Liz_GBS_Liz5186              GW_Liz_GBS_Liz5186              ML        lud_chick       lud_ML              51.31
 272 │ GW_Liz_GBS_Liz5187              GW_Liz_GBS_Liz5187              ML        lud_chick       lud_ML              51.32
 273 │ GW_Liz_GBS_Liz5188              GW_Liz_GBS_Liz5188              ML        lud             lud_ML              51.33
 274 │ GW_Liz_GBS_Liz5189              GW_Liz_GBS_Liz5189              ML        lud_chick       lud_ML              51.34
 275 │ GW_Liz_GBS_Liz5190              GW_Liz_GBS_Liz5190              ML        lud_chick       lud_ML              51.35
 276 │ GW_Liz_GBS_Liz5191              GW_Liz_GBS_Liz5191              ML        lud_chick       lud_ML              51.36
 277 │ GW_Liz_GBS_Liz5192              GW_Liz_GBS_Liz5192              ML        lud_chick       lud_ML              51.37
 278 │ GW_Liz_GBS_Liz5193              GW_Liz_GBS_Liz5193              ML        lud_chick       lud_ML              51.38
 279 │ GW_Liz_GBS_Liz5194              GW_Liz_GBS_Liz5194              ML        lud_chick       lud_ML              51.39
 280 │ GW_Liz_GBS_Liz5195              GW_Liz_GBS_Liz5195              ML        lud             lud_ML              51.4
 281 │ GW_Liz_GBS_Liz5197              GW_Liz_GBS_Liz5197              ML        lud             lud_ML              51.41
 282 │ GW_Liz_GBS_Liz5199              GW_Liz_GBS_Liz5199              ML        lud_chick       lud_ML              51.42
 283 │ GW_Liz_GBS_Liz6002              GW_Liz_GBS_Liz6002              ML        lud             lud_ML              51.43
 284 │ GW_Liz_GBS_Liz6006              GW_Liz_GBS_Liz6006              ML        lud             lud_ML              51.44
 285 │ GW_Liz_GBS_Liz6008              GW_Liz_GBS_Liz6008              ML        lud             lud_ML              51.45
 286 │ GW_Liz_GBS_Liz6009              GW_Liz_GBS_Liz6009              ML        lud             lud_ML              51.46
 287 │ GW_Liz_GBS_Liz6010              GW_Liz_GBS_Liz6010              ML        lud             lud_ML              51.47
 288 │ GW_Liz_GBS_Liz6012              GW_Liz_GBS_Liz6012              ML        lud             lud_ML              51.48
 289 │ GW_Liz_GBS_Liz6014              GW_Liz_GBS_Liz6014              ML        lud             lud_ML              51.49
 290 │ GW_Liz_GBS_Liz6055              GW_Liz_GBS_Liz6055              ML        lud             lud_ML              51.5
 291 │ GW_Liz_GBS_Liz6057              GW_Liz_GBS_Liz6057              ML        lud             lud_ML              51.51
 292 │ GW_Liz_GBS_Liz6060              GW_Liz_GBS_Liz6060              ML        lud             lud_ML              51.52
 293 │ GW_Liz_GBS_Liz6062              GW_Liz_GBS_Liz6062              ML        lud             lud_ML              51.53
 294 │ GW_Liz_GBS_Liz6063              GW_Liz_GBS_Liz6063              ML        lud             lud_ML              51.54
 295 │ GW_Liz_GBS_Liz6066              GW_Liz_GBS_Liz6066              ML        lud             lud_ML              51.55
 296 │ GW_Liz_GBS_Liz6072              GW_Liz_GBS_Liz6072              ML        lud             lud_ML              51.56
 297 │ GW_Liz_GBS_Liz6079              GW_Liz_GBS_Liz6079              ML        lud             lud_ML              51.57
 298 │ GW_Liz_GBS_Liz6203              GW_Liz_GBS_Liz6203              ML        lud_chick       lud_ML              51.58
 299 │ GW_Liz_GBS_Liz6204              GW_Liz_GBS_Liz6204              ML        lud_chick       lud_ML              51.59
 300 │ GW_Liz_GBS_Liz6461              GW_Liz_GBS_Liz6461              ML        lud             lud_ML              51.6
 301 │ GW_Liz_GBS_Liz6472              GW_Liz_GBS_Liz6472              ML        lud             lud_ML              51.61
 302 │ GW_Liz_GBS_Liz6478              GW_Liz_GBS_Liz6478              ML        lud             lud_ML              51.62
 303 │ GW_Liz_GBS_Liz6766              GW_Liz_GBS_Liz6766              ML        lud             lud_ML              51.63
 304 │ GW_Liz_GBS_Liz6776              GW_Liz_GBS_Liz6776              ML        lud             lud_ML              51.64
 305 │ GW_Liz_GBS_Liz6794              GW_Liz_GBS_Liz6794              ML        lud             lud_ML              51.65
 306 │ GW_Liz_GBS_P_fusc               GW_Liz_GBS_P_fusc               fusc      fusc            fusc               201.0
 307 │ GW_Liz_GBS_P_h_man              GW_Liz_GBS_P_h_man              hmand     hmand           hmand              202.0
 308 │ GW_Liz_GBS_P_humei              GW_Liz_GBS_P_humei              hume      hume            hume               203.0
 309 │ GW_Liz_GBS_P_inor               GW_Liz_GBS_P_inor               inor      inor            inor               204.0
 310 │ GW_Liz_GBS_S_burk               GW_Liz_GBS_S_burk               burk      burk            burk               205.0

GOOD NEWS: names of individuals in metadata file and genotype ind file match perfectly.

Polish a few individual names (to match those in other metadata object above, and make more readable graphs):

ind_with_metadata_chr4A.ind = correctNames(ind_with_metadata_chr4A.ind)
ind_with_metadata_chr4A.ID = correctNames(ind_with_metadata_chr4A.ID)
310-element Vector{String}:
 "GW_Armando_plate1_AB1"
 "GW_Armando_plate1_JF07G02"
 "GW_Armando_plate1_JF07G03"
 "GW_Armando_plate1_JF07G04"
 "GW_Armando_plate1_JF08G02"
 "GW_Armando_plate1_JF09G01"
 "GW_Armando_plate1_JF09G02"
 "GW_Armando_plate1_JF10G03"
 "GW_Armando_plate1_JF11G01"
 "GW_Armando_plate1_JF12G01"
 "GW_Armando_plate1_JF12G02"
 "GW_Armando_plate1_JF12G04"
 "GW_Armando_plate1_JF13G01"
 ⋮
 "GW_Liz_GBS_Liz6204"
 "GW_Liz_GBS_Liz6461"
 "GW_Liz_GBS_Liz6472"
 "GW_Liz_GBS_Liz6478"
 "GW_Liz_GBS_Liz6766"
 "GW_Liz_GBS_Liz6776"
 "GW_Liz_GBS_Liz6794"
 "GW_Liz_GBS_P_fusc"
 "GW_Liz_GBS_P_h_man"
 "GW_Liz_GBS_P_humei"
 "GW_Liz_GBS_P_inor"
 "GW_Liz_GBS_S_burk"

Filter to just the individuals also included in the analysis of LHBRs above

selection = map(in(ind_with_metadata_included.ind), ind_with_metadata_chr4A.ind)

ind_with_metadata_chr4A_included = ind_with_metadata_chr4A[selection, :]

# select genotypes of just the included individuals, and ignore first column
geno_chr4A_included = geno_chr4A[selection, 2:end]

#
println(ind_with_metadata_included.gw4A_cluster)
["virLud", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "virLud_obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "virLud", "virLud", "obsPlumb", "virLud", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "virLud_obsPlumb", "virLud_obsPlumb", "obsPlumb", "virLud_obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "virLudHet", "troch", "virLud_troch", "obsPlumb", "virLud_troch", "troch_obsPlumb", "troch_obsPlumb", "troch_obsPlumb", "virLudHet", "virLud_obsPlumb", "virLudHet", "obsPlumb", "virLud_obsPlumb", "virLud_troch", "virLud_obsPlumb", "obsPlumb", "troch_obsPlumb", "troch_obsPlumb", "troch_obsPlumb", "troch", "troch_obsPlumb", "troch", "troch", "troch", "virLud_troch", "troch", "trochHet", "virLud_obsPlumb", "obsPlumb", "obsPlumb", "virLud_obsPlumb", "obsPlumb", "virLud", "virLud", "virLud", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "virLud", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "virLud", "virLud_obsPlumb", "obsPlumb", "obsPlumbHet", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "troch", "obsPlumb", "troch", "troch", "troch_obsPlumb", "virLud_troch", "obsPlumb", "obsPlumb", "obsPlumb", "virLud_obsPlumb", "virLud_obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "troch_obsPlumb", "obsPlumb", "virLud_obsPlumb", "trochHet", "troch", "troch", "troch_obsPlumb", "troch", "troch", "troch", "troch", "troch", "troch", "virLud_obsPlumb", "virLud_troch", "virLud_obsPlumb", "obsPlumb", "virLud_obsPlumb", "virLud_obsPlumb", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "troch_obsPlumb", "obsPlumb", "obsPlumb", "virLud_obsPlumb", "virLud_obsPlumb", "troch", "troch", "troch", "trochHet", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "virLud", "obsPlumb", "virLud", "obsPlumb", "troch", "obsPlumb", "obsPlumb", "virLud_obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "virLud", "virLud", "virLud", "obsPlumb", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "nit", "nit", "obsPlumb", "obsPlumb", "virLud_obsPlumb", "obsPlumb", "obsPlumb", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "troch_obsPlumb", "troch", "troch", "troch", "troch_obsPlumb", "troch", "troch", "troch", "troch", "troch", "troch", "troch_obsPlumb", "troch", "troch", "troch_obsPlumb", "troch_obsPlumb", "troch", "troch_obsPlumb", "troch", "virLud_obsPlumb", "troch_obsPlumb", "troch_obsPlumb", "troch", "virLud_troch", "troch", "troch_obsPlumb", "troch", "virLud_troch", "virLud_troch", "troch_obsPlumb", "troch", "troch", "virLud_troch", "troch", "troch_obsPlumb", "troch", "troch", "troch", "trochHet", "troch", "troch", "troch", "troch"]

Look up the chr4A individual membership in homozygous clusters, and calculate pi and Dxy

indClusterMembership_gw4A = ind_with_metadata_included.gw4A_cluster

clusterNames_gw4A = ["virLud",
                "nit",
                "troch",
                "obsPlumb"]

# get boundaries of gw4A LHBR:

chr = "gw4A"
positionMin_chr4A_LHBR, positionMax_chr4A_LHBR, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
                        getWindowedIndHetStanRegion(genosOnly_included, 
                                                    pos_SNP_filtered, 
                                                    highViSHetRegions, chr;
                                                    windowSize = 500)

# select the loci within the gw4A LHBR:
selection = (positionMin_chr4A_LHBR .<= pos_chr4A.position .<= positionMax_chr4A_LHBR) 

geno_chr4A_included_LHBR = geno_chr4A_included[:, selection]

pos_chr4A_LHBR = pos_chr4A[selection, :]

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(geno_chr4A_included_LHBR, indClusterMembership_gw4A, clusterNames_gw4A)

# Calculate per-site pi (within-group nucleotide distance)
sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames_gw4A)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames_gw4A; among=false)

# Now get averages of pi and Dxy for whole region:
regionPiTable = DataFrame(cluster = clusterNames_gw4A, pi = getRegionPi(sitePi))
#= 4×2 DataFrame
 Row │ cluster   pi          
     │ String    Float64     
─────┼───────────────────────
   1 │ virLud    0.000956575
   2 │ nit       0.000332204
   3 │ troch     0.000613901
   4 │ obsPlumb  0.000261819 =#

# average pi (for chr 4A LHBR) among three major groups:
(0.000956575 + 0.000613901 + 0.000261819) / 3
# 0.000610765

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 6×2 DataFrame
 Row │ cluster_pair     Dxy        
     │ String           Float64    
─────┼─────────────────────────────
   1 │ virLud_nit       0.00325609
   2 │ virLud_troch     0.0031813
   3 │ virLud_obsPlumb  0.00241666
   4 │ nit_troch        0.00286634
   5 │ nit_obsPlumb     0.00249507
   6 │ troch_obsPlumb   0.00305931 =#

# average Dxy (for chr 4A LHBR) among three major groups:
(0.0031813 + 0.00241666 + 0.00305931) / 3
# 0.0028857566666666674

# Drawing phylogeny (in Illustrator) based on above, between three major groups.
# Ignoring nit, the most recent connection is between virLud and obsPlumb (0.00241666).
# For deeper branch length, am using: 
# Calculation for average Dxy between troch and (virLud, obsPlumb):
(0.0031813 + 0.00305931) / 2
# 0.003120305
Good news: 1 region on that scaffold
0.0031203050000000003

Wow, that is an amazing difference between pi within the obsPlumb haplotype and Dxy between that and others (roughly 10x).

Calculate pi and Dxy outside of the chr 4A LHBR (grouped by the LHBR homozygous groups)

# select the loci outside of the gw4A LHBR:
selection = .!(positionMin_chr4A_LHBR .<= pos_chr4A.position .<= positionMax_chr4A_LHBR) 

geno_chr4A_included_nonLHBR = geno_chr4A_included[:, selection]

pos_chr4A_nonLHBR = pos_chr4A[selection, :]

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(geno_chr4A_included_nonLHBR, indClusterMembership_gw4A, clusterNames_gw4A)

# Calculate per-site pi (within-group nucleotide distance)
sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames_gw4A)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames_gw4A; among=false)

# Now get averages of pi and Dxy for whole region:
regionPiTable = DataFrame(cluster = clusterNames_gw4A, pi = getRegionPi(sitePi))
#= 4×2 DataFrame
 Row │ cluster   pi         
     │ String    Float64    
─────┼──────────────────────
   1 │ virLud    0.0041321
   2 │ nit       0.00196343
   3 │ troch     0.00551821
   4 │ obsPlumb  0.0055897 =#

# average pi (for chr 4A NOT in LHBR) among three major groups:
(0.0041321 + 0.00551821 + 0.0055897) / 3
# 0.005080003333333333

#ratio of average pi outside to average pi within chr 4A LHBR:
0.005080003333333333 / 0.000610765
# 8.317443424776032

# percent lower that average pi is within compared to outside LHBR:
100 * (8.317443424776032 - 1) / 8.317443424776032
# 87.97707481819238

# for obsPlumb haplotype, ratio of pi outside to pi within chr 4A LHBR:
0.0055897 / 0.000261819
# 21.349481893980194

# percent lower that pi of obsPlumb haplotype is within vs. outside of HLBR:
100 * (21.349481893980194 - 1) / 21.349481893980194
# 95.31604558384171

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 6×2 DataFrame
 Row │ cluster_pair     Dxy        
     │ String           Float64    
─────┼─────────────────────────────
   1 │ virLud_nit       0.00440681
   2 │ virLud_troch     0.0055599
   3 │ virLud_obsPlumb  0.00556595
   4 │ nit_troch        0.00520058
   5 │ nit_obsPlumb     0.00548082
   6 │ troch_obsPlumb   0.00601295 =#

# average Dxy (for OUTSIDE of chr 4A LHBR) among three major groups:
(0.0055599 + 0.00556595 + 0.00601295) / 3
# 0.005712933333333333

#ratio of average Dxy outside to average Dxy within chr 4A LHBR among 3 major groups:
0.005712933333333333 / 0.0028857566666666674
# 1.9797002981309344

# percent lower that average Dxy is within compared to outside LHBR:
100 * (1.9797002981309344 - 1) / 1.9797002981309344
# 49.487303661866626

# Drawing phylogeny (in Illustrator) based on above, between three major groups.
# In this case, the virLud_troch is the lower Dxy so am connecting those more recently.
# For deeper brancha length, using this:
# Calculation of average distance between obsPlumb and (virLud, troch)
(0.00556595 + 0.00601295) / 2
# 0.00578945
0.00578945

Remarkable differences between pi and Dxy in the gw4A LHBR, and between LHBR and non-LHBR part of that chromosome!

Do same with chr 3, which also shows 3 clear haplotype groups, but a very different biogeographic pattern than 4A:

Examine chromosome 3 Large HaploBlock Region (LHBR) with invariant sites included

Before running below, I need to change format of 012NA file to # Before running below, changed 012NA file back into 012minus1 file, using commands like below, so can be read as integer:

cat /Users/darrenirwin/GW_data_from_cedar_Feb2024/GW2022_cedar/infoSites_vcfs/GW2022_all4plates.genotypes.allSites.chrgw3.infoSites.max2allele_noindel.maxmiss60.MQ20.lowHet.tab.012NA | sed 's/NA/-1/g' > /Users/darrenirwin/GW_data_from_cedar_Feb2024/GW2022_cedar/infoSites_vcfs/GW2022_all4plates.genotypes.allSites.chrgw3.infoSites.max2allele_noindel.maxmiss60.MQ20.lowHet.tab.012minus1
baseName = "/Users/darrenirwin/GW_data_from_cedar_Feb2024/GW2022_cedar/infoSites_vcfs/GW2022_all4plates.genotypes.allSites.chrgw3.infoSites.max2allele_noindel.maxmiss60.MQ20.lowHet.tab"
# load metadata
cd(dataDirectory)
metadata_chr3 = DataFrame(CSV.File(metadataFile)) # the CSV.File function interprets the correct delimiter
num_metadata_cols_chr3 = ncol(metadata_chr3)
num_individuals_chr3 = nrow(metadata_chr3) 
# read in individual names for this dataset
individuals_file_name_chr3 = string(baseName, ".012.indv")
ind_chr3 = DataFrame(CSV.File(individuals_file_name_chr3; header=["ind"], types=[String])) 
indNum_chr3 = size(ind_chr3, 1) # number of individuals
if num_individuals_chr3 != indNum_chr3
    println("WARNING: number of rows in metadata file different than number of individuals in .indv file")
end
# read in position data for this dataset
position_file_name_chr3 = string(baseName, ".012.pos")
pos_chr3 = DataFrame(CSV.File(position_file_name_chr3; header=["chrom", "position"], types=[String, Int]))
# read in genotype data
genotype_file_name_chr3 = string(baseName, ".012minus1") 
@time if 1 <= indNum_chr3 <= 127   
    geno_chr3 = readdlm(genotype_file_name_chr3, '\t', Int8, '\n'); # this has been sped up dramatically, by first converting "NA" to -1
elseif 128 <= indNum_chr3 <= 32767
    geno_chr3 = readdlm(genotype_file_name_chr3, '\t', Int16, '\n'); # this needed for first column, which is number of individual; Int16 not much slower on import than Int8
else
    print("Error: Number of individuals in .indv appears outside of range from 1 to 32767")
end
loci_count_chr3 = size(geno_chr3, 2) - 1   # because the first column is not a SNP (just a count from zero)
print(string("Read in genotypic data at ", loci_count_chr3," loci for ", indNum_chr3, " individuals. \n"))
 53.001126 seconds (2.97 M allocations: 15.221 GiB, 19.54% gc time, 0.15% compilation time)
Read in genotypic data at 1855532 loci for 310 individuals. 

Check that individuals are same in genotype data and metadata

ind_with_metadata_chr3 = hcat(ind_chr3, metadata_chr3)
println(ind_with_metadata_chr3)
println()  # prints a line break 
if isequal(ind_with_metadata_chr3.ind, ind_with_metadata_chr3.ID)
    println("GOOD NEWS: names of individuals in metadata file and genotype ind file match perfectly.")
else
    println("WARNING: names of individuals in metadata file and genotype ind file do not completely match.")
end
310×6 DataFrame
 Row │ ind                             ID                              location  group           Fst_group       plot_order 
     │ String                          String31                        String7   String15        String15        Float64    
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ GW_Armando_plate1_AB1           GW_Armando_plate1_AB1           AB        vir             vir                 20.01
   2 │ GW_Armando_plate1_JF07G02       GW_Armando_plate1_JF07G02       ST        plumb           plumb              108.0
   3 │ GW_Armando_plate1_JF07G03       GW_Armando_plate1_JF07G03       ST        plumb           plumb              109.0
   4 │ GW_Armando_plate1_JF07G04       GW_Armando_plate1_JF07G04       ST        plumb           plumb              110.0
   5 │ GW_Armando_plate1_JF08G02       GW_Armando_plate1_JF08G02       ST        plumb           plumb              111.0
   6 │ GW_Armando_plate1_JF09G01       GW_Armando_plate1_JF09G01       ST        plumb           plumb              112.0
   7 │ GW_Armando_plate1_JF09G02       GW_Armando_plate1_JF09G02       ST        plumb           plumb              113.0
   8 │ GW_Armando_plate1_JF10G03       GW_Armando_plate1_JF10G03       ST        plumb_vir       plumb_vir          170.0
   9 │ GW_Armando_plate1_JF11G01       GW_Armando_plate1_JF11G01       ST        plumb           plumb              114.0
  10 │ GW_Armando_plate1_JF12G01       GW_Armando_plate1_JF12G01       ST        plumb           plumb              115.0
  11 │ GW_Armando_plate1_JF12G02       GW_Armando_plate1_JF12G02       ST        plumb           plumb              116.0
  12 │ GW_Armando_plate1_JF12G04       GW_Armando_plate1_JF12G04       ST_vi     vir             vir                 24.001
  13 │ GW_Armando_plate1_JF13G01       GW_Armando_plate1_JF13G01       ST        plumb           plumb              117.0
  14 │ GW_Armando_plate1_JF15G03       GW_Armando_plate1_JF15G03       DV        plumb           plumb              103.0
  15 │ GW_Armando_plate1_JF16G01       GW_Armando_plate1_JF16G01       DV_vi     plumb_vir       vir                 24.041
  16 │ GW_Armando_plate1_JF20G01       GW_Armando_plate1_JF20G01       MB        plumb           plumb               94.0
  17 │ GW_Armando_plate1_JF22G01       GW_Armando_plate1_JF22G01       MB        plumb           plumb               95.0
  18 │ GW_Armando_plate1_JF23G01       GW_Armando_plate1_JF23G01       VB        plumb           plumb               98.0
  19 │ GW_Armando_plate1_JF23G02       GW_Armando_plate1_JF23G02       VB        plumb           plumb               99.0
  20 │ GW_Armando_plate1_JF24G02       GW_Armando_plate1_JF24G02       VB        plumb           plumb              100.0
  21 │ GW_Armando_plate1_JF26G01       GW_Armando_plate1_JF26G01       ST        plumb           plumb              118.0
  22 │ GW_Armando_plate1_JF27G01       GW_Armando_plate1_JF27G01       ST        plumb           plumb              119.0
  23 │ GW_Armando_plate1_JF29G01       GW_Armando_plate1_JF29G01       ST        plumb           plumb              120.0
  24 │ GW_Armando_plate1_JF29G02       GW_Armando_plate1_JF29G02       ST        plumb           plumb              121.0
  25 │ GW_Armando_plate1_JF29G03       GW_Armando_plate1_JF29G03       ST        plumb           plumb              122.0
  26 │ GW_Armando_plate1_JG02G02       GW_Armando_plate1_JG02G02       PR        plumb           plumb              145.0
  27 │ GW_Armando_plate1_JG02G04       GW_Armando_plate1_JG02G04       PR        plumb           plumb              146.0
  28 │ GW_Armando_plate1_JG08G01       GW_Armando_plate1_JG08G01       ST        plumb           plumb              123.0
  29 │ GW_Armando_plate1_JG08G02       GW_Armando_plate1_JG08G02       ST        plumb           plumb              124.0
  30 │ GW_Armando_plate1_JG10G01       GW_Armando_plate1_JG10G01       ST        plumb           plumb              125.0
  31 │ GW_Armando_plate1_JG12G01       GW_Armando_plate1_JG12G01       ST        plumb           plumb              126.0
  32 │ GW_Armando_plate1_JG17G01       GW_Armando_plate1_JG17G01       ST        plumb_vir       plumb              127.0
  33 │ GW_Armando_plate1_NO_BC_TTGW05  GW_Armando_plate1_NO_BC_TTGW05  blank     blank           blank              -99.0
  34 │ GW_Armando_plate1_NO_DNA        GW_Armando_plate1_NO_DNA        blank     blank           blank              -99.0
  35 │ GW_Armando_plate1_RF20G01       GW_Armando_plate1_RF20G01       BJ        obs_plumb       plumb_BJ            77.501
  36 │ GW_Armando_plate1_RF29G02       GW_Armando_plate1_RF29G02       BJ        obs_plumb       plumb_BJ            77.502
  37 │ GW_Armando_plate1_TL3           GW_Armando_plate1_TL3           TL        vir             vir                 11.01
  38 │ GW_Armando_plate1_TTGW01        GW_Armando_plate1_TTGW01        MN        troch_MN        troch_west          53.0
  39 │ GW_Armando_plate1_TTGW05_rep1   GW_Armando_plate1_TTGW05_rep1   MN_rep    troch_MN_rep    troch_west_rep      53.0
  40 │ GW_Armando_plate1_TTGW05_rep2   GW_Armando_plate1_TTGW05_rep2   MN        troch_MN        troch_west          53.0
  41 │ GW_Armando_plate1_TTGW06        GW_Armando_plate1_TTGW06        SU        lud_Sukhto      lud_central         47.0
  42 │ GW_Armando_plate1_TTGW07        GW_Armando_plate1_TTGW07        SU        lud_Sukhto      lud_central         47.0
  43 │ GW_Armando_plate1_TTGW10        GW_Armando_plate1_TTGW10        SU        lud_Sukhto      lud_central         47.0
  44 │ GW_Armando_plate1_TTGW11        GW_Armando_plate1_TTGW11        SU        lud_Sukhto      lud_central         47.0
  45 │ GW_Armando_plate1_TTGW13        GW_Armando_plate1_TTGW13        TH        lud_Thallighar  lud_central         43.0
  46 │ GW_Armando_plate1_TTGW17        GW_Armando_plate1_TTGW17        TH        lud_Thallighar  lud_central         43.0
  47 │ GW_Armando_plate1_TTGW19        GW_Armando_plate1_TTGW19        TH        lud_Thallighar  lud_central         43.0
  48 │ GW_Armando_plate1_TTGW21        GW_Armando_plate1_TTGW21        SR        lud_Sural       lud_central         45.0
  49 │ GW_Armando_plate1_TTGW22        GW_Armando_plate1_TTGW22        SR        lud_Sural       lud_central         45.0
  50 │ GW_Armando_plate1_TTGW23        GW_Armando_plate1_TTGW23        SR        lud_Sural       lud_central         45.0
  51 │ GW_Armando_plate1_TTGW29        GW_Armando_plate1_TTGW29        SR        lud_Sural       lud_central         45.0
  52 │ GW_Armando_plate1_TTGW52        GW_Armando_plate1_TTGW52        NG        lud_Nainaghar   lud_central         49.0
  53 │ GW_Armando_plate1_TTGW53        GW_Armando_plate1_TTGW53        NG        lud_Nainaghar   lud_central         49.0
  54 │ GW_Armando_plate1_TTGW55        GW_Armando_plate1_TTGW55        NG        lud_Nainaghar   lud_central         49.0
  55 │ GW_Armando_plate1_TTGW57        GW_Armando_plate1_TTGW57        NG        lud_Nainaghar   lud_central         49.0
  56 │ GW_Armando_plate1_TTGW58        GW_Armando_plate1_TTGW58        NG        lud_Nainaghar   lud_central         49.0
  57 │ GW_Armando_plate1_TTGW59        GW_Armando_plate1_TTGW59        NG        lud_Nainaghar   lud_central         49.0
  58 │ GW_Armando_plate1_TTGW63        GW_Armando_plate1_TTGW63        SP        lud_Spiti       troch_west          55.0
  59 │ GW_Armando_plate1_TTGW64        GW_Armando_plate1_TTGW64        SP        lud_Spiti       troch_west          55.0
  60 │ GW_Armando_plate1_TTGW65        GW_Armando_plate1_TTGW65        SP        lud_Spiti       troch_west          55.0
  61 │ GW_Armando_plate1_TTGW66        GW_Armando_plate1_TTGW66        SP        lud_Spiti       troch_west          55.0
  62 │ GW_Armando_plate1_TTGW68        GW_Armando_plate1_TTGW68        SP        lud_Spiti       troch_west          55.0
  63 │ GW_Armando_plate1_TTGW70        GW_Armando_plate1_TTGW70        SA        lud_Sathrundi   lud_Sath            41.0
  64 │ GW_Armando_plate1_TTGW71        GW_Armando_plate1_TTGW71        SA        lud_Sathrundi   lud_Sath            41.0
  65 │ GW_Armando_plate1_TTGW72        GW_Armando_plate1_TTGW72        SA        lud_Sathrundi   lud_Sath            41.0
  66 │ GW_Armando_plate1_TTGW74        GW_Armando_plate1_TTGW74        SA        lud_Sathrundi   lud_Sath            41.0
  67 │ GW_Armando_plate1_TTGW78        GW_Armando_plate1_TTGW78        SA        lud_Sathrundi   lud_Sath            41.0
  68 │ GW_Armando_plate1_TTGW_15_05    GW_Armando_plate1_TTGW_15_05    SR        lud_Sural       lud_central         45.0
  69 │ GW_Armando_plate1_TTGW_15_07    GW_Armando_plate1_TTGW_15_07    SR        lud_Sural       lud_central         45.0
  70 │ GW_Armando_plate1_TTGW_15_08    GW_Armando_plate1_TTGW_15_08    SR        lud_Sural       lud_central         45.0
  71 │ GW_Armando_plate1_TTGW_15_09    GW_Armando_plate1_TTGW_15_09    SR        lud_Sural       lud_central         45.0
  72 │ GW_Armando_plate1_UY1           GW_Armando_plate1_UY1           UY        plumb           plumb               87.0
  73 │ GW_Armando_plate2_IL2           GW_Armando_plate2_IL2           IL_rep    plumb_rep       plumb_rep           84.0
  74 │ GW_Armando_plate2_JE31G01       GW_Armando_plate2_JE31G01       VB_vi     vir_misID       vir                 24.002
  75 │ GW_Armando_plate2_JF03G01       GW_Armando_plate2_JF03G01       ST_vi     vir_misID       vir                 24.003
  76 │ GW_Armando_plate2_JF03G02       GW_Armando_plate2_JF03G02       VB_vi     vir_misID       vir                 24.004
  77 │ GW_Armando_plate2_JF07G01       GW_Armando_plate2_JF07G01       ST        plumb           plumb              128.0
  78 │ GW_Armando_plate2_JF08G04       GW_Armando_plate2_JF08G04       ST        plumb           plumb              129.0
  79 │ GW_Armando_plate2_JF10G02       GW_Armando_plate2_JF10G02       ST        plumb           plumb              130.0
  80 │ GW_Armando_plate2_JF11G02       GW_Armando_plate2_JF11G02       ST        plumb           plumb              131.0
  81 │ GW_Armando_plate2_JF12G03       GW_Armando_plate2_JF12G03       ST        plumb           plumb              132.0
  82 │ GW_Armando_plate2_JF12G05       GW_Armando_plate2_JF12G05       ST        plumb           plumb              133.0
  83 │ GW_Armando_plate2_JF13G02       GW_Armando_plate2_JF13G02       ST        plumb           plumb              134.0
  84 │ GW_Armando_plate2_JF14G01       GW_Armando_plate2_JF14G01       DV        plumb           plumb              104.0
  85 │ GW_Armando_plate2_JF14G02       GW_Armando_plate2_JF14G02       DV        plumb           plumb              105.0
  86 │ GW_Armando_plate2_JF15G01       GW_Armando_plate2_JF15G01       DV        plumb           plumb              106.0
  87 │ GW_Armando_plate2_JF15G02       GW_Armando_plate2_JF15G02       DV        plumb           plumb              107.0
  88 │ GW_Armando_plate2_JF16G02       GW_Armando_plate2_JF16G02       DV_vi     plumb_vir       vir                 24.042
  89 │ GW_Armando_plate2_JF19G01       GW_Armando_plate2_JF19G01       MB        plumb           plumb               96.0
  90 │ GW_Armando_plate2_JF20G02       GW_Armando_plate2_JF20G02       MB        plumb           plumb               97.0
  91 │ GW_Armando_plate2_JF24G01       GW_Armando_plate2_JF24G01       VB        plumb           plumb              101.0
  92 │ GW_Armando_plate2_JF24G03       GW_Armando_plate2_JF24G03       ST        plumb           plumb              135.0
  93 │ GW_Armando_plate2_JF25G01       GW_Armando_plate2_JF25G01       VB        plumb           plumb              102.0
  94 │ GW_Armando_plate2_JF26G02       GW_Armando_plate2_JF26G02       ST        plumb           plumb              136.0
  95 │ GW_Armando_plate2_JF27G02       GW_Armando_plate2_JF27G02       ST        plumb           plumb              137.0
  96 │ GW_Armando_plate2_JF30G01       GW_Armando_plate2_JF30G01       ST_vi     vir_misID       vir                 24.005
  97 │ GW_Armando_plate2_JG01G01       GW_Armando_plate2_JG01G01       PR        plumb           plumb              147.0
  98 │ GW_Armando_plate2_JG02G01       GW_Armando_plate2_JG02G01       PR        plumb           plumb              148.0
  99 │ GW_Armando_plate2_JG02G03       GW_Armando_plate2_JG02G03       PR        plumb           plumb              149.0
 100 │ GW_Armando_plate2_JG10G02       GW_Armando_plate2_JG10G02       ST        plumb           plumb              138.0
 101 │ GW_Armando_plate2_JG10G03       GW_Armando_plate2_JG10G03       ST        plumb           plumb              139.0
 102 │ GW_Armando_plate2_JG12G02       GW_Armando_plate2_JG12G02       ST        plumb           plumb              140.0
 103 │ GW_Armando_plate2_JG12G03       GW_Armando_plate2_JG12G03       ST        plumb           plumb              141.0
 104 │ GW_Armando_plate2_LN11          GW_Armando_plate2_LN11          LN_rep    troch_LN_rep    troch_LN_rep        65.01
 105 │ GW_Armando_plate2_LN2           GW_Armando_plate2_LN2           LN        troch_LN        troch_LN            58.01
 106 │ GW_Armando_plate2_NO_BC_TTGW05  GW_Armando_plate2_NO_BC_TTGW05  blank     blank           blank              -99.0
 107 │ GW_Armando_plate2_NO_DNA        GW_Armando_plate2_NO_DNA        blank     blank           blank              -99.0
 108 │ GW_Armando_plate2_RF29G01       GW_Armando_plate2_RF29G01       BJ        obs_plumb       plumb_BJ            77.503
 109 │ GW_Armando_plate2_TTGW02        GW_Armando_plate2_TTGW02        MN        troch_MN        troch_west          53.0
 110 │ GW_Armando_plate2_TTGW03        GW_Armando_plate2_TTGW03        MN        troch_MN        troch_west          53.0
 111 │ GW_Armando_plate2_TTGW05_rep3   GW_Armando_plate2_TTGW05_rep3   MN_rep    troch_MN_rep    troch_west_rep      53.0
 112 │ GW_Armando_plate2_TTGW05_rep4   GW_Armando_plate2_TTGW05_rep4   MN_rep    troch_MN_rep    troch_west_rep      53.0
 113 │ GW_Armando_plate2_TTGW08        GW_Armando_plate2_TTGW08        SU        lud_Sukhto      lud_central         47.0
 114 │ GW_Armando_plate2_TTGW09        GW_Armando_plate2_TTGW09        SU        lud_Sukhto      lud_central         47.0
 115 │ GW_Armando_plate2_TTGW12        GW_Armando_plate2_TTGW12        TH        lud_Thallighar  lud_central         43.0
 116 │ GW_Armando_plate2_TTGW14        GW_Armando_plate2_TTGW14        TH        lud_Thallighar  lud_central         43.0
 117 │ GW_Armando_plate2_TTGW15        GW_Armando_plate2_TTGW15        TH        lud_Thallighar  lud_central         43.0
 118 │ GW_Armando_plate2_TTGW16        GW_Armando_plate2_TTGW16        TH        lud_Thallighar  lud_central         43.0
 119 │ GW_Armando_plate2_TTGW18        GW_Armando_plate2_TTGW18        TH        lud_Thallighar  lud_central         43.0
 120 │ GW_Armando_plate2_TTGW20        GW_Armando_plate2_TTGW20        SR        lud_Sural       lud_central         45.0
 121 │ GW_Armando_plate2_TTGW24        GW_Armando_plate2_TTGW24        SR        lud_Sural       lud_central         45.0
 122 │ GW_Armando_plate2_TTGW25        GW_Armando_plate2_TTGW25        SR        lud_Sural       lud_central         45.0
 123 │ GW_Armando_plate2_TTGW27        GW_Armando_plate2_TTGW27        SR        lud_Sural       lud_central         45.0
 124 │ GW_Armando_plate2_TTGW28        GW_Armando_plate2_TTGW28        SR        lud_Sural       lud_central         45.0
 125 │ GW_Armando_plate2_TTGW50        GW_Armando_plate2_TTGW50        NG        lud_Nainaghar   lud_central         49.0
 126 │ GW_Armando_plate2_TTGW51        GW_Armando_plate2_TTGW51        NG        lud_Nainaghar   lud_central         49.0
 127 │ GW_Armando_plate2_TTGW54        GW_Armando_plate2_TTGW54        NG        lud_Nainaghar   lud_central         49.0
 128 │ GW_Armando_plate2_TTGW56        GW_Armando_plate2_TTGW56        NG        lud_Nainaghar   lud_central         49.0
 129 │ GW_Armando_plate2_TTGW60        GW_Armando_plate2_TTGW60        SP        lud_Spiti       troch_west          55.0
 130 │ GW_Armando_plate2_TTGW61        GW_Armando_plate2_TTGW61        SP        lud_Spiti       troch_west          55.0
 131 │ GW_Armando_plate2_TTGW62        GW_Armando_plate2_TTGW62        SP        lud_Spiti       troch_west          55.0
 132 │ GW_Armando_plate2_TTGW67        GW_Armando_plate2_TTGW67        SP        lud_Spiti       troch_west          55.0
 133 │ GW_Armando_plate2_TTGW69        GW_Armando_plate2_TTGW69        SP        lud_Spiti       troch_west          55.0
 134 │ GW_Armando_plate2_TTGW73        GW_Armando_plate2_TTGW73        SA        lud_Sathrundi   lud_Sath            41.0
 135 │ GW_Armando_plate2_TTGW75        GW_Armando_plate2_TTGW75        SA        lud_Sathrundi   lud_Sath            41.0
 136 │ GW_Armando_plate2_TTGW77        GW_Armando_plate2_TTGW77        SA        lud_Sathrundi   lud_Sath            41.0
 137 │ GW_Armando_plate2_TTGW79        GW_Armando_plate2_TTGW79        SA        lud_Sathrundi   lud_Sath            41.0
 138 │ GW_Armando_plate2_TTGW80        GW_Armando_plate2_TTGW80        SA        lud_Sathrundi   lud_Sath            41.0
 139 │ GW_Armando_plate2_TTGW_15_01    GW_Armando_plate2_TTGW_15_01    SR        lud_Sural       lud_central         45.0
 140 │ GW_Armando_plate2_TTGW_15_02    GW_Armando_plate2_TTGW_15_02    SR        lud_Sural       lud_central         45.0
 141 │ GW_Armando_plate2_TTGW_15_03    GW_Armando_plate2_TTGW_15_03    SR        lud_Sural       lud_central         45.0
 142 │ GW_Armando_plate2_TTGW_15_04    GW_Armando_plate2_TTGW_15_04    SR        lud_Sural       lud_central         45.0
 143 │ GW_Armando_plate2_TTGW_15_06    GW_Armando_plate2_TTGW_15_06    SR        lud_Sural       lud_central         45.0
 144 │ GW_Armando_plate2_TTGW_15_10    GW_Armando_plate2_TTGW_15_10    SR        lud_Sural       lud_central         45.0
 145 │ GW_Lane5_AA1                    GW_Lane5_AA1                    AA        vir_S           vir_S               25.0
 146 │ GW_Lane5_AA10                   GW_Lane5_AA10                   AA        vir_S           vir_S               33.0
 147 │ GW_Lane5_AA11                   GW_Lane5_AA11                   AA        vir_S           vir_S               34.0
 148 │ GW_Lane5_AA3                    GW_Lane5_AA3                    AA        vir_S           vir_S               26.0
 149 │ GW_Lane5_AA4                    GW_Lane5_AA4                    AA        vir_S           vir_S               27.0
 150 │ GW_Lane5_AA5                    GW_Lane5_AA5                    AA        vir_S           vir_S               28.0
 151 │ GW_Lane5_AA6                    GW_Lane5_AA6                    AA        vir_S           vir_S               29.0
 152 │ GW_Lane5_AA7                    GW_Lane5_AA7                    AA        vir_S           vir_S               30.0
 153 │ GW_Lane5_AA8                    GW_Lane5_AA8                    AA        vir_S           vir_S               31.0
 154 │ GW_Lane5_AA9                    GW_Lane5_AA9                    AA        vir_S           vir_S               32.0
 155 │ GW_Lane5_AB1                    GW_Lane5_AB1                    AB_rep    vir_rep         vir_rep             20.0
 156 │ GW_Lane5_AB2                    GW_Lane5_AB2                    AB        vir             vir                 21.0
 157 │ GW_Lane5_AN1                    GW_Lane5_AN1                    AN        plumb           plumb               80.0
 158 │ GW_Lane5_AN2                    GW_Lane5_AN2                    AN        plumb           plumb               81.0
 159 │ GW_Lane5_BK2                    GW_Lane5_BK2                    BK        plumb           plumb               78.0
 160 │ GW_Lane5_BK3                    GW_Lane5_BK3                    BK        plumb           plumb               79.0
 161 │ GW_Lane5_DA2                    GW_Lane5_DA2                    XN        obs             obs                 73.0
 162 │ GW_Lane5_DA3                    GW_Lane5_DA3                    XN        obs             obs                 74.0
 163 │ GW_Lane5_DA4                    GW_Lane5_DA4                    XN        obs             obs                 75.0
 164 │ GW_Lane5_DA6                    GW_Lane5_DA6                    XN        obs             low_reads           76.0
 165 │ GW_Lane5_DA7                    GW_Lane5_DA7                    XN        obs             obs                 77.0
 166 │ GW_Lane5_EM1                    GW_Lane5_EM1                    EM        troch_EM        troch_EM            72.0
 167 │ GW_Lane5_IL1                    GW_Lane5_IL1                    IL        plumb           plumb               82.0
 168 │ GW_Lane5_IL2                    GW_Lane5_IL2                    IL_rep    plumb_rep       plumb_rep           85.0
 169 │ GW_Lane5_IL4                    GW_Lane5_IL4                    IL        plumb           plumb               83.0
 170 │ GW_Lane5_KS1                    GW_Lane5_KS1                    OV        lud_KS          lud_KS              40.0
 171 │ GW_Lane5_KS2                    GW_Lane5_KS2                    OV        lud_KS          lud_KS              40.0
 172 │ GW_Lane5_LN1                    GW_Lane5_LN1                    LN        troch_LN        troch_LN            57.0
 173 │ GW_Lane5_LN10                   GW_Lane5_LN10                   LN        troch_LN        troch_LN            64.0
 174 │ GW_Lane5_LN11                   GW_Lane5_LN11                   LN        troch_LN        troch_LN            65.0
 175 │ GW_Lane5_LN12                   GW_Lane5_LN12                   LN        troch_LN        troch_LN            66.0
 176 │ GW_Lane5_LN14                   GW_Lane5_LN14                   LN        troch_LN        troch_LN            67.0
 177 │ GW_Lane5_LN16                   GW_Lane5_LN16                   LN        troch_LN        troch_LN            68.0
 178 │ GW_Lane5_LN18                   GW_Lane5_LN18                   LN        troch_LN        troch_LN            69.0
 179 │ GW_Lane5_LN19                   GW_Lane5_LN19                   LN        troch_LN        troch_LN            70.0
 180 │ GW_Lane5_LN2                    GW_Lane5_LN2                    LN_rep    troch_LN_rep    troch_LN_rep        58.0
 181 │ GW_Lane5_LN20                   GW_Lane5_LN20                   LN        troch_LN        troch_LN            71.0
 182 │ GW_Lane5_LN3                    GW_Lane5_LN3                    LN        troch_LN        troch_LN            59.0
 183 │ GW_Lane5_LN4                    GW_Lane5_LN4                    LN        troch_LN        troch_LN            60.0
 184 │ GW_Lane5_LN6                    GW_Lane5_LN6                    LN        troch_LN        troch_LN            61.0
 185 │ GW_Lane5_LN7                    GW_Lane5_LN7                    LN        troch_LN        troch_LN            62.0
 186 │ GW_Lane5_LN8                    GW_Lane5_LN8                    LN        troch_LN        troch_LN            63.0
 187 │ GW_Lane5_MN1                    GW_Lane5_MN1                    MN        troch_MN        troch_west          51.0
 188 │ GW_Lane5_MN12                   GW_Lane5_MN12                   MN        troch_MN        troch_west          56.0
 189 │ GW_Lane5_MN3                    GW_Lane5_MN3                    MN        troch_MN        troch_west          52.0
 190 │ GW_Lane5_MN5                    GW_Lane5_MN5                    MN        troch_MN        troch_west          53.0
 191 │ GW_Lane5_MN8                    GW_Lane5_MN8                    MN        troch_MN        troch_west          54.0
 192 │ GW_Lane5_MN9                    GW_Lane5_MN9                    MN        troch_MN        troch_west          55.0
 193 │ GW_Lane5_NA1                    GW_Lane5_NA1                    NR        lud_PK          lud_PK              39.2
 194 │ GW_Lane5_NA3-3ul                GW_Lane5_NA3-3ul                NR        lud_PK          lud_PK              39.2
 195 │ GW_Lane5_PT11                   GW_Lane5_PT11                   KL        lud_KL          lud_central         42.0
 196 │ GW_Lane5_PT12                   GW_Lane5_PT12                   KL        lud_KL          lud_central         42.0
 197 │ GW_Lane5_PT2                    GW_Lane5_PT2                    ML        lud_ML          lud_ML              51.0
 198 │ GW_Lane5_PT3                    GW_Lane5_PT3                    PA        lud_PA          lud_central         46.0
 199 │ GW_Lane5_PT4                    GW_Lane5_PT4                    PA        lud_PA          lud_central         46.0
 200 │ GW_Lane5_PT6                    GW_Lane5_PT6                    KL        lud_KL          lud_central         42.0
 201 │ GW_Lane5_SH1                    GW_Lane5_SH1                    SH        lud_PK          lud_PK              39.1
 202 │ GW_Lane5_SH2                    GW_Lane5_SH2                    SH        lud_PK          lud_PK              39.1
 203 │ GW_Lane5_SH4                    GW_Lane5_SH4                    SH        lud_PK          lud_PK              39.1
 204 │ GW_Lane5_SH5                    GW_Lane5_SH5                    SH        lud_PK          lud_PK              39.1
 205 │ GW_Lane5_SL1                    GW_Lane5_SL1                    SL        plumb           plumb              150.0
 206 │ GW_Lane5_SL2                    GW_Lane5_SL2                    SL        plumb           plumb              151.0
 207 │ GW_Lane5_ST1                    GW_Lane5_ST1                    ST        plumb           plumb              142.0
 208 │ GW_Lane5_ST12                   GW_Lane5_ST12                   ST        plumb           plumb              144.0
 209 │ GW_Lane5_ST3                    GW_Lane5_ST3                    ST        plumb           plumb              143.0
 210 │ GW_Lane5_STvi1                  GW_Lane5_STvi1                  ST_vi     vir             vir                 22.0
 211 │ GW_Lane5_STvi2                  GW_Lane5_STvi2                  ST_vi     vir             vir                 23.0
 212 │ GW_Lane5_STvi3                  GW_Lane5_STvi3                  ST_vi     vir             vir                 24.0
 213 │ GW_Lane5_TA1                    GW_Lane5_TA1                    TA        plumb           plumb               86.0
 214 │ GW_Lane5_TL1                    GW_Lane5_TL1                    TL        vir             vir                  9.0
 215 │ GW_Lane5_TL10                   GW_Lane5_TL10                   TL        vir             vir                 17.0
 216 │ GW_Lane5_TL11                   GW_Lane5_TL11                   TL        vir             vir                 18.0
 217 │ GW_Lane5_TL12                   GW_Lane5_TL12                   TL        vir             vir                 19.0
 218 │ GW_Lane5_TL2                    GW_Lane5_TL2                    TL        vir             vir                 10.0
 219 │ GW_Lane5_TL3                    GW_Lane5_TL3                    TL_rep    vir_rep         vir_rep             11.0
 220 │ GW_Lane5_TL4                    GW_Lane5_TL4                    TL        vir             vir                 12.0
 221 │ GW_Lane5_TL5                    GW_Lane5_TL5                    TL        vir             vir                 13.0
 222 │ GW_Lane5_TL7                    GW_Lane5_TL7                    TL        vir             vir                 14.0
 223 │ GW_Lane5_TL8                    GW_Lane5_TL8                    TL        vir             vir                 15.0
 224 │ GW_Lane5_TL9                    GW_Lane5_TL9                    TL        vir             vir                 16.0
 225 │ GW_Lane5_TU1                    GW_Lane5_TU1                    TU        nit             nit                 35.0
 226 │ GW_Lane5_TU2                    GW_Lane5_TU2                    TU        nit             nit                 36.0
 227 │ GW_Lane5_UY1                    GW_Lane5_UY1                    UY_rep    plumb_rep       plumb_rep           93.0
 228 │ GW_Lane5_UY2                    GW_Lane5_UY2                    UY        plumb           plumb               88.0
 229 │ GW_Lane5_UY3                    GW_Lane5_UY3                    UY        plumb           plumb               89.0
 230 │ GW_Lane5_UY4                    GW_Lane5_UY4                    UY        plumb           plumb               90.0
 231 │ GW_Lane5_UY5                    GW_Lane5_UY5                    UY        plumb           plumb               91.0
 232 │ GW_Lane5_UY6                    GW_Lane5_UY6                    UY        plumb           plumb               92.0
 233 │ GW_Lane5_YK1                    GW_Lane5_YK1                    YK        vir             vir                  1.0
 234 │ GW_Lane5_YK11                   GW_Lane5_YK11                   YK        vir             vir                  8.0
 235 │ GW_Lane5_YK3                    GW_Lane5_YK3                    YK        vir             vir                  2.0
 236 │ GW_Lane5_YK4                    GW_Lane5_YK4                    YK        vir             vir                  3.0
 237 │ GW_Lane5_YK5                    GW_Lane5_YK5                    YK        vir             vir                  4.0
 238 │ GW_Lane5_YK6                    GW_Lane5_YK6                    YK        vir             vir                  5.0
 239 │ GW_Lane5_YK7                    GW_Lane5_YK7                    YK        vir             vir                  6.0
 240 │ GW_Lane5_YK9                    GW_Lane5_YK9                    YK        vir             vir                  7.0
 241 │ GW_Liz_GBS_Liz10045             GW_Liz_GBS_Liz10045             ML        lud             lud_ML              51.01
 242 │ GW_Liz_GBS_Liz10094             GW_Liz_GBS_Liz10094             ML        lud             lud_ML              51.02
 243 │ GW_Liz_GBS_Liz5101              GW_Liz_GBS_Liz5101              ML        lud             lud_ML              51.03
 244 │ GW_Liz_GBS_Liz5101_R            GW_Liz_GBS_Liz5101_R            ML_rep    lud_rep         lud_ML_rep          51.04
 245 │ GW_Liz_GBS_Liz5118              GW_Liz_GBS_Liz5118              ML        lud             lud_ML              51.05
 246 │ GW_Liz_GBS_Liz5139              GW_Liz_GBS_Liz5139              ML        lud             lud_ML              51.06
 247 │ GW_Liz_GBS_Liz5142              GW_Liz_GBS_Liz5142              ML        lud             lud_ML              51.07
 248 │ GW_Liz_GBS_Liz5144              GW_Liz_GBS_Liz5144              ML        lud             lud_ML              51.08
 249 │ GW_Liz_GBS_Liz5150              GW_Liz_GBS_Liz5150              ML        lud             lud_ML              51.09
 250 │ GW_Liz_GBS_Liz5159              GW_Liz_GBS_Liz5159              ML        lud_chick       lud_ML              51.1
 251 │ GW_Liz_GBS_Liz5162              GW_Liz_GBS_Liz5162              ML        lud_chick       lud_ML              51.11
 252 │ GW_Liz_GBS_Liz5163              GW_Liz_GBS_Liz5163              ML        lud_chick       lud_ML              51.12
 253 │ GW_Liz_GBS_Liz5164              GW_Liz_GBS_Liz5164              ML        lud_chick       lud_ML              51.13
 254 │ GW_Liz_GBS_Liz5165              GW_Liz_GBS_Liz5165              ML        lud             lud_ML              51.14
 255 │ GW_Liz_GBS_Liz5167              GW_Liz_GBS_Liz5167              ML        lud_chick       lud_ML              51.15
 256 │ GW_Liz_GBS_Liz5168              GW_Liz_GBS_Liz5168              ML        lud_chick       lud_ML              51.16
 257 │ GW_Liz_GBS_Liz5169              GW_Liz_GBS_Liz5169              ML        lud_chick       lud_ML              51.17
 258 │ GW_Liz_GBS_Liz5171              GW_Liz_GBS_Liz5171              ML        lud             lud_ML              51.18
 259 │ GW_Liz_GBS_Liz5172              GW_Liz_GBS_Liz5172              ML        lud_chick       lud_ML              51.19
 260 │ GW_Liz_GBS_Liz5173              GW_Liz_GBS_Liz5173              ML        lud_chick       lud_ML              51.2
 261 │ GW_Liz_GBS_Liz5174              GW_Liz_GBS_Liz5174              ML        lud             lud_ML              51.21
 262 │ GW_Liz_GBS_Liz5175              GW_Liz_GBS_Liz5175              ML        lud             lud_ML              51.22
 263 │ GW_Liz_GBS_Liz5176              GW_Liz_GBS_Liz5176              ML        lud             lud_ML              51.23
 264 │ GW_Liz_GBS_Liz5177              GW_Liz_GBS_Liz5177              ML        lud_chick       lud_ML              51.24
 265 │ GW_Liz_GBS_Liz5178              GW_Liz_GBS_Liz5178              ML        lud_chick       lud_ML              51.25
 266 │ GW_Liz_GBS_Liz5179              GW_Liz_GBS_Liz5179              ML        lud_chick       lud_ML              51.26
 267 │ GW_Liz_GBS_Liz5180              GW_Liz_GBS_Liz5180              ML        lud             lud_ML              51.27
 268 │ GW_Liz_GBS_Liz5182              GW_Liz_GBS_Liz5182              ML        lud_chick       lud_ML              51.28
 269 │ GW_Liz_GBS_Liz5184              GW_Liz_GBS_Liz5184              ML        lud_chick       lud_ML              51.29
 270 │ GW_Liz_GBS_Liz5185              GW_Liz_GBS_Liz5185              ML        lud             lud_ML              51.3
 271 │ GW_Liz_GBS_Liz5186              GW_Liz_GBS_Liz5186              ML        lud_chick       lud_ML              51.31
 272 │ GW_Liz_GBS_Liz5187              GW_Liz_GBS_Liz5187              ML        lud_chick       lud_ML              51.32
 273 │ GW_Liz_GBS_Liz5188              GW_Liz_GBS_Liz5188              ML        lud             lud_ML              51.33
 274 │ GW_Liz_GBS_Liz5189              GW_Liz_GBS_Liz5189              ML        lud_chick       lud_ML              51.34
 275 │ GW_Liz_GBS_Liz5190              GW_Liz_GBS_Liz5190              ML        lud_chick       lud_ML              51.35
 276 │ GW_Liz_GBS_Liz5191              GW_Liz_GBS_Liz5191              ML        lud_chick       lud_ML              51.36
 277 │ GW_Liz_GBS_Liz5192              GW_Liz_GBS_Liz5192              ML        lud_chick       lud_ML              51.37
 278 │ GW_Liz_GBS_Liz5193              GW_Liz_GBS_Liz5193              ML        lud_chick       lud_ML              51.38
 279 │ GW_Liz_GBS_Liz5194              GW_Liz_GBS_Liz5194              ML        lud_chick       lud_ML              51.39
 280 │ GW_Liz_GBS_Liz5195              GW_Liz_GBS_Liz5195              ML        lud             lud_ML              51.4
 281 │ GW_Liz_GBS_Liz5197              GW_Liz_GBS_Liz5197              ML        lud             lud_ML              51.41
 282 │ GW_Liz_GBS_Liz5199              GW_Liz_GBS_Liz5199              ML        lud_chick       lud_ML              51.42
 283 │ GW_Liz_GBS_Liz6002              GW_Liz_GBS_Liz6002              ML        lud             lud_ML              51.43
 284 │ GW_Liz_GBS_Liz6006              GW_Liz_GBS_Liz6006              ML        lud             lud_ML              51.44
 285 │ GW_Liz_GBS_Liz6008              GW_Liz_GBS_Liz6008              ML        lud             lud_ML              51.45
 286 │ GW_Liz_GBS_Liz6009              GW_Liz_GBS_Liz6009              ML        lud             lud_ML              51.46
 287 │ GW_Liz_GBS_Liz6010              GW_Liz_GBS_Liz6010              ML        lud             lud_ML              51.47
 288 │ GW_Liz_GBS_Liz6012              GW_Liz_GBS_Liz6012              ML        lud             lud_ML              51.48
 289 │ GW_Liz_GBS_Liz6014              GW_Liz_GBS_Liz6014              ML        lud             lud_ML              51.49
 290 │ GW_Liz_GBS_Liz6055              GW_Liz_GBS_Liz6055              ML        lud             lud_ML              51.5
 291 │ GW_Liz_GBS_Liz6057              GW_Liz_GBS_Liz6057              ML        lud             lud_ML              51.51
 292 │ GW_Liz_GBS_Liz6060              GW_Liz_GBS_Liz6060              ML        lud             lud_ML              51.52
 293 │ GW_Liz_GBS_Liz6062              GW_Liz_GBS_Liz6062              ML        lud             lud_ML              51.53
 294 │ GW_Liz_GBS_Liz6063              GW_Liz_GBS_Liz6063              ML        lud             lud_ML              51.54
 295 │ GW_Liz_GBS_Liz6066              GW_Liz_GBS_Liz6066              ML        lud             lud_ML              51.55
 296 │ GW_Liz_GBS_Liz6072              GW_Liz_GBS_Liz6072              ML        lud             lud_ML              51.56
 297 │ GW_Liz_GBS_Liz6079              GW_Liz_GBS_Liz6079              ML        lud             lud_ML              51.57
 298 │ GW_Liz_GBS_Liz6203              GW_Liz_GBS_Liz6203              ML        lud_chick       lud_ML              51.58
 299 │ GW_Liz_GBS_Liz6204              GW_Liz_GBS_Liz6204              ML        lud_chick       lud_ML              51.59
 300 │ GW_Liz_GBS_Liz6461              GW_Liz_GBS_Liz6461              ML        lud             lud_ML              51.6
 301 │ GW_Liz_GBS_Liz6472              GW_Liz_GBS_Liz6472              ML        lud             lud_ML              51.61
 302 │ GW_Liz_GBS_Liz6478              GW_Liz_GBS_Liz6478              ML        lud             lud_ML              51.62
 303 │ GW_Liz_GBS_Liz6766              GW_Liz_GBS_Liz6766              ML        lud             lud_ML              51.63
 304 │ GW_Liz_GBS_Liz6776              GW_Liz_GBS_Liz6776              ML        lud             lud_ML              51.64
 305 │ GW_Liz_GBS_Liz6794              GW_Liz_GBS_Liz6794              ML        lud             lud_ML              51.65
 306 │ GW_Liz_GBS_P_fusc               GW_Liz_GBS_P_fusc               fusc      fusc            fusc               201.0
 307 │ GW_Liz_GBS_P_h_man              GW_Liz_GBS_P_h_man              hmand     hmand           hmand              202.0
 308 │ GW_Liz_GBS_P_humei              GW_Liz_GBS_P_humei              hume      hume            hume               203.0
 309 │ GW_Liz_GBS_P_inor               GW_Liz_GBS_P_inor               inor      inor            inor               204.0
 310 │ GW_Liz_GBS_S_burk               GW_Liz_GBS_S_burk               burk      burk            burk               205.0

GOOD NEWS: names of individuals in metadata file and genotype ind file match perfectly.

Polish a few individual names (to match those in other metadata object above, and make more readable graphs):

ind_with_metadata_chr3.ind = correctNames(ind_with_metadata_chr3.ind)
ind_with_metadata_chr3.ID = correctNames(ind_with_metadata_chr3.ID)
310-element Vector{String}:
 "GW_Armando_plate1_AB1"
 "GW_Armando_plate1_JF07G02"
 "GW_Armando_plate1_JF07G03"
 "GW_Armando_plate1_JF07G04"
 "GW_Armando_plate1_JF08G02"
 "GW_Armando_plate1_JF09G01"
 "GW_Armando_plate1_JF09G02"
 "GW_Armando_plate1_JF10G03"
 "GW_Armando_plate1_JF11G01"
 "GW_Armando_plate1_JF12G01"
 "GW_Armando_plate1_JF12G02"
 "GW_Armando_plate1_JF12G04"
 "GW_Armando_plate1_JF13G01"
 ⋮
 "GW_Liz_GBS_Liz6204"
 "GW_Liz_GBS_Liz6461"
 "GW_Liz_GBS_Liz6472"
 "GW_Liz_GBS_Liz6478"
 "GW_Liz_GBS_Liz6766"
 "GW_Liz_GBS_Liz6776"
 "GW_Liz_GBS_Liz6794"
 "GW_Liz_GBS_P_fusc"
 "GW_Liz_GBS_P_h_man"
 "GW_Liz_GBS_P_humei"
 "GW_Liz_GBS_P_inor"
 "GW_Liz_GBS_S_burk"

Filter to just the individuals also included in the analysis of LHBRs above

selection = map(in(ind_with_metadata_included.ind), ind_with_metadata_chr3.ind)

ind_with_metadata_chr3_included = ind_with_metadata_chr3[selection, :]

# select genotypes of just the included individuals, and ignore first column
geno_chr3_included = geno_chr3[selection, 2:end]

println(ind_with_metadata_included.gw3_cluster)
["virLud", "plumb", "plumb", "plumb", "plumb", "plumb", "plumbHet", "plumb", "plumb", "plumbHet", "plumb", "virLud", "plumb", "plumb", "virLud", "plumb", "plumb", "plumb", "plumb", "plumb", "plumb", "plumb", "plumb", "plumb", "plumb", "vir_plumb", "plumb", "plumb", "plumb", "plumb", "plumb", "plumb", "virLud", "trochObs", "trochObs", "virLud", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "virLud", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "trochObs", "virLud", "virLudHet", "virLud_trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "virLud_trochObs", "virLud_trochObs", "virLud", "virLud_trochObs", "plumb", "virLud", "virLudHet", "virLud", "plumb", "plumb", "plumb", "plumbHet", "plumb", "plumb", "plumb", "plumb", "vir_plumb", "plumb", "plumb", "virLud", "plumb", "plumb", "plumbHet", "plumb", "plumbHet", "plumbHet", "plumb", "virLud", "plumbHet", "plumb", "plumb", "plumb", "vir_plumb", "plumb", "plumb", "trochObs", "plumb", "trochObs", "trochObs", "virLudHet", "virLud_trochObs", "virLudHet", "virLud_trochObs", "virLud", "virLud", "virLud", "virLudHet", "virLud", "virLudHet", "virLudHet", "virLud_trochObs", "virLud", "virLud", "trochObsHet", "trochObs", "trochObs", "trochObs", "trochObs", "trochObsHet", "trochObs", "trochObs", "trochObs", "trochObs", "virLud", "virLud", "virLud", "virLud", "virLud_trochObs", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "plumb", "plumb", "plumb", "plumb", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "plumb", "plumb", "virLud", "virLud", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "virLud_trochObs", "trochObs", "trochObs", "virLud", "virLud", "virLud_trochObs", "virLud", "trochObs", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "plumb", "plumb", "plumb", "plumb", "plumb", "virLud", "virLud", "virLud", "plumb", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "nit", "nit", "plumb", "plumb", "plumb", "plumb", "plumb", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "trochObs", "trochObs", "trochObs", "trochObs", "virLud_trochObs", "virLud_trochObs", "trochObs", "virLud_trochObs", "trochObs", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "trochObs", "virLud_trochObs", "trochObs", "virLud_trochObs", "trochObs", "trochObs", "virLud_trochObs", "trochObsHet", "trochObs", "virLud_trochObs", "virLud_trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "trochObs", "virLud_trochObs", "trochObs", "trochObs", "trochObs", "virLud_trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs"]

Look up the chr 3 individual membership in homozygous clusters, and calculate pi and Dxy

indClusterMembership_gw3 = ind_with_metadata_included.gw3_cluster

clusterNames_gw3 = ["virLud",
                    "nit",
                    "trochObs",
                    "plumb"]

# get boundaries of gw3 LHBR:

chr = "gw3"
positionMin_chr3_LHBR, positionMax_chr3_LHBR, regionText, 
    windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
    genos_highViSHetRegion, pos_highViSHetRegion, regionInfo = 
                        getWindowedIndHetStanRegion(genosOnly_included, 
                                                    pos_SNP_filtered, 
                                                    highViSHetRegions, chr;
                                                    windowSize = 500)

# select the loci within the gw3 LHBR:
selection = (positionMin_chr3_LHBR .<= pos_chr3.position .<= positionMax_chr3_LHBR) 

geno_chr3_included_LHBR = geno_chr3_included[:, selection]

pos_chr3_LHBR = pos_chr3[selection, :]

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(geno_chr3_included_LHBR, indClusterMembership_gw3, clusterNames_gw3)

# Calculate per-site pi (within-group nucleotide distance)
sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames_gw3)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames_gw3; among=false)

# Now get averages of pi and Dxy for chr 3 LHBR:
regionPiTable = DataFrame(cluster = clusterNames_gw3, pi = getRegionPi(sitePi))
#= 4×2 DataFrame
 Row │ cluster   pi          
     │ String    Float64     
─────┼───────────────────────
   1 │ virLud    0.0012486
   2 │ nit       0.000697639
   3 │ trochObs  0.00136251
   4 │ plumb     0.00111764 =#

# average pi (for chr 3 LHBR) among three major groups:
(0.0012486 + 0.00136251 + 0.00111764) / 3
# 0.0012429166

regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 6×2 DataFrame
 Row │ cluster_pair     Dxy        
     │ String           Float64    
─────┼─────────────────────────────
   1 │ virLud_nit       0.0026694
   2 │ virLud_trochObs  0.00354486
   3 │ virLud_plumb     0.00398499
   4 │ nit_trochObs     0.00335481
   5 │ nit_plumb        0.00400019
   6 │ trochObs_plumb   0.00328016 =#

# average Dxy (for chr 3 LHBR) among three major groups:
(0.00354486 + 0.00398499 + 0.00328016) / 3
# 0.0036033366666666663

# Drawing phylogeny (in Illustrator) based on above, between three major groups.
# Lowest Dxy is between trochObs and plumb (0.00328016).
# Calculation of deepest split, an average of virLud diff with (trochObs, plumb):
(0.00354486 + 0.00398499) / 2
# 0.003764925
More than 1 region on that scaffold. Using just the longest one.
2×3 DataFrame
Row regionChrom regionStart regionEnd
String Int64 Int64
1 gw3 101192949 103495514
2 gw3 104554714 108279595
0.0037649249999999997

Calculate pi and Dxy outside of the chr 3 LHBR (grouped by the LHBR homozygous groups)

# select the loci outside of the gw3 LHBR:
selection = .!(positionMin_chr3_LHBR .<= pos_chr3.position .<= positionMax_chr3_LHBR) 

geno_chr3_included_nonLHBR = geno_chr3_included[:, selection]

pos_chr3_nonLHBR = pos_chr3[selection, :]

# Calculate allele freqs and sample sizes
freqs, sampleSizes = getFreqsAndSampleSizes(geno_chr3_included_nonLHBR, indClusterMembership_gw3, clusterNames_gw3)

# Calculate per-site pi (within-group nucleotide distance)
sitePi = getSitePi(freqs, sampleSizes)

# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
Dxy, pairwiseDxyClusterNames = getDxy(freqs, clusterNames_gw3)

Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames = getFst(freqs, sampleSizes, clusterNames_gw3; among=false)

# Now get averages of pi and Dxy for whole region:
regionPiTable = DataFrame(cluster = clusterNames_gw3, pi = getRegionPi(sitePi))
#= 4×2 DataFrame
 Row │ cluster   pi         
     │ String    Float64    
─────┼──────────────────────
   1 │ virLud    0.00456714
   2 │ nit       0.00161186
   3 │ trochObs  0.00568622
   4 │ plumb     0.00554501 =#

# average pi (for chr 3 NOT in LHBR) among three major groups:
(0.00456714 + 0.00568622 + 0.00554501) / 3
# 0.005266123

#ratio of average pi outside to average pi within chr 3 LHBR:
0.005266123 / 0.0012429166
# 4.236907769998406

# percent lower that average pi is within compared to outside LHBR:
100 * (4.2369078 - 1) / 4.2369078
# 76.39788149272448

#ratio of pi outside to within LHBR:
0.00456714 / 0.0012486
# 3.6578087457952906

0.00568622 / 0.00136251
# 4.173341847032315

0.00554501 / 0.00111764
# 4.961356071722558

((0.00456714 / 0.0012486) + (0.00568622 / 0.00136251) + (0.00554501 / 0.00111764)) / 3
# 4.264168888183388


regionDxyTable = DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
#= 6×2 DataFrame
 Row │ cluster_pair     Dxy        
     │ String           Float64    
─────┼─────────────────────────────
   1 │ virLud_nit       0.00439336
   2 │ virLud_trochObs  0.0059857
   3 │ virLud_plumb     0.00660034
   4 │ nit_trochObs     0.00548855
   5 │ nit_plumb        0.00608182
   6 │ trochObs_plumb   0.00652593 =#

# average Dxy (for OUTSIDE of chr 3 LHBR) among three major groups:
(0.0059857 + 0.00660034 + 0.00652593) / 3
# 0.006370656666666666

#ratio of average Dxy outside to average Dxy within LHBR among 3 major groups:
0.006370656666666666 / 0.0036033366666666663
# 1.76798818872508

# percent lower that average Dxy is within compared to outside LHBR:
100 * (1.76798818872508 - 1) / 1.76798818872508
# 43.43853616346196

# Drawing phylogeny (in Illustrator) based on above, between three major groups.
# Lowest Dxy is between virLud and trochObs (0.0059857).
# Calculation of deepest split, an average of plumb diff with (virLud, trochObs):
(0.00660034 + 0.00652593) / 2
# 0.006563135

((0.00354486 / 0.0059857) + (0.00328016 / 0.00652593) + (0.00398499 / 0.00660034)) / 3
# 0.5662038652462132
0.5662038652462132

Really neat results above. More diversity at chr 3 LHBR than at 4A.

Make Supplemental GBI plots

Make list of scaffolds to plot:

scaffolds_to_plot = "gw" .* string.(vcat(28:-1:17, 15:-1:1))
push!(scaffolds_to_plot, "gw1A", "gw4A")  # add two other scaffolds
29-element Vector{String}:
 "gw28"
 "gw27"
 "gw26"
 "gw25"
 "gw24"
 "gw23"
 "gw22"
 "gw21"
 "gw20"
 "gw19"
 "gw18"
 "gw17"
 "gw15"
 ⋮
 "gw10"
 "gw9"
 "gw8"
 "gw7"
 "gw6"
 "gw5"
 "gw4"
 "gw3"
 "gw2"
 "gw1"
 "gw1A"
 "gw4A"

Do other setup:

groups_to_plot_all = ["vir","vir_S","nit", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN","troch_EM","obs","plumb_BJ","plumb","plumb_vir"]
group_colors_all = ["blue","turquoise1","grey","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow","gold","orange","pink","red","purple"];

groups = ["vir","troch_LN","plumb"] # for purpose of calculating pairwise Fst and Fst_group (to determine SNPs)   
plotGroups = groups_to_plot_all 
plotGroupColors = group_colors_all
group1 = "vir"   # these groups will determine the color used in the graph
group2 = "plumb"
groupsToCompare = ["vir_plumb", "vir_troch_LN", "troch_LN_plumb"]     # "Fst_among" #"vir_plumb" 
missingFractionAllowed = 0.2  # only show SNPs with less than this fraction of missing data among individuals

# Calculate allele freqs and sample sizes (use column Fst_group)
freqs, sampleSizes = getFreqsAndSampleSizes(genosOnly_included, ind_with_metadata_indFiltered.Fst_group, groups)
println("Calculated population allele frequencies and sample sizes")

Fst, FstNumerator, FstDenominator, pairwiseNamesFst = getFst(freqs, sampleSizes, groups; among=true)  # set among to FALSE if no among Fst wanted (some things won't work without it) 
println("Calculated Fst values")
Calculated population allele frequencies and sample sizes
Calculated Fst values

Loop through scaffolds and make plots, and adding a line for the LHBRs:

(Making inactive because plots were already made)

# for autosomes, Fst > 0.8
Fst_cutoff = 0.8
for i in 1:length(scaffolds_to_plot)
    chr = scaffolds_to_plot[i]
    regionInfo = chooseChrRegion(pos_SNP_filtered, chr; positionMin=1, positionMax=scaffold_lengths[chr]) # this gets the maximum position for the chromosome
    # get info for lines for LHBRs
    highViSHetRegions_thisScaffold = highViSHetRegions[highViSHetRegions.regionChrom .== chr, :]

    plotInfo = plotGenotypeByIndividualWithFst(groupsToCompare, Fst_cutoff, 
        missingFractionAllowed, regionInfo,
        pos_SNP_filtered, Fst, pairwiseNamesFst, 
        genosOnly_included, ind_with_metadata_indFiltered, freqs, 
        plotGroups, plotGroupColors;
        indFontSize=4, figureSize=(1200,1600), plotTitle = "",
        highlightRegionStarts = highViSHetRegions_thisScaffold.regionStart,
        highlightRegionEnds = highViSHetRegions_thisScaffold.regionEnd,
        highlightRegionColor = "magenta")

    println("Completed the figure for ", chr, ".")
    if false  # set to true to save plot
        filename = string("Figure_", chr, "_Fst3groupsGBIallInds_Fst",Fst_cutoff,"_fromJulia.png")
        save(filename, plotInfo[1], px_per_unit = 2.0)
        println("Saved ", filename)
    end 
end

Now for Z chromosome, with Fst > 0.9

scaffolds_to_plot = ["gwZ"]; Fst_cutoff = 0.9
for i in 1:length(scaffolds_to_plot)
    chr = scaffolds_to_plot[i]
    regionInfo = chooseChrRegion(pos_SNP_filtered, chr; positionMin=1, positionMax=scaffold_lengths[chr]) # this gets the maximum position for the chromosome
    # get info for lines for LHBRs
    highViSHetRegions_thisScaffold = highViSHetRegions[highViSHetRegions.regionChrom .== chr, :]

    plotInfo = plotGenotypeByIndividualWithFst(groupsToCompare, Fst_cutoff, 
        missingFractionAllowed, regionInfo,
        pos_SNP_filtered, Fst, pairwiseNamesFst, 
        genosOnly_included, ind_with_metadata_indFiltered, freqs, 
        plotGroups, plotGroupColors;
        indFontSize=4, figureSize=(1200,1600), plotTitle = "",
        highlightRegionStarts = highViSHetRegions_thisScaffold.regionStart,
        highlightRegionEnds = highViSHetRegions_thisScaffold.regionEnd,
        highlightRegionColor = "magenta")

    println("Completed the figure for ", chr, ".")
    if false  # set to true to save plot
        filename = string("Figure_", chr, "_Fst3groupsGBIallInds_fromJulia.png")
        save(filename, plotInfo[1], px_per_unit = 2.0)
        println("Saved ", filename)
    end 
end
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238

Completed the figure for gwZ.