using JLD2 # for loading saved data
using DataFrames # for storing data as type DataFrame
using CairoMakie # for plots
using Impute # for imputing missing genotypes
using Statistics # for var() function
using MultivariateStats # for getting variances from PCA model
using CSV # for reading in delimited files
using DelimitedFiles # for reading delimited files (the genotypic data)
Greenish Warbler heterozygosity variance analysis
This page shows the code used to conduct the analysis of haploblocks in the Greenish Warbler ring species, using the ViSHet (Variance in Standardized Heterozygosity) statistic.
Prior to examining the code on this page, readers should look at GreenishWarblerGenomics2025.qmd
(or .html
) and GW_Zchromosome_analysis.qmd
(or .html
), as this current page depends on the code on those pages being run first.
Citation
The scripts, data, and figures shown in this website were used as the basis for the paper listed below, which should be cited as the source of information from this website:
Irwin, D., S. Bensch, C. Charlebois, G. David, A. Geraldes, S.K. Gupta, B. Harr, P. Holt, J.H. Irwin, V.V. Ivanitskii, I.M. Marova, Y. Niu, S. Seneviratne, A. Singh, Y. Wu, S. Zhang, T.D. Price. 2025. The distribution and dispersal of large haploblocks in a superspecies. Molecular Ecology, in press.
A note about plots in this document
The plots shown below may different somewhat in appearance between the version produced by Quarto (i.e., in this published document) and the version you would get if you run this code without using Quarto. In particular, the dimensions and font sizes of labels and titles may differ. So if you want the versions identical to those used in the paper, run the code directly in the Julia REPL (or using an environment such as VS Code) without using Quarto.
In the rendered (.html
) version of this Quarto notebook, each figure may be accompanied by a warning caused by an interaction between Quarto and the Makie plotting package. Ignore these warnings as they do not affect the calculations or plots.
Load packages
Load my custom package GenomicDiversity
:
using GenomicDiversity
Choose working directory
Adjust as appropriate for your computer:
= "/Users/darrenirwin/Dropbox/Darren's current work/"
dataDirectory cd(dataDirectory)
Load the filtered dataset
This dataset was produced through filtering in GreenishWarblerGenomics2025.qmd
:
= "GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome"
baseName = ".Jan2025."
tagName = string(baseName, tagName, "ind_SNP_ind_filtered.jld2")
filename # load info into a dictionary:
= load(filename)
d if baseName != d["baseName"]
println("WARNING: baseNames don't match between that defined above and in the saved file")
end
if tagName != d["tagName"]
println("WARNING: tagNames don't match don't match between that defined above and in the saved file")
end
= d["GW_GenoData_indFiltered"]
GW_GenoData_indFiltered = d["repoDirectory"]
repoDirectory = d["dataDirectory"]
dataDirectory = d["scaffold_info"]
scaffold_info = d["scaffold_lengths"]
scaffold_lengths = d["filenameTextMiddle"]
filenameTextMiddle = d["missingGenotypeThreshold"]
missingGenotypeThreshold = d["filenameTextEnd"]
filenameTextEnd =d["chromosomes_to_process"]
chromosomes_to_process = d["metadataFile"]
metadataFile println("Loaded the filtered data.")
Loaded the filtered data.
Also define correctNames()
function as in main script, to correct some names:
function correctNames(metadataColumn)
= replace(metadataColumn, "GW_Armando_plate1_TTGW05_rep2" => "GW_Armando_plate1_TTGW05r2",
metadataColumn_corrected "GW_Lane5_NA3-3ul" => "GW_Lane5_NA3",
"GW_Armando_plate1_TTGW_15_05" => "GW_Armando_plate1_TTGW-15-05",
"GW_Armando_plate1_TTGW_15_07" => "GW_Armando_plate1_TTGW-15-07",
"GW_Armando_plate1_TTGW_15_08" => "GW_Armando_plate1_TTGW-15-08",
"GW_Armando_plate1_TTGW_15_09" => "GW_Armando_plate1_TTGW-15-09",
"GW_Armando_plate1_TTGW_15_01" => "GW_Armando_plate1_TTGW-15-01",
"GW_Armando_plate1_TTGW_15_02" => "GW_Armando_plate1_TTGW-15-02",
"GW_Armando_plate1_TTGW_15_03" => "GW_Armando_plate1_TTGW-15-03",
"GW_Armando_plate1_TTGW_15_04" => "GW_Armando_plate1_TTGW-15-04",
"GW_Armando_plate1_TTGW_15_06" => "GW_Armando_plate1_TTGW-15-06",
"GW_Armando_plate1_TTGW_15_10" => "GW_Armando_plate1_TTGW-15-10",
"GW_Armando_plate2_TTGW_15_01" => "GW_Armando_plate2_TTGW-15-01",
"GW_Armando_plate2_TTGW_15_02" => "GW_Armando_plate2_TTGW-15-02",
"GW_Armando_plate2_TTGW_15_03" => "GW_Armando_plate2_TTGW-15-03",
"GW_Armando_plate2_TTGW_15_04" => "GW_Armando_plate2_TTGW-15-04",
"GW_Armando_plate2_TTGW_15_06" => "GW_Armando_plate2_TTGW-15-06",
"GW_Armando_plate2_TTGW_15_10" => "GW_Armando_plate2_TTGW-15-10")
end
correctNames (generic function with 1 method)
Replace the Z chromosome SNPs with the filtered Z chromosome SNPs
# remove the Z SNPs from the big dataset loaded above:
= (GW_GenoData_indFiltered.positions.chrom .!= "gwZ")
selection = GW_GenoData_indFiltered.positions[selection, :]
GW_GenoData_indFiltered.positions = GW_GenoData_indFiltered.genotypes[: , selection]
GW_GenoData_indFiltered.genotypes
# load and add the Z filtered SNPs:
= string(baseName, tagName, "chrgwZ_cleaned.notImputed.jld2")
filename = load(filename, "genotypes_gwZ_SNPfiltered")
genosOnly_chrgwZ_cleaned = load(filename, "ind_with_metadata_indFiltered")
ind_with_metadata_indFiltered_chrgwZ_cleaned = load(filename, "pos_SNP_filtered_region")
pos_SNP_filtered_chrgwZ_cleaned
if GW_GenoData_indFiltered.indInfo.ind != ind_with_metadata_indFiltered_chrgwZ_cleaned.ind
println("Warning: the list of individuals in the big file and Z file are not completely identical.")
end
= vcat(GW_GenoData_indFiltered.positions, pos_SNP_filtered_chrgwZ_cleaned)
GW_GenoData_indFiltered.positions = hcat(GW_GenoData_indFiltered.genotypes, genosOnly_chrgwZ_cleaned)
GW_GenoData_indFiltered.genotypes println("Replaced the Z chromosome data with the filtered Z data.")
# copy the sex column so we can use later:
= ind_with_metadata_indFiltered_chrgwZ_cleaned.sex; GW_GenoData_indFiltered.indInfo.sex
Replaced the Z chromosome data with the filtered Z data.
Adjust sample plotting order
This sets the genotype-by-individual plots to arrange sample sites according to ring_km
, and sets the nitidus samples to -2500 km and the Siberian hybrid to 5000 km (just for plotting):
= GW_GenoData_indFiltered.indInfo.plot_order
GW_GenoData_indFiltered.indInfo.original_plot_order = GW_GenoData_indFiltered.indInfo.ring_km
GW_GenoData_indFiltered.indInfo.plot_order .== "nit"] .= -2500
GW_GenoData_indFiltered.indInfo.plot_order[GW_GenoData_indFiltered.indInfo.Fst_group .== "plumb_vir"] .= 5000; GW_GenoData_indFiltered.indInfo.plot_order[GW_GenoData_indFiltered.indInfo.Fst_group
Prepare data for Genotype-by-individual plots and PCA
For missing genotypes, change our code of -1
to missing
(a special data type meaning missing data, for the later imputation step):
= Matrix{Union{Missing, Int16}}(GW_GenoData_indFiltered.genotypes)
GW_GenoData_indFiltered.genotypes = replace(GW_GenoData_indFiltered.genotypes, -1 => missing) GW_GenoData_indFiltered.genotypes
257×1015750 Matrix{Union{Missing, Int16}}:
0 0 0 0 0 1 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
⋮ ⋮ ⋱ ⋮ ⋮
0 0 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 missing 0 0 … 0 0 0 2 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 … 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Make list of scaffolds to plot:
= "gw" .* string.(vcat(1, "1A", 2:4, "4A", 5:15, 17:28, "Z")) scaffolds_to_plot
30-element Vector{String}:
"gw1"
"gw1A"
"gw2"
"gw3"
"gw4"
"gw4A"
"gw5"
"gw6"
"gw7"
"gw8"
"gw9"
"gw10"
"gw11"
⋮
"gw18"
"gw19"
"gw20"
"gw21"
"gw22"
"gw23"
"gw24"
"gw25"
"gw26"
"gw27"
"gw28"
"gwZ"
Determine number of SNPs in the chromosomes above
sum(map(in(scaffolds_to_plot), GW_GenoData_indFiltered.positions.chrom))
1003924
This reports that 1003924 SNPs are within the listed chromosomes. (Good because that matches the number as determined in GW_PCAplots.qmd)
Choose groups and colors
= ["vir","vir_S","nit", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN","troch_EM","obs","plumb_BJ","plumb","plumb_vir"]
groups_to_plot_PCA = ["blue","turquoise1","grey","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow","gold","orange","pink","red","purple"]; group_colors_PCA
Show windowed heterozygosity for individuals (for one example scaffold)
# option to select a subset of individuals
= false # false means include all individuals
filterGroups
= ["lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west"]
groupsToInclude = [1000, 1000, 1000, 1000, 1000, 1000]
numIndsToPlot
if filterGroups
= limitIndsToPlot(groupsToInclude, numIndsToPlot, GW_GenoData_indFiltered.genotypes, GW_GenoData_indFiltered.indInfo)
genosOnly_included, ind_with_metadata_included else
= GW_GenoData_indFiltered.genotypes
genosOnly_included = GW_GenoData_indFiltered.indInfo
ind_with_metadata_included end
= "gw15"
chr = 500
windowSize = (GW_GenoData_indFiltered.positions.chrom .== chr)
loci_selection = GW_GenoData_indFiltered.positions[loci_selection, :]
pos_region = genosOnly_included[:, loci_selection]
genotypes_region
= getWindowedIndHet(genotypes_region, pos_region, windowSize)
windowedPos, windowedIndHet
= string("Windowed heterozygosity of ", size(windowedIndHet, 1), " individuals")
plotTitle = 24
titleSize = string("Location on scaffold ", chr)
xLabelText = "Heterozygosity"
yLabelText = 24
labelSize = CairoMakie.Figure()
f = Axis(f[1, 1],
ax =plotTitle, titlesize=titleSize,
title=xLabelText, xlabelsize=labelSize,
xlabel=yLabelText, ylabelsize=labelSize)
ylabellines!(windowedPos, windowedIndHet[1, :])
for i in 2:size(windowedIndHet, 1)
lines!(windowedPos, windowedIndHet[i, :])
end
= sum(windowedIndHet, dims=1) ./ size(windowedIndHet, 1)
meanPerWindow_windowedIndHet lines!(windowedPos, vec(meanPerWindow_windowedIndHet), linewidth=10, color=:red)
display(f)
= standardizeIndHet(windowedIndHet)
windowedIndHet_standardized
= string("Standardized heterozygosity of ", size(windowedIndHet, 1), " individuals")
plotTitle = 24
titleSize = string("Location on scaffold ", chr)
xLabelText = "Standardized heterozygosity"
yLabelText = 24
labelSize = CairoMakie.Figure()
g = Axis(g[1, 1],
ax =plotTitle, titlesize=titleSize,
title=xLabelText, xlabelsize=labelSize,
xlabel=yLabelText, ylabelsize=labelSize)
ylabelfor i in 1:size(windowedIndHet_standardized, 1)
lines!(windowedPos, windowedIndHet_standardized[i, :])
end
display(g)
# Now graph the variance in standardized heterozygosity
= getWindowedViSHet(windowedIndHet_standardized)
windowedViSHet
= string("Var. in Stand. Het. (ViSHet) among ", size(windowedIndHet, 1), " individuals")
plotTitle = 18
titleSize = string("Location on scaffold ", chr)
xLabelText = "ViSHet"
yLabelText = 18
labelSize = CairoMakie.Figure()
h = Axis(h[1, 1],
ax =plotTitle, titlesize=titleSize,
title=xLabelText, xlabelsize=labelSize,
xlabel=yLabelText, ylabelsize=labelSize)
ylabellines!(windowedPos, windowedViSHet)
display(h)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Make ViSHet plot for whole genome:
(ViSHet is Variance in Standardized Heterozygosity.)
# option to select a subset of individuals
= false # false means include all individuals
filterGroups
= ["vir"]
groupsToInclude = [1000]
numIndsToPlot
if filterGroups
= limitIndsToPlot(groupsToInclude, numIndsToPlot, GW_GenoData_indFiltered.genotypes, GW_GenoData_indFiltered.indInfo)
genosOnly_included, ind_with_metadata_included else
= GW_GenoData_indFiltered.genotypes
genosOnly_included = GW_GenoData_indFiltered.indInfo
ind_with_metadata_included end
= scaffolds_to_plot
scaffolds_for_ViSHet #initialize data structures
= DataFrame(chrom = String[], mean_position = Float64[], first_position = Int[], last_position = Int[])
windowed_pos_all = Vector{Float32}(undef, 0)
windowed_ViSHet_all for chrom in scaffolds_for_ViSHet
= string("chr", chrom)
regionText = (GW_GenoData_indFiltered.positions.chrom .== chrom)
loci_selection = GW_GenoData_indFiltered.positions[loci_selection, :]
pos_region = GW_GenoData_indFiltered.genotypes[:, loci_selection]
genotypes_region if chrom == "gwZ" #include only males (because females have only one Z)
= genotypes_region[GW_GenoData_indFiltered.indInfo.sex .== "M", :]
genotypes_region_males = getWindowedIndHet(genotypes_region_males, pos_region, windowSize)
windowedPos, windowedIndHet else # for all other chromosomes, include all individuals
= getWindowedIndHet(genotypes_region, pos_region, windowSize)
windowedPos, windowedIndHet end
= getWindowBoundaries(pos_region.position, windowSize)
windowBoundaries = standardizeIndHet(windowedIndHet)
windowedIndHet_standardized = getWindowedViSHet(windowedIndHet_standardized)
windowed_ViSHet_scaffold = DataFrame(chrom = repeat([chrom], length(windowedPos)), mean_position = windowedPos, first_position = windowBoundaries[:,1], last_position = windowBoundaries[:,2])
windowed_pos_chrom = vcat(windowed_pos_all, windowed_pos_chrom)
windowed_pos_all = [windowed_ViSHet_all; windowed_ViSHet_scaffold]
windowed_ViSHet_all end
Identify “haploblock regions” as those that have high ViSHet
= 0.4
threshold_ViSHet = windowed_ViSHet_all .>= threshold_ViSHet
selection = selection # adds true/false column to dataframe indicating high ViSHet windows
windowed_pos_all.high_ViSHet
# Make list of contiguous high ViSHet region:
= DataFrame(regionChrom = String[], regionStart = Int[], regionEnd = Int[])
highViSHetRegions = 1
i = nrow(windowed_pos_all)
lastWindow while i <= lastWindow # eachindex(windowed_pos_all[:,1])
if windowed_pos_all.high_ViSHet[i] == true
= windowed_pos_all.chrom[i]
regionChrom = windowed_pos_all.first_position[i]
regionStart = windowed_pos_all.last_position[i]
regionEnd # check whether contiguous with next
= 1
next while i + next <= lastWindow && windowed_pos_all.chrom[i + next] == regionChrom
if windowed_pos_all.high_ViSHet[i + next] == true
= windowed_pos_all.last_position[i + next]
regionEnd += 1
next else
break
end
end
= push!(highViSHetRegions, [regionChrom, regionStart, regionEnd])
highViSHetRegions = i + next + 1
i else
= i + 1
i end
iend
highViSHetRegions
Row | regionChrom | regionStart | regionEnd |
---|---|---|---|
String | Int64 | Int64 | |
1 | gw1 | 15689747 | 23478124 |
2 | gw1A | 4674 | 3771263 |
3 | gw1A | 23592559 | 30616953 |
4 | gw2 | 54537375 | 59262130 |
5 | gw2 | 60234161 | 61533451 |
6 | gw3 | 101192949 | 103495514 |
7 | gw3 | 104554714 | 108279595 |
8 | gw4 | 5295912 | 5438270 |
9 | gw4 | 14837641 | 16117455 |
10 | gw4 | 20930552 | 23610800 |
11 | gw4A | 379058 | 730094 |
12 | gw5 | 10095304 | 10956815 |
13 | gw6 | 34584054 | 35259663 |
⋮ | ⋮ | ⋮ | ⋮ |
28 | gw19 | 43362 | 1006242 |
29 | gw20 | 27354 | 721651 |
30 | gw20 | 5852254 | 6671670 |
31 | gw21 | 3275121 | 3731689 |
32 | gw22 | 5214430 | 5775824 |
33 | gw23 | 4135459 | 4774426 |
34 | gw24 | 3468239 | 4001782 |
35 | gw25 | 5185626 | 5473966 |
36 | gw26 | 4153299 | 5549635 |
37 | gw27 | 41621 | 541081 |
38 | gw28 | 1822776 | 2522648 |
39 | gwZ | 68372986 | 73749599 |
Determine fraction of genome that has high ViSHet
# get total length of high ViSHet regions
= sum(highViSHetRegions.regionEnd .- highViSHetRegions.regionStart)
sum_highViSHetRegions
# get total lengths of all scaffolds
= 0
sum_scaffold_lengths for scaffold_name in scaffolds_to_plot
println(scaffold_name)
+= scaffold_lengths[scaffold_name]
sum_scaffold_lengths end
sum_scaffold_lengths
# calculate percent of genome in high ViSHet regions
= 100 * sum_highViSHetRegions / sum_scaffold_lengths
percentGenomeHighViSHet println("The percent of the genome in high ViSHet regions is $percentGenomeHighViSHet")
gw1
gw1A
gw2
gw3
gw4
gw4A
gw5
gw6
gw7
gw8
gw9
gw10
gw11
gw12
gw13
gw14
gw15
gw17
gw18
gw19
gw20
gw21
gw22
gw23
gw24
gw25
gw26
gw27
gw28
gwZ
The percent of the genome in high ViSHet regions is 5.799030328440733
Plot LHBRs (Large Haploblock Regions) for whole genome
= plotGenomeViSHet(scaffolds_to_plot,
fig2
windowed_ViSHet_all,
windowed_pos_all;= "purple",
fillColor = 0.8,
lineTransparency = 0.2,
fillTransparency =(1200, 1200),
figureSize= true,
plotRegions = highViSHetRegions,
regionsToPlot = "magenta")
regionColor if false # set to true to save plot
= "Figure2_ViSHet_allGenome_fromJulia.png"
filename save(filename, fig2, px_per_unit = 2.0)
println("Saved ", filename)
end
[["gw1", "gw4A", "gw6"], ["gw1A", "gw4", "gw9"], ["gw2", "gw8"], ["gw3", "gw5"], ["gw7", "gw10", "gw11", "gw12", "gw13", "gw14"], ["gw15", "gw17", "gw18", "gw19", "gw20", "gw21", "gw22", "gw23", "gw24", "gw25"], ["gw26", "gw27", "gw28", "gwZ"]]
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Note: from this point on will not use GenoData
objects (although would be more concise code)
The paper was accepted and I need to get this final version posted. The code below uses the metadata, genotype matrix, and loci positions as separate data objects. This works fine, as the funtions are built to use either forms of data input.
= GW_GenoData_indFiltered.positions
pos_SNP_filtered = GW_GenoData_indFiltered.genotypes
genosOnly_included = GW_GenoData_indFiltered.indInfo; ind_with_metadata_indFiltered
Make a PCA based on all regions not in LHBRs
First remove the LHBR regions
# cycle through the LHBRs, determining the SNPs within each to remove from the dataset:
= fill(false, nrow(pos_SNP_filtered))
lociToRemove for i in eachrow(highViSHetRegions)
= (pos_SNP_filtered.chrom .== i.regionChrom) .&&
lociWithinThisLHBR .>= i.regionStart) .&&
(pos_SNP_filtered.position .<= i.regionEnd)
(pos_SNP_filtered.position = lociToRemove .|| lociWithinThisLHBR
lociToRemove end
# now actually remove them:
= genosOnly_included[:, .!lociToRemove]
genosOnly_included_nonLHBR = pos_SNP_filtered[.!lociToRemove, :]
pos_SNP_filtered_nonLHBR = size(pos_SNP_filtered, 1) - size(pos_SNP_filtered_nonLHBR, 1)
num_removed println("Removed $num_removed loci for the non-LHBR PCA.")
Removed 46500 loci for the non-LHBR PCA.
Make list of scaffolds to include in the whole-genome non-LHBR PCA:
= "gw" .* string.(vcat(28:-1:17, 15:-1:1))
chromosomes_to_include push!(chromosomes_to_include, "gw1A", "gw4A", "gwZ") # add two other scaffolds
30-element Vector{String}:
"gw28"
"gw27"
"gw26"
"gw25"
"gw24"
"gw23"
"gw22"
"gw21"
"gw20"
"gw19"
"gw18"
"gw17"
"gw15"
⋮
"gw9"
"gw8"
"gw7"
"gw6"
"gw5"
"gw4"
"gw3"
"gw2"
"gw1"
"gw1A"
"gw4A"
"gwZ"
Imputation using KNN
Did this on 11Jan2025 and saved files, so inactivated this cell for now (can take up to several minutes for each big scaffold):
for i in eachindex(chromosomes_to_include)
= chromosomes_to_include[i]
chrom = string("chr", chrom, "nonLHBR") # this is where "nonLHBR will be incorporated in file name"
regionText = (pos_SNP_filtered_nonLHBR.chrom .== chrom)
loci_selection = pos_SNP_filtered_nonLHBR[loci_selection,:]
pos_SNP_filtered_nonLHBR_region = Matrix{Union{Missing, Float32}}(genosOnly_included_nonLHBR[:,loci_selection])
genosOnly_nonLHBR_region_for_imputing @time imputed_genos = Impute.knn(genosOnly_nonLHBR_region_for_imputing; dims = :rows)
= string(baseName, tagName, regionText, ".KNNimputedMissing.jld2")
filename jldsave(filename; imputed_genos, ind_with_metadata_indFiltered, pos_SNP_filtered_nonLHBR_region)
println(string(regionText, ": Saved real and imputed genotypes for non_LHBR parts of genome, for ", size(pos_SNP_filtered_nonLHBR_region, 1)," SNPs and ", size(genosOnly_nonLHBR_region_for_imputing, 1)," filtered individuals, in file $filename"))
end
Load saved imputed data for each chromosome (the non-LHBR part):
# initialize data structures
= Matrix{Union{Missing, Float32}}(undef, nrow(ind_with_metadata_indFiltered), 0)
genos_imputed_loaded = DataFrame(chrom = String[], position = Int64[])
pos_SNP_loaded for i in eachindex(chromosomes_to_include)
= chromosomes_to_include[i]
chrom = string("chr", chrom, "nonLHBR")
regionText = string(baseName, tagName, regionText, ".KNNimputedMissing.jld2")
filename = load(filename, "imputed_genos")
imputed_genos_one_chr = hcat(genos_imputed_loaded, imputed_genos_one_chr)
genos_imputed_loaded if ind_with_metadata_indFiltered.ind != load(filename, "ind_with_metadata_indFiltered")[:, :ind]
println("""Warning: "ind" columns in loaded data and memory data don't match.""")
end
= load(filename, "pos_SNP_filtered_nonLHBR_region")
pos_SNP_filtered_region = vcat(pos_SNP_loaded, pos_SNP_filtered_region)
pos_SNP_loaded println(string("Loaded ",filename))
println(string(regionText, ": ", size(imputed_genos_one_chr,2), " SNPs from ", size(imputed_genos_one_chr,1), " individuals"))
end
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw28nonLHBR.KNNimputedMissing.jld2
chrgw28nonLHBR: 10180 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw27nonLHBR.KNNimputedMissing.jld2
chrgw27nonLHBR: 9184 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw26nonLHBR.KNNimputedMissing.jld2
chrgw26nonLHBR: 11803 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw25nonLHBR.KNNimputedMissing.jld2
chrgw25nonLHBR: 3294 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw24nonLHBR.KNNimputedMissing.jld2
chrgw24nonLHBR: 13321 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw23nonLHBR.KNNimputedMissing.jld2
chrgw23nonLHBR: 12949 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw22nonLHBR.KNNimputedMissing.jld2
chrgw22nonLHBR: 4973 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw21nonLHBR.KNNimputedMissing.jld2
chrgw21nonLHBR: 12821 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw20nonLHBR.KNNimputedMissing.jld2
chrgw20nonLHBR: 30239 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw19nonLHBR.KNNimputedMissing.jld2
chrgw19nonLHBR: 23914 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw18nonLHBR.KNNimputedMissing.jld2
chrgw18nonLHBR: 17359 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw17nonLHBR.KNNimputedMissing.jld2
chrgw17nonLHBR: 24313 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw15nonLHBR.KNNimputedMissing.jld2
chrgw15nonLHBR: 25517 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw14nonLHBR.KNNimputedMissing.jld2
chrgw14nonLHBR: 28469 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw13nonLHBR.KNNimputedMissing.jld2
chrgw13nonLHBR: 30543 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw12nonLHBR.KNNimputedMissing.jld2
chrgw12nonLHBR: 31794 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw11nonLHBR.KNNimputedMissing.jld2
chrgw11nonLHBR: 27183 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw10nonLHBR.KNNimputedMissing.jld2
chrgw10nonLHBR: 26462 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw9nonLHBR.KNNimputedMissing.jld2
chrgw9nonLHBR: 37680 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw8nonLHBR.KNNimputedMissing.jld2
chrgw8nonLHBR: 37318 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw7nonLHBR.KNNimputedMissing.jld2
chrgw7nonLHBR: 35575 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw6nonLHBR.KNNimputedMissing.jld2
chrgw6nonLHBR: 39675 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw5nonLHBR.KNNimputedMissing.jld2
chrgw5nonLHBR: 54829 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw4nonLHBR.KNNimputedMissing.jld2
chrgw4nonLHBR: 47980 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw3nonLHBR.KNNimputedMissing.jld2
chrgw3nonLHBR: 79372 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw2nonLHBR.KNNimputedMissing.jld2
chrgw2nonLHBR: 91292 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw1nonLHBR.KNNimputedMissing.jld2
chrgw1nonLHBR: 77362 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw1AnonLHBR.KNNimputedMissing.jld2
chrgw1AnonLHBR: 44551 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgw4AnonLHBR.KNNimputedMissing.jld2
chrgw4AnonLHBR: 17467 SNPs from 257 individuals
Loaded GW_genomics_2022_with_new_genome/GW2022_GBS_012NA_files/GW2022_all4plates.genotypes.SNPs_only.whole_genome.Jan2025.chrgwZnonLHBR.KNNimputedMissing.jld2
chrgwZnonLHBR: 50005 SNPs from 257 individuals
Now do the PCA for non-LHBR parts of genome:
= true
flipPC1 = true
flipPC2 = plotPCA(genos_imputed_loaded, ind_with_metadata_indFiltered,
PCA_wholeGenome
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = "non-LHBR wholeGenome",
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = false)
symbolSize = var(PCA_wholeGenome.model)
totalObservationVariance = principalvars(PCA_wholeGenome.model)[1:2]
PC1_variance, PC2_variance = PC1_variance / totalObservationVariance
PC1_prop_variance = PC2_variance / totalObservationVariance
PC2_prop_variance println("PC1 explains ", 100*PC1_prop_variance, "% of the total variance.
", 100*PC2_prop_variance, "%.") PC2 explains
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
PC1 explains 10.945476% of the total variance.
PC2 explains 5.529751%.
The above looks quite similar to the whole-genome PCA including the LHBRs, but the two axes explain a little less of the overall variation. The overall conclusion: There is a lot of signal for geographic structure, both outside and inside of the LHBRs.
Do non-LHBR PCA for a specific chromosome:
= pos_SNP_loaded.chrom .== "gw4A"
selection = pos_SNP_loaded[selection, :]
pos_SNP_loaded_oneChr = genos_imputed_loaded[:, selection]
genos_imputed_loaded_oneChr = true
flipPC1 = true
flipPC2 = plotPCA(genos_imputed_loaded_oneChr, ind_with_metadata_indFiltered,
PCA_oneChr
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = "non-LHBR gw4A",
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = false)
symbolSize = var(PCA_oneChr.model)
totalObservationVariance = principalvars(PCA_oneChr.model)[1:2]
PC1_variance, PC2_variance = PC1_variance / totalObservationVariance
PC1_prop_variance = PC2_variance / totalObservationVariance
PC2_prop_variance println("PC1 explains ", 100*PC1_prop_variance, "% of the total variance.
", 100*PC2_prop_variance, "%.") PC2 explains
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
PC1 explains 7.5557256% of the total variance.
PC2 explains 4.751407%.
There is a whole lot of signal for population structure in the non-LHBR regions—maybe not surprising given there are high-Fst regions with low ViSHet.
Choose low-het individuals in a high ViSHet region
Now that we have high ViSHet regions indicated, we can automate the choosing of essential homozygous individuals in those regions. Below I will do this for chrZ (and later other chromosomes).
First, here’s two functions, first for getting the boundaries of the longest high ViSHet region from a scaffold, second for getting a bunch of info about the region:
function getOneHighViSHetRegion(highViSHetRegions, chr)
= (highViSHetRegions.regionChrom .== chr)
selection if sum(selection) == 1
println("Good news: 1 region on that scaffold")
= highViSHetRegions.regionStart[selection][1]
positionMin = highViSHetRegions.regionEnd[selection][1]
positionMax = string("chr ", chr, " ",positionMin," to ",positionMax)
regionText elseif sum(selection) > 1
println("More than 1 region on that scaffold. Using just the longest one.")
= highViSHetRegions[selection, :]
highViSHetRegions_chr display(highViSHetRegions_chr)
# get biggest region (first one if tied):
= highViSHetRegions_chr.regionEnd .- highViSHetRegions_chr.regionStart
regionSizes = findfirst(regionSizes .== maximum(regionSizes))
indexOfLongest = highViSHetRegions_chr.regionStart[indexOfLongest]
positionMin = highViSHetRegions_chr.regionEnd[indexOfLongest]
positionMax = string("chr ", chr, " ",positionMin," to ",positionMax)
regionText elseif sum(selection) == 0
println("No high ViSHet regions on that scaffold")
return
end
return positionMin, positionMax, regionText
end
function getWindowedIndHetStanRegion(genos, pos,
highViSHetRegions, chr;= 500)
windowSize # remake the windowedIndHet_standardized (done above in a different cell)
= (pos.chrom .== chr)
loci_selection = pos[loci_selection, :]
pos_region = genos[:, loci_selection]
genotypes_region = getWindowedIndHet(genotypes_region, pos_region, windowSize)
windowedPos, windowedIndHet = standardizeIndHet(windowedIndHet)
windowedIndHet_standardized # look up the boundaries of the high ViSHet region:
= getOneHighViSHetRegion(highViSHetRegions, chr)
positionMin, positionMax, regionText # choose just the windows that are in the high ViSHet region:
= (positionMin .< windowedPos .< positionMax)
window_selection = windowedIndHet_standardized[:,window_selection]
windowedIndHetStanRegion = mean.(eachrow(windowedIndHetStanRegion))
meanAcrossRegionIndHetStan # choose loci in region
= (positionMin .< pos_region.position .< positionMax)
lociSelection = pos_region[lociSelection, :]
pos_highViSHetRegion = genotypes_region[:, lociSelection]
genos_highViSHetRegion # convert `-1` genotypes (which indicates missing) to `missing`:
replace!(genos_highViSHetRegion, -1 => missing)
= chooseChrRegion(pos_highViSHetRegion, chr; positionMin=positionMin, positionMax=positionMax) # this makes appropriate text describing the region
regionInfo return positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,
genos_highViSHetRegion, pos_highViSHetRegion, regionInfoend
getWindowedIndHetStanRegion (generic function with 1 method)
Now do a PCA for just one LHBR (this time on the Z chromosome)
The below will show a PCA for all individuals, and another for just the low-heterozygosity individuals (for this LHBR)
# choose scaffold
= "gwZ"
chr
positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)
# Add column to metadata containing the regionIndHetStan for this highHet region:
= "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
command eval(Meta.parse(command)) # this executes the command constructed above
= meanAcrossRegionIndHetStan
ind_with_metadata_included.regionIndHetStan
#names(ind_with_metadata_included)
# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)
# PCA of all individuals:
= Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))
genos_highViSHetRegion_imputed
= true
flipPC1 = false
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodelAll
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelAll.PCAfig)
# Add PC values to metadata for individuals included in PCA above:
if flipPC1
= -1 .* PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 else
= PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 end
if flipPC2
= -1 .* PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 else
= PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 end
# also flip PC3:
= -1 .* PCAmodelAll.values[3,:]
PCAmodelAll.metadata.PC3
# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`
# choose inds with low IndHet in high ViSHet region:
= (meanAcrossRegionIndHetStan .< 2)
indSelection_lowIndHetStan
#Plot only the lowIndHetStan individuals, PC1 to PC2:
= CairoMakie.Figure()
fig_3A = Axis(fig_3A[1, 1],
ax = "gwZ LHBR PC1 vs. PC2, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(fig_3A)
if false # set to true to save plot
save("Figure3A_from_Julia.png", fig_3A, px_per_unit = 2.0)
end
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now plot PC1 vs. PC3 for low ViSHet individuals
#Plot only the lowIndHetStan individuals, PC1 to PC2:
= CairoMakie.Figure()
fig_3B = Axis(fig_3B[1, 1],
ax = "gwZ LHBR PC1 vs. PC3, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC3", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(fig_3B)
if false # set to true to save plot
save("Figure3B_from_Julia.png", fig_3B, px_per_unit = 2.0)
end
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Save the individual colors in the metadata
= fill("", size(PCAmodelAll.metadata, 1))
indColors for i in axes(PCAmodelAll.metadata, 1)
= group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
indColors[i] end
= indColors
PCAmodelAll.metadata.indColorLeft = indColors; PCAmodelAll.metadata.indColorRight
Plot PC1 vs. PC2:
= CairoMakie.Figure()
fig_3C = Axis(fig_3C[1, 1],
ax = "gwZ LHBR PC1 vs. PC2, all individuals",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(fig_3C)
if false # set to true to save plot
save("Figure3C_from_Julia.png", fig_3C, px_per_unit = 2.0)
end
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Plot PC1 vs. PC3:
= CairoMakie.Figure()
fig_3D = Axis(fig_3D[1, 1],
ax = "gwZ LHBR PC1 vs. PC3, all individuals",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC3", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(fig_3D)
if false # set to true to save plot
save("Figure3D_from_Julia.png", fig_3D, px_per_unit = 2.0)
end
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Assign genotype groups based on PCA and heterozygosity
It is clear that there are six clear haplogroups of Z high ViSHet region. Divide samples into those groups, based on PCA scores, and then calculate pi and Dxy. Make a genotype-by-individual plot of homozygous LHBR individuals, with colors on left side indicating the genotype groups.
# Inspect Z chromosome PCA of low IndHet (< 2) individuals,
# and specify group boundaries:
= ["vir",
clusterNames "nit",
"lud",
"troch",
"obs",
"plumb"]
= ["blue",
clusterColors "grey",
"seagreen4",
"yellow",
"orange",
"red"]
= (PCAmodelAll.metadata.PC1 .< -4.5) .&
vir 3 .< PCAmodelAll.metadata.PC2) .&
(.> 2) .&
(PCAmodelAll.metadata.PC3
indSelection_lowIndHetStan= (-4 .< PCAmodelAll.metadata.PC1 .< -2) .&
nit 1 .< PCAmodelAll.metadata.PC2 .< 3) .&
(
indSelection_lowIndHetStan= (PCAmodelAll.metadata.PC1 .< -4.5) .&
lud 3 .< PCAmodelAll.metadata.PC2) .&
(.< -2) .&
(PCAmodelAll.metadata.PC3
indSelection_lowIndHetStan= (PCAmodelAll.metadata.PC2 .< -5) .&
troch
indSelection_lowIndHetStan= (-2 .< PCAmodelAll.metadata.PC1 .< 2) .&
obs -3 .< PCAmodelAll.metadata.PC2 .< -1) .&
(
indSelection_lowIndHetStan= (5 .< PCAmodelAll.metadata.PC1) .&
plumb
indSelection_lowIndHetStan
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[vir]# note there are two nitidus with nearly identical values
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[lud]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]
= [vir nit lud troch obs plumb]
clusterArray
# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembership = fill(-9, nrow(PCAmodelAll.metadata))
plotOrder for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNames[i]
clusterMembership[clusterArray[:,i]] .= i
plotOrder[clusterArray[end
"""
getFreqsAndSampleSizesBySexForZ(genoData, sex, indGroup, groupsToCalc)
Calculate allele frequencies and sample sizes for each group and SNP, taking into account sex for analysis of Z chromosome.
# Arguments
- `genoData`: The genotype matrix, where rows are individuals and columns are loci, with genotype codes 0,1,2 meaning homozygous reference, heterozygote, homozygous alternate, and missing genotypes can be either -1 or `missing`.
- `sex`: Vector of sexes ('f` or `m`)
- `indGroup`: A vector providing the group name each individual belongs to.
- `groupsToCalc`: A list of group names to include in calculations.
# Notes
Returns a tuple containing 1) a matrix of frequencies, and 2) a matrix of samples sizes (in both, rows are groups and columns are loci).
"""
function getFreqsAndSampleSizesBySexForZ(genoData, sex, indGroup, groupsToCalc)
if any(.!map(x -> x in ["F", "M"], sex))
println("Warning: not all entries in sex vector are `F` or `M`")
end
ismissing.(genoData)] .= -1 # if "missing" datatype is use, convert to -1
genoData[= length(groupsToCalc)
groupCount = Array{Float32,2}(undef, groupCount, size(genoData, 2))
freqs = Array{Number,2}(undef, groupCount, size(genoData, 2))
sampleSizes for i in 1:groupCount
# females:
= (indGroup .== groupsToCalc[i]) .& (sex .== "F") # gets the correct rows for individuals in the group
selection = sum(genoData[selection, :] .== 0, dims=1) # count by column the number of 0 genotypes (homozygous ref)
geno0countsF = sum(genoData[selection, :] .== 1, dims=1) # same for 1 genotypes (heterozygous)
geno1countsF = sum(genoData[selection, :] .== 2, dims=1) # same for 2 genotypes (homozygous alternate)
geno2countsF
# males:
= (indGroup .== groupsToCalc[i]) .& (sex .== "M")
selection = sum(genoData[selection, :] .== 0, dims=1)
geno0countsM = sum(genoData[selection, :] .== 1, dims=1)
geno1countsM = sum(genoData[selection, :] .== 2, dims=1)
geno2countsM = (2 .* geno0countsM) .+ geno1countsM .+ geno0countsF .+ (0.5 .* geno1countsF)
allele0counts = (2 .* geno2countsM) .+ geno1countsM .+ geno2countsF .+ (0.5 .* geno1countsF)
allele2counts = allele0counts .+ allele2counts
sumAlleleCounts :] = 0.5 .* sumAlleleCounts # sample size in number of individuals
sampleSizes[i, :] = allele2counts ./ sumAlleleCounts
freqs[i, end
return freqs, sampleSizes
end
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizesBySexForZ(genos_highViSHetRegion, ind_with_metadata_included.sex, clusterMembership, clusterNames)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames; among=false) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
regionPiTable #= 6×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼─────────────────────
1 │ vir 0.00706945
2 │ nit 0.0035094
3 │ lud 0.00794828
4 │ troch 0.00968743
5 │ obs 0.0112686
6 │ plumb 0.0104236 =#
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #=
15×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼─────────────────────────
1 │ vir_nit 0.0388124
2 │ vir_lud 0.0173121
3 │ vir_troch 0.0410345
4 │ vir_obs 0.037509
5 │ vir_plumb 0.0466843
6 │ nit_lud 0.0365198
7 │ nit_troch 0.0464045
8 │ nit_obs 0.0427296
9 │ nit_plumb 0.0524538
10 │ lud_troch 0.038968
11 │ lud_obs 0.0354195
12 │ lud_plumb 0.0445131
13 │ troch_obs 0.0265615
14 │ troch_plumb 0.0404278
15 │ obs_plumb 0.0365139 =#
# It seems the distances are not very consistent with a bifurcating tree,
# nor 1-D isolation by distance, but something more complex.
# Obscuratus is closer to viridanus than troch is.
# Nitidus quite distant but gets put in centre of PCA because off on its own axis.
# Make a genotype-by-individual plot using all variable loci in the region,
= 0.1
missingFractionAllowed # in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.original_Fst_groups = clusterMembership
PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.original_plot_order = plotOrder
PCAmodelAll.metadata.plot_order
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
genos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
pos_selectedSNPs = Fst[:, selectedSNPs]
Fst_selectedSNPs = freqs[:, selectedSNPs]
freqs_selectedSNPs
# limit the number of individuals per group to plot
= [15, 15, 15, 15, 15, 15]
numIndsToPlot
= limitIndsToPlot(clusterNames, numIndsToPlot,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;= missingFractionAllowed,
missingFractionAllowed = 20,
titleFontSize = true); indColorRightProvided
The numbers in each group are [37 2 38 80 5 73] and the sum of those is 235
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now show a GBI plot like above, but with heterozygotes included:
= ["vir",
clusterNamesWithHets "vir_lud",
"nit",
"lud",
"lud_troch",
"troch",
"obs",
"plumb"]
= ["blue",
clusterColorsWithHets "lightseagreen",
"grey",
"green",
"yellowgreen",
"yellow",
"orange",
"red"]
= (PCAmodelAll.metadata.PC1 .< -4.5) .&
vir_lud 3 .< PCAmodelAll.metadata.PC2) .&
(-1 .< PCAmodelAll.metadata.PC3 .< 0) # Note this one has lowIndHetStan but is mix of vir and lud
(= (-5 .< PCAmodelAll.metadata.PC1 .< -1) .&
lud_troch -3 .< PCAmodelAll.metadata.PC2 .< 1) .&
(
.!indSelection_lowIndHetStan
= [vir vir_lud nit lud lud_troch troch obs plumb]
clusterArray
sum(clusterArray, dims=1)
if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
println("Good news: Individuals included in a group matches total number of individuals")
else
println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end
# check which individuals left out:
sum(clusterArray, dims=2)
vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembershipWithHets = fill(-9, nrow(PCAmodelAll.metadata))
plotOrderWithHets for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNamesWithHets[i]
clusterMembershipWithHets[clusterArray[:,i]] .= i
plotOrderWithHets[clusterArray[end
# Add column to main metadata object containing the cluster membership for this highHet region:
= "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
command eval(Meta.parse(command)) # this executes the command constructed above
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= clusterMembershipWithHets
PCAmodelAll.metadata.Fst_group = plotOrderWithHets
PCAmodelAll.metadata.plot_order
# limit the number of individuals per group to plot
= fill(15, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
= plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
fig_4
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed =8, figureSize=(1200, 1200),
indFontSize= false,
indColorLeftProvided = true);
indColorRightProvided
if false # set to true to save plot
save("Figure4_from_Julia.png", fig_4[1], px_per_unit = 2.0)
end
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show GBI plot according to original groups and plot order
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with all individuals
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
# Set no limit (or high limit anyway) on the number of individuals per group to plot
= fill(1000, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
= plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
plotInfo
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed =6, figureSize=(1200, 1600),
indFontSize= false,
indColorLeftProvided = true);
indColorRightProvided
if false # set to true to save plot
save("FigureS1_from_Julia.png", plotInfo[1], px_per_unit = 2.0)
end
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the west area (without nitidus)
= ["vir",
clusterNamesWithHetsWest "lud",
"lud_troch",
"troch",]
= ["blue",
clusterColorsWithHetsWest "green",
"yellowgreen",
"yellow"]
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizesBySexForZ(genos_selectedSNPs, ind_with_metadata_included.sex, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= [100, 100, 100, 100]
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the east area
= ["troch",
clusterNamesWithHetsEast "obs",
"plumb"]
= ["yellow",
clusterColorsWithHetsEast "orange",
"red"]
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizesBySexForZ(genos_selectedSNPs, ind_with_metadata_included.sex, clusterMembershipWithHets, clusterNamesWithHetsEast)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsEast))
numIndsToPlotWithHetsEast
= limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the northern area
= ["vir",
clusterNamesWithHetsNorth "plumb"]
= ["blue",
clusterColorsWithHetsNorth "red"]
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= [100, 100, 100]
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Do a PCA based on above info, on just the west side of the ring
= ["vir","vir_S","nit", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN"]
groups_to_plot_PCA_westside = ["blue","turquoise1","grey","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow"]
group_colors_PCA_westside
# without nitidus:
= ["vir","vir_S", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN"]
groups_to_plot_PCA_westside = ["blue","turquoise1","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow"]
group_colors_PCA_westside
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodel
groups_to_plot_PCA_westside, group_colors_PCA_westside; = "greenish warblers (west side)", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodel.PCAfig)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Do a PCA based on above info, on just the east side of the ring
= ["troch_LN","troch_EM","obs","plumb_BJ","plumb","plumb_vir"]
groups_to_plot_PCA_eastside = ["yellow","gold","orange","pink","red","purple"];
group_colors_PCA_eastside
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodel
groups_to_plot_PCA_eastside, group_colors_PCA_eastside; = "greenish warblers (east side)", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodel.PCAfig)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Show PCA for just the western haploblocks of the ring, for just chr Z
= copy(ind_with_metadata_included)
ind_with_metadata_included_temp
# Leave out the individuals that don't have western haplogroup genotypes
= ["vir","vir_lud", "nit", "lud"]
hapGroups_to_plot_PCA_westside = map(in(hapGroups_to_plot_PCA_westside), clusterMembershipWithHets)
selection .= "ignore" # write over the group name so function below won't plot that individual
ind_with_metadata_included_temp.Fst_group[.!selection]
= ["vir","vir_S","nit", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN"]
groups_to_plot_PCA_westside = ["blue","turquoise1","grey","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow"]
group_colors_PCA_westside
= true
flipPC1 = true
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included_temp,
PCAmodelHapWest
groups_to_plot_PCA_westside, group_colors_PCA_westside; = "greenish warblers west haps", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelHapWest.PCAfig)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Show PCA for just the eastern haploblocks of the ring, for just chr Z
= copy(ind_with_metadata_included)
ind_with_metadata_included_temp
# Leave out the individuals that don't have western haplogroup genotypes
= ["troch", "obs", "plumb"]
hapGroups_to_plot_PCA_eastside = map(in(hapGroups_to_plot_PCA_eastside), clusterMembershipWithHets)
selection .= "ignore" # write over the group name so function below won't plot that individual
ind_with_metadata_included_temp.Fst_group[.!selection]
= ["troch_LN","troch_EM","obs","plumb","plumb_vir","plumb_BJ"]
groups_to_plot_PCA_eastside = ["yellow","gold","orange","red","purple","pink"];
group_colors_PCA_eastside
= true
flipPC1 = true
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included_temp,
PCAmodelHapEast
groups_to_plot_PCA_eastside, group_colors_PCA_eastside; = "greenish warblers east haps", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelHapEast.PCAfig)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Do a PCA based on a same-size region elsewhere on the Z (with low ViSHet):
# get length of region
= positionMax - positionMin
lengthHighViSHetRegion
= 1_000_000 # start at 1 Mb from left side
leftLocus = leftLocus + lengthHighViSHetRegion
rightLocus = string("chr ", chr, " ",leftLocus," to ",rightLocus)
regionText_lowViSHetRegion
= (leftLocus .<= pos_region.position .<= rightLocus)
lociSelection = genotypes_region[:, lociSelection]
genotypes_lowViSHetRegion
# impute missing genotypes:
= Impute.svd(Matrix{Union{Missing, Float32}}(genotypes_lowViSHetRegion))
genotypes_lowViSHetRegion_imputed
= true
flipPC1 = true
flipPC2
= plotPCA(genotypes_lowViSHetRegion_imputed, ind_with_metadata_included,
PCAmodel
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText_lowViSHetRegion,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodel.PCAfig)
if false # set to true to save plot
save("FigureS2A_gwZ_nonHLBRarbitrary_from_Julia.png", PCAmodel.PCAfig, px_per_unit = 2.0)
end
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Do similar as above but for chr 15:
# choose scaffold
= "gw15"
chr
positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)
# Add column to metadata containing the regionIndHetStan for this highHet region:
= "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
command eval(Meta.parse(command)) # this executes the command constructed above
= meanAcrossRegionIndHetStan
ind_with_metadata_included.regionIndHetStan
#names(ind_with_metadata_included)
# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)
# PCA of all individuals:
= Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))
genos_highViSHetRegion_imputed
= true
flipPC1 = true
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodelAll
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelAll.PCAfig)
# Add PC values to metadata for individuals included in PCA above:
if flipPC1
= -1 .* PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 else
= PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 end
if flipPC2
= -1 .* PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 else
= PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 end
= PCAmodelAll.values[3,:]
PCAmodelAll.metadata.PC3
# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`
# choose inds with low IndHet in high ViSHet region:
= (meanAcrossRegionIndHetStan .< 1.75)
indSelection_lowIndHetStan
#Plot only the lowIndHetStan individuals:
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Save the individual colors in the metadata
= fill("", size(PCAmodelAll.metadata, 1))
indColors for i in axes(PCAmodelAll.metadata, 1)
= group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
indColors[i] end
= indColors
PCAmodelAll.metadata.indColorLeft = indColors; PCAmodelAll.metadata.indColorRight
Plot PC1 vs. PC2:
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Plot PC1 vs. PC3:
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC3",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC3", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
At chr 15 high ViSHet region, there are only 5 clear haplogroups (PC3 does not distinguish vir and lud). Divide samples into those groups, based on PCA scores, and then pi and Dxy.
= ["virLud",
clusterNames "nit",
"troch",
"obs",
"plumb"]
= ["green",
clusterColors "grey",
"yellowgreen",
"orange",
"red"]
= (PCAmodelAll.metadata.PC1 .< -5) .&
virLud
indSelection_lowIndHetStan= (-5 .< PCAmodelAll.metadata.PC1 .< -2.5) .&
nit
indSelection_lowIndHetStan= (-1 .< PCAmodelAll.metadata.PC1 .< 2.5) .&
troch .< 1) .&
(PCAmodelAll.metadata.PC3
indSelection_lowIndHetStan= (0 .< PCAmodelAll.metadata.PC1 .< 3) .&
obs -5.5 .< PCAmodelAll.metadata.PC2 .< -3) .&
(
indSelection_lowIndHetStan= (7 .< PCAmodelAll.metadata.PC1) .&
plumb
indSelection_lowIndHetStan
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]
= [virLud nit troch obs plumb]
clusterArray
# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembership = fill(-9, nrow(PCAmodelAll.metadata))
plotOrder for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNames[i]
clusterMembership[clusterArray[:,i]] .= i
plotOrder[clusterArray[end
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames; among=false) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
regionPiTable #= 5×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼─────────────────────
1 │ virLud 0.00892738
2 │ nit 0.00677711
3 │ troch 0.00725483
4 │ obs 0.0083953
5 │ plumb 0.00673292 =#
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 10×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼─────────────────────────
1 │ virLud_nit 0.032793
2 │ virLud_troch 0.0326016
3 │ virLud_obs 0.0334515
4 │ virLud_plumb 0.041869
5 │ nit_troch 0.0389012
6 │ nit_obs 0.0393449
7 │ nit_plumb 0.0476769
8 │ troch_obs 0.0150895
9 │ troch_plumb 0.0294242
10 │ obs_plumb 0.0297807 =#
# Make a genotype-by-individual plot using all variable loci in the region,
= 0.1
missingFractionAllowed # in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.original_Fst_groups = clusterMembership
PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.original_plot_order = plotOrder
PCAmodelAll.metadata.plot_order
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
genos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
pos_selectedSNPs = Fst[:, selectedSNPs]
Fst_selectedSNPs = freqs[:, selectedSNPs]
freqs_selectedSNPs
# limit the number of individuals per group to plot
= fill(15, length(clusterNames))
numIndsToPlot
= limitIndsToPlot(clusterNames, numIndsToPlot,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;= missingFractionAllowed,
missingFractionAllowed = true); indColorRightProvided
The numbers in each group are [78 2 71 5 70] and the sum of those is 226
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now show a GBI plot like above, but with heterozygotes:
= ["virLud",
clusterNamesWithHets "nit",
"virLud_troch",
"troch",
"obs",
"plumb",
"vir_plumb"]
= ["blue",
clusterColorsWithHets "grey",
"yellowgreen",
"yellow",
"orange",
"red",
"purple"]
= (-5 .< PCAmodelAll.metadata.PC1 .< 0) .&
virLud_troch -5.5 .< PCAmodelAll.metadata.PC2 .< 0) .&
(
.!indSelection_lowIndHetStan= (-2 .< PCAmodelAll.metadata.PC1 .< 2) .&
vir_plumb 3 .< PCAmodelAll.metadata.PC2 .< 5.5) .&
(
.!indSelection_lowIndHetStan
= [virLud nit virLud_troch troch obs plumb vir_plumb]
clusterArray
sum(clusterArray, dims=1)
if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
println("Good news: Individuals included in a group matches total number of individuals")
else
println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembershipWithHets = fill(-9, nrow(PCAmodelAll.metadata))
plotOrderWithHets for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNamesWithHets[i]
clusterMembershipWithHets[clusterArray[:,i]] .= i
plotOrderWithHets[clusterArray[end
# Add column to main metadata object containing the cluster membership for this highHet region:
= "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
command eval(Meta.parse(command)) # this executes the command constructed above
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= clusterMembershipWithHets
PCAmodelAll.metadata.Fst_group = plotOrderWithHets
PCAmodelAll.metadata.plot_order
# limit the number of individuals per group to plot
= fill(15, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the west area (without nitidus)
= ["virLud",
clusterNamesWithHetsWest "virLud_troch",
"troch"]
= ["blue",
clusterColorsWithHetsWest "yellowgreen",
"yellow"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= [100, 100, 100]
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the east area
= ["troch",
clusterNamesWithHetsEast "obs",
"plumb"]
= ["yellow",
clusterColorsWithHetsEast "orange",
"red"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsEast))
numIndsToPlotWithHetsEast
= limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the northern area
= ["virLud",
clusterNamesWithHetsNorth "vir_plumb",
"plumb"]
= ["blue",
clusterColorsWithHetsNorth "purple",
"red"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= [100, 100, 100]
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now do the same for chr 28
# choose scaffold
= "gw28"
chr
positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)
# Add column to metadata containing the regionIndHetStan for this highHet region:
= "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
command eval(Meta.parse(command)) # this executes the command constructed above
= meanAcrossRegionIndHetStan
ind_with_metadata_included.regionIndHetStan
#names(ind_with_metadata_included)
# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)
# PCA of all individuals:
= Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))
genos_highViSHetRegion_imputed
= true
flipPC1 = true
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodelAll
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelAll.PCAfig)
# Add PC values to metadata for individuals included in PCA above:
if flipPC1
= -1 .* PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 else
= PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 end
if flipPC2
= -1 .* PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 else
= PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 end
= PCAmodelAll.values[3,:]
PCAmodelAll.metadata.PC3
# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`
# choose inds with low IndHet in high ViSHet region:
= (meanAcrossRegionIndHetStan .< 2)
indSelection_lowIndHetStan
#Plot only the lowIndHetStan individuals:
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Save the individual colors in the metadata
= fill("", size(PCAmodelAll.metadata, 1))
indColors for i in axes(PCAmodelAll.metadata, 1)
= group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
indColors[i] end
= indColors
PCAmodelAll.metadata.indColorLeft = indColors; PCAmodelAll.metadata.indColorRight
Plot PC1 vs. PC2:
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Plot PC1 vs. PC3:
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC3",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC3", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
At chr 28 high ViSHet region, there are only 5 clear haplogroups (yellow and orange are close). Vir and lud don’t distinguish along PC3, as nitidus varies there. Divide samples into those groups, based on PCA scores, and then calculate pi and Dxy.
= ["virLud",
clusterNames "nit",
"troch",
"obs",
"plumb"]
= ["blue",
clusterColors "grey",
"yellowgreen",
"orange",
"red"]
= (PCAmodelAll.metadata.PC1 .< -4) .& indSelection_lowIndHetStan
virLud = (-1 .< PCAmodelAll.metadata.PC1 .< 1) .& indSelection_lowIndHetStan
nit = (1 .< PCAmodelAll.metadata.PC1 .< 3) .&
troch .< -3.2) .&
(PCAmodelAll.metadata.PC2
indSelection_lowIndHetStan= (1 .< PCAmodelAll.metadata.PC1 .< 2.5) .&
obs -3.2 .< PCAmodelAll.metadata.PC2 .< -1) .&
(
indSelection_lowIndHetStan= (3 .< PCAmodelAll.metadata.PC1) .&
plumb 2.5 .< PCAmodelAll.PC2) .&
(
indSelection_lowIndHetStan
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]
= [virLud nit troch obs plumb]
clusterArray
# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembership = fill(-9, nrow(PCAmodelAll.metadata))
plotOrder for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNames[i]
clusterMembership[clusterArray[:,i]] .= i
plotOrder[clusterArray[end
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames; among=false) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
regionPiTable #= 5×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼─────────────────────
1 │ virLud 0.00792304
2 │ nit 0.00320189
3 │ troch 0.00734994
4 │ obs 0.0101536
5 │ plumb 0.00270239 =#
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 10×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼─────────────────────────
1 │ virLud_nit 0.0334156
2 │ virLud_troch 0.0318841
3 │ virLud_obs 0.0351279
4 │ virLud_plumb 0.0330054
5 │ nit_troch 0.0314387
6 │ nit_obs 0.0344624
7 │ nit_plumb 0.0307517
8 │ troch_obs 0.0188902
9 │ troch_plumb 0.0234771
10 │ obs_plumb 0.0265753 =#
# Make a genotype-by-individual plot using all variable loci in the region,
= 0.1
missingFractionAllowed # in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.original_Fst_groups = clusterMembership
PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.original_plot_order = plotOrder
PCAmodelAll.metadata.plot_order
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
genos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
pos_selectedSNPs = Fst[:, selectedSNPs]
Fst_selectedSNPs = freqs[:, selectedSNPs]
freqs_selectedSNPs
# limit the number of individuals per group to plot
= fill(15, length(clusterNames))
numIndsToPlot
= limitIndsToPlot(clusterNames, numIndsToPlot,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;= missingFractionAllowed,
missingFractionAllowed = true); indColorRightProvided
The numbers in each group are [74 2 65 3 67] and the sum of those is 211
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now show a GBI plot like above, but with heterozygotes
= ["virLud",
clusterNamesWithHets "virLud_nit",
"nit",
"virLud_troch",
"troch",
"obs",
"obsHet",
"obs_plumb",
"plumb",
"vir_plumb"]
= ["blue",
clusterColorsWithHets "slateblue1",
"grey",
"yellowgreen",
"yellow",
"orange",
"darkgoldenrod1",
"darkorange1",
"red",
"purple"]
= (-4 .< PCAmodelAll.metadata.PC1 .< -2) .&
virLud_nit 0 .< PCAmodelAll.metadata.PC2 .< 2) .&
(
.!indSelection_lowIndHetStan= (-2.5 .< PCAmodelAll.metadata.PC1 .< 0) .&
virLud_troch -3.5 .< PCAmodelAll.metadata.PC2 .< 0) .&
(
.!indSelection_lowIndHetStan= (1.5 .< PCAmodelAll.metadata.PC1 .< 3) .&
obsHet -3.5 .< PCAmodelAll.metadata.PC2 .< -1) .&
(
.!indSelection_lowIndHetStan= (2.5 .< PCAmodelAll.metadata.PC1 .< 4) .&
obs_plumb -1 .< PCAmodelAll.metadata.PC2 .< 2) .&
(
.!indSelection_lowIndHetStan= (-2 .< PCAmodelAll.metadata.PC1 .< 2) .&
vir_plumb 2 .< PCAmodelAll.metadata.PC2 .< 4) .&
(
.!indSelection_lowIndHetStan
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[virLud_nit]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[virLud_troch]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[obsHet]
PCAmodelAll.metadata.Fst_group[obs_plumb]
PCAmodelAll.metadata.Fst_group[plumb]
PCAmodelAll.metadata.Fst_group[vir_plumb]
= [virLud virLud_nit nit virLud_troch troch obs obsHet obs_plumb plumb vir_plumb]
clusterArray
sum(clusterArray, dims=1)
if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
println("Good news: Individuals included in a group matches total number of individuals")
end
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembershipWithHets = fill(-9, nrow(PCAmodelAll.metadata))
plotOrderWithHets for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNamesWithHets[i]
clusterMembershipWithHets[clusterArray[:,i]] .= i
plotOrderWithHets[clusterArray[end
# Add column to main metadata object containing the cluster membership for this highHet region:
= "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
command eval(Meta.parse(command)) # this executes the command constructed above
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= clusterMembershipWithHets
PCAmodelAll.metadata.Fst_group = plotOrderWithHets
PCAmodelAll.metadata.plot_order
# limit the number of individuals per group to plot
= fill(15, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show GBI plot according to original groups and plot order
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with all individuals
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
# Set no limit (or high limit anyway) on the number of individuals per group to plot
= fill(1000, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with only vir and plumb pops
= ["virLud", "plumb"] # these are the haplotype clusters to include in the choice below of SNPs to show
includeTheseClusters
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)
freqs_local, sampleSizes_local
= (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genosForGBI = pos_selectedSNPs[selectedSNPs, :]
posForGBI = freqs_local[:, selectedSNPs]
freqsForGBI
= ["vir", "plumb", "plumb_vir"] # these are the original Fst_groups
plotGroups = ["blue", "red", "purple"]
plotGroupColors
= copy(PCAmodelAll.metadata)
metadataForGBI
= metadataForGBI.original_Fst_groups
metadataForGBI.Fst_group
plotGenotypeByIndividual(regionInfo, posForGBI,
genosForGBI, metadataForGBI, freqsForGBI, plotGroups, plotGroupColors;= missingFractionAllowed); missingFractionAllowed
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with only vir lud troch pops
= ["virLud", "troch"] # these are the haplotype clusters to include in the choice below of SNPs to show
includeTheseClusters
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)
freqs_local, sampleSizes_local
= (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genosForGBI = pos_selectedSNPs[selectedSNPs, :]
posForGBI = freqs_local[:, selectedSNPs]
freqsForGBI
= ["vir", "plumb", "plumb_vir"] # these are the original Fst_groups
plotGroups = ["blue", "red", "purple"]
plotGroupColors
= copy(PCAmodelAll.metadata)
metadataForGBI = metadataForGBI.original_Fst_groups
metadataForGBI.Fst_group
= ["vir", "vir_S", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML", "troch_west", "troch_LN"]
plotGroups = ["blue","turquoise1", "seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow"]
plotGroupColors
# Set no limit (or high limit anyway) on the number of individuals per group to plot
= fill(10, length(plotGroups))
numIndsToPlotWithHets
= limitIndsToPlot(plotGroups,
genosForGBI_limited, indMetadataforGBI_limited
numIndsToPlotWithHets,
genosForGBI, metadataForGBI;= false)
sortByMissing
plotGenotypeByIndividual(regionInfo, posForGBI,
genosForGBI_limited, indMetadataforGBI_limited, freqsForGBI, plotGroups, plotGroupColors;= missingFractionAllowed); missingFractionAllowed
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with only troch plumb pops
= ["troch", "obs", "plumb"] # these are the haplotype clusters to include in the choice below of SNPs to show
includeTheseClusters
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)
freqs_local, sampleSizes_local
= (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genosForGBI = pos_selectedSNPs[selectedSNPs, :]
posForGBI = freqs_local[:, selectedSNPs]
freqsForGBI
= copy(PCAmodelAll.metadata)
metadataForGBI = metadataForGBI.original_Fst_groups
metadataForGBI.Fst_group
# remove individuals that have vir haplotypes, as this could otherwise be mistaken for introgression from obscuratus:
= ["GW_Armando_plate1_JF24G02", # gw19 hetero from plumb
removeTheseInds "GW_Armando_plate1_JF07G03", # gw19 hetero from plumb
"GW_Armando_plate1_JF12G02", # gw19 hetero from plumb
"GW_Armando_plate1_JF09G01"] # gw28 is hetero from plumb
= map(in(removeTheseInds), metadataForGBI.ind)
selection = metadataForGBI[.!selection, :]
metadataForGBI = genosForGBI[.!selection, :]
genosForGBI
= ["troch_LN","troch_EM","obs","plumb_BJ","plumb"]
plotGroups = ["yellow","gold","orange","pink","red"]
plotGroupColors
# Set limit on the number of individuals per group to plot
= fill(15, length(plotGroups))
numIndsToPlotWithHets
# metadataForGBI[metadataForGBI.Fst_group .== "plumb", :]
= limitIndsToPlot(plotGroups,
genosForGBI_limited, indMetadataforGBI_limited
numIndsToPlotWithHets,
genosForGBI, metadataForGBI;= false)
sortByMissing
# indMetadataforGBI_limited[indMetadataforGBI_limited.Fst_group .== "plumb", :]
plotGenotypeByIndividual(regionInfo, posForGBI,
genosForGBI_limited, indMetadataforGBI_limited, freqsForGBI, plotGroups, plotGroupColors;= missingFractionAllowed); missingFractionAllowed
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the west area (without nitidus)
= ["virLud",
clusterNamesWithHetsWest "virLud_troch",
"troch"]
= ["blue",
clusterColorsWithHetsWest "yellowgreen",
"yellow"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= [100, 100, 100]
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the east area
= ["troch",
clusterNamesWithHetsWest "obs",
"obsHet",
"obs_plumb",
"plumb"]
= ["yellow",
clusterColorsWithHetsWest "orange",
"darkgoldenrod1",
"darkorange1",
"red"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= [100, 100, 100, 100, 100]
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the northern area
= ["virLud",
clusterNamesWithHetsWest "vir_plumb",
"plumb"]
= ["blue",
clusterColorsWithHetsWest "purple",
"red"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= [100, 100, 100]
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now do the same for chr 26
# choose scaffold
= "gw26"
chr
positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)
# Add column to metadata containing the regionIndHetStan for this highHet region:
= "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
command eval(Meta.parse(command)) # this executes the command constructed above
= meanAcrossRegionIndHetStan
ind_with_metadata_included.regionIndHetStan
# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)
# PCA of all individuals:
= Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))
genos_highViSHetRegion_imputed
= false
flipPC1 = false
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodelAll
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelAll.PCAfig)
# Add PC values to metadata for individuals included in PCA above:
if flipPC1
= -1 .* PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 else
= PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 end
if flipPC2
= -1 .* PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 else
= PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 end
= PCAmodelAll.values[3,:]
PCAmodelAll.metadata.PC3
# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`
# choose inds with low IndHet in high ViSHet region:
= (meanAcrossRegionIndHetStan .< 1.5)
indSelection_lowIndHetStan
#Plot only the lowIndHetStan individuals:
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Save the individual colors in the metadata
= fill("", size(PCAmodelAll.metadata, 1))
indColors for i in axes(PCAmodelAll.metadata, 1)
= group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
indColors[i] end
= indColors
PCAmodelAll.metadata.indColorLeft = indColors; PCAmodelAll.metadata.indColorRight
Plot PC1 vs. PC2
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Plot PC1 vs. PC3
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC3",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC3", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
At chr 26 high ViSHet region, there are only 5 clear haplogroups (one green is somewhat away). Vir and lud vary along PC3 but cannot be clearly grouped. Divide samples into those groups, based on PCA scores, and then calculate pi and Dxy.
# Inspect chromosome 26 PCA of low IndHet (< 1.5) individuals,
# and specify group boundaries:
= ["virLud",
clusterNames "nit",
"troch",
"obs",
"plumb"]
= ["blue",
clusterColors "grey",
"yellow",
"orange",
"red"]
= (PCAmodelAll.metadata.PC1 .< -5.5) .& indSelection_lowIndHetStan
virLud = (-5.5 .< PCAmodelAll.metadata.PC1 .< -4) .& indSelection_lowIndHetStan
nit = (PCAmodelAll.metadata.PC2 .< -6) .& indSelection_lowIndHetStan
troch = (-6 .< PCAmodelAll.metadata.PC2 .< -4) .& indSelection_lowIndHetStan
obs = (6 .< PCAmodelAll.metadata.PC1) .& (2 .< PCAmodelAll.PC2) .& indSelection_lowIndHetStan
plumb
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]
= [virLud nit troch obs plumb]
clusterArray
# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembership = fill(-9, nrow(PCAmodelAll.metadata))
plotOrder for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNames[i]
clusterMembership[clusterArray[:,i]] .= i
plotOrder[clusterArray[end
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames; among=false) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
regionPiTable #= 5×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼─────────────────────
1 │ virLud 0.0135205
2 │ nit 0.00548557
3 │ troch 0.00975861
4 │ obs 0.00902527
5 │ plumb 0.00510553 =#
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 10×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼─────────────────────────
1 │ virLud_nit 0.0243846
2 │ virLud_troch 0.0324256
3 │ virLud_obs 0.0332749
4 │ virLud_plumb 0.0390193
5 │ nit_troch 0.0341654
6 │ nit_obs 0.0344734
7 │ nit_plumb 0.0403857
8 │ troch_obs 0.0176458
9 │ troch_plumb 0.0296574
10 │ obs_plumb 0.0300157 =#
# Make a genotype-by-individual plot using all variable loci in the region,
= 0.1
missingFractionAllowed # in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.original_Fst_groups = clusterMembership
PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.original_plot_order = plotOrder
PCAmodelAll.metadata.plot_order
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
genos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
pos_selectedSNPs = Fst[:, selectedSNPs]
Fst_selectedSNPs = freqs[:, selectedSNPs]
freqs_selectedSNPs
# limit the number of individuals per group to plot
= fill(15, length(clusterNames))
numIndsToPlot
= limitIndsToPlot(clusterNames, numIndsToPlot,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;= missingFractionAllowed,
missingFractionAllowed = true); indColorRightProvided
The numbers in each group are [71 2 62 3 67] and the sum of those is 205
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now show a GBI plot like above, but with heterozygotes
= ["virLud",
clusterNamesWithHets "nit",
"virLud_troch",
"troch",
"obs",
"obs_plumb",
"plumb",
"vir_plumb"]
= ["blue",
clusterColorsWithHets "grey",
"yellowgreen",
"yellow",
"orange",
"darkorange1",
"red",
"purple"]
= (-5.5 .< PCAmodelAll.metadata.PC1 .< -2.2) .&
virLud_troch -4 .< PCAmodelAll.metadata.PC2 .< 2) .&
(
.!indSelection_lowIndHetStan= (2.5 .< PCAmodelAll.metadata.PC1 .< 5) .&
obs_plumb -3.5 .< PCAmodelAll.metadata.PC2 .< -1.5) .&
(
.!indSelection_lowIndHetStan= (-2 .< PCAmodelAll.metadata.PC1 .< 3) .&
vir_plumb 2.5 .< PCAmodelAll.metadata.PC2 .< 5) .&
(
.!indSelection_lowIndHetStan
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[virLud_troch]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[obs_plumb]
PCAmodelAll.metadata.Fst_group[plumb]
PCAmodelAll.metadata.Fst_group[vir_plumb]
= [virLud nit virLud_troch troch obs obs_plumb plumb vir_plumb]
clusterArray
sum(clusterArray, dims=1)
if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
println("Good news: Individuals included in a group matches total number of individuals")
end
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembershipWithHets = fill(-9, nrow(PCAmodelAll.metadata))
plotOrderWithHets for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNamesWithHets[i]
clusterMembershipWithHets[clusterArray[:,i]] .= i
plotOrderWithHets[clusterArray[end
# Add column to main metadata object containing the cluster membership for this highHet region:
= "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
command eval(Meta.parse(command)) # this executes the command constructed above
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= clusterMembershipWithHets
PCAmodelAll.metadata.Fst_group = plotOrderWithHets
PCAmodelAll.metadata.plot_order
# limit the number of individuals per group to plot
= fill(15, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show GBI plot according to original groups and plot order
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with all individuals
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
# Set no limit (or high limit anyway) on the number of individuals per group to plot
= fill(1000, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with only vir and plumb pops
= ["virLud", "plumb"] # these are the haplotype clusters to include in the choice below of SNPs to show
includeTheseClusters
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)
freqs_local, sampleSizes_local
= (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genosForGBI = pos_selectedSNPs[selectedSNPs, :]
posForGBI = freqs_local[:, selectedSNPs]
freqsForGBI
= ["vir", "plumb", "plumb_vir"] # these are the original Fst_groups
plotGroups = ["blue", "red", "purple"]
plotGroupColors
= copy(PCAmodelAll.metadata)
metadataForGBI
= metadataForGBI.original_Fst_groups
metadataForGBI.Fst_group
plotGenotypeByIndividual(regionInfo, posForGBI,
genosForGBI, metadataForGBI, freqsForGBI, plotGroups, plotGroupColors;= missingFractionAllowed) missingFractionAllowed
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
(Scene (768px, 960px): 0 Plots 2 Child Scenes: ├ Scene (768px, 960px) └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 1 0; … ; 2 2 … 2 2; 2 1 … 0 0], [4175655, 4201372, 4202662, 4220609, 4269946, 4272732, 4280387, 4397231, 4397235, 4411591 … 5519560, 5519564, 5524191, 5524276, 5530753, 5530774, 5535422, 5540526, 5549587, 5549602], 100×24 DataFrame Row │ ind ID location group ⋯ │ String String String7 String1 ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ GW_Armando_plate1_JF12G04 GW_Armando_plate1_JF12G04 ST_vi vir ⋯ 2 │ GW_Armando_plate2_JF03G01 GW_Armando_plate2_JF03G01 ST_vi vir_mis 3 │ GW_Armando_plate2_JF30G01 GW_Armando_plate2_JF30G01 ST_vi vir_mis 4 │ GW_Lane5_STvi1 GW_Lane5_STvi1 ST_vi vir 5 │ GW_Lane5_STvi2 GW_Lane5_STvi2 ST_vi vir ⋯ 6 │ GW_Lane5_STvi3 GW_Lane5_STvi3 ST_vi vir 7 │ GW_Armando_plate1_JF16G01 GW_Armando_plate1_JF16G01 DV_vi plumb_v 8 │ GW_Armando_plate2_JF16G02 GW_Armando_plate2_JF16G02 DV_vi plumb_v 9 │ GW_Armando_plate2_JE31G01 GW_Armando_plate2_JE31G01 VB_vi vir_mis ⋯ 10 │ GW_Armando_plate2_JF03G02 GW_Armando_plate2_JF03G02 VB_vi vir_mis 11 │ GW_Lane5_YK11 GW_Lane5_YK11 YK vir ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋱ 91 │ GW_Armando_plate2_JF24G01 GW_Armando_plate2_JF24G01 VB plumb 92 │ GW_Armando_plate2_JF25G01 GW_Armando_plate2_JF25G01 VB plumb ⋯ 93 │ GW_Armando_plate1_JG02G02 GW_Armando_plate1_JG02G02 PR plumb 94 │ GW_Armando_plate1_JG02G04 GW_Armando_plate1_JG02G04 PR plumb 95 │ GW_Armando_plate2_JG01G01 GW_Armando_plate2_JG01G01 PR plumb 96 │ GW_Armando_plate2_JG02G01 GW_Armando_plate2_JG02G01 PR plumb ⋯ 97 │ GW_Armando_plate2_JG02G03 GW_Armando_plate2_JG02G03 PR plumb 98 │ GW_Lane5_SL1 GW_Lane5_SL1 SL plumb 99 │ GW_Lane5_SL2 GW_Lane5_SL2 SL plumb 100 │ GW_Armando_plate1_JF10G03 GW_Armando_plate1_JF10G03 ST plumb_v ⋯ 21 columns and 79 rows omitted)
Show just the west clusters (without nitidus)
= ["virLud",
clusterNamesWithHetsWest "virLud_troch",
"troch"]
= ["blue",
clusterColorsWithHetsWest "yellowgreen",
"yellow"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= [100, 100, 100]
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the east area
= ["troch",
clusterNamesWithHetsWest "obs",
"obs_plumb",
"plumb",]
= ["yellow",
clusterColorsWithHetsWest "orange",
"darkorange1",
"red"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= [100, 100, 100, 100]
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the northern area
= ["virLud",
clusterNamesWithHetsWest "vir_plumb",
"plumb"]
= ["blue",
clusterColorsWithHetsWest "purple",
"red"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= [100, 100, 100]
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Do a PCA based on a same-size region elsewhere on gw26 (with low ViSHet):
# get length of region
= positionMax - positionMin
lengthHighViSHetRegion
= 1_000_000 # start at 1 Mb from left side
leftLocus = leftLocus + lengthHighViSHetRegion
rightLocus = string("chr ", chr, " ",leftLocus," to ",rightLocus)
regionText_lowViSHetRegion
= (leftLocus .<= pos_region.position .<= rightLocus)
lociSelection = genotypes_region[:, lociSelection]
genotypes_lowViSHetRegion
# impute missing genotypes:
= Impute.svd(Matrix{Union{Missing, Float32}}(genotypes_lowViSHetRegion))
genotypes_lowViSHetRegion_imputed
= true
flipPC1 = true
flipPC2
= plotPCA(genotypes_lowViSHetRegion_imputed, ind_with_metadata_included,
PCAmodel
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText_lowViSHetRegion,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodel.PCAfig)
if false # set to true to save plot
save("FigureS2C_gw26_nonHLBRarbitrary_from_Julia.png", PCAmodel.PCAfig, px_per_unit = 2.0)
end
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Do similar as above but for chr 1A
(Tried chr 1 but seems to be some recomb that makes it less clear to assign to groups)
# choose scaffold
= "gw1A"
chr
positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)
# Add column to metadata containing the regionIndHetStan for this highHet region:
= "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
command eval(Meta.parse(command)) # this executes the command constructed above
= meanAcrossRegionIndHetStan
ind_with_metadata_included.regionIndHetStan
# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)
# PCA of all individuals:
= Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))
genos_highViSHetRegion_imputed
= true
flipPC1 = true
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodelAll
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelAll.PCAfig)
# Add PC values to metadata for individuals included in PCA above:
if flipPC1
= -1 .* PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 else
= PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 end
if flipPC2
= -1 .* PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 else
= PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 end
= PCAmodelAll.values[3,:]
PCAmodelAll.metadata.PC3
# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`
# choose inds with low IndHet in high ViSHet region:
= (meanAcrossRegionIndHetStan .< 1.5)
indSelection_lowIndHetStan
#Plot only the lowIndHetStan individuals:
= CairoMakie.Figure();
f = Axis(f[1, 1],
ax = "PC1 vs. PC2, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
More than 1 region on that scaffold. Using just the longest one.
Row | regionChrom | regionStart | regionEnd |
---|---|---|---|
String | Int64 | Int64 | |
1 | gw1A | 4674 | 3771263 |
2 | gw1A | 23592559 | 30616953 |
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Save the individual colors in the metadata
= fill("", size(PCAmodelAll.metadata, 1))
indColors for i in axes(PCAmodelAll.metadata, 1)
= group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
indColors[i] end
= indColors
PCAmodelAll.metadata.indColorLeft = indColors; PCAmodelAll.metadata.indColorRight
Plot PC1 vs. PC2
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Plot PC1 vs. PC3
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC3",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC3", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
At chr 1A high ViSHet region, there are only 5 clear haplogroups (checked that vir and lud don’t form distinct groups on PC3). Divide samples into those groups, based on PCA scores, and then calculate pi and Dxy.
= ["virLud",
clusterNames "nit",
"troch",
"obs",
"plumb"]
= ["green",
clusterColors "grey",
"yellowgreen",
"orange",
"red"]
= (PCAmodelAll.metadata.PC1 .< -8) .&
virLud
indSelection_lowIndHetStan= (-8 .< PCAmodelAll.metadata.PC1 .< -4) .&
nit
indSelection_lowIndHetStan= (4 .< PCAmodelAll.metadata.PC1 .< 9) .&
troch .< -5) .&
(PCAmodelAll.metadata.PC2
indSelection_lowIndHetStan= (2 .< PCAmodelAll.metadata.PC1 .< 6) .&
obs -5 .< PCAmodelAll.metadata.PC2 .< -1) .&
(
indSelection_lowIndHetStan= (3 .< PCAmodelAll.metadata.PC1) .&
plumb 7.5 .< PCAmodelAll.metadata.PC2) .&
(
indSelection_lowIndHetStan
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]
= [virLud nit troch obs plumb]
clusterArray
# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembership = fill(-9, nrow(PCAmodelAll.metadata))
plotOrder for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNames[i]
clusterMembership[clusterArray[:,i]] .= i
plotOrder[clusterArray[end
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames; among=false) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
regionPiTable #= 5×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼─────────────────────
1 │ virLud 0.00559696
2 │ nit 0.00458482
3 │ troch 0.00470781
4 │ obs 0.00524545
5 │ plumb 0.00659452 =#
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 10×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼─────────────────────────
1 │ virLud_nit 0.0234051
2 │ virLud_troch 0.0303858
3 │ virLud_obs 0.0285612
4 │ virLud_plumb 0.0298279
5 │ nit_troch 0.036893
6 │ nit_obs 0.0346282
7 │ nit_plumb 0.0363109
8 │ troch_obs 0.0169886
9 │ troch_plumb 0.0274903
10 │ obs_plumb 0.0256253 =#
# Make a genotype-by-individual plot using all variable loci in the region,
= 0.1
missingFractionAllowed # in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.original_Fst_groups = clusterMembership
PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.original_plot_order = plotOrder
PCAmodelAll.metadata.plot_order
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
genos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
pos_selectedSNPs = Fst[:, selectedSNPs]
Fst_selectedSNPs = freqs[:, selectedSNPs]
freqs_selectedSNPs
# limit the number of individuals per group to plot
= fill(15, length(clusterNames))
numIndsToPlot
= limitIndsToPlot(clusterNames, numIndsToPlot,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;= missingFractionAllowed,
missingFractionAllowed = true); indColorRightProvided
The numbers in each group are [75 2 65 5 68] and the sum of those is 215
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now show a GBI plot like above, but with heterozygotes
= ["virLud",
clusterNamesWithHets "nit",
"virLud_troch",
"troch",
"obs",
"plumb",
"vir_plumb"]
= ["blue",
clusterColorsWithHets "grey",
"yellowgreen",
"yellow",
"orange",
"red",
"purple"]
= (-5 .< PCAmodelAll.metadata.PC1 .< 2) .&
virLud_troch -8 .< PCAmodelAll.metadata.PC2 .< -2) .&
(
.!indSelection_lowIndHetStan= (-5 .< PCAmodelAll.metadata.PC1 .< 0) .&
vir_plumb 3 .< PCAmodelAll.metadata.PC2 .< 7) .&
(
.!indSelection_lowIndHetStan
= [virLud nit virLud_troch troch obs plumb vir_plumb]
clusterArray
sum(clusterArray, dims=1)
if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
println("Good news: Individuals included in a group matches total number of individuals")
else
println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembershipWithHets = fill(-9, nrow(PCAmodelAll.metadata))
plotOrderWithHets for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNamesWithHets[i]
clusterMembershipWithHets[clusterArray[:,i]] .= i
plotOrderWithHets[clusterArray[end
# Add column to main metadata object containing the cluster membership for this highHet region:
= "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
command eval(Meta.parse(command)) # this executes the command constructed above
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= clusterMembershipWithHets
PCAmodelAll.metadata.Fst_group = plotOrderWithHets
PCAmodelAll.metadata.plot_order
# limit the number of individuals per group to plot
= fill(15, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the west area (without nitidus)
= ["virLud",
clusterNamesWithHetsWest "virLud_troch",
"troch"]
= ["blue",
clusterColorsWithHetsWest "yellowgreen",
"yellow"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= [100, 100, 100]
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the east area
= ["troch",
clusterNamesWithHetsEast "obs",
"plumb"]
= ["yellow",
clusterColorsWithHetsEast "orange",
"red"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsEast))
numIndsToPlotWithHetsEast
= limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the northern area
= ["virLud",
clusterNamesWithHetsNorth "vir_plumb",
"plumb"]
= ["blue",
clusterColorsWithHetsNorth "purple",
"red"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= [100, 100, 100]
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Do similar as above but for chr 2:
This one doesn’t look like a super clear pattern (in terms of assigning homozygous and heterozygous haploblock genotypes), but we’ll see what it shows:
# choose scaffold
= "gw2"
chr
positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)
# Add column to metadata containing the regionIndHetStan for this highHet region:
= "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
command eval(Meta.parse(command)) # this executes the command constructed above
= meanAcrossRegionIndHetStan
ind_with_metadata_included.regionIndHetStan
# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)
# PCA of all individuals:
= Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))
genos_highViSHetRegion_imputed
= false
flipPC1 = true
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodelAll
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelAll.PCAfig)
# Add PC values to metadata for individuals included in PCA above:
if flipPC1
= -1 .* PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 else
= PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 end
if flipPC2
= -1 .* PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 else
= PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 end
= PCAmodelAll.values[3,:]
PCAmodelAll.metadata.PC3
# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`
# choose inds with low IndHet in high ViSHet region:
= (meanAcrossRegionIndHetStan .< 1.25)
indSelection_lowIndHetStan
#Plot only the lowIndHetStan individuals:
= CairoMakie.Figure();
f = Axis(f[1, 1],
ax = "PC1 vs. PC2, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
More than 1 region on that scaffold. Using just the longest one.
Row | regionChrom | regionStart | regionEnd |
---|---|---|---|
String | Int64 | Int64 | |
1 | gw2 | 54537375 | 59262130 |
2 | gw2 | 60234161 | 61533451 |
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Save the individual colors in the metadata
= fill("", size(PCAmodelAll.metadata, 1))
indColors for i in axes(PCAmodelAll.metadata, 1)
= group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
indColors[i] end
= indColors
PCAmodelAll.metadata.indColorLeft = indColors; PCAmodelAll.metadata.indColorRight
Plot PC1 vs. PC2
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Plot PC1 vs. PC3
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC3",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC3", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
At chr 2 high ViSHet region, there are 5 clear haplogroups. Divide samples into those groups, based on PCA scores, and then calculate pi and Dxy.
= ["virLud",
clusterNames "nit",
"troch",
"obs",
"plumb"]
= ["green",
clusterColors "grey",
"yellowgreen",
"orange",
"red"]
= (PCAmodelAll.metadata.PC1 .< -4) .&
virLud
indSelection_lowIndHetStan= (-3 .< PCAmodelAll.metadata.PC1 .< -1) .&
nit 0 .< PCAmodelAll.metadata.PC2 .< 2) .&
(
indSelection_lowIndHetStan= (-2.5 .< PCAmodelAll.metadata.PC1 .< 0) .&
troch .< -5) .&
(PCAmodelAll.metadata.PC2
indSelection_lowIndHetStan= (-2 .< PCAmodelAll.metadata.PC1 .< 1) .&
obs -5 .< PCAmodelAll.metadata.PC2 .< -3) .&
(
indSelection_lowIndHetStan= (6 .< PCAmodelAll.metadata.PC1) .&
plumb 1 .< PCAmodelAll.metadata.PC2) .&
(
indSelection_lowIndHetStan
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]
= [virLud nit troch obs plumb]
clusterArray
# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembership = fill(-9, nrow(PCAmodelAll.metadata))
plotOrder for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNames[i]
clusterMembership[clusterArray[:,i]] .= i
plotOrder[clusterArray[end
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames; among=false) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
regionPiTable #= 5×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼─────────────────────
1 │ virLud 0.0123364
2 │ nit 0.00557103
3 │ troch 0.00911341
4 │ obs 0.00891506
5 │ plumb 0.0086287 =#
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 10×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼─────────────────────────
1 │ virLud_nit 0.0328534
2 │ virLud_troch 0.0337586
3 │ virLud_obs 0.0328064
4 │ virLud_plumb 0.0416095
5 │ nit_troch 0.0376123
6 │ nit_obs 0.0363568
7 │ nit_plumb 0.0456889
8 │ troch_obs 0.0144702
9 │ troch_plumb 0.0331178
10 │ obs_plumb 0.0318128 =#
# Make a genotype-by-individual plot using all variable loci in the region,
= 0.1
missingFractionAllowed # in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.original_Fst_groups = clusterMembership
PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.original_plot_order = plotOrder
PCAmodelAll.metadata.plot_order
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
genos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
pos_selectedSNPs = Fst[:, selectedSNPs]
Fst_selectedSNPs = freqs[:, selectedSNPs]
freqs_selectedSNPs
# limit the number of individuals per group to plot
= fill(15, length(clusterNames))
numIndsToPlot
= limitIndsToPlot(clusterNames, numIndsToPlot,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;= missingFractionAllowed,
missingFractionAllowed = true); indColorRightProvided
The numbers in each group are [59 1 72 4 69] and the sum of those is 205
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now show a GBI plot like above, but with heterozygotes
= ["virLud",
clusterNamesWithHets "virLudHet",
"nit",
"nitHet",
"virLud_troch",
"troch",
"trochHet",
"obs",
"obs_plumb",
"plumb",
"plumbHet",
"vir_plumb"]
= ["blue",
clusterColorsWithHets "blue",
"grey",
"grey",
"yellowgreen",
"yellow",
"yellow",
"orange",
"darkorange1",
"red",
"red",
"purple"]
= (PCAmodelAll.metadata.PC1 .< -4) .&
virLudHet 2.5 .< PCAmodelAll.metadata.PC2 .< 7) .&
(
.!indSelection_lowIndHetStan= (-3 .< PCAmodelAll.metadata.PC1 .< -2) .&
nitHet 0.5 .< PCAmodelAll.metadata.PC2 .< 1.5) .&
(
.!indSelection_lowIndHetStan= (-5 .< PCAmodelAll.metadata.PC1 .< -2) .&
virLud_troch -3 .< PCAmodelAll.metadata.PC2 .< 1) .&
(
.!indSelection_lowIndHetStan= (-2.5 .< PCAmodelAll.metadata.PC1 .< 0) .&
trochHet .< -5) .&
(PCAmodelAll.metadata.PC2
.!indSelection_lowIndHetStan= (2 .< PCAmodelAll.metadata.PC1 .< 3) .&
obs_plumb -3 .< PCAmodelAll.metadata.PC2 .< 2) .&
(
.!indSelection_lowIndHetStan= (6 .< PCAmodelAll.metadata.PC1) .&
plumbHet 1 .< PCAmodelAll.metadata.PC2) .&
(
.!indSelection_lowIndHetStan= (-3 .< PCAmodelAll.metadata.PC1 .< 3) .&
vir_plumb 2 .< PCAmodelAll.metadata.PC2 .< 5) .&
(
.!indSelection_lowIndHetStan
= [virLud virLudHet nit nitHet virLud_troch troch trochHet obs obs_plumb plumb plumbHet vir_plumb]
clusterArray
sum(clusterArray, dims=1)
if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
println("Good news: Individuals included in a group matches total number of individuals")
else
println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end
# check which individuals left out:
sum(clusterArray, dims=2)
vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembershipWithHets = fill(-9, nrow(PCAmodelAll.metadata))
plotOrderWithHets for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNamesWithHets[i]
clusterMembershipWithHets[clusterArray[:,i]] .= i
plotOrderWithHets[clusterArray[end
# Add column to main metadata object containing the cluster membership for this highHet region:
= "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
command eval(Meta.parse(command)) # this executes the command constructed above
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= clusterMembershipWithHets
PCAmodelAll.metadata.Fst_group = plotOrderWithHets
PCAmodelAll.metadata.plot_order
# limit the number of individuals per group to plot
= fill(15, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show GBI plot according to original groups and plot order
#PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.original_Fst_group
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
show same but with all individuals
#PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.original_Fst_group
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
# Set no limit (or high limit anyway) on the number of individuals per group to plot
= fill(1000, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Chr 2 is complicated, shows recomb and perhaps some haploblock sharing between east Siberia and the southwestern area. Hard to show in summary figure but should perhaps mention.
Same for chr 3
# choose scaffold
= "gw3"
chr
positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)
# Add column to metadata containing the regionIndHetStan for this highHet region:
= "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
command eval(Meta.parse(command)) # this executes the command constructed above
= meanAcrossRegionIndHetStan
ind_with_metadata_included.regionIndHetStan
#names(ind_with_metadata_included)
# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)
# PCA of all individuals:
= Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))
genos_highViSHetRegion_imputed
= false
flipPC1 = true
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodelAll
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelAll.PCAfig)
# Add PC values to metadata for individuals included in PCA above:
if flipPC1
= -1 .* PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 else
= PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 end
if flipPC2
= -1 .* PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 else
= PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 end
= PCAmodelAll.values[3,:]
PCAmodelAll.metadata.PC3
# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`
# choose inds with low IndHet in high ViSHet region:
= (meanAcrossRegionIndHetStan .< 1.25)
indSelection_lowIndHetStan
#Plot only the lowIndHetStan individuals:
= CairoMakie.Figure();
f = Axis(f[1, 1],
ax = "PC1 vs. PC2, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
More than 1 region on that scaffold. Using just the longest one.
Row | regionChrom | regionStart | regionEnd |
---|---|---|---|
String | Int64 | Int64 | |
1 | gw3 | 101192949 | 103495514 |
2 | gw3 | 104554714 | 108279595 |
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Save the individual colors in the metadata
= fill("", size(PCAmodelAll.metadata, 1))
indColors for i in axes(PCAmodelAll.metadata, 1)
= group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
indColors[i] end
= indColors
PCAmodelAll.metadata.indColorLeft = indColors; PCAmodelAll.metadata.indColorRight
Plot PC1 vs. PC2
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Plot PC1 vs. PC3
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC3",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC3", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
At chr 3 high ViSHet region, there are only 4 clear haplogroups (vir and lud separated though on PC3, but not in a clean way that I can distinguish clearly). Divide samples into those groups, based on PCA scores, and then calculate pi and Dxy
= ["virLud",
clusterNames "nit",
"trochObs",
"plumb"]
= ["blue",
clusterColors "grey",
"yellow",
"red"]
= (PCAmodelAll.metadata.PC1 .< -4) .&
virLud
indSelection_lowIndHetStan= (-4 .< PCAmodelAll.metadata.PC1 .< -2) .&
nit
indSelection_lowIndHetStan= (-2 .< PCAmodelAll.metadata.PC1 .< 2.5) .&
trochObs .< -3) .&
(PCAmodelAll.metadata.PC2
indSelection_lowIndHetStan= (5 .< PCAmodelAll.metadata.PC1) .&
plumb 2 .< PCAmodelAll.metadata.PC2) .&
(
indSelection_lowIndHetStan
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[trochObs]
PCAmodelAll.metadata.Fst_group[plumb]
= [virLud nit trochObs plumb]
clusterArray
# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembership = fill(-9, nrow(PCAmodelAll.metadata))
plotOrder for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNames[i]
clusterMembership[clusterArray[:,i]] .= i
plotOrder[clusterArray[end
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames; among=false) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
regionPiTable #= 4×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼──────────────────────
1 │ virLud 0.00950795
2 │ nit 0.00509165
3 │ trochObs 0.00992915
4 │ plumb 0.00992294 =#
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 6×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼────────────────────────────
1 │ virLud_nit 0.0234761
2 │ virLud_trochObs 0.0309999
3 │ virLud_plumb 0.0345515
4 │ nit_trochObs 0.0320461
5 │ nit_plumb 0.0351086
6 │ trochObs_plumb 0.0305924 =#
# Make a genotype-by-individual plot using all variable loci in the region,
= 0.1
missingFractionAllowed # in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.original_Fst_groups = clusterMembership
PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.original_plot_order = plotOrder
PCAmodelAll.metadata.plot_order
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
genos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
pos_selectedSNPs = Fst[:, selectedSNPs]
Fst_selectedSNPs = freqs[:, selectedSNPs]
freqs_selectedSNPs
# limit the number of individuals per group to plot
= fill(150, length(clusterNames))
numIndsToPlot
= limitIndsToPlot(clusterNames, numIndsToPlot,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= false)
sortByMissing
# sort based on original_plot_order, and then together with function below will arrange individuals in population order within clusters:
= sortperm(indMetadataforGBI.original_plot_order, rev=false)
sortOrder = indMetadataforGBI[sortOrder, :]
indMetadataforGBI = genosForGBI[sortOrder, :]
genosForGBI
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;=6, figureSize=(800, 1800),
indFontSize= missingFractionAllowed,
missingFractionAllowed = true,
indColorLeftProvided = true); indColorRightProvided
The numbers in each group are [64 2 72 63] and the sum of those is 201
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now show a GBI plot like above, but with heterozygotes
= ["virLud",
clusterNamesWithHets "virLudHet",
"nit",
"virLud_trochObs",
"trochObs",
"trochObsHet",
"plumb",
"plumbHet",
"vir_plumb"]
= ["blue",
clusterColorsWithHets "blue",
"grey",
"green",
"yellow",
"orange",
"red",
"red",
"purple"]
= (PCAmodelAll.metadata.PC1 .< -4) .&
virLudHet 2.5 .< PCAmodelAll.metadata.PC2) .&
(
.!indSelection_lowIndHetStan= (-5 .< PCAmodelAll.metadata.PC1 .< -1.5) .&
virLud_trochObs -3 .< PCAmodelAll.metadata.PC2 .< 1) .&
(
.!indSelection_lowIndHetStan= (-2 .< PCAmodelAll.metadata.PC1 .< 2.5) .&
trochObsHet .< -3) .&
(PCAmodelAll.metadata.PC2
.!indSelection_lowIndHetStan= (5 .< PCAmodelAll.metadata.PC1) .&
plumbHet 2 .< PCAmodelAll.metadata.PC2) .&
(
.!indSelection_lowIndHetStan= (-1 .< PCAmodelAll.metadata.PC1 .< 2) .&
vir_plumb 3 .< PCAmodelAll.metadata.PC2 .< 5) .&
(
.!indSelection_lowIndHetStan
= [virLud virLudHet nit virLud_trochObs trochObs trochObsHet plumb plumbHet vir_plumb]
clusterArray
sum(clusterArray, dims=1)
if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
println("Good news: Individuals included in a group matches total number of individuals")
else
println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end
# check which individuals left out:
sum(clusterArray, dims=2)
vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembershipWithHets = fill(-9, nrow(PCAmodelAll.metadata))
plotOrderWithHets for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNamesWithHets[i]
clusterMembershipWithHets[clusterArray[:,i]] .= i
plotOrderWithHets[clusterArray[end
# Add column to main metadata object containing the cluster membership for this highHet region:
= "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
command eval(Meta.parse(command)) # this executes the command constructed above
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= clusterMembershipWithHets
PCAmodelAll.metadata.Fst_group = plotOrderWithHets
PCAmodelAll.metadata.plot_order
# limit the number of individuals per group to plot
= fill(15, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the west area (without nitidus)
= ["virLud",
clusterNamesWithHetsWest "virLudHet",
"virLud_trochObs",
"trochObs",
"trochObsHet"]
= ["blue",
clusterColorsWithHetsWest "blue",
"yellowgreen",
"yellow",
"yellow"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the east area
= ["trochObs",
clusterNamesWithHetsEast "trochObsHet",
"plumb",
"plumbHet"]
= ["yellow",
clusterColorsWithHetsEast "yellow",
"red",
"red"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsEast))
numIndsToPlotWithHetsEast
= limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the northern area
= ["virLud",
clusterNamesWithHetsNorth "virLudHet",
"vir_plumb",
"plumb",
"plumbHet"]
= ["blue",
clusterColorsWithHetsNorth "blue",
"purple",
"red",
"red"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsNorth))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Same for chr 18
# choose scaffold
= "gw18"
chr
positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)
# Add column to metadata containing the regionIndHetStan for this highHet region:
= "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
command eval(Meta.parse(command)) # this executes the command constructed above
= meanAcrossRegionIndHetStan
ind_with_metadata_included.regionIndHetStan
# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)
# PCA of all individuals:
= Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))
genos_highViSHetRegion_imputed
= true
flipPC1 = true
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodelAll
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelAll.PCAfig)
# Add PC values to metadata for individuals included in PCA above:
if flipPC1
= -1 .* PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 else
= PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 end
if flipPC2
= -1 .* PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 else
= PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 end
= PCAmodelAll.values[3,:]
PCAmodelAll.metadata.PC3
# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`
# choose inds with low IndHet in high ViSHet region:
= (meanAcrossRegionIndHetStan .< 1.55)
indSelection_lowIndHetStan
#Plot only the lowIndHetStan individuals:
= CairoMakie.Figure();
f = Axis(f[1, 1],
ax = "PC1 vs. PC2, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Save the individual colors in the metadata
= fill("", size(PCAmodelAll.metadata, 1))
indColors for i in axes(PCAmodelAll.metadata, 1)
= group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
indColors[i] end
= indColors
PCAmodelAll.metadata.indColorLeft = indColors; PCAmodelAll.metadata.indColorRight
Plot PC1 vs. PC2
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Plot PC1 vs. PC3
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC3",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC3", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
At chr 18 high ViSHet region, there are 5 clear haplogroups (vir and lud separated though on PC3, though not clearly enough to indicate as different in summary plot). Divide samples into those groups, based on PCA scores, and calculate pi and Dxy.
= ["virLud",
clusterNames "nit",
"troch",
"obs",
"plumb"]
= ["blue",
clusterColors "grey",
"yellow",
"orange",
"red"]
= (PCAmodelAll.metadata.PC1 .< -7) .&
virLud
indSelection_lowIndHetStan= (-6 .< PCAmodelAll.metadata.PC1 .< -4) .&
nit
indSelection_lowIndHetStan= (2 .< PCAmodelAll.metadata.PC1 .< 5) .&
troch .< -5) .&
(PCAmodelAll.metadata.PC2
indSelection_lowIndHetStan= (2 .< PCAmodelAll.metadata.PC1 .< 5) .&
obs -5 .< PCAmodelAll.metadata.PC2 .< -2) .&
(
indSelection_lowIndHetStan= (4 .< PCAmodelAll.metadata.PC1) .&
plumb 3 .< PCAmodelAll.metadata.PC2) .&
(
indSelection_lowIndHetStan
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]
= [virLud nit troch obs plumb]
clusterArray
# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembership = fill(-9, nrow(PCAmodelAll.metadata))
plotOrder for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNames[i]
clusterMembership[clusterArray[:,i]] .= i
plotOrder[clusterArray[end
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames; among=false) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
regionPiTable #= 5×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼─────────────────────
1 │ virLud 0.0110074
2 │ nit 0.00453689
3 │ troch 0.00973106
4 │ obs 0.0123218
5 │ plumb 0.00925472 =#
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 10×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼─────────────────────────
1 │ virLud_nit 0.0263493
2 │ virLud_troch 0.0361335
3 │ virLud_obs 0.0359267
4 │ virLud_plumb 0.0395363
5 │ nit_troch 0.0371472
6 │ nit_obs 0.0377076
7 │ nit_plumb 0.0400618
8 │ troch_obs 0.0169656
9 │ troch_plumb 0.0287838
10 │ obs_plumb 0.0290661 =#
# Make a genotype-by-individual plot using all variable loci in the region,
= 0.1
missingFractionAllowed # in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.original_Fst_groups = clusterMembership
PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.original_plot_order = plotOrder
PCAmodelAll.metadata.plot_order
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
genos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
pos_selectedSNPs = Fst[:, selectedSNPs]
Fst_selectedSNPs = freqs[:, selectedSNPs]
freqs_selectedSNPs
# limit the number of individuals per group to plot
= fill(15, length(clusterNames))
numIndsToPlot
= limitIndsToPlot(clusterNames, numIndsToPlot,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;= missingFractionAllowed,
missingFractionAllowed = true); indColorRightProvided
The numbers in each group are [72 2 76 4 70] and the sum of those is 224
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now show a GBI plot like above, but with heterozygotes
= ["virLud",
clusterNamesWithHets "nit",
"virLud_troch",
"troch",
"obs",
"obs_plumb",
"plumb",
"vir_plumb"]
= ["blue",
clusterColorsWithHets "grey",
"green",
"yellow",
"orange",
"darkorange1",
"red",
"purple"]
= (-5.5 .< PCAmodelAll.metadata.PC1 .< -0.5) .&
virLud_troch -4 .< PCAmodelAll.metadata.PC2 .< -0.25) .&
(
.!indSelection_lowIndHetStan= (4 .< PCAmodelAll.metadata.PC1 .< 5) .&
obs_plumb 0 .< PCAmodelAll.metadata.PC2 .< 2) .&
(
.!indSelection_lowIndHetStan= (-3 .< PCAmodelAll.metadata.PC1 .< -1) .&
vir_plumb 2.5 .< PCAmodelAll.metadata.PC2 .< 5) .&
(
.!indSelection_lowIndHetStan
= [virLud nit virLud_troch troch obs obs_plumb plumb vir_plumb]
clusterArray
sum(clusterArray, dims=1)
if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
println("Good news: Individuals included in a group matches total number of individuals")
else
println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end
# check which individuals left out:
sum(clusterArray, dims=2)
vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembershipWithHets = fill(-9, nrow(PCAmodelAll.metadata))
plotOrderWithHets for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNamesWithHets[i]
clusterMembershipWithHets[clusterArray[:,i]] .= i
plotOrderWithHets[clusterArray[end
# Add column to main metadata object containing the cluster membership for this highHet region:
= "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
command eval(Meta.parse(command)) # this executes the command constructed above
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= clusterMembershipWithHets
PCAmodelAll.metadata.Fst_group = plotOrderWithHets
PCAmodelAll.metadata.plot_order
# limit the number of individuals per group to plot
= fill(15, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show GBI plot according to original groups and plot order
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with all individuals
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
# Set no limit (or high limit anyway) on the number of individuals per group to plot
= fill(1000, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with only vir and plumb pops
= ["virLud", "plumb"] # these are the haplotype clusters to include in the choice below of SNPs to show
includeTheseClusters
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)
freqs_local, sampleSizes_local
= (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genosForGBI = pos_selectedSNPs[selectedSNPs, :]
posForGBI = freqs_local[:, selectedSNPs]
freqsForGBI
= ["vir", "plumb", "plumb_vir"] # these are the original Fst_groups
plotGroups = ["blue", "red", "purple"]
plotGroupColors
= copy(PCAmodelAll.metadata)
metadataForGBI
= metadataForGBI.original_Fst_groups
metadataForGBI.Fst_group
plotGenotypeByIndividual(regionInfo, posForGBI,
genosForGBI, metadataForGBI, freqsForGBI, plotGroups, plotGroupColors;= missingFractionAllowed) missingFractionAllowed
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
(Scene (768px, 960px): 0 Plots 2 Child Scenes: ├ Scene (768px, 960px) └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 0 0; … ; 2 2 … 0 2; 2 2 … 1 2], [6108876, 6293642, 6414325, 6456723, 6456819, 6477300, 6631300, 6812016, 6812064, 6838561 … 8668147, 8681264, 8750618, 8750639, 8750642, 8772183, 8773241, 8773281, 8784596, 8833826], 100×32 DataFrame Row │ ind ID location group ⋯ │ String String String7 String1 ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ GW_Armando_plate1_JF12G04 GW_Armando_plate1_JF12G04 ST_vi vir ⋯ 2 │ GW_Armando_plate2_JF03G01 GW_Armando_plate2_JF03G01 ST_vi vir_mis 3 │ GW_Armando_plate2_JF30G01 GW_Armando_plate2_JF30G01 ST_vi vir_mis 4 │ GW_Lane5_STvi1 GW_Lane5_STvi1 ST_vi vir 5 │ GW_Lane5_STvi2 GW_Lane5_STvi2 ST_vi vir ⋯ 6 │ GW_Lane5_STvi3 GW_Lane5_STvi3 ST_vi vir 7 │ GW_Armando_plate1_JF16G01 GW_Armando_plate1_JF16G01 DV_vi plumb_v 8 │ GW_Armando_plate2_JF16G02 GW_Armando_plate2_JF16G02 DV_vi plumb_v 9 │ GW_Armando_plate2_JE31G01 GW_Armando_plate2_JE31G01 VB_vi vir_mis ⋯ 10 │ GW_Armando_plate2_JF03G02 GW_Armando_plate2_JF03G02 VB_vi vir_mis 11 │ GW_Lane5_YK11 GW_Lane5_YK11 YK vir ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋱ 91 │ GW_Armando_plate2_JF24G01 GW_Armando_plate2_JF24G01 VB plumb 92 │ GW_Armando_plate2_JF25G01 GW_Armando_plate2_JF25G01 VB plumb ⋯ 93 │ GW_Armando_plate1_JG02G02 GW_Armando_plate1_JG02G02 PR plumb 94 │ GW_Armando_plate1_JG02G04 GW_Armando_plate1_JG02G04 PR plumb 95 │ GW_Armando_plate2_JG01G01 GW_Armando_plate2_JG01G01 PR plumb 96 │ GW_Armando_plate2_JG02G01 GW_Armando_plate2_JG02G01 PR plumb ⋯ 97 │ GW_Armando_plate2_JG02G03 GW_Armando_plate2_JG02G03 PR plumb 98 │ GW_Lane5_SL1 GW_Lane5_SL1 SL plumb 99 │ GW_Lane5_SL2 GW_Lane5_SL2 SL plumb 100 │ GW_Armando_plate1_JF10G03 GW_Armando_plate1_JF10G03 ST plumb_v ⋯ 29 columns and 79 rows omitted)
Show just the west area (without nitidus)
= ["virLud",
clusterNamesWithHetsWest "virLud_troch",
"troch"]
= ["blue",
clusterColorsWithHetsWest "green",
"yellow"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsWest))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the east area
= ["obs",
clusterNamesWithHetsEast "obs_plumb",
"plumb"]
= ["yellow",
clusterColorsWithHetsEast "darkorange1",
"red"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsEast))
numIndsToPlotWithHetsEast
= limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the northern area
= ["virLud",
clusterNamesWithHetsNorth "vir_plumb",
"plumb"]
= ["blue",
clusterColorsWithHetsNorth "purple",
"red"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsNorth))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Tried chr 12 but sort of a mess. The ludlowi samples fall in all clusters, even plumb! Would be good to look more at this one in the future.
Same for chr 13
# choose scaffold
= "gw13"
chr
positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)
# Add column to metadata containing the regionIndHetStan for this highHet region:
= "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
command eval(Meta.parse(command)) # this executes the command constructed above
= meanAcrossRegionIndHetStan
ind_with_metadata_included.regionIndHetStan
# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)
# PCA of all individuals:
= Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))
genos_highViSHetRegion_imputed
= true
flipPC1 = true
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodelAll
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelAll.PCAfig)
# Add PC values to metadata for individuals included in PCA above:
if flipPC1
= -1 .* PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 else
= PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 end
if flipPC2
= -1 .* PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 else
= PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 end
= PCAmodelAll.values[3,:]
PCAmodelAll.metadata.PC3
# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`
# choose inds with low IndHet in high ViSHet region:
= (meanAcrossRegionIndHetStan .< 1.75)
indSelection_lowIndHetStan
#Plot only the lowIndHetStan individuals:
= CairoMakie.Figure();
f = Axis(f[1, 1],
ax = "PC1 vs. PC2, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
More than 1 region on that scaffold. Using just the longest one.
Row | regionChrom | regionStart | regionEnd |
---|---|---|---|
String | Int64 | Int64 | |
1 | gw13 | 13574177 | 13722280 |
2 | gw13 | 14099239 | 15243036 |
3 | gw13 | 15413381 | 15607553 |
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Save the individual colors in the metadata
= fill("", size(PCAmodelAll.metadata, 1))
indColors for i in axes(PCAmodelAll.metadata, 1)
= group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
indColors[i] end
= indColors
PCAmodelAll.metadata.indColorLeft = indColors; PCAmodelAll.metadata.indColorRight
Plot PC1 vs. PC2
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Plot PC1 vs. PC3
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC3",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC3", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
At chr 13 high ViSHet region, there are 6 clear haplogroups (vir and lud separated cleanly on PC3, with one hetero between them). Divide samples into those groups, based on PCA scores, and calculate pi and Dxy.
= ["vir",
clusterNames "nit",
"lud",
"troch",
"obs",
"plumb"]
= ["blue",
clusterColors "grey",
"green",
"yellow",
"orange",
"red"]
= (PCAmodelAll.metadata.PC1 .< -4) .&
vir 2 .< PCAmodelAll.metadata.PC3) .&
(
indSelection_lowIndHetStan= (-4 .< PCAmodelAll.metadata.PC1 .< -1) .&
nit 2.5 .< PCAmodelAll.metadata.PC2 .< 3.5) .&
(
indSelection_lowIndHetStan= (PCAmodelAll.metadata.PC1 .< -5) .&
lud .< -2) .&
(PCAmodelAll.metadata.PC3
indSelection_lowIndHetStan= (-2 .< PCAmodelAll.metadata.PC1 .< 0) .&
troch .< -5.5) .&
(PCAmodelAll.metadata.PC2
indSelection_lowIndHetStan= (-1 .< PCAmodelAll.metadata.PC1 .< 3) .&
obs -5.5 .< PCAmodelAll.metadata.PC2 .< -2.5) .&
(
indSelection_lowIndHetStan= (6 .< PCAmodelAll.metadata.PC1) .&
plumb 1 .< PCAmodelAll.metadata.PC2) .&
(
indSelection_lowIndHetStan
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[vir]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[lud]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]
= [vir nit lud troch obs plumb]
clusterArray
# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembership = fill(-9, nrow(PCAmodelAll.metadata))
plotOrder for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNames[i]
clusterMembership[clusterArray[:,i]] .= i
plotOrder[clusterArray[end
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames; among=false) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
regionPiTable #= 6×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼─────────────────────
1 │ vir 0.00875059
2 │ nit 0.00517962
3 │ lud 0.00819617
4 │ troch 0.00565913
5 │ obs 0.0090813
6 │ plumb 0.00929977 =#
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 15×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼─────────────────────────
1 │ vir_nit 0.035675
2 │ vir_lud 0.0188542
3 │ vir_troch 0.0297034
4 │ vir_obs 0.028434
5 │ vir_plumb 0.0382774
6 │ nit_lud 0.0377189
7 │ nit_troch 0.0437711
8 │ nit_obs 0.0424561
9 │ nit_plumb 0.0482994
10 │ lud_troch 0.0303352
11 │ lud_obs 0.0294719
12 │ lud_plumb 0.0394332
13 │ troch_obs 0.0124742
14 │ troch_plumb 0.0313941
15 │ obs_plumb 0.0300717 =#
# Make a genotype-by-individual plot using all variable loci in the region,
= 0.1
missingFractionAllowed # in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.original_Fst_groups = clusterMembership
PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.original_plot_order = plotOrder
PCAmodelAll.metadata.plot_order
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
genos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
pos_selectedSNPs = Fst[:, selectedSNPs]
Fst_selectedSNPs = freqs[:, selectedSNPs]
freqs_selectedSNPs
# limit the number of individuals per group to plot
= fill(15, length(clusterNames))
numIndsToPlot
= limitIndsToPlot(clusterNames, numIndsToPlot,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;= missingFractionAllowed,
missingFractionAllowed = true); indColorRightProvided
The numbers in each group are [38 2 41 67 5 68] and the sum of those is 221
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show a GBI plot like above, but with heterozygotes
= ["vir",
clusterNamesWithHets "vir_lud",
"nit",
"lud",
"lud_troch",
"troch",
"obs",
"plumb",
"plumbHet",
"vir_plumb"]
= ["blue",
clusterColorsWithHets "seagreen",
"grey",
"green",
"green2",
"yellow",
"orange",
"red",
"red",
"purple"]
= (PCAmodelAll.metadata.PC1 .< -5) .&
vir_lud -1 .< PCAmodelAll.metadata.PC3 .< 1)
(= (-5 .< PCAmodelAll.metadata.PC1 .< -2) .&
lud_troch -3.5 .< PCAmodelAll.metadata.PC2 .< 0) .&
(
.!indSelection_lowIndHetStan= (7 .< PCAmodelAll.metadata.PC1) .&
plumbHet 1 .< PCAmodelAll.metadata.PC2) .&
(
.!indSelection_lowIndHetStan= (1 .< PCAmodelAll.metadata.PC1 .< 4) .&
vir_plumb 2 .< PCAmodelAll.metadata.PC2 .< 5) .&
(
.!indSelection_lowIndHetStan
= [vir vir_lud nit lud lud_troch troch obs plumb plumbHet vir_plumb]
clusterArray
sum(clusterArray, dims=1)
if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
println("Good news: Individuals included in a group matches total number of individuals")
else
println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end
# check which individuals left out:
sum(clusterArray, dims=2)
vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembershipWithHets = fill(-9, nrow(PCAmodelAll.metadata))
plotOrderWithHets for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNamesWithHets[i]
clusterMembershipWithHets[clusterArray[:,i]] .= i
plotOrderWithHets[clusterArray[end
# Add column to main metadata object containing the cluster membership for this highHet region:
= "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
command eval(Meta.parse(command)) # this executes the command constructed above
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= clusterMembershipWithHets
PCAmodelAll.metadata.Fst_group = plotOrderWithHets
PCAmodelAll.metadata.plot_order
# limit the number of individuals per group to plot
= fill(15, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show GBI plot according to original groups and plot order
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with all individuals
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
# Set no limit (or high limit anyway) on the number of individuals per group to plot
= fill(1000, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with only vir and plumb pops
= ["vir", "plumb"] # these are the haplotype clusters to include in the choice below of SNPs to show
includeTheseClusters
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)
freqs_local, sampleSizes_local
= (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genosForGBI = pos_selectedSNPs[selectedSNPs, :]
posForGBI = freqs_local[:, selectedSNPs]
freqsForGBI
= ["vir", "plumb", "plumb_vir"] # these are the original Fst_groups
plotGroups = ["blue", "red", "purple"]
plotGroupColors
= copy(PCAmodelAll.metadata)
metadataForGBI
= metadataForGBI.original_Fst_groups
metadataForGBI.Fst_group
plotGenotypeByIndividual(regionInfo, posForGBI,
genosForGBI, metadataForGBI, freqsForGBI, plotGroups, plotGroupColors;= missingFractionAllowed) missingFractionAllowed
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
(Scene (768px, 960px): 0 Plots 2 Child Scenes: ├ Scene (768px, 960px) └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 0 0; … ; 2 2 … 2 2; 2 2 … 2 2], [14109196, 14121642, 14127880, 14152299, 14191715, 14212308, 14234576, 14246931, 14261718, 14282920 … 15059893, 15061446, 15061449, 15084470, 15089863, 15108049, 15127266, 15134320, 15177202, 15177253], 100×34 DataFrame Row │ ind ID location group ⋯ │ String String String7 String1 ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ GW_Armando_plate1_JF12G04 GW_Armando_plate1_JF12G04 ST_vi vir ⋯ 2 │ GW_Armando_plate2_JF03G01 GW_Armando_plate2_JF03G01 ST_vi vir_mis 3 │ GW_Armando_plate2_JF30G01 GW_Armando_plate2_JF30G01 ST_vi vir_mis 4 │ GW_Lane5_STvi1 GW_Lane5_STvi1 ST_vi vir 5 │ GW_Lane5_STvi2 GW_Lane5_STvi2 ST_vi vir ⋯ 6 │ GW_Lane5_STvi3 GW_Lane5_STvi3 ST_vi vir 7 │ GW_Armando_plate1_JF16G01 GW_Armando_plate1_JF16G01 DV_vi plumb_v 8 │ GW_Armando_plate2_JF16G02 GW_Armando_plate2_JF16G02 DV_vi plumb_v 9 │ GW_Armando_plate2_JE31G01 GW_Armando_plate2_JE31G01 VB_vi vir_mis ⋯ 10 │ GW_Armando_plate2_JF03G02 GW_Armando_plate2_JF03G02 VB_vi vir_mis 11 │ GW_Lane5_YK11 GW_Lane5_YK11 YK vir ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋱ 91 │ GW_Armando_plate2_JF24G01 GW_Armando_plate2_JF24G01 VB plumb 92 │ GW_Armando_plate2_JF25G01 GW_Armando_plate2_JF25G01 VB plumb ⋯ 93 │ GW_Armando_plate1_JG02G02 GW_Armando_plate1_JG02G02 PR plumb 94 │ GW_Armando_plate1_JG02G04 GW_Armando_plate1_JG02G04 PR plumb 95 │ GW_Armando_plate2_JG01G01 GW_Armando_plate2_JG01G01 PR plumb 96 │ GW_Armando_plate2_JG02G01 GW_Armando_plate2_JG02G01 PR plumb ⋯ 97 │ GW_Armando_plate2_JG02G03 GW_Armando_plate2_JG02G03 PR plumb 98 │ GW_Lane5_SL1 GW_Lane5_SL1 SL plumb 99 │ GW_Lane5_SL2 GW_Lane5_SL2 SL plumb 100 │ GW_Armando_plate1_JF10G03 GW_Armando_plate1_JF10G03 ST plumb_v ⋯ 31 columns and 79 rows omitted)
Show just the west area (without nitidus)
= ["vir",
clusterNamesWithHetsWest "vir_lud",
"lud",
"lud_troch",
"troch"]
= ["blue",
clusterColorsWithHetsWest "seagreen",
"green",
"green2",
"yellow"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsWest))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the east area
= ["obs",
clusterNamesWithHetsEast "plumb",
"plumbHet"]
= ["orange",
clusterColorsWithHetsEast "red",
"red"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsEast))
numIndsToPlotWithHetsEast
= limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the northern area
= ["vir",
clusterNamesWithHetsNorth "vir_plumb",
"plumb",
"plumbHet"]
= ["blue",
clusterColorsWithHetsNorth "purple",
"red",
"red"]
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsNorth))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Do a PCA based on a same-size region elsewhere on gw13 (with low ViSHet):
# get length of region
= positionMax - positionMin
lengthHighViSHetRegion
= 1_000_000 # start at 1 Mb from left side
leftLocus = leftLocus + lengthHighViSHetRegion
rightLocus = string("chr ", chr, " ",leftLocus," to ",rightLocus)
regionText_lowViSHetRegion
= (leftLocus .<= pos_region.position .<= rightLocus)
lociSelection = genotypes_region[:, lociSelection]
genotypes_lowViSHetRegion
# impute missing genotypes:
= Impute.svd(Matrix{Union{Missing, Float32}}(genotypes_lowViSHetRegion))
genotypes_lowViSHetRegion_imputed
= true
flipPC1 = true
flipPC2
= plotPCA(genotypes_lowViSHetRegion_imputed, ind_with_metadata_included,
PCAmodel
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText_lowViSHetRegion,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodel.PCAfig)
if false # set to true to save plot
save("FigureS2B_gw13_nonHLBRarbitrary_from_Julia.png", PCAmodel.PCAfig, px_per_unit = 2.0)
end
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Tried chr 14 but not very clear. Looks like recombination backcrosses in ludlowi.
Same for chr 17
Shows a pattern seen in chr 2 (and others) where ludlowi samples have some plumb haplotypes.
# choose scaffold
= "gw17"
chr
positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)
# Add column to metadata containing the regionIndHetStan for this highHet region:
= "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
command eval(Meta.parse(command)) # this executes the command constructed above
= meanAcrossRegionIndHetStan
ind_with_metadata_included.regionIndHetStan
# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)
# PCA of all individuals:
= Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))
genos_highViSHetRegion_imputed
= false
flipPC1 = false
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodelAll
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelAll.PCAfig)
# Add PC values to metadata for individuals included in PCA above:
if flipPC1
= -1 .* PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 else
= PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 end
if flipPC2
= -1 .* PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 else
= PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 end
= PCAmodelAll.values[3,:]
PCAmodelAll.metadata.PC3
# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`
# choose inds with low IndHet in high ViSHet region:
= (meanAcrossRegionIndHetStan .< 1.4)
indSelection_lowIndHetStan
#Plot only the lowIndHetStan individuals:
= CairoMakie.Figure();
f = Axis(f[1, 1],
ax = "PC1 vs. PC2, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Save the individual colors in the metadata
= fill("", size(PCAmodelAll.metadata, 1))
indColors for i in axes(PCAmodelAll.metadata, 1)
= group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
indColors[i] end
= indColors
PCAmodelAll.metadata.indColorLeft = indColors; PCAmodelAll.metadata.indColorRight
Plot PC1 vs. PC2
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Plot PC1 vs. PC3
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC3",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC3", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
= ["virLud",
clusterNames "nit",
"troch",
"obs",
"plumb"]
= ["blue",
clusterColors "grey",
"yellow",
"orange",
"red"]
= (PCAmodelAll.metadata.PC1 .< -6.5) .&
virLud 0.5 .< PCAmodelAll.metadata.PC2 .< 4) .&
(
indSelection_lowIndHetStan= (-6 .< PCAmodelAll.metadata.PC1 .< -4.5) .&
nit 0.5 .< PCAmodelAll.metadata.PC2 .< 1.5) .&
(
indSelection_lowIndHetStan= (2 .< PCAmodelAll.metadata.PC1 .< 4.5) .&
troch .< -4) .&
(PCAmodelAll.metadata.PC2
indSelection_lowIndHetStan= (1.5 .< PCAmodelAll.metadata.PC1 .< 5) .&
obs -3.5 .< PCAmodelAll.metadata.PC2 .< 2.5) .&
(
indSelection_lowIndHetStan= (3.5 .< PCAmodelAll.metadata.PC1) .&
plumb 3 .< PCAmodelAll.metadata.PC2) .&
(
indSelection_lowIndHetStan
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]
= [virLud nit troch obs plumb]
clusterArray
# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembership = fill(-9, nrow(PCAmodelAll.metadata))
plotOrder for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNames[i]
clusterMembership[clusterArray[:,i]] .= i
plotOrder[clusterArray[end
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames; among=false) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
regionPiTable #= 5×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼─────────────────────
1 │ virLud 0.0116354
2 │ nit 0.0010142
3 │ troch 0.00706002
4 │ obs 0.0162496
5 │ plumb 0.00402182 =#
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 10×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼─────────────────────────
1 │ virLud_nit 0.021495
2 │ virLud_troch 0.0329751
3 │ virLud_obs 0.0328026
4 │ virLud_plumb 0.0354124
5 │ nit_troch 0.0339931
6 │ nit_obs 0.0337221
7 │ nit_plumb 0.036556
8 │ troch_obs 0.0201926
9 │ troch_plumb 0.0240156
10 │ obs_plumb 0.0195389 =#
# Make a genotype-by-individual plot using all variable loci in the region,
= 0.1
missingFractionAllowed # in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.original_Fst_groups = clusterMembership
PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.original_plot_order = plotOrder
PCAmodelAll.metadata.plot_order
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
genos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
pos_selectedSNPs = Fst[:, selectedSNPs]
Fst_selectedSNPs = freqs[:, selectedSNPs]
freqs_selectedSNPs
# limit the number of individuals per group to plot
= fill(15, length(clusterNames))
numIndsToPlot
= limitIndsToPlot(clusterNames, numIndsToPlot,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;= missingFractionAllowed,
missingFractionAllowed = true); indColorRightProvided
The numbers in each group are [73 2 62 3 64] and the sum of those is 204
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now show a GBI plot like above, but with heterozygotes:
= ["virLud",
clusterNamesWithHets "nit",
"virLud_troch",
"troch",
"virLud_obs",
"obs",
"troch_plumb",
"plumb",
"vir_plumb"]
= ["blue",
clusterColorsWithHets "grey",
"green",
"yellow",
"olive",
"orange",
"coral",
"red",
"purple"]
= (-5 .< PCAmodelAll.metadata.PC1 .< 0) .&
virLud_troch -4 .< PCAmodelAll.metadata.PC2 .< -0.7) .&
(
.!indSelection_lowIndHetStan= (-3 .< PCAmodelAll.metadata.PC1 .< -1) .&
virLud_obs -0.5 .< PCAmodelAll.metadata.PC2 .< 0) .&
(
.!indSelection_lowIndHetStan= (1.5 .< PCAmodelAll.metadata.PC1 .< 5.5) .&
troch_plumb -3 .< PCAmodelAll.metadata.PC2 .< 2) .&
(
.!indSelection_lowIndHetStan= (-2.5 .< PCAmodelAll.metadata.PC1 .< 1) .&
vir_plumb 2 .< PCAmodelAll.metadata.PC2 .< 5) .&
(
.!indSelection_lowIndHetStan
= [virLud nit virLud_troch troch virLud_obs obs troch_plumb plumb vir_plumb]
clusterArray
sum(clusterArray, dims=1)
if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
println("Good news: Individuals included in a group matches total number of individuals")
else
println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end
# check which individuals left out:
sum(clusterArray, dims=2)
vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembershipWithHets = fill(-9, nrow(PCAmodelAll.metadata))
plotOrderWithHets for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNamesWithHets[i]
clusterMembershipWithHets[clusterArray[:,i]] .= i
plotOrderWithHets[clusterArray[end
# Add column to main metadata object containing the cluster membership for this highHet region:
= "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
command eval(Meta.parse(command)) # this executes the command constructed above
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= clusterMembershipWithHets
PCAmodelAll.metadata.Fst_group = plotOrderWithHets
PCAmodelAll.metadata.plot_order
# limit the number of individuals per group to plot
= fill(100, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Chr 17 above shows interesting pattern with plumb haplotype more widespread—found in ludlowi, and the obscuratus pattern is complex with recombination. I checked chr 17 carefully in the summary plot, and it looks good.
Same for chr 19
# choose scaffold
= "gw19"
chr
positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)
# Add column to metadata containing the regionIndHetStan for this highHet region:
= "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
command eval(Meta.parse(command)) # this executes the command constructed above
= meanAcrossRegionIndHetStan
ind_with_metadata_included.regionIndHetStan
# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)
# PCA of all individuals:
= Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))
genos_highViSHetRegion_imputed
= true
flipPC1 = true
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodelAll
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelAll.PCAfig)
# Add PC values to metadata for individuals included in PCA above:
if flipPC1
= -1 .* PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 else
= PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 end
if flipPC2
= -1 .* PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 else
= PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 end
= PCAmodelAll.values[3,:]
PCAmodelAll.metadata.PC3
# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`
# choose inds with low IndHet in high ViSHet region:
= (meanAcrossRegionIndHetStan .< 1.5)
indSelection_lowIndHetStan
#Plot only the lowIndHetStan individuals:
= CairoMakie.Figure();
f = Axis(f[1, 1],
ax = "PC1 vs. PC2, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Save the individual colors in the metadata
= fill("", size(PCAmodelAll.metadata, 1))
indColors for i in axes(PCAmodelAll.metadata, 1)
= group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
indColors[i] end
= indColors
PCAmodelAll.metadata.indColorLeft = indColors; PCAmodelAll.metadata.indColorRight
Plot PC1 vs. PC2
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Plot PC1 vs. PC3
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC3",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC3", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
At chr 19 high ViSHet region, there are 4 clear homozygous haplogroups (vir and lud separated though on PC3, but not clearly enough to show in summary plot). Divide samples into those groups, based on PCA scores, and calculate pi and Dxy.
= ["virLud",
clusterNames "nit",
"trochObs",
"plumb"]
= ["blue",
clusterColors "grey",
"yellow",
"red"]
= (PCAmodelAll.metadata.PC1 .< -4) .&
virLud 2 .< PCAmodelAll.metadata.PC2) .&
(
indSelection_lowIndHetStan= (-4 .< PCAmodelAll.metadata.PC1 .< -2) .&
nit 1 .< PCAmodelAll.metadata.PC2 .< 2.5) .&
(
indSelection_lowIndHetStan= (-1.5 .< PCAmodelAll.metadata.PC1 .< 1.5) .&
trochObs .< -3.5) .&
(PCAmodelAll.metadata.PC2
indSelection_lowIndHetStan= (5 .< PCAmodelAll.metadata.PC1) .&
plumb 1.5 .< PCAmodelAll.metadata.PC2) .&
(
indSelection_lowIndHetStan
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[trochObs]
PCAmodelAll.metadata.Fst_group[plumb]
= [virLud nit trochObs plumb]
clusterArray
# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembership = fill(-9, nrow(PCAmodelAll.metadata))
plotOrder for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNames[i]
clusterMembership[clusterArray[:,i]] .= i
plotOrder[clusterArray[end
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames; among=false) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
regionPiTable #= 4×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼──────────────────────
1 │ virLud 0.0144925
2 │ nit 0.0052608
3 │ trochObs 0.0150341
4 │ plumb 0.00320386 =#
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 6×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼────────────────────────────
1 │ virLud_nit 0.0291485
2 │ virLud_trochObs 0.0330435
3 │ virLud_plumb 0.0347335
4 │ nit_trochObs 0.0359384
5 │ nit_plumb 0.0373399
6 │ trochObs_plumb 0.0289202 =#
# Make a genotype-by-individual plot using all variable loci in the region,
= 0.1
missingFractionAllowed # in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.original_Fst_groups = clusterMembership
PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.original_plot_order = plotOrder
PCAmodelAll.metadata.plot_order
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
genos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
pos_selectedSNPs = Fst[:, selectedSNPs]
Fst_selectedSNPs = freqs[:, selectedSNPs]
freqs_selectedSNPs
# limit the number of individuals per group to plot
= fill(15, length(clusterNames))
numIndsToPlot
= limitIndsToPlot(clusterNames, numIndsToPlot,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;= missingFractionAllowed,
missingFractionAllowed = true); indColorRightProvided
The numbers in each group are [70 2 67 66] and the sum of those is 205
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now show a GBI plot like above, but with heterozygotes
= ["virLud",
clusterNamesWithHets "virLudHet",
"nit",
"virLud_trochObs",
"trochObs",
"trochObsHet",
"trochObs_plumb",
"plumb",
"vir_plumb"]
= ["blue",
clusterColorsWithHets "blue",
"grey",
"green",
"yellow",
"yellow",
"orange",
"red",
"purple"]
= (-8 .< PCAmodelAll.metadata.PC1 .< -4) .&
virLudHet 1.5 .< PCAmodelAll.metadata.PC2 .< 5) .&
(
.!indSelection_lowIndHetStan= (-4 .< PCAmodelAll.metadata.PC1 .< -1) .&
virLud_trochObs -3.5 .< PCAmodelAll.metadata.PC2 .< 0.5) .&
(
.!indSelection_lowIndHetStan= (-1.5 .< PCAmodelAll.metadata.PC1 .< 1.5) .&
trochObsHet .< -3.5) .&
(PCAmodelAll.metadata.PC2
.!indSelection_lowIndHetStan= (1.5 .< PCAmodelAll.metadata.PC1 .< 5) .&
trochObs_plumb -3.5 .< PCAmodelAll.metadata.PC2 .< 0) .&
(
.!indSelection_lowIndHetStan= (-1 .< PCAmodelAll.metadata.PC1 .< 1.5) .&
vir_plumb 2 .< PCAmodelAll.metadata.PC2 .< 5) .&
(
.!indSelection_lowIndHetStan
= [virLud virLudHet nit virLud_trochObs trochObs trochObsHet trochObs_plumb plumb vir_plumb]
clusterArray
sum(clusterArray, dims=1)
if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
println("Good news: Individuals included in a group matches total number of individuals")
else
println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end
# check which individuals left out:
sum(clusterArray, dims=2)
vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembershipWithHets = fill(-9, nrow(PCAmodelAll.metadata))
plotOrderWithHets for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNamesWithHets[i]
clusterMembershipWithHets[clusterArray[:,i]] .= i
plotOrderWithHets[clusterArray[end
# Add column to main metadata object containing the cluster membership for this highHet region:
= "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
command eval(Meta.parse(command)) # this executes the command constructed above
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= clusterMembershipWithHets
PCAmodelAll.metadata.Fst_group = plotOrderWithHets
PCAmodelAll.metadata.plot_order
# limit the number of individuals per group to plot
= fill(15, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show GBI plot according to original groups and plot order
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with all individuals
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
# Set no limit (or high limit anyway) on the number of individuals per group to plot
= fill(1000, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= false)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with only vir and plumb pops
= ["virLud", "plumb"] # these are the haplotype clusters to include in the choice below of SNPs to show
includeTheseClusters
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)
freqs_local, sampleSizes_local
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genosForGBI = pos_selectedSNPs[selectedSNPs, :]
posForGBI = freqs_local[:, selectedSNPs]
freqsForGBI
= ["vir", "plumb", "plumb_vir"] # these are the original Fst_groups
plotGroups = ["blue", "red", "purple"]
plotGroupColors
= copy(PCAmodelAll.metadata)
metadataForGBI
= metadataForGBI.original_Fst_groups
metadataForGBI.Fst_group
plotGenotypeByIndividual(regionInfo, posForGBI,
genosForGBI, metadataForGBI, freqsForGBI, plotGroups, plotGroupColors;= missingFractionAllowed) missingFractionAllowed
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
(Scene (768px, 960px): 0 Plots 2 Child Scenes: ├ Scene (768px, 960px) └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 0 0; … ; 2 2 … 2 2; 0 1 … 1 2], [63088, 70570, 71423, 78271, 89983, 153219, 199863, 252463, 296231, 423565 … 882012, 900073, 925752, 925928, 939829, 945566, 952095, 975932, 976062, 984894], 100×38 DataFrame Row │ ind ID location group ⋯ │ String String String7 String1 ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ GW_Armando_plate1_JF12G04 GW_Armando_plate1_JF12G04 ST_vi vir ⋯ 2 │ GW_Armando_plate2_JF03G01 GW_Armando_plate2_JF03G01 ST_vi vir_mis 3 │ GW_Armando_plate2_JF30G01 GW_Armando_plate2_JF30G01 ST_vi vir_mis 4 │ GW_Lane5_STvi1 GW_Lane5_STvi1 ST_vi vir 5 │ GW_Lane5_STvi2 GW_Lane5_STvi2 ST_vi vir ⋯ 6 │ GW_Lane5_STvi3 GW_Lane5_STvi3 ST_vi vir 7 │ GW_Armando_plate1_JF16G01 GW_Armando_plate1_JF16G01 DV_vi plumb_v 8 │ GW_Armando_plate2_JF16G02 GW_Armando_plate2_JF16G02 DV_vi plumb_v 9 │ GW_Armando_plate2_JE31G01 GW_Armando_plate2_JE31G01 VB_vi vir_mis ⋯ 10 │ GW_Armando_plate2_JF03G02 GW_Armando_plate2_JF03G02 VB_vi vir_mis 11 │ GW_Lane5_YK11 GW_Lane5_YK11 YK vir ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋱ 91 │ GW_Armando_plate2_JF24G01 GW_Armando_plate2_JF24G01 VB plumb 92 │ GW_Armando_plate2_JF25G01 GW_Armando_plate2_JF25G01 VB plumb ⋯ 93 │ GW_Armando_plate1_JG02G02 GW_Armando_plate1_JG02G02 PR plumb 94 │ GW_Armando_plate1_JG02G04 GW_Armando_plate1_JG02G04 PR plumb 95 │ GW_Armando_plate2_JG01G01 GW_Armando_plate2_JG01G01 PR plumb 96 │ GW_Armando_plate2_JG02G01 GW_Armando_plate2_JG02G01 PR plumb ⋯ 97 │ GW_Armando_plate2_JG02G03 GW_Armando_plate2_JG02G03 PR plumb 98 │ GW_Lane5_SL1 GW_Lane5_SL1 SL plumb 99 │ GW_Lane5_SL2 GW_Lane5_SL2 SL plumb 100 │ GW_Armando_plate1_JF10G03 GW_Armando_plate1_JF10G03 ST plumb_v ⋯ 35 columns and 79 rows omitted)
Show same but with only vir lud troch pops
= ["virLud", "trochObs"] # these are the haplotype clusters to include in the choice below of SNPs to show
includeTheseClusters
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)
freqs_local, sampleSizes_local
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genosForGBI = pos_selectedSNPs[selectedSNPs, :]
posForGBI = freqs_local[:, selectedSNPs]
freqsForGBI
= copy(PCAmodelAll.metadata)
metadataForGBI = metadataForGBI.original_Fst_groups
metadataForGBI.Fst_group
= ["vir", "vir_S", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML", "troch_west", "troch_LN"]
plotGroups = ["blue","turquoise1", "seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow"]
plotGroupColors
# Set no limit (or high limit anyway) on the number of individuals per group to plot
= fill(10, length(plotGroups))
numIndsToPlotWithHets
= limitIndsToPlot(plotGroups,
genosForGBI_limited, indMetadataforGBI_limited
numIndsToPlotWithHets,
genosForGBI, metadataForGBI;= false)
sortByMissing
plotGenotypeByIndividual(regionInfo, posForGBI,
genosForGBI_limited, indMetadataforGBI_limited, freqsForGBI, plotGroups, plotGroupColors;= missingFractionAllowed) missingFractionAllowed
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
(Scene (768px, 960px): 0 Plots 2 Child Scenes: ├ Scene (768px, 960px) └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 0 0; … ; 2 2 … 2 2; 2 2 … 2 2], [63088, 153219, 167981, 204946, 252487, 296231, 367641, 423565, 431630, 443933 … 678802, 684399, 741883, 767533, 773736, 792896, 900073, 925762, 939829, 952095], 74×38 DataFrame Row │ ind ID location group ⋯ │ String String String7 String1 ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ GW_Armando_plate1_JF12G04 GW_Armando_plate1_JF12G04 ST_vi vir ⋯ 2 │ GW_Armando_plate2_JF03G01 GW_Armando_plate2_JF03G01 ST_vi vir_mis 3 │ GW_Armando_plate2_JF30G01 GW_Armando_plate2_JF30G01 ST_vi vir_mis 4 │ GW_Armando_plate1_JF16G01 GW_Armando_plate1_JF16G01 DV_vi plumb_v 5 │ GW_Armando_plate2_JF16G02 GW_Armando_plate2_JF16G02 DV_vi plumb_v ⋯ 6 │ GW_Armando_plate2_JE31G01 GW_Armando_plate2_JE31G01 VB_vi vir_mis 7 │ GW_Armando_plate2_JF03G02 GW_Armando_plate2_JF03G02 VB_vi vir_mis 8 │ GW_Armando_plate1_AB1 GW_Armando_plate1_AB1 AB vir 9 │ GW_Lane5_AB2 GW_Lane5_AB2 AB vir ⋯ 10 │ GW_Armando_plate1_TL3 GW_Armando_plate1_TL3 TL vir 11 │ GW_Lane5_AA1 GW_Lane5_AA1 AA vir_S ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋱ 65 │ GW_Armando_plate2_LN2 GW_Armando_plate2_LN2 LN troch_L 66 │ GW_Lane5_LN1 GW_Lane5_LN1 LN troch_L ⋯ 67 │ GW_Lane5_LN10 GW_Lane5_LN10 LN troch_L 68 │ GW_Lane5_LN12 GW_Lane5_LN12 LN troch_L 69 │ GW_Lane5_LN14 GW_Lane5_LN14 LN troch_L 70 │ GW_Lane5_LN16 GW_Lane5_LN16 LN troch_L ⋯ 71 │ GW_Lane5_LN18 GW_Lane5_LN18 LN troch_L 72 │ GW_Lane5_LN19 GW_Lane5_LN19 LN troch_L 73 │ GW_Lane5_LN20 GW_Lane5_LN20 LN troch_L 74 │ GW_Lane5_LN3 GW_Lane5_LN3 LN troch_L ⋯ 35 columns and 53 rows omitted)
Show same but with only troch obs plumb pops
= ["trochObs", "plumb"] # these are the haplotype clusters to include in the choice below of SNPs to show
includeTheseClusters
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)
freqs_local, sampleSizes_local
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genosForGBI = pos_selectedSNPs[selectedSNPs, :]
posForGBI = freqs_local[:, selectedSNPs]
freqsForGBI
= copy(PCAmodelAll.metadata)
metadataForGBI = metadataForGBI.original_Fst_groups
metadataForGBI.Fst_group
# remove individuals that have vir haplotypes, as this could otherwise be mistaken for introgression from obscuratus:
= ["GW_Armando_plate1_JF24G02", # gw19 hetero from plumb
removeTheseInds "GW_Armando_plate1_JF07G03", # gw19 hetero from plumb
"GW_Armando_plate1_JF12G02", # gw19 hetero from plumb
"GW_Armando_plate1_JF09G01"] # gw28 is hetero from plumb
= map(in(removeTheseInds), metadataForGBI.ind)
selection = metadataForGBI[.!selection, :]
metadataForGBI = genosForGBI[.!selection, :]
genosForGBI
= ["troch_LN","troch_EM","obs","plumb_BJ","plumb"]
plotGroups = ["yellow","gold","orange","pink","red"]
plotGroupColors
# Set limit on the number of individuals per group to plot
= fill(15, length(plotGroups))
numIndsToPlotWithHets
# metadataForGBI[metadataForGBI.Fst_group .== "plumb", :]
= limitIndsToPlot(plotGroups,
genosForGBI_limited, indMetadataforGBI_limited
numIndsToPlotWithHets,
genosForGBI, metadataForGBI;= false)
sortByMissing
# indMetadataforGBI_limited[indMetadataforGBI_limited.Fst_group .== "plumb", :]
plotGenotypeByIndividual(regionInfo, posForGBI,
genosForGBI_limited, indMetadataforGBI_limited, freqsForGBI, plotGroups, plotGroupColors;= missingFractionAllowed) missingFractionAllowed
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
(Scene (768px, 960px): 0 Plots 2 Child Scenes: ├ Scene (768px, 960px) └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 0 0; … ; 2 2 … 2 2; 2 2 … 2 2], [70570, 71423, 78271, 89983, 167981, 199863, 204946, 252463, 252487, 367641 … 792896, 867410, 881923, 882012, 925752, 925762, 925928, 945566, 976062, 984894], 37×38 DataFrame Row │ ind ID location group ⋯ │ String String String7 String1 ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ GW_Armando_plate2_LN2 GW_Armando_plate2_LN2 LN troch_L ⋯ 2 │ GW_Lane5_LN1 GW_Lane5_LN1 LN troch_L 3 │ GW_Lane5_LN10 GW_Lane5_LN10 LN troch_L 4 │ GW_Lane5_LN12 GW_Lane5_LN12 LN troch_L 5 │ GW_Lane5_LN14 GW_Lane5_LN14 LN troch_L ⋯ 6 │ GW_Lane5_LN16 GW_Lane5_LN16 LN troch_L 7 │ GW_Lane5_LN18 GW_Lane5_LN18 LN troch_L 8 │ GW_Lane5_LN19 GW_Lane5_LN19 LN troch_L 9 │ GW_Lane5_LN20 GW_Lane5_LN20 LN troch_L ⋯ 10 │ GW_Lane5_LN3 GW_Lane5_LN3 LN troch_L 11 │ GW_Lane5_LN4 GW_Lane5_LN4 LN troch_L ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋱ 28 │ GW_Armando_plate1_JF09G02 GW_Armando_plate1_JF09G02 ST plumb 29 │ GW_Armando_plate1_JF11G01 GW_Armando_plate1_JF11G01 ST plumb ⋯ 30 │ GW_Armando_plate1_JF12G01 GW_Armando_plate1_JF12G01 ST plumb 31 │ GW_Armando_plate1_JF13G01 GW_Armando_plate1_JF13G01 ST plumb 32 │ GW_Armando_plate1_JF26G01 GW_Armando_plate1_JF26G01 ST plumb 33 │ GW_Armando_plate1_JF27G01 GW_Armando_plate1_JF27G01 ST plumb ⋯ 34 │ GW_Armando_plate1_JF29G01 GW_Armando_plate1_JF29G01 ST plumb 35 │ GW_Armando_plate1_JF15G03 GW_Armando_plate1_JF15G03 DV plumb 36 │ GW_Armando_plate1_JF23G01 GW_Armando_plate1_JF23G01 VB plumb 37 │ GW_Armando_plate1_JF23G02 GW_Armando_plate1_JF23G02 VB plumb ⋯ 35 columns and 16 rows omitted)
Show just the west area (without nitidus)
= ["virLud",
clusterNamesWithHetsWest "virLudHet",
"virLud_trochObs",
"trochObs",
"trochObsHet"]
= ["blue",
clusterColorsWithHetsWest "blue",
"green",
"yellow",
"yellow"]
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsWest))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the east area
= ["trochObs",
clusterNamesWithHetsEast "trochObsHet",
"trochObs_plumb",
"plumb"]
= ["yellow",
clusterColorsWithHetsEast "yellow",
"orange",
"red"]
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsEast))
numIndsToPlotWithHetsEast
= limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the northern area
= ["virLud",
clusterNamesWithHetsNorth "virLudHet",
"vir_plumb",
"plumb"]
= ["blue",
clusterColorsWithHetsNorth "blue",
"purple",
"red"]
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsNorth))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Same for chr 4A
# choose scaffold
= "gw4A"
chr
positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)
# Add column to metadata containing the regionIndHetStan for this highHet region:
= "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
command eval(Meta.parse(command)) # this executes the command constructed above
= meanAcrossRegionIndHetStan
ind_with_metadata_included.regionIndHetStan
# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)
# PCA of all individuals:
= Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))
genos_highViSHetRegion_imputed
= true
flipPC1 = true
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodelAll
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelAll.PCAfig)
# Add PC values to metadata for individuals included in PCA above:
if flipPC1
= -1 .* PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 else
= PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 end
if flipPC2
= -1 .* PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 else
= PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 end
= PCAmodelAll.values[3,:]
PCAmodelAll.metadata.PC3
# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`
# choose inds with low IndHet in high ViSHet region:
= (meanAcrossRegionIndHetStan .< 1.5)
indSelection_lowIndHetStan
#Plot only the lowIndHetStan individuals:
= CairoMakie.Figure();
f = Axis(f[1, 1],
ax = "PC1 vs. PC2, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Save the individual colors in the metadata
= fill("", size(PCAmodelAll.metadata, 1))
indColors for i in axes(PCAmodelAll.metadata, 1)
= group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
indColors[i] end
= indColors
PCAmodelAll.metadata.indColorLeft = indColors; PCAmodelAll.metadata.indColorRight
Plot PC1 vs. PC2
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Plot PC1 vs. PC3
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC3",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC3", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
At chr 4A high ViSHet region, there are 4 clear haplogroups (no discernment of lud or obs on PC3, which is driven by nitidus). Divide samples into those groups, based on PCA scores, and calculate pi and Dxy
= ["virLud",
clusterNames "nit",
"troch",
"obsPlumb"]
= ["blue",
clusterColors "grey",
"yellow",
"red"]
= (PCAmodelAll.metadata.PC1 .< -2) .&
virLud .< -2.5) .&
(PCAmodelAll.metadata.PC2
indSelection_lowIndHetStan= (PCAmodelAll.metadata.PC3 .< -7) .&
nit
indSelection_lowIndHetStan= (3 .< PCAmodelAll.metadata.PC1) .&
troch
indSelection_lowIndHetStan= (-4 .< PCAmodelAll.metadata.PC1 .< -1) .&
obsPlumb 2.5 .< PCAmodelAll.metadata.PC2) .&
(
indSelection_lowIndHetStan
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[virLud]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obsPlumb]
= [virLud nit troch obsPlumb]
clusterArray
# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembership = fill(-9, nrow(PCAmodelAll.metadata))
plotOrder for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNames[i]
clusterMembership[clusterArray[:,i]] .= i
plotOrder[clusterArray[end
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames; among=false) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
regionPiTable #= 4×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼──────────────────────
1 │ virLud 0.00664772
2 │ nit 0.00609756
3 │ troch 0.00614846
4 │ obsPlumb 0.00206023 =#
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 6×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼────────────────────────────
1 │ virLud_nit 0.0447451
2 │ virLud_troch 0.0343779
3 │ virLud_obsPlumb 0.0217185
4 │ nit_troch 0.0373178
5 │ nit_obsPlumb 0.0317873
6 │ troch_obsPlumb 0.0237405 =#
# Make a genotype-by-individual plot using all variable loci in the region,
= 0.1
missingFractionAllowed # in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.original_Fst_groups = clusterMembership
PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.original_plot_order = plotOrder
PCAmodelAll.metadata.plot_order
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
genos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
pos_selectedSNPs = Fst[:, selectedSNPs]
Fst_selectedSNPs = freqs[:, selectedSNPs]
freqs_selectedSNPs
# limit the number of individuals per group to plot
= fill(150, length(clusterNames))
numIndsToPlot
= limitIndsToPlot(clusterNames, numIndsToPlot,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= false)
sortByMissing
# sort based on original_plot_order, and then together with function below will arrange individuals in population order within clusters:
= sortperm(indMetadataforGBI.original_plot_order, rev=false)
sortOrder = indMetadataforGBI[sortOrder, :]
indMetadataforGBI = genosForGBI[sortOrder, :]
genosForGBI
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;=6, figureSize=(800, 1800),
indFontSize= missingFractionAllowed,
missingFractionAllowed = true,
indColorLeftProvided = true); indColorRightProvided
The numbers in each group are [40 2 62 91] and the sum of those is 195
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now show a GBI plot like above, but with heterozygotes
= ["virLud",
clusterNamesWithHets "virLudHet",
"nit",
"virLud_troch",
"troch",
"trochHet",
"troch_obsPlumb",
"obsPlumb",
"obsPlumbHet",
"virLud_obsPlumb"]
= ["blue",
clusterColorsWithHets "blue",
"grey",
"green",
"yellow",
"yellow",
"orange",
"red",
"red",
"purple"]
= (PCAmodelAll.metadata.PC1 .< -2) .&
virLud .< -2.5) .&
(PCAmodelAll.metadata.PC2
indSelection_lowIndHetStan= (PCAmodelAll.metadata.PC3 .< -7) .&
nit
indSelection_lowIndHetStan= (3 .< PCAmodelAll.metadata.PC1) .&
troch
indSelection_lowIndHetStan= (-4 .< PCAmodelAll.metadata.PC1 .< -1) .&
obsPlumb 2.5 .< PCAmodelAll.metadata.PC2) .&
(
indSelection_lowIndHetStan= (PCAmodelAll.metadata.PC1 .< -2) .&
virLudHet .< -2.5) .&
(PCAmodelAll.metadata.PC2
.!indSelection_lowIndHetStan= (0 .< PCAmodelAll.metadata.PC1 .< 3) .&
virLud_troch -4 .< PCAmodelAll.metadata.PC2 .< -0.5) .&
(
.!indSelection_lowIndHetStan= (3 .< PCAmodelAll.metadata.PC1) .&
trochHet
.!indSelection_lowIndHetStan= (0 .< PCAmodelAll.metadata.PC1 .< 3) .&
troch_obsPlumb 0 .< PCAmodelAll.metadata.PC2 .< 2.5) .&
(-2.5 .< PCAmodelAll.metadata.PC3) .&
(
.!indSelection_lowIndHetStan= (-4 .< PCAmodelAll.metadata.PC1 .< -1) .&
obsPlumbHet 2.5 .< PCAmodelAll.metadata.PC2) .&
(
.!indSelection_lowIndHetStan= (-4 .< PCAmodelAll.metadata.PC1 .< -1.5) .&
virLud_obsPlumb -2.5 .< PCAmodelAll.metadata.PC2 .< 1.5) .&
(
.!indSelection_lowIndHetStan
= [virLud virLudHet nit virLud_troch troch trochHet troch_obsPlumb obsPlumb obsPlumbHet virLud_obsPlumb]
clusterArray
sum(clusterArray, dims=1)
if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
println("Good news: Individuals included in a group matches total number of individuals")
else
println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end
# check which individuals left out:
sum(clusterArray, dims=2)
vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembershipWithHets = fill(-9, nrow(PCAmodelAll.metadata))
plotOrderWithHets for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNamesWithHets[i]
clusterMembershipWithHets[clusterArray[:,i]] .= i
plotOrderWithHets[clusterArray[end
# Add column to main metadata object containing the cluster membership for this highHet region:
= "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
command eval(Meta.parse(command)) # this executes the command constructed above
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= clusterMembershipWithHets
PCAmodelAll.metadata.Fst_group = plotOrderWithHets
PCAmodelAll.metadata.plot_order
# limit the number of individuals per group to plot
= fill(15, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show GBI plot according to original groups and plot order
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with all individuals
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
# Set no limit (or high limit anyway) on the number of individuals per group to plot
= fill(1000, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with only vir and plumb pops
= ["virLud", "obsPlumb"] # these are the haplotype clusters to include in the choice below of SNPs to show
includeTheseClusters
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)
freqs_local, sampleSizes_local
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genosForGBI = pos_selectedSNPs[selectedSNPs, :]
posForGBI = freqs_local[:, selectedSNPs]
freqsForGBI
= ["vir", "plumb", "plumb_vir"] # these are the original Fst_groups
plotGroups = ["blue", "red", "purple"]
plotGroupColors
= copy(PCAmodelAll.metadata)
metadataForGBI
= metadataForGBI.original_Fst_groups
metadataForGBI.Fst_group
plotGenotypeByIndividual(regionInfo, posForGBI,
genosForGBI, metadataForGBI, freqsForGBI, plotGroups, plotGroupColors;= missingFractionAllowed) missingFractionAllowed
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
(Scene (768px, 960px): 0 Plots 2 Child Scenes: ├ Scene (768px, 960px) └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 0 0; … ; 2 2 … 2 2; 1 2 … 1 0], [397824, 447665, 454290, 505522, 510299, 520268, 527334, 531726, 578230, 582505, 587143, 614605, 617468, 617555, 621740, 633718, 633803, 690700], 100×40 DataFrame Row │ ind ID location group ⋯ │ String String String7 String1 ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ GW_Armando_plate1_JF12G04 GW_Armando_plate1_JF12G04 ST_vi vir ⋯ 2 │ GW_Armando_plate2_JF03G01 GW_Armando_plate2_JF03G01 ST_vi vir_mis 3 │ GW_Armando_plate2_JF30G01 GW_Armando_plate2_JF30G01 ST_vi vir_mis 4 │ GW_Lane5_STvi1 GW_Lane5_STvi1 ST_vi vir 5 │ GW_Lane5_STvi2 GW_Lane5_STvi2 ST_vi vir ⋯ 6 │ GW_Lane5_STvi3 GW_Lane5_STvi3 ST_vi vir 7 │ GW_Armando_plate1_JF16G01 GW_Armando_plate1_JF16G01 DV_vi plumb_v 8 │ GW_Armando_plate2_JF16G02 GW_Armando_plate2_JF16G02 DV_vi plumb_v 9 │ GW_Armando_plate2_JE31G01 GW_Armando_plate2_JE31G01 VB_vi vir_mis ⋯ 10 │ GW_Armando_plate2_JF03G02 GW_Armando_plate2_JF03G02 VB_vi vir_mis 11 │ GW_Lane5_YK11 GW_Lane5_YK11 YK vir ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋱ 91 │ GW_Armando_plate2_JF24G01 GW_Armando_plate2_JF24G01 VB plumb 92 │ GW_Armando_plate2_JF25G01 GW_Armando_plate2_JF25G01 VB plumb ⋯ 93 │ GW_Armando_plate1_JG02G02 GW_Armando_plate1_JG02G02 PR plumb 94 │ GW_Armando_plate1_JG02G04 GW_Armando_plate1_JG02G04 PR plumb 95 │ GW_Armando_plate2_JG01G01 GW_Armando_plate2_JG01G01 PR plumb 96 │ GW_Armando_plate2_JG02G01 GW_Armando_plate2_JG02G01 PR plumb ⋯ 97 │ GW_Armando_plate2_JG02G03 GW_Armando_plate2_JG02G03 PR plumb 98 │ GW_Lane5_SL1 GW_Lane5_SL1 SL plumb 99 │ GW_Lane5_SL2 GW_Lane5_SL2 SL plumb 100 │ GW_Armando_plate1_JF10G03 GW_Armando_plate1_JF10G03 ST plumb_v ⋯ 37 columns and 79 rows omitted)
Show same but whole ring (but not nit)
= ["virLud",
includeTheseClusters "troch",
"obsPlumb"] # these are the haplotype clusters to include in the choice below of SNPs to show
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembership, includeTheseClusters)
freqs_local, sampleSizes_local
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= (vec(maximum(freqs_local, dims=1)) .> 0.5) .& (vec(minimum(freqs_local, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genosForGBI = pos_selectedSNPs[selectedSNPs, :]
posForGBI = freqs_local[:, selectedSNPs]
freqsForGBI
= copy(PCAmodelAll.metadata)
metadataForGBI = metadataForGBI.original_Fst_groups
metadataForGBI.Fst_group
= ["vir","vir_S","lud_PK","lud_KS","lud_central","troch_LN","troch_EM","obs","plumb_BJ","plumb","plumb_vir"]
plotGroups = ["blue","turquoise1","seagreen4","seagreen3","seagreen2","yellow","gold","orange", "pink","red","purple"]
plotGroupColors
# Set limit on the number of individuals per group to plot
= [10, 5, 5, 2, 10, 10, 1, 4, 3, 10, 1] # maximum number of individuals to plot from each group
numIndsToPlotWithHets
= limitIndsToPlot(plotGroups,
genosForGBI_limited, indMetadataforGBI_limited
numIndsToPlotWithHets,
genosForGBI, metadataForGBI;= false)
sortByMissing
plotGenotypeByIndividual(regionInfo, posForGBI,
genosForGBI_limited, indMetadataforGBI_limited, freqsForGBI, plotGroups, plotGroupColors;= missingFractionAllowed) missingFractionAllowed
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
(Scene (768px, 960px): 0 Plots 2 Child Scenes: ├ Scene (768px, 960px) └ Scene (768px, 960px), Union{Missing, Int16}[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 1 … 0 0], [397781, 397824, 405798, 419198, 425296, 447665, 454290, 505522, 510299, 513889 … 617468, 617539, 617555, 621740, 633718, 633787, 633803, 676526, 690700, 711862], 61×40 DataFrame Row │ ind ID location group ⋯ │ String String String7 String1 ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ GW_Armando_plate1_JF12G04 GW_Armando_plate1_JF12G04 ST_vi vir ⋯ 2 │ GW_Armando_plate2_JF03G01 GW_Armando_plate2_JF03G01 ST_vi vir_mis 3 │ GW_Armando_plate2_JF30G01 GW_Armando_plate2_JF30G01 ST_vi vir_mis 4 │ GW_Armando_plate1_JF16G01 GW_Armando_plate1_JF16G01 DV_vi plumb_v 5 │ GW_Armando_plate2_JF16G02 GW_Armando_plate2_JF16G02 DV_vi plumb_v ⋯ 6 │ GW_Armando_plate2_JE31G01 GW_Armando_plate2_JE31G01 VB_vi vir_mis 7 │ GW_Armando_plate2_JF03G02 GW_Armando_plate2_JF03G02 VB_vi vir_mis 8 │ GW_Armando_plate1_AB1 GW_Armando_plate1_AB1 AB vir 9 │ GW_Lane5_AB2 GW_Lane5_AB2 AB vir ⋯ 10 │ GW_Armando_plate1_TL3 GW_Armando_plate1_TL3 TL vir 11 │ GW_Lane5_AA1 GW_Lane5_AA1 AA vir_S ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋱ 52 │ GW_Armando_plate1_JF07G03 GW_Armando_plate1_JF07G03 ST plumb 53 │ GW_Armando_plate1_JF07G04 GW_Armando_plate1_JF07G04 ST plumb ⋯ 54 │ GW_Armando_plate1_JF08G02 GW_Armando_plate1_JF08G02 ST plumb 55 │ GW_Armando_plate1_JF09G01 GW_Armando_plate1_JF09G01 ST plumb 56 │ GW_Armando_plate1_JF09G02 GW_Armando_plate1_JF09G02 ST plumb 57 │ GW_Armando_plate1_JF11G01 GW_Armando_plate1_JF11G01 ST plumb ⋯ 58 │ GW_Armando_plate1_JF12G01 GW_Armando_plate1_JF12G01 ST plumb 59 │ GW_Armando_plate1_JF12G02 GW_Armando_plate1_JF12G02 ST plumb 60 │ GW_Armando_plate1_JF13G01 GW_Armando_plate1_JF13G01 ST plumb 61 │ GW_Armando_plate1_JF10G03 GW_Armando_plate1_JF10G03 ST plumb_v ⋯ 37 columns and 40 rows omitted)
Show just the west area (without nitidus)
= ["virLud",
clusterNamesWithHetsWest "virLudHet",
"virLud_troch",
"troch",
"trochHet"]
= ["blue",
clusterColorsWithHetsWest "blue",
"green",
"yellow",
"yellow"]
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsWest))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the east area
= ["troch",
clusterNamesWithHetsEast "trochHet",
"troch_obsPlumb",
"obsPlumb",
"obsPlumbHet"]
= ["yellow",
clusterColorsWithHetsEast "yellow",
"orange",
"red",
"red"]
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsEast))
numIndsToPlotWithHetsEast
= limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the northern area
= ["virLud",
clusterNamesWithHetsNorth "virLudHet",
"virLud_obsPlumb",
"obsPlumb",
"obsPlumbHet"]
= ["blue",
clusterColorsWithHetsNorth "blue",
"purple",
"red",
"red"]
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsNorth))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Do a PCA based on a same-size region elsewhere on gw4A (with low ViSHet):
# get length of region
= positionMax - positionMin
lengthHighViSHetRegion
# because the region is on the left side of chr 4A, will put the non-interesting region on the right side:
= scaffold_lengths["gw4A"] - 1_000_000
rightLocus = rightLocus - lengthHighViSHetRegion
leftLocus
= string("chr ", chr, " ",leftLocus," to ",rightLocus)
regionText_lowViSHetRegion
= (leftLocus .<= pos_region.position .<= rightLocus)
lociSelection = genotypes_region[:, lociSelection]
genotypes_lowViSHetRegion
# impute missing genotypes:
= Impute.svd(Matrix{Union{Missing, Float32}}(genotypes_lowViSHetRegion))
genotypes_lowViSHetRegion_imputed
= true
flipPC1 = true
flipPC2
= plotPCA(genotypes_lowViSHetRegion_imputed, ind_with_metadata_included,
PCAmodel
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText_lowViSHetRegion,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodel.PCAfig)
if true # set to true to save plot
save("FigureS2D_gw4A_nonHLBRarbitrary_from_Julia.png", PCAmodel.PCAfig, px_per_unit = 2.0)
end
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Chr 4A shows really remarkable patterns. Absolutely must be selective sweeps.
Same for chr 20
# choose scaffold
= "gw20"
chr
positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)
# Add column to metadata containing the regionIndHetStan for this highHet region:
= "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
command eval(Meta.parse(command)) # this executes the command constructed above
= meanAcrossRegionIndHetStan
ind_with_metadata_included.regionIndHetStan
# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)
# PCA of all individuals:
= Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))
genos_highViSHetRegion_imputed
= false
flipPC1 = true
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodelAll
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelAll.PCAfig)
# Add PC values to metadata for individuals included in PCA above:
if flipPC1
= -1 .* PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 else
= PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 end
if flipPC2
= -1 .* PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 else
= PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 end
= PCAmodelAll.values[3,:]
PCAmodelAll.metadata.PC3
# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`
# choose inds with low IndHet in high ViSHet region:
= (meanAcrossRegionIndHetStan .< 1.5)
indSelection_lowIndHetStan
#Plot only the lowIndHetStan individuals:
= CairoMakie.Figure();
f = Axis(f[1, 1],
ax = "PC1 vs. PC2, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
if false # set to true to save plot
save("Figure6_top_gw28GBIplotEast_from_Julia.png", plotInfo[1], px_per_unit = 2.0)
end
More than 1 region on that scaffold. Using just the longest one.
Row | regionChrom | regionStart | regionEnd |
---|---|---|---|
String | Int64 | Int64 | |
1 | gw20 | 27354 | 721651 |
2 | gw20 | 5852254 | 6671670 |
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Save the individual colors in the metadata
= fill("", size(PCAmodelAll.metadata, 1))
indColors for i in axes(PCAmodelAll.metadata, 1)
= group_colors_PCA[findfirst(groups_to_plot_PCA .== PCAmodelAll.metadata.Fst_group[i])]
indColors[i] end
= indColors
PCAmodelAll.metadata.indColorLeft = indColors; PCAmodelAll.metadata.indColorRight
Plot PC1 vs. PC2
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC2",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Plot PC1 vs. PC3
= CairoMakie.Figure()
f = Axis(f[1, 1],
ax = "PC1 vs. PC3",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC3", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC3[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
At chr 20 high ViSHet region, there are 6 clear haplogroups (with vir and lud clearly different on PC3). Divide samples into those groups, based on PCA scores, and calculate pi and Dxy.
= ["vir",
clusterNames "nit",
"lud",
"troch",
"obs",
"plumb"]
= ["blue",
clusterColors "grey",
"green",
"yellow",
"orange",
"red"]
= (PCAmodelAll.metadata.PC1 .< -5) .&
vir -5 .< PCAmodelAll.metadata.PC3 .< -0.5) .&
(
indSelection_lowIndHetStan= (-5 .< PCAmodelAll.metadata.PC1 .< -3) .&
nit .< -5) .&
(PCAmodelAll.metadata.PC3
indSelection_lowIndHetStan= (PCAmodelAll.metadata.PC1 .< -4) .&
lud 0.5 .< PCAmodelAll.metadata.PC3 .< 7) .&
(
indSelection_lowIndHetStan= (2 .< PCAmodelAll.metadata.PC1 .< 5) .&
troch .< -4.2) .&
(PCAmodelAll.metadata.PC2
indSelection_lowIndHetStan= (3 .< PCAmodelAll.metadata.PC1 .< 4) .&
obs -4.2 .< PCAmodelAll.metadata.PC2 .< -2.5) .&
(
indSelection_lowIndHetStan= (2.5 .< PCAmodelAll.metadata.PC1 .< 6) .&
plumb 3 .< PCAmodelAll.metadata.PC2) .&
(
indSelection_lowIndHetStan
# check the individuals in each group
PCAmodelAll.metadata.Fst_group[vir]
PCAmodelAll.metadata.Fst_group[nit]
PCAmodelAll.metadata.Fst_group[lud]
PCAmodelAll.metadata.Fst_group[troch]
PCAmodelAll.metadata.Fst_group[obs]
PCAmodelAll.metadata.Fst_group[plumb]
= [vir nit lud troch obs plumb]
clusterArray
# show numbers in each group
println("The numbers in each group are $(sum(clusterArray, dims=1)) and the sum of those is $(sum(sum(clusterArray, dims=1)))")
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembership = fill(-9, nrow(PCAmodelAll.metadata))
plotOrder for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNames[i]
clusterMembership[clusterArray[:,i]] .= i
plotOrder[clusterArray[end
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(genos_highViSHetRegion, clusterMembership, clusterNames)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames; among=false) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames, pi = getRegionPi(sitePi))
regionPiTable #= 6×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼─────────────────────
1 │ vir 0.0132903
2 │ nit 0.00761773
3 │ lud 0.014873
4 │ troch 0.0101873
5 │ obs 0.00904222
6 │ plumb 0.00593251 =#
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 15×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼─────────────────────────
1 │ vir_nit 0.0280243
2 │ vir_lud 0.0204941
3 │ vir_troch 0.0394257
4 │ vir_obs 0.0403572
5 │ vir_plumb 0.0376188
6 │ nit_lud 0.0288021
7 │ nit_troch 0.0377964
8 │ nit_obs 0.0389254
9 │ nit_plumb 0.0359742
10 │ lud_troch 0.0390498
11 │ lud_obs 0.0398045
12 │ lud_plumb 0.0371989
13 │ troch_obs 0.015702
14 │ troch_plumb 0.0285113
15 │ obs_plumb 0.0286543 =#
# Make a genotype-by-individual plot using all variable loci in the region,
= 0.1
missingFractionAllowed # in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= PCAmodelAll.metadata.Fst_group # store the Fst_groups in this
PCAmodelAll.metadata.original_Fst_groups = clusterMembership
PCAmodelAll.metadata.Fst_group = PCAmodelAll.metadata.plot_order # store the original plot_order in this
PCAmodelAll.metadata.original_plot_order = plotOrder
PCAmodelAll.metadata.plot_order
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
# (So for each column in `freqs`, the maximum should be > 0.5
# and the minimum should be < 0.5)
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_highViSHetRegion[:, selectedSNPs]
genos_selectedSNPs = pos_highViSHetRegion[selectedSNPs, :]
pos_selectedSNPs = Fst[:, selectedSNPs]
Fst_selectedSNPs = freqs[:, selectedSNPs]
freqs_selectedSNPs
# limit the number of individuals per group to plot
= fill(15, length(clusterNames))
numIndsToPlot
= limitIndsToPlot(clusterNames, numIndsToPlot,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNames, clusterColors;= missingFractionAllowed,
missingFractionAllowed = true); indColorRightProvided
The numbers in each group are [38 2 29 68 4 66] and the sum of those is 207
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now show a GBI plot like above, but with heterozygotes
= ["vir",
clusterNamesWithHets "nit",
"lud",
"ludHet",
"lud_troch",
"troch",
"obs",
"obs_plumb",
"plumb",
"vir_plumb"]
= ["blue",
clusterColorsWithHets "grey",
"green",
"green",
"seagreen",
"yellow",
"orange",
"darkorange1",
"red",
"purple"]
= (PCAmodelAll.metadata.PC1 .< -4) .&
ludHet 0.5 .< PCAmodelAll.metadata.PC3 .< 7) .&
(
.!indSelection_lowIndHetStan= (-4 .< PCAmodelAll.metadata.PC1 .< 0) .&
lud_troch -4 .< PCAmodelAll.metadata.PC2 .< -1) .&
(
.!indSelection_lowIndHetStan= (3 .< PCAmodelAll.metadata.PC1 .< 5) .&
obs_plumb 0 .< PCAmodelAll.metadata.PC2 .< 1) .&
(
.!indSelection_lowIndHetStan= (-3 .< PCAmodelAll.metadata.PC1 .< 0) .&
vir_plumb 2.5 .< PCAmodelAll.metadata.PC2 .< 5) .&
(
.!indSelection_lowIndHetStan
= [vir nit lud ludHet lud_troch troch obs obs_plumb plumb vir_plumb]
clusterArray
sum(clusterArray, dims=1)
if sum(sum(clusterArray, dims=1)) == size(PCAmodelAll.metadata, 1)
println("Good news: Individuals included in a group matches total number of individuals")
else
println("Warning: Individuals included in a group ($(sum(sum(clusterArray, dims=1)))) do NOT match total number of individuals ($(size(PCAmodelAll.metadata, 1)))")
end
# check which individuals left out:
sum(clusterArray, dims=2)
vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.ind[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC1[vec(sum(clusterArray, dims=2) .== 0)]
PCAmodelAll.metadata.PC2[vec(sum(clusterArray, dims=2) .== 0)]
indSelection_lowIndHetStan[
# create vectors that indicate the groups and plot order for this analysis:
= fill("none", nrow(PCAmodelAll.metadata))
clusterMembershipWithHets = fill(-9, nrow(PCAmodelAll.metadata))
plotOrderWithHets for i in eachindex(clusterArray[1,:])
:,i]] .= clusterNamesWithHets[i]
clusterMembershipWithHets[clusterArray[:,i]] .= i
plotOrderWithHets[clusterArray[end
# Add column to main metadata object containing the cluster membership for this highHet region:
= "ind_with_metadata_included." * chr * "_cluster = clusterMembershipWithHets"
command eval(Meta.parse(command)) # this executes the command constructed above
# in metadata, replace `Fst_group` column with cluster info (needed for the function below):
= clusterMembershipWithHets
PCAmodelAll.metadata.Fst_group = plotOrderWithHets
PCAmodelAll.metadata.plot_order
# limit the number of individuals per group to plot
= fill(15, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Good news: Individuals included in a group matches total number of individuals
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show GBI plot according to original groups and plot order
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show same but with all individuals
= PCAmodelAll.metadata.original_plot_order
PCAmodelAll.metadata.plot_order
# Set no limit (or high limit anyway) on the number of individuals per group to plot
= fill(1000, length(clusterNamesWithHets))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHets, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs, clusterNamesWithHets, clusterColorsWithHets;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the west area (without nitidus)
= ["vir",
clusterNamesWithHetsWest "lud",
"ludHet",
"lud_troch",
"troch"]
= ["blue",
clusterColorsWithHetsWest "green",
"green",
"seagreen",
"yellow"]
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsWest)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsWest))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsWest, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsWest, clusterColorsWithHetsWest;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the east area
= ["troch",
clusterNamesWithHetsEast "obs",
"obs_plumb",
"plumb"]
= ["yellow",
clusterColorsWithHetsEast "orange",
"darkorange1",
"red"]
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsEast)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsEast))
numIndsToPlotWithHetsEast
= limitIndsToPlot(clusterNamesWithHetsEast, numIndsToPlotWithHetsEast,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsEast, clusterColorsWithHetsEast;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Show just the northern area
= ["vir",
clusterNamesWithHetsNorth "vir_plumb",
"plumb"]
= ["blue",
clusterColorsWithHetsNorth "purple",
"red"]
# limit the SNPs to those with variants greater than 50% in
# at least one pop, and less than 50% in at least one pop.
= getFreqsAndSampleSizes(genos_selectedSNPs, clusterMembershipWithHets, clusterNamesWithHetsNorth)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= (vec(maximum(freqs, dims=1)) .> 0.5) .& (vec(minimum(freqs, dims=1)) .< 0.5)
selectedSNPs = genos_selectedSNPs[:, selectedSNPs]
genos_selectedSNPs2 = pos_selectedSNPs[selectedSNPs, :]
pos_selectedSNPs2 = freqs[:, selectedSNPs]
freqs_selectedSNPs2
= fill(100, length(clusterNamesWithHetsNorth))
numIndsToPlotWithHets
= limitIndsToPlot(clusterNamesWithHetsNorth, numIndsToPlotWithHets,
genosForGBI, indMetadataforGBI
genos_selectedSNPs2, PCAmodelAll.metadata;= true)
sortByMissing
plotGenotypeByIndividual(regionInfo, pos_selectedSNPs2,
genosForGBI, indMetadataforGBI, freqs_selectedSNPs2, clusterNamesWithHetsNorth, clusterColorsWithHetsNorth;= missingFractionAllowed,
missingFractionAllowed = false,
indColorLeftProvided = true); indColorRightProvided
Calculated population allele frequencies and sample sizes
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Try chr 27
# choose scaffold
= "gw27"
chr
positionMin, positionMax, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# inspect values for mean IndHetStan per individual for that high ViSHet region
plot(meanAcrossRegionIndHetStan)
# Add column to metadata containing the regionIndHetStan for this highHet region:
= "ind_with_metadata_included." * chr * "_regionIndHetStan = meanAcrossRegionIndHetStan"
command eval(Meta.parse(command)) # this executes the command constructed above
= meanAcrossRegionIndHetStan
ind_with_metadata_included.regionIndHetStan
# check whether missing data related to heterozygosity (good news: not really)
plot(ind_with_metadata_included.numMissings, meanAcrossRegionIndHetStan)
# PCA of all individuals:
= Impute.svd(Matrix{Union{Missing, Float32}}(genos_highViSHetRegion))
genos_highViSHetRegion_imputed
= true
flipPC1 = true
flipPC2
= plotPCA(genos_highViSHetRegion_imputed, ind_with_metadata_included,
PCAmodelAll
groups_to_plot_PCA, group_colors_PCA; = "greenish warblers", regionText = regionText,
sampleSet = flipPC1, flip2 = flipPC2,
flip1 = 0.7, fillOpacity = 0.6,
lineOpacity = 14, showTitle = true,
symbolSize = string("Region PC1"), yLabelText = string("Region PC2"),
xLabelText = false)
showPlot
display(PCAmodelAll.PCAfig)
# Add PC values to metadata for individuals included in PCA above:
if flipPC1
= -1 .* PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 else
= PCAmodelAll.values[1,:]
PCAmodelAll.metadata.PC1 end
if flipPC2
= -1 .* PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 else
= PCAmodelAll.values[2,:]
PCAmodelAll.metadata.PC2 end
= PCAmodelAll.values[3,:]
PCAmodelAll.metadata.PC3
# For the next bit to work with above, make sure that all individuals in the above `plotPCA` command
# are included in the `groups_to_plot_PCA`
# choose inds with low IndHet in high ViSHet region:
= (meanAcrossRegionIndHetStan .< 1.4)
indSelection_lowIndHetStan
#Plot only the lowIndHetStan individuals:
= CairoMakie.Figure();
f = Axis(f[1, 1],
ax = "PC1 vs. PC2, only low heterozygosity",
title = "Region PC1", xlabelsize = 24,
xlabel = "Region PC2", ylabelsize = 24,
ylabel = 1)
autolimitaspect hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
for i in eachindex(groups_to_plot_PCA)
= (PCAmodelAll.metadata.Fst_group .== groups_to_plot_PCA[i]) .& indSelection_lowIndHetStan
selection scatter!(ax, PCAmodelAll.metadata.PC1[selection], PCAmodelAll.metadata.PC2[selection], marker = :diamond, color = (group_colors_PCA[i], 0.6), markersize = 14, strokewidth=0.5, strokecolor = ("black", 0.7))
CairoMakie.end
display(f)
Good news: 1 region on that scaffold
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
CairoMakie.Screen{IMAGE}
Summary of chromosome LHBR patterns
Tried chr 1 but some unclean distinguishing of ludlowi clusters (no wide sharing of plumb)
Included chr 1A
Tried chr 2 but some unclean distinguishing of ludlowi clusters (with some sharing of plumb)
Included chr 3
Tried chr 4 but not clean separation of high vs. low IndHet
Included chr 4A
Tried chr 5 but not clean separation of high vs. low IndHet
Tried chr 6 but not clean separation of high vs. low IndHet
Tried chr 7 but not clean separation of high vs. low IndHet, recomb in ludlowi
Tried chr 8 but not clean separation of high vs. low IndHet, recomb in ludlowi
Tried chr 9 but not clean separation of high vs. low IndHet
Tried chr 10 but not clean separation of high vs. low IndHet
Tried chr 11 and has potential, and shows an obs with two plumb types, but ludlowi not cleanly distinguished into types
Tried chr 12 but not a very clear separation of high vs. low IndHet (shows a lot of sharing of plumb haps in ludlowi)
Tried chr 14 but not clean separation of high vs. low IndHet, recomb in ludlowi
Included chr 15
No chr 16
Tried chr 23 but not a very clear separation of high vs. low IndHet
Included chr 17, chr 18, chr 19, chr 20
Tried chr 21 but not a very clear separation of high vs. low IndHet
Tried chr 22 but not a very clear separation of high vs. low IndHet. lud is all over the place.
Tried chr 23 and almost included, but a few inds would be tough to categorize. Similar pattern as some others. Not a very clear separation of high vs. low IndHet.
Tried chr 24 but not a very clear separation of high vs. low IndHet.
Tried chr 25 but not a very clear separation of high vs. low IndHet, and hard to categorize a lot of inds.
Included chr 26
Tried chr 27 but not a very clear separation of high vs. low IndHet.
Included chr 28
Included chr Z
Make a summary plot for the cluster types at different chromosome haploblocks (west without nitidus)
Will modify the plotGenotypeByIndividual() function, but need to construct a genotype data structure based on the groups (determined above) for each haploblock.
For west side (without nitidus):
#= # For debugging function:
indMetadata = ind_with_metadata_included
plotGroups = plotGroupsForSummary
plotGroupColors = groupColorsForSummary
regionNames = HaploblockRegions
indFontSize = 10
figureSize = (1200, 1200)
plotTitle = nothing
indColorLeftProvided = false
indColorRightProvided = false =#
"""
plotHaploblockSummary(genosSummary, indMetadata,
plotGroups, plotGroupColors;
regionNames,
indFontSize=10, figureSize=(1200, 1200),
plotTitle = nothing,
indColorLeftProvided = false,
indColorRightProvided = false)
Construct a genotype-by-individual plot, with option to filter out SNPs with too much missing data.
Under the default setting, alleles are colored (dark purple vs. light purple) according to whichever allele is designated as `group1`.
# Arguments
- `genosSummary`: Matrix containing summary genotype data (individuals in rows, loci in columns).
- `indMetadata`: Matrix of metadata for individuals; must contain `Fst_group` and `plot_order` columns.
- `plotGroups`: Vector of group names to include in plot.
- `plotGroupColors`: Vector of plotting colors corresponding to the groups.
- `regionNames`: Optional; Names of the genotyped regions.
- `indFontSize`: Optional; the font size of the individual ID labels.
- `figureSize`: Optional; the size of the figure; default is `(1200, 1200)`.
- `plotTitle`: Optional; default will make a title. For no title, set to `""`.
- `indColorLeftProvided`: Optional; Default is `false`. Set to `true` if there is a column labeled `indColorLeft` in the metadata providing color of each individual for plotting on left side.
- `indColorRightProvided`: Optional; same as above but for right side (requires `indColorRight` column in metadata).
# Notes
Returns a tuple containing:
- the figure
- the plotted genotypes
- the sorted metadata matrix for the plotted individuals
"""
function plotHaploblockSummary(genosSummary, indMetadata,
plotGroups, plotGroupColors;= nothing,
regionNames =10, figureSize=(1200, 1200),
indFontSize= nothing,
plotTitle = false,
indColorLeftProvided = false)
indColorRightProvided
# if the genoData has missing values, then convert to -1:
ismissing.(genosSummary)] .= -1
genosSummary[
= size(genosSummary, 2)
numRegions
= genosSummary[indMetadata.Fst_group .∈ Ref(plotGroups), :]
genosSummary_subset = indMetadata[indMetadata.Fst_group .∈ Ref(plotGroups), :]
indMetadata_subset
# Choose sorting order by plot_order column in input metadata file
= genosSummary_subset[sortperm(indMetadata_subset.plot_order, rev=false), :]
sorted_genosSummary_subset = size(sorted_genosSummary_subset, 1)
numInds = indMetadata_subset[sortperm(indMetadata_subset.plot_order, rev=false), :]
sorted_indMetadata_subset
# Set up the plot window:
= CairoMakie.Figure(size=figureSize)
f
if isnothing(plotTitle)
= "Summary of $numRegions haploblock genotypes for $numInds individuals"
plotTitle end
# Set up the main Axis:
= Axis(f[1, 1],
ax = plotTitle,
title =30,
titlesize=(0.5 - 0.09 * (numRegions), 0.5 + 1.09 * (numRegions),
limits0.5 - 0.3 * numInds, 0.5 + numInds)
)hidedecorations!(ax) # hide background lattice and axis labels
hidespines!(ax) # hide box around plot
= ["#3f007d", "#807dba", "#dadaeb", "grey50"] # purple shades from colorbrewer
genotypeColors
# plot evenly spaced by SNP order along chromosome:
# make top part of fig (genotypes for individuals)
= numRegions / 100
labelCushion = 0.5 - labelCushion
label_x_left = 0.5 + numRegions + labelCushion
label_x_right = 0.07 * numRegions
colorBoxCushion = 0.5 - colorBoxCushion
groupColorBox_x_left = 0.5 + numRegions + colorBoxCushion
groupColorBox_x_right = 0.005 * numRegions * 2
boxWidth = [-boxWidth, -boxWidth, boxWidth, boxWidth, -boxWidth] .+ groupColorBox_x_left
groupColorBox_x_left = [-boxWidth, -boxWidth, boxWidth, boxWidth, -boxWidth] .+ groupColorBox_x_right
groupColorBox_x_right = [0.4, -0.4, -0.4, 0.4, 0.4]
groupColorBox_y
for i in 1:numInds
= numInds + 1 - i # y is location for plotting; this reverses order of plot top-bottom
y = last(split(sorted_indMetadata_subset.ID[i], "_")) # this gets the last part of the sample ID (usually the main ID part)
labelText # put sample label on left side:
text!(label_x_left, y; text=labelText, align=(:right, :center), fontsize=indFontSize)
CairoMakie.# put sample label on left side:
text!(label_x_right, y; text=labelText, align=(:left, :center), fontsize=indFontSize)
CairoMakie.if indColorLeftProvided
= sorted_indMetadata_subset.indColorLeft[i]
boxColorLeft else
= plotGroupColors[findfirst(plotGroups .== sorted_indMetadata_subset.Fst_group[i])]
boxColorLeft end
if indColorRightProvided
= sorted_indMetadata_subset.indColorRight[i]
boxColorRight else
= plotGroupColors[findfirst(plotGroups .== sorted_indMetadata_subset.Fst_group[i])]
boxColorRight end
poly!(Point2f.(groupColorBox_x_left, (y .+ groupColorBox_y)), color=boxColorLeft)
CairoMakie.poly!(Point2f.(groupColorBox_x_right, (y .+ groupColorBox_y)), color=boxColorRight)
CairoMakie.end
# generate my own plotting symbol (a rectangle)
= [-0.45, -0.45, 0.45, 0.45, -0.45]
box_x #box_x = [-0.5, -0.5, 0.5, 0.5, -0.5]
= [0.4, -0.4, -0.4, 0.4, 0.4]
box_y # generate triangles for plotting heterozygotes
= [-0.45, -0.45, 0.45, -0.45]
triangle1_x #triangle1_x = [-0.5, -0.5, 0.5, -0.5]
= [0.4, -0.4, 0.4, 0.4]
triangle1_y = [-0.45, 0.45, 0.45, -0.45]
triangle2_x #triangle2_x = [-0.5, 0.5, 0.5, -0.5]
= [-0.4, -0.4, 0.4, -0.4]
triangle2_y # cycle through individuals, graphing each type of genotype:
for i in 1:numInds
= numInds + 1 - i # y is location for plotting; this reverses order of plot top-bottom
y #CairoMakie.lines!([0.5, numRegions + 0.5], [y, y], color="grey40") # for lines across the individual rows
= sorted_genosSummary_subset[i, :]
genotypes = findall(genotypes .== 0)
hom_ref_locs if length(hom_ref_locs) > 0
for j in eachindex(hom_ref_locs)
poly!(Point2f.((hom_ref_locs[j] .+ box_x), (y .+ box_y)), color=genotypeColors[1])
CairoMakie.end
end
= findall(genotypes .== 1)
het_locs if length(het_locs) > 0
for j in eachindex(het_locs)
poly!(Point2f.((het_locs[j] .+ triangle1_x), (y .+ triangle1_y)), color=genotypeColors[1])
CairoMakie.poly!(Point2f.((het_locs[j] .+ triangle2_x), (y .+ triangle2_y)), color=genotypeColors[3])
CairoMakie.end
end
= findall(genotypes .== 2)
hom_alt_locs if length(hom_alt_locs) > 0
for j in eachindex(hom_alt_locs)
poly!(Point2f.((hom_alt_locs[j] .+ box_x), (y .+ box_y)), color=genotypeColors[3])
CairoMakie.end
end
end
if isnothing(regionNames)
= string.(1:numRegions)
regionNames end
# make labels on lower part
= 0.5 - 0.025numInds
y_label for i in 1:numRegions
text!(i, y_label; text = regionNames[i], align=(:center, :center), fontsize=30)
CairoMakie.end
display(f)
return f, sorted_genosSummary_subset, sorted_indMetadata_subset
end
# Set up a data structure to store the key to converting, for each haploblock region,
# the cluster names to genotype integers. This is a dictiionary of dictionaries:
= Dict{String, Dict{String, Int}}(
regionHaplotypeCode_west "gw1A" => Dict("virLud"=>0, "virLud_troch"=>1, "troch"=>2),
"gw3" => Dict("virLud"=>0, "virLudHet"=>0, "virLud_trochObs"=>1, "trochObs"=>2, "trochObsHet"=>2),
"gw13" => Dict("vir"=>0, "lud"=>0, "lud_troch"=>1, "troch"=>2),
"gw15" => Dict("virLud"=>0, "virLud_troch"=>1, "troch"=>2),
"gw18" => Dict("virLud"=>0, "virLud_troch"=>1, "troch"=>2),
"gw19" => Dict("virLud"=>0, "virLudHet"=>0, "virLud_trochObs"=>1, "trochObs"=>2, "trochObsHet"=>2),
"gw26" => Dict("virLud"=>0, "virLud_troch"=>1, "troch"=>2),
"gw28" => Dict("virLud"=>0, "virLud_troch"=>1, "troch"=>2),
"gwZ" => Dict("vir"=>0, "lud"=>0, "vir_lud"=>0, "lud_troch"=>1, "troch"=>2)
)
= ["gw1A", "gw3", "gw13", "gw15", "gw18", "gw19", "gw26", "gw28", "gwZ"]
haploblockRegions = length(haploblockRegions)
numHaploblockRegions = size(ind_with_metadata_included, 1)
numInds # create genotype object and fill with missing (-1) genotypes
= fill(-1, (numInds, numHaploblockRegions))
genosSummary # fill object with appropriate genotypes
for i in 1:numHaploblockRegions
= haploblockRegions[i]
region for (key, value) in regionHaplotypeCode_west[region]
= """genosSummary[ind_with_metadata_included.$(region)_cluster .== "$(key)", $i] .= """ * string(value)
command eval(Meta.parse(command)) # this executes the command constructed above
end
end
# Must say I am pleased with the cleverness of above. Concise datastructure and code that does a lot. :)
= ["vir","vir_S","lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN"]
plotGroupsForSummaryWest = ["blue","turquoise1","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow"]
groupColorsForSummaryWest
plotHaploblockSummary(genosSummary, ind_with_metadata_included,
plotGroupsForSummaryWest, groupColorsForSummaryWest;= haploblockRegions,
regionNames = 8, figureSize = (1200, 1600),
indFontSize = nothing,
plotTitle = false,
indColorLeftProvided = false); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
The one missing element (the white cell in the gw28 column) is a heterozygote with the nitidus haplotype.
Make a summary plot for the east side
# Set up a data structure to store the key to converting, for each haploblock region,
# the cluster names to genotype integers. This is a dictiionary of dictionaries:
= Dict{String, Dict{String, Int}}(
regionHaplotypeCode_east "gw1A" => Dict("obs"=>0, "plumb"=>2),
"gw3" => Dict("trochObs"=>0, "trochObsHet"=>0, "plumb"=>2, "plumbHet"=>2),
"gw13" => Dict("obs"=>0, "plumb"=>2, "plumbHet"=>2),
"gw15" => Dict("obs"=>0, "plumb"=>2),
"gw18" => Dict("obs"=>0, "obs_plumb"=>1, "plumb"=>2),
"gw19" => Dict("trochObs"=>0, "trochObs_plumb"=>1, "plumb"=>2),
"gw26" => Dict("obs"=>0, "obs_plumb"=>1, "plumb"=>2),
"gw28" => Dict("obs"=>0, "obsHet"=>0, "obs_plumb"=>1, "plumb"=>2),
"gwZ" => Dict("obs"=>0, "plumb"=>2)
)
= ["gw1A", "gw3", "gw13", "gw15", "gw18", "gw19", "gw26", "gw28", "gwZ"]
haploblockRegions = length(haploblockRegions)
numHaploblockRegions = size(ind_with_metadata_included, 1)
numInds # create genotype object and fill with missing (-1) genotypes
= fill(-1, (numInds, numHaploblockRegions))
genosSummary # fill object with appropriate genotypes
for i in 1:numHaploblockRegions
= haploblockRegions[i]
region for (key, value) in regionHaplotypeCode_east[region]
= """genosSummary[ind_with_metadata_included.$(region)_cluster .== "$(key)", $i] .= """ * string(value)
command eval(Meta.parse(command)) # this executes the command constructed above
end
end
= ["obs","plumb_BJ","plumb"]
plotGroupsForSummaryEast = ["orange","pink","red"]
groupColorsForSummaryEast
plotHaploblockSummary(genosSummary, ind_with_metadata_included,
plotGroupsForSummaryEast, groupColorsForSummaryEast;= haploblockRegions,
regionNames = 8, figureSize = (1200, 1600),
indFontSize = nothing,
plotTitle = false,
indColorLeftProvided = false); indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
The white cells in the figure are heterozygotes between the plumbeitarsus and viridanus haplotypes.
Make a summary plot for the north side
# Set up a data structure to store the code to converting, for each haploblock region,
# the cluster names to genotype integers. This is a dictiionary of dictionaries:
= Dict{String, Dict{String, Int}}(
regionHaplotypeCode_north "gw1A" => Dict("virLud"=>0, "vir_plumb"=>1, "plumb"=>2),
"gw3" => Dict("virLud"=>0, "virLudHet"=>0, "vir_plumb"=>1, "plumb"=>2, "plumbHet"=>2),
"gw13" => Dict("vir"=>0, "vir_plumb"=>1, "plumb"=>2, "plumbHet"=>2),
"gw15" => Dict("virLud"=>0, "vir_plumb"=>1, "plumb"=>2),
"gw18" => Dict("virLud"=>0, "vir_plumb"=>1, "plumb"=>2),
"gw19" => Dict("virLud"=>0, "virLudHet"=>0, "vir_plumb"=>1, "plumb"=>2),
"gw26" => Dict("virLud"=>0, "vir_plumb"=>1, "plumb"=>2),
"gw28" => Dict("virLud"=>0, "vir_plumb"=>1, "plumb"=>2),
"gwZ" => Dict("vir"=>0, "plumb"=>2)
)
= ["gw1A", "gw3", "gw13", "gw15", "gw18", "gw19", "gw26", "gw28", "gwZ"]
haploblockRegions = length(haploblockRegions)
numHaploblockRegions = size(ind_with_metadata_included, 1)
numInds # create genotype object and fill with missing (-1) genotypes
= fill(-1, (numInds, numHaploblockRegions))
genosSummary # fill object with appropriate genotypes
for i in 1:numHaploblockRegions
= haploblockRegions[i]
region for (key, value) in regionHaplotypeCode_north[region]
= """genosSummary[ind_with_metadata_included.$(region)_cluster .== "$(key)", $i] .= """ * string(value)
command eval(Meta.parse(command)) # this executes the command constructed above
end
end
= ["vir","plumb_vir","plumb"]
plotGroupsForSummaryNorth = ["blue","purple","red"]
groupColorsForSummaryNorth
plotHaploblockSummary(genosSummary, ind_with_metadata_included,
plotGroupsForSummaryNorth, groupColorsForSummaryNorth;= haploblockRegions,
regionNames = 8, figureSize = (1200, 1600),
indFontSize = nothing,
plotTitle = false,
indColorLeftProvided = false) indColorRightProvided
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
(Scene (768px, 960px): 0 Plots 1 Child Scene: └ Scene (768px, 960px), [0 0 … 0 0; 0 0 … 0 0; … ; 2 2 … 2 2; 1 2 … 1 2], 100×38 DataFrame Row │ ind ID location group ⋯ │ String String String7 String1 ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ GW_Armando_plate1_JF12G04 GW_Armando_plate1_JF12G04 ST_vi vir ⋯ 2 │ GW_Armando_plate2_JF03G01 GW_Armando_plate2_JF03G01 ST_vi vir_mis 3 │ GW_Armando_plate2_JF30G01 GW_Armando_plate2_JF30G01 ST_vi vir_mis 4 │ GW_Lane5_STvi1 GW_Lane5_STvi1 ST_vi vir 5 │ GW_Lane5_STvi2 GW_Lane5_STvi2 ST_vi vir ⋯ 6 │ GW_Lane5_STvi3 GW_Lane5_STvi3 ST_vi vir 7 │ GW_Armando_plate1_JF16G01 GW_Armando_plate1_JF16G01 DV_vi plumb_v 8 │ GW_Armando_plate2_JF16G02 GW_Armando_plate2_JF16G02 DV_vi plumb_v 9 │ GW_Armando_plate2_JE31G01 GW_Armando_plate2_JE31G01 VB_vi vir_mis ⋯ 10 │ GW_Armando_plate2_JF03G02 GW_Armando_plate2_JF03G02 VB_vi vir_mis 11 │ GW_Lane5_YK11 GW_Lane5_YK11 YK vir ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋱ 91 │ GW_Armando_plate2_JF24G01 GW_Armando_plate2_JF24G01 VB plumb 92 │ GW_Armando_plate2_JF25G01 GW_Armando_plate2_JF25G01 VB plumb ⋯ 93 │ GW_Armando_plate1_JG02G02 GW_Armando_plate1_JG02G02 PR plumb 94 │ GW_Armando_plate1_JG02G04 GW_Armando_plate1_JG02G04 PR plumb 95 │ GW_Armando_plate2_JG01G01 GW_Armando_plate2_JG01G01 PR plumb 96 │ GW_Armando_plate2_JG02G01 GW_Armando_plate2_JG02G01 PR plumb ⋯ 97 │ GW_Armando_plate2_JG02G03 GW_Armando_plate2_JG02G03 PR plumb 98 │ GW_Lane5_SL1 GW_Lane5_SL1 SL plumb 99 │ GW_Lane5_SL2 GW_Lane5_SL2 SL plumb 100 │ GW_Armando_plate1_JF10G03 GW_Armando_plate1_JF10G03 ST plumb_v ⋯ 35 columns and 79 rows omitted)
Make a summary plot for the whole ring
# Set up a code converting integers to colors.
# These will be used for all chromosome regions below.
= Dict{Int, String}(
integerToColorCodes 1 => "blue", # vir
2 => "turquoise1", # vir south
3 => "grey", # nit
4 => "green", # lud
5 => "yellow", # troch
6 => "orange", # obs
7 => "red", # plumb
)
# Set up a data structure to store the code to converting, for each haploblock region,
# the cluster names to genotype integers corresponding to colors above. This is a dictionary of dictionaries.
# Each genotype will be encoded with a tuple representing the alleles.
= Dict{String, Dict{String, Tuple{Int, Int}}}(
regionHaplotypeCode_all "gw1A" => Dict("virLud"=>(1,1),
"nit"=>(3,3),
"virLud_troch"=>(1,5),
"troch"=>(5,5),
"obs"=>(6,6),
"plumb"=>(7,7),
"vir_plumb"=>(1,7)),
"gw3" => Dict("virLud"=>(1,1),
"virLudHet"=>(1,1),
"nit"=>(3,3),
"virLud_trochObs"=>(1,5),
"trochObs"=>(5,5),
"trochObsHet"=>(5,5),
"plumb"=>(7,7),
"plumbHet"=>(7,7),
"vir_plumb"=>(1,7)),
"gw4A" => Dict("virLud"=>(1,1),
"virLudHet"=>(1,1),
"nit"=>(3,3),
"virLud_troch"=>(1,5),
"troch"=>(5,5),
"trochHet"=>(5,5),
"troch_obsPlumb"=>(5,7),
"obsPlumb"=>(7,7),
"obsPlumbHet"=>(7,7),
"virLud_obsPlumb"=>(1,7)),
"gw13" => Dict("vir"=>(1,1),
"vir_lud"=>(1,4),
"nit"=>(3,3),
"lud"=>(4,4),
"lud_troch"=>(4,5),
"troch"=>(5,5),
"obs"=>(6,6),
"plumb"=>(7,7),
"plumbHet"=>(7,7),
"vir_plumb"=>(1,7)),
"gw15" => Dict("virLud"=>(1,1),
"nit"=>(3,3),
"virLud_troch"=>(1,5),
"troch"=>(5,5),
"obs"=>(6,6),
"plumb"=>(7,7),
"vir_plumb"=>(1,7)),
"gw17" => Dict("virLud"=>(1,1),
"nit"=>(3,3),
"virLud_troch"=>(1,5),
"troch"=>(5,5),
"virLud_obs"=>(1,6),
"obs"=>(6,6),
"troch_plumb"=>(5,7),
"plumb"=>(7,7),
"vir_plumb"=>(1,7)),
"gw18" => Dict("virLud"=>(1,1),
"nit"=>(3,3),
"virLud_troch"=>(1,5),
"troch"=>(5,5),
"obs"=>(6,6),
"obs_plumb"=>(6,7),
"plumb"=>(7,7),
"vir_plumb"=>(1,7)),
"gw19" => Dict("virLud"=>(1,1),
"virLudHet"=>(1,1),
"nit"=>(3,3),
"virLud_trochObs"=>(1,5),
"trochObs"=>(5,5),
"trochObsHet"=>(5,5),
"trochObs_plumb"=>(5,7),
"plumb"=>(7,7),
"vir_plumb"=>(1,7)),
"gw20" => Dict("vir"=>(1,1),
"nit"=>(3,3),
"lud"=>(4,4),
"ludHet"=>(4,4),
"lud_troch"=>(4,5),
"troch"=>(5,5),
"obs"=>(6,6),
"obs_plumb"=>(6,7),
"plumb"=>(7,7),
"vir_plumb"=>(1,7)),
"gw26" => Dict("virLud"=>(1,1),
"nit"=>(3,3),
"virLud_troch"=>(1,5),
"troch"=>(5,5),
"obs"=>(6,6),
"obs_plumb"=>(6,7),
"plumb"=>(7,7),
"vir_plumb"=>(1,7)),
"gw28" => Dict("virLud"=>(1,1),
"virLud_nit"=>(1,3),
"nit"=>(3,3),
"virLud_troch"=>(1,5),
"troch"=>(5,5),
"obs"=>(6,6),
"obsHet"=>(6,6),
"obs_plumb"=>(6,7),
"plumb"=>(7,7),
"vir_plumb"=>(1,7)),
"gwZ" => Dict("vir"=>(1,1),
"vir_lud"=>(1,4),
"nit"=>(3,3),
"lud"=>(4,4),
"lud_troch"=>(4,5),
"troch"=>(5,5),
"obs"=>(6,6),
"plumb"=>(7,7))
)
= ["gwZ", "gw1A", "gw3", "gw4A", "gw13", "gw15", "gw17", "gw18", "gw19", "gw20","gw26", "gw28"]
haploblockRegions
= length(haploblockRegions)
numHaploblockRegions = size(ind_with_metadata_included, 1)
numInds # create genotype object and fill with missing (-1) genotypes
= fill((-9, -9), numInds, numHaploblockRegions)
genosSummary # fill object with appropriate genotypes
for i in 1:numHaploblockRegions
= haploblockRegions[i]
region for (key, value) in regionHaplotypeCode_all[region]
= """genosSummary[ind_with_metadata_included.$(region)_cluster .== "$(key)", $i] .= ($(string(value)),)""" # the construction at the end "protects" the tuple within a tuple, so it broadcasts correctly to each element on the left
command eval(Meta.parse(command)) # this executes the command constructed above
end
end
= ["vir","vir_S","nit", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN","troch_EM","obs","plumb_BJ","plumb","plumb_vir"]
plotGroupsForSummary_all = ["blue","turquoise1","grey","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow","gold","orange","pink","red","purple"]
groupColorsForSummary_all
"""
plotHaploblockSummaryWithColors(integerToColorCodes::Dict{Int, String},
genosSummary::Matrix{Tuple{Int64, Int64}},
indMetadata,
plotGroups, plotGroupColors;
regionNames,
indFontSize=10, figureSize=(1200, 1200),
plotTitle = nothing,
indColorLeftProvided = false,
indColorRightProvided = false)
Construct a genotype-by-individual plot, with option to filter out SNPs with too much missing data.
In this version, more than two haplotype alleles can be plotted, using colors provided according to the first argument.
# Arguments
- `integerToColorCodes`: The code matching integer haploblock types to colors.
- `genosSummary`: Matrix containing summary genotype data (individuals in rows, loci in columns), with each genotype represented by a tuple of 2 integers.
- `indMetadata`: Matrix of metadata for individuals; must contain `Fst_group` and `plot_order` columns.
- `plotGroups`: Vector of group names to include in plot.
- `plotGroupColors`: Vector of plotting colors corresponding to the groups.
- `regionNames`: Optional; Names of the genotyped regions.
- `indFontSize`: Optional; the font size of the individual ID labels.
- `figureSize`: Optional; the size of the figure; default is `(1200, 1200)`.
- `plotTitle`: Optional; default will make a title. For no title, set to `""`.
- `indColorLeftProvided`: Optional; Default is `false`. Set to `true` if there is a column labeled `indColorLeft` in the metadata providing color of each individual for plotting on left side.
- `indColorRightProvided`: Optional; same as above but for right side (requires `indColorRight` column in metadata).
# Notes
Returns a tuple containing:
- the figure
- the plotted genotypes
- the sorted metadata matrix for the plotted individuals
"""
function plotHaploblockSummaryWithColors(integerToColorCodes::Dict{Int, String},
::Matrix{Tuple{Int64, Int64}},
genosSummary
indMetadata,
plotGroups, plotGroupColors;= nothing,
regionNames =10, figureSize=(1200, 1200),
indFontSize= nothing,
plotTitle = false,
indColorLeftProvided = false)
indColorRightProvided
= size(genosSummary, 2)
numRegions
= genosSummary[indMetadata.Fst_group .∈ Ref(plotGroups), :]
genosSummary_subset = indMetadata[indMetadata.Fst_group .∈ Ref(plotGroups), :]
indMetadata_subset
# Choose sorting order by plot_order column in input metadata file
= genosSummary_subset[sortperm(indMetadata_subset.plot_order, rev=false), :]
sorted_genosSummary_subset = size(sorted_genosSummary_subset, 1)
numInds = indMetadata_subset[sortperm(indMetadata_subset.plot_order, rev=false), :]
sorted_indMetadata_subset
# Set up the plot window:
= CairoMakie.Figure(size=figureSize)
f
if isnothing(plotTitle)
= "Summary of $numRegions haploblock genotypes for $numInds individuals"
plotTitle end
# Set up the main Axis:
= Axis(f[1, 1],
ax = plotTitle,
title =30,
titlesize=(0.5 - 0.09 * (numRegions), 0.5 + 1.09 * (numRegions),
limits0.5 - 0.3 * numInds, 0.5 + numInds)
)hidedecorations!(ax) # hide background lattice and axis labels
hidespines!(ax) # hide box around plot
= ["#3f007d", "#807dba", "#dadaeb", "grey50"] # purple shades from colorbrewer
genotypeColors
# plot evenly spaced by SNP order along chromosome:
# make top part of fig (genotypes for individuals)
= numRegions / 100
labelCushion = 0.5 - labelCushion
label_x_left = 0.5 + numRegions + labelCushion
label_x_right = 0.07 * numRegions
colorBoxCushion = 0.5 - colorBoxCushion
groupColorBox_x_left = 0.5 + numRegions + colorBoxCushion
groupColorBox_x_right = 0.005 * numRegions * 2
boxWidth = [-boxWidth, -boxWidth, boxWidth, boxWidth, -boxWidth] .+ groupColorBox_x_left
groupColorBox_x_left = [-boxWidth, -boxWidth, boxWidth, boxWidth, -boxWidth] .+ groupColorBox_x_right
groupColorBox_x_right = [0.4, -0.4, -0.4, 0.4, 0.4]
groupColorBox_y
for i in 1:numInds
= numInds + 1 - i # y is location for plotting; this reverses order of plot top-bottom
y = last(split(sorted_indMetadata_subset.ID[i], "_")) # this gets the last part of the sample ID (usually the main ID part)
labelText # put sample label on left side:
text!(label_x_left, y; text=labelText, align=(:right, :center), fontsize=indFontSize)
CairoMakie.# put sample label on left side:
text!(label_x_right, y; text=labelText, align=(:left, :center), fontsize=indFontSize)
CairoMakie.if indColorLeftProvided
= sorted_indMetadata_subset.indColorLeft[i]
boxColorLeft else
= plotGroupColors[findfirst(plotGroups .== sorted_indMetadata_subset.Fst_group[i])]
boxColorLeft end
if indColorRightProvided
= sorted_indMetadata_subset.indColorRight[i]
boxColorRight else
= plotGroupColors[findfirst(plotGroups .== sorted_indMetadata_subset.Fst_group[i])]
boxColorRight end
poly!(Point2f.(groupColorBox_x_left, (y .+ groupColorBox_y)), color=boxColorLeft)
CairoMakie.poly!(Point2f.(groupColorBox_x_right, (y .+ groupColorBox_y)), color=boxColorRight)
CairoMakie.end
# generate my own plotting symbol (a rectangle)
= [-0.45, -0.45, 0.45, 0.45, -0.45]
box_x #box_x = [-0.5, -0.5, 0.5, 0.5, -0.5]
= [0.4, -0.4, -0.4, 0.4, 0.4]
box_y # generate triangles for plotting heterozygotes
= [-0.45, -0.45, 0.45, -0.45]
triangle1_x #triangle1_x = [-0.5, -0.5, 0.5, -0.5]
= [0.4, -0.4, 0.4, 0.4]
triangle1_y = [-0.45, 0.45, 0.45, -0.45]
triangle2_x #triangle2_x = [-0.5, 0.5, 0.5, -0.5]
= [-0.4, -0.4, 0.4, -0.4]
triangle2_y # cycle through individuals, graphing each type of genotype:
for i in 1:numInds
= numInds + 1 - i # y is location for plotting; this reverses order of plot top-bottom
y #CairoMakie.lines!([0.5, numRegions + 0.5], [y, y], color="grey40") # for lines across the individual rows
# cycle through regions for this individual
for j in 1:numRegions
= sorted_genosSummary_subset[i, j]
genotype if genotype[1] == genotype[2] # homozygous
poly!(Point2f.((j .+ box_x), (y .+ box_y)), color=integerToColorCodes[genotype[1]])
CairoMakie.else # heterozygous
poly!(Point2f.((j .+ triangle1_x), (y .+ triangle1_y)), color=integerToColorCodes[genotype[1]])
CairoMakie.poly!(Point2f.((j .+ triangle2_x), (y .+ triangle2_y)), color=integerToColorCodes[genotype[2]])
CairoMakie.end
end
end
if isnothing(regionNames)
= string.(1:numRegions)
regionNames end
# make labels on lower part
= 0.5 - 0.025numInds
y_label for i in 1:numRegions
text!(i, y_label; text = regionNames[i], align=(:center, :center), fontsize=24)
CairoMakie.end
display(f)
return f, sorted_genosSummary_subset, sorted_indMetadata_subset
end
= plotHaploblockSummaryWithColors(integerToColorCodes,
fig_5
genosSummary,
ind_with_metadata_included,
plotGroupsForSummary_all, groupColorsForSummary_all;= haploblockRegions,
regionNames = 7, figureSize = (1000, 2000),
indFontSize = nothing,
plotTitle = false,
indColorLeftProvided = false)
indColorRightProvided
if false # set to true to save plot
save("Figure5_from_Julia.png", fig_5[1], px_per_unit = 3.0)
end
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Produce Fst plot across genome
Calculate allele freqs and sample sizes (use column Fst_group)
= ["vir","troch_LN","plumb","plumb_vir"]
groups = getFreqsAndSampleSizes(genosOnly_included, ind_with_metadata_included.Fst_group, groups)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
Calculated population allele frequencies and sample sizes
calculate Fst for each SNP
= getFst(freqs, sampleSizes, groups; among=true) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseNamesFst println("Calculated Fst values")
Calculated Fst values
Make list of main scaffolds to include in Fst plot across genome:
= "gw" .* string.(vcat(1, "1A", 2:4, "4A", 5:15, 17:28, "Z")) scaffolds_for_Fst
30-element Vector{String}:
"gw1"
"gw1A"
"gw2"
"gw3"
"gw4"
"gw4A"
"gw5"
"gw6"
"gw7"
"gw8"
"gw9"
"gw10"
"gw11"
⋮
"gw18"
"gw19"
"gw20"
"gw21"
"gw22"
"gw23"
"gw24"
"gw25"
"gw26"
"gw27"
"gw28"
"gwZ"
calculate windowed Fst
This is calculated according to Weir&Cockerham1984 (with sample size and pop number correction), calculated as windowed numerator over windowed denominator, in whole windows starting on left side of chromosome.
= 500
windowSize
# calculate windowed Fst across all scaffolds:
= DataFrame(chrom = String[], mean_position = Float64[])
windowed_pos_all = Array{Float32, 2}(undef, size(FstNumerator, 1), 0)
windowed_Fst_all for chrom in scaffolds_for_Fst
= string("chr", chrom)
regionText = (pos_SNP_filtered.chrom .== chrom)
loci_selection = pos_SNP_filtered[loci_selection, :]
pos_region = FstNumerator[:, loci_selection]
FstNumerator_region = FstDenominator[:, loci_selection]
FstDenominator_region = getWindowedFst(FstNumerator_region, FstDenominator_region, pos_region, windowSize)
windowedPos, windowedFst = DataFrame(chrom = repeat([chrom], length(windowedPos)), mean_position = windowedPos)
windowed_pos_chrom = vcat(windowed_pos_all, windowed_pos_chrom)
windowed_pos_all = hcat(windowed_Fst_all, windowedFst)
windowed_Fst_all end
# The below is just a test plot, showing nothing useful really (as it overlaps all chromosomes onto one x axis):
#plot(windowed_pos_all.mean_position, windowed_Fst_all[1,:])
The above has produced windowed Fst values across the whole genome, for each population comparison. These are stored in windowed_Fst_all
and the location info is stored in windowed_pos_all
.
Now make a plot of windowed Fst across all scaffolds:
= scaffolds_for_Fst
scaffolds_to_plot
= ["vir_troch_LN", "troch_LN_plumb", "vir_plumb"]
groupsToPlotFst = ["green3", "orange", "purple"]
groupColorsFst
= plotGenomeFst(scaffolds_to_plot,
figHandle_GenomeFst3
windowed_Fst_all,
pairwiseNamesFst,
windowed_pos_all,
groupsToPlotFst,
groupColorsFst;= 0.8,
lineTransparency = 0.2,
fillTransparency =(1200, 1200)); figureSize
[["gw1", "gw4A", "gw6"], ["gw1A", "gw4", "gw9"], ["gw2", "gw8"], ["gw3", "gw5"], ["gw7", "gw10", "gw11", "gw12", "gw13", "gw14"], ["gw15", "gw17", "gw18", "gw19", "gw20", "gw21", "gw22", "gw23", "gw24", "gw25"], ["gw26", "gw27", "gw28", "gwZ"]]
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Now do one with just the vir_plumb connection
= ["vir_plumb"]
groupsToPlotFst = ["purple"]
groupColorsFst
= plotGenomeFst(scaffolds_to_plot,
figHandle_GenomeFst1
windowed_Fst_all,
pairwiseNamesFst,
windowed_pos_all,
groupsToPlotFst,
groupColorsFst;= 0.8,
lineTransparency = 0.2,
fillTransparency =(1200, 800))
figureSize
if true # set to true to save plot
= string("FigureS33_GenomeFst_fromJulia.png")
filename save(filename, figHandle_GenomeFst1, px_per_unit = 2.0)
println("Saved ", filename)
end
[["gw1", "gw4A", "gw6"], ["gw1A", "gw4", "gw9"], ["gw2", "gw8"], ["gw3", "gw5"], ["gw7", "gw10", "gw11", "gw12", "gw13", "gw14"], ["gw15", "gw17", "gw18", "gw19", "gw20", "gw21", "gw22", "gw23", "gw24", "gw25"], ["gw26", "gw27", "gw28", "gwZ"]]
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Saved FigureS33_GenomeFst_fromJulia.png
Plot ViSHet vs. Fst
= "vir_plumb"
groupsToCompareUsingFst = findfirst(pairwiseNamesFst .== groupsToCompareUsingFst)
FstRow = windowed_Fst_all[FstRow, :]
windowedFstValues #plot(windowedFstValues, windowed_ViSHet_all)
= 0.3
fillOpacity = 0.8
lineOpacity
= Figure()
f = Axis(f[1, 1],
ax = "windowed Fst", xlabelsize = 24,
xlabel = "windowed VisHet", ylabelsize = 24)
ylabel # hidedecorations!(ax, label = false, ticklabels = false, ticks = false) # hide background lattice
plot!(ax, windowedFstValues, windowed_ViSHet_all,
= :circle, color = ("black", fillOpacity), markersize = 8, strokewidth=0.5, strokecolor = ("black", lineOpacity))
marker
display(f)
if true # set to true to save plot
= string("FigureS34_windowedFstvViSHet_fromJulia.png")
filename save(filename, f, px_per_unit = 2.0)
println("Saved ", filename)
end
# to see histograms of each distribution:
# hist(windowedFstValues)
# hist(windowed_ViSHet_all)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Saved FigureS34_windowedFstvViSHet_fromJulia.png
Examine chromosome 4A Large HaploBlock Region (LHBR) with invariant sites included
Before running below, changed 012NA file back into 012minus1 file, using commands like below, so can be read as integer:
cat /Users/darrenirwin/GW_data_from_cedar_Feb2024/GW2022_cedar/infoSites_vcfs/GW2022_all4plates.genotypes.allSites.chrgw4A.infoSites.max2allele_noindel.maxmiss60.MQ20.lowHet.tab.012NA | sed 's/NA/-1/g' > /Users/darrenirwin/GW_data_from_cedar_Feb2024/GW2022_cedar/infoSites_vcfs/GW2022_all4plates.genotypes.allSites.chrgw4A.infoSites.max2allele_noindel.maxmiss60.MQ20.lowHet.tab.012minus1
= "/Users/darrenirwin/GW_data_from_cedar_Feb2024/GW2022_cedar/infoSites_vcfs/GW2022_all4plates.genotypes.allSites.chrgw4A.infoSites.max2allele_noindel.maxmiss60.MQ20.lowHet.tab"
baseName # load metadata
cd(dataDirectory)
= DataFrame(CSV.File(metadataFile)) # the CSV.File function interprets the correct delimiter
metadata_chr4A = ncol(metadata_chr4A)
num_metadata_cols_chr4A = nrow(metadata_chr4A)
num_individuals_chr4A # read in individual names for this dataset
= string(baseName, ".012.indv")
individuals_file_name_chr4A = DataFrame(CSV.File(individuals_file_name_chr4A; header=["ind"], types=[String]))
ind_chr4A = size(ind_chr4A, 1) # number of individuals
indNum_chr4A if num_individuals_chr4A != indNum_chr4A
println("WARNING: number of rows in metadata file different than number of individuals in .indv file")
end
# read in position data for this dataset
= string(baseName, ".012.pos")
position_file_name_chr4A = DataFrame(CSV.File(position_file_name_chr4A; header=["chrom", "position"], types=[String, Int]))
pos_chr4A # read in genotype data
= string(baseName, ".012minus1")
genotype_file_name_chr4A @time if 1 <= indNum_chr4A <= 127
= readdlm(genotype_file_name_chr4A, '\t', Int8, '\n'); # this has been sped up dramatically, by first converting "NA" to -1
geno_chr4A elseif 128 <= indNum_chr4A <= 32767
= readdlm(genotype_file_name_chr4A, '\t', Int16, '\n'); # this needed for first column, which is number of individual; Int16 not much slower on import than Int8
geno_chr4A else
print("Error: Number of individuals in .indv appears outside of range from 1 to 32767")
end
= size(geno_chr4A, 2) - 1 # because the first column is not a SNP (just a count from zero)
loci_count_chr4A print(string("Read in genotypic data at ", loci_count_chr4A," loci for ", indNum_chr4A, " individuals. \n"))
25.019594 seconds (340.65 M allocations: 9.788 GiB, 44.52% gc time, 2.54% compilation time)
Read in genotypic data at 364640 loci for 310 individuals.
Check that individuals are same in genotype data and metadata
= hcat(ind_chr4A, metadata_chr4A)
ind_with_metadata_chr4A println(ind_with_metadata_chr4A)
println() # prints a line break
if isequal(ind_with_metadata_chr4A.ind, ind_with_metadata_chr4A.ID)
println("GOOD NEWS: names of individuals in metadata file and genotype ind file match perfectly.")
else
println("WARNING: names of individuals in metadata file and genotype ind file do not completely match.")
end
310×6 DataFrame
Row │ ind ID location group Fst_group plot_order
│ String String31 String7 String15 String15 Float64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ GW_Armando_plate1_AB1 GW_Armando_plate1_AB1 AB vir vir 20.01
2 │ GW_Armando_plate1_JF07G02 GW_Armando_plate1_JF07G02 ST plumb plumb 108.0
3 │ GW_Armando_plate1_JF07G03 GW_Armando_plate1_JF07G03 ST plumb plumb 109.0
4 │ GW_Armando_plate1_JF07G04 GW_Armando_plate1_JF07G04 ST plumb plumb 110.0
5 │ GW_Armando_plate1_JF08G02 GW_Armando_plate1_JF08G02 ST plumb plumb 111.0
6 │ GW_Armando_plate1_JF09G01 GW_Armando_plate1_JF09G01 ST plumb plumb 112.0
7 │ GW_Armando_plate1_JF09G02 GW_Armando_plate1_JF09G02 ST plumb plumb 113.0
8 │ GW_Armando_plate1_JF10G03 GW_Armando_plate1_JF10G03 ST plumb_vir plumb_vir 170.0
9 │ GW_Armando_plate1_JF11G01 GW_Armando_plate1_JF11G01 ST plumb plumb 114.0
10 │ GW_Armando_plate1_JF12G01 GW_Armando_plate1_JF12G01 ST plumb plumb 115.0
11 │ GW_Armando_plate1_JF12G02 GW_Armando_plate1_JF12G02 ST plumb plumb 116.0
12 │ GW_Armando_plate1_JF12G04 GW_Armando_plate1_JF12G04 ST_vi vir vir 24.001
13 │ GW_Armando_plate1_JF13G01 GW_Armando_plate1_JF13G01 ST plumb plumb 117.0
14 │ GW_Armando_plate1_JF15G03 GW_Armando_plate1_JF15G03 DV plumb plumb 103.0
15 │ GW_Armando_plate1_JF16G01 GW_Armando_plate1_JF16G01 DV_vi plumb_vir vir 24.041
16 │ GW_Armando_plate1_JF20G01 GW_Armando_plate1_JF20G01 MB plumb plumb 94.0
17 │ GW_Armando_plate1_JF22G01 GW_Armando_plate1_JF22G01 MB plumb plumb 95.0
18 │ GW_Armando_plate1_JF23G01 GW_Armando_plate1_JF23G01 VB plumb plumb 98.0
19 │ GW_Armando_plate1_JF23G02 GW_Armando_plate1_JF23G02 VB plumb plumb 99.0
20 │ GW_Armando_plate1_JF24G02 GW_Armando_plate1_JF24G02 VB plumb plumb 100.0
21 │ GW_Armando_plate1_JF26G01 GW_Armando_plate1_JF26G01 ST plumb plumb 118.0
22 │ GW_Armando_plate1_JF27G01 GW_Armando_plate1_JF27G01 ST plumb plumb 119.0
23 │ GW_Armando_plate1_JF29G01 GW_Armando_plate1_JF29G01 ST plumb plumb 120.0
24 │ GW_Armando_plate1_JF29G02 GW_Armando_plate1_JF29G02 ST plumb plumb 121.0
25 │ GW_Armando_plate1_JF29G03 GW_Armando_plate1_JF29G03 ST plumb plumb 122.0
26 │ GW_Armando_plate1_JG02G02 GW_Armando_plate1_JG02G02 PR plumb plumb 145.0
27 │ GW_Armando_plate1_JG02G04 GW_Armando_plate1_JG02G04 PR plumb plumb 146.0
28 │ GW_Armando_plate1_JG08G01 GW_Armando_plate1_JG08G01 ST plumb plumb 123.0
29 │ GW_Armando_plate1_JG08G02 GW_Armando_plate1_JG08G02 ST plumb plumb 124.0
30 │ GW_Armando_plate1_JG10G01 GW_Armando_plate1_JG10G01 ST plumb plumb 125.0
31 │ GW_Armando_plate1_JG12G01 GW_Armando_plate1_JG12G01 ST plumb plumb 126.0
32 │ GW_Armando_plate1_JG17G01 GW_Armando_plate1_JG17G01 ST plumb_vir plumb 127.0
33 │ GW_Armando_plate1_NO_BC_TTGW05 GW_Armando_plate1_NO_BC_TTGW05 blank blank blank -99.0
34 │ GW_Armando_plate1_NO_DNA GW_Armando_plate1_NO_DNA blank blank blank -99.0
35 │ GW_Armando_plate1_RF20G01 GW_Armando_plate1_RF20G01 BJ obs_plumb plumb_BJ 77.501
36 │ GW_Armando_plate1_RF29G02 GW_Armando_plate1_RF29G02 BJ obs_plumb plumb_BJ 77.502
37 │ GW_Armando_plate1_TL3 GW_Armando_plate1_TL3 TL vir vir 11.01
38 │ GW_Armando_plate1_TTGW01 GW_Armando_plate1_TTGW01 MN troch_MN troch_west 53.0
39 │ GW_Armando_plate1_TTGW05_rep1 GW_Armando_plate1_TTGW05_rep1 MN_rep troch_MN_rep troch_west_rep 53.0
40 │ GW_Armando_plate1_TTGW05_rep2 GW_Armando_plate1_TTGW05_rep2 MN troch_MN troch_west 53.0
41 │ GW_Armando_plate1_TTGW06 GW_Armando_plate1_TTGW06 SU lud_Sukhto lud_central 47.0
42 │ GW_Armando_plate1_TTGW07 GW_Armando_plate1_TTGW07 SU lud_Sukhto lud_central 47.0
43 │ GW_Armando_plate1_TTGW10 GW_Armando_plate1_TTGW10 SU lud_Sukhto lud_central 47.0
44 │ GW_Armando_plate1_TTGW11 GW_Armando_plate1_TTGW11 SU lud_Sukhto lud_central 47.0
45 │ GW_Armando_plate1_TTGW13 GW_Armando_plate1_TTGW13 TH lud_Thallighar lud_central 43.0
46 │ GW_Armando_plate1_TTGW17 GW_Armando_plate1_TTGW17 TH lud_Thallighar lud_central 43.0
47 │ GW_Armando_plate1_TTGW19 GW_Armando_plate1_TTGW19 TH lud_Thallighar lud_central 43.0
48 │ GW_Armando_plate1_TTGW21 GW_Armando_plate1_TTGW21 SR lud_Sural lud_central 45.0
49 │ GW_Armando_plate1_TTGW22 GW_Armando_plate1_TTGW22 SR lud_Sural lud_central 45.0
50 │ GW_Armando_plate1_TTGW23 GW_Armando_plate1_TTGW23 SR lud_Sural lud_central 45.0
51 │ GW_Armando_plate1_TTGW29 GW_Armando_plate1_TTGW29 SR lud_Sural lud_central 45.0
52 │ GW_Armando_plate1_TTGW52 GW_Armando_plate1_TTGW52 NG lud_Nainaghar lud_central 49.0
53 │ GW_Armando_plate1_TTGW53 GW_Armando_plate1_TTGW53 NG lud_Nainaghar lud_central 49.0
54 │ GW_Armando_plate1_TTGW55 GW_Armando_plate1_TTGW55 NG lud_Nainaghar lud_central 49.0
55 │ GW_Armando_plate1_TTGW57 GW_Armando_plate1_TTGW57 NG lud_Nainaghar lud_central 49.0
56 │ GW_Armando_plate1_TTGW58 GW_Armando_plate1_TTGW58 NG lud_Nainaghar lud_central 49.0
57 │ GW_Armando_plate1_TTGW59 GW_Armando_plate1_TTGW59 NG lud_Nainaghar lud_central 49.0
58 │ GW_Armando_plate1_TTGW63 GW_Armando_plate1_TTGW63 SP lud_Spiti troch_west 55.0
59 │ GW_Armando_plate1_TTGW64 GW_Armando_plate1_TTGW64 SP lud_Spiti troch_west 55.0
60 │ GW_Armando_plate1_TTGW65 GW_Armando_plate1_TTGW65 SP lud_Spiti troch_west 55.0
61 │ GW_Armando_plate1_TTGW66 GW_Armando_plate1_TTGW66 SP lud_Spiti troch_west 55.0
62 │ GW_Armando_plate1_TTGW68 GW_Armando_plate1_TTGW68 SP lud_Spiti troch_west 55.0
63 │ GW_Armando_plate1_TTGW70 GW_Armando_plate1_TTGW70 SA lud_Sathrundi lud_Sath 41.0
64 │ GW_Armando_plate1_TTGW71 GW_Armando_plate1_TTGW71 SA lud_Sathrundi lud_Sath 41.0
65 │ GW_Armando_plate1_TTGW72 GW_Armando_plate1_TTGW72 SA lud_Sathrundi lud_Sath 41.0
66 │ GW_Armando_plate1_TTGW74 GW_Armando_plate1_TTGW74 SA lud_Sathrundi lud_Sath 41.0
67 │ GW_Armando_plate1_TTGW78 GW_Armando_plate1_TTGW78 SA lud_Sathrundi lud_Sath 41.0
68 │ GW_Armando_plate1_TTGW_15_05 GW_Armando_plate1_TTGW_15_05 SR lud_Sural lud_central 45.0
69 │ GW_Armando_plate1_TTGW_15_07 GW_Armando_plate1_TTGW_15_07 SR lud_Sural lud_central 45.0
70 │ GW_Armando_plate1_TTGW_15_08 GW_Armando_plate1_TTGW_15_08 SR lud_Sural lud_central 45.0
71 │ GW_Armando_plate1_TTGW_15_09 GW_Armando_plate1_TTGW_15_09 SR lud_Sural lud_central 45.0
72 │ GW_Armando_plate1_UY1 GW_Armando_plate1_UY1 UY plumb plumb 87.0
73 │ GW_Armando_plate2_IL2 GW_Armando_plate2_IL2 IL_rep plumb_rep plumb_rep 84.0
74 │ GW_Armando_plate2_JE31G01 GW_Armando_plate2_JE31G01 VB_vi vir_misID vir 24.002
75 │ GW_Armando_plate2_JF03G01 GW_Armando_plate2_JF03G01 ST_vi vir_misID vir 24.003
76 │ GW_Armando_plate2_JF03G02 GW_Armando_plate2_JF03G02 VB_vi vir_misID vir 24.004
77 │ GW_Armando_plate2_JF07G01 GW_Armando_plate2_JF07G01 ST plumb plumb 128.0
78 │ GW_Armando_plate2_JF08G04 GW_Armando_plate2_JF08G04 ST plumb plumb 129.0
79 │ GW_Armando_plate2_JF10G02 GW_Armando_plate2_JF10G02 ST plumb plumb 130.0
80 │ GW_Armando_plate2_JF11G02 GW_Armando_plate2_JF11G02 ST plumb plumb 131.0
81 │ GW_Armando_plate2_JF12G03 GW_Armando_plate2_JF12G03 ST plumb plumb 132.0
82 │ GW_Armando_plate2_JF12G05 GW_Armando_plate2_JF12G05 ST plumb plumb 133.0
83 │ GW_Armando_plate2_JF13G02 GW_Armando_plate2_JF13G02 ST plumb plumb 134.0
84 │ GW_Armando_plate2_JF14G01 GW_Armando_plate2_JF14G01 DV plumb plumb 104.0
85 │ GW_Armando_plate2_JF14G02 GW_Armando_plate2_JF14G02 DV plumb plumb 105.0
86 │ GW_Armando_plate2_JF15G01 GW_Armando_plate2_JF15G01 DV plumb plumb 106.0
87 │ GW_Armando_plate2_JF15G02 GW_Armando_plate2_JF15G02 DV plumb plumb 107.0
88 │ GW_Armando_plate2_JF16G02 GW_Armando_plate2_JF16G02 DV_vi plumb_vir vir 24.042
89 │ GW_Armando_plate2_JF19G01 GW_Armando_plate2_JF19G01 MB plumb plumb 96.0
90 │ GW_Armando_plate2_JF20G02 GW_Armando_plate2_JF20G02 MB plumb plumb 97.0
91 │ GW_Armando_plate2_JF24G01 GW_Armando_plate2_JF24G01 VB plumb plumb 101.0
92 │ GW_Armando_plate2_JF24G03 GW_Armando_plate2_JF24G03 ST plumb plumb 135.0
93 │ GW_Armando_plate2_JF25G01 GW_Armando_plate2_JF25G01 VB plumb plumb 102.0
94 │ GW_Armando_plate2_JF26G02 GW_Armando_plate2_JF26G02 ST plumb plumb 136.0
95 │ GW_Armando_plate2_JF27G02 GW_Armando_plate2_JF27G02 ST plumb plumb 137.0
96 │ GW_Armando_plate2_JF30G01 GW_Armando_plate2_JF30G01 ST_vi vir_misID vir 24.005
97 │ GW_Armando_plate2_JG01G01 GW_Armando_plate2_JG01G01 PR plumb plumb 147.0
98 │ GW_Armando_plate2_JG02G01 GW_Armando_plate2_JG02G01 PR plumb plumb 148.0
99 │ GW_Armando_plate2_JG02G03 GW_Armando_plate2_JG02G03 PR plumb plumb 149.0
100 │ GW_Armando_plate2_JG10G02 GW_Armando_plate2_JG10G02 ST plumb plumb 138.0
101 │ GW_Armando_plate2_JG10G03 GW_Armando_plate2_JG10G03 ST plumb plumb 139.0
102 │ GW_Armando_plate2_JG12G02 GW_Armando_plate2_JG12G02 ST plumb plumb 140.0
103 │ GW_Armando_plate2_JG12G03 GW_Armando_plate2_JG12G03 ST plumb plumb 141.0
104 │ GW_Armando_plate2_LN11 GW_Armando_plate2_LN11 LN_rep troch_LN_rep troch_LN_rep 65.01
105 │ GW_Armando_plate2_LN2 GW_Armando_plate2_LN2 LN troch_LN troch_LN 58.01
106 │ GW_Armando_plate2_NO_BC_TTGW05 GW_Armando_plate2_NO_BC_TTGW05 blank blank blank -99.0
107 │ GW_Armando_plate2_NO_DNA GW_Armando_plate2_NO_DNA blank blank blank -99.0
108 │ GW_Armando_plate2_RF29G01 GW_Armando_plate2_RF29G01 BJ obs_plumb plumb_BJ 77.503
109 │ GW_Armando_plate2_TTGW02 GW_Armando_plate2_TTGW02 MN troch_MN troch_west 53.0
110 │ GW_Armando_plate2_TTGW03 GW_Armando_plate2_TTGW03 MN troch_MN troch_west 53.0
111 │ GW_Armando_plate2_TTGW05_rep3 GW_Armando_plate2_TTGW05_rep3 MN_rep troch_MN_rep troch_west_rep 53.0
112 │ GW_Armando_plate2_TTGW05_rep4 GW_Armando_plate2_TTGW05_rep4 MN_rep troch_MN_rep troch_west_rep 53.0
113 │ GW_Armando_plate2_TTGW08 GW_Armando_plate2_TTGW08 SU lud_Sukhto lud_central 47.0
114 │ GW_Armando_plate2_TTGW09 GW_Armando_plate2_TTGW09 SU lud_Sukhto lud_central 47.0
115 │ GW_Armando_plate2_TTGW12 GW_Armando_plate2_TTGW12 TH lud_Thallighar lud_central 43.0
116 │ GW_Armando_plate2_TTGW14 GW_Armando_plate2_TTGW14 TH lud_Thallighar lud_central 43.0
117 │ GW_Armando_plate2_TTGW15 GW_Armando_plate2_TTGW15 TH lud_Thallighar lud_central 43.0
118 │ GW_Armando_plate2_TTGW16 GW_Armando_plate2_TTGW16 TH lud_Thallighar lud_central 43.0
119 │ GW_Armando_plate2_TTGW18 GW_Armando_plate2_TTGW18 TH lud_Thallighar lud_central 43.0
120 │ GW_Armando_plate2_TTGW20 GW_Armando_plate2_TTGW20 SR lud_Sural lud_central 45.0
121 │ GW_Armando_plate2_TTGW24 GW_Armando_plate2_TTGW24 SR lud_Sural lud_central 45.0
122 │ GW_Armando_plate2_TTGW25 GW_Armando_plate2_TTGW25 SR lud_Sural lud_central 45.0
123 │ GW_Armando_plate2_TTGW27 GW_Armando_plate2_TTGW27 SR lud_Sural lud_central 45.0
124 │ GW_Armando_plate2_TTGW28 GW_Armando_plate2_TTGW28 SR lud_Sural lud_central 45.0
125 │ GW_Armando_plate2_TTGW50 GW_Armando_plate2_TTGW50 NG lud_Nainaghar lud_central 49.0
126 │ GW_Armando_plate2_TTGW51 GW_Armando_plate2_TTGW51 NG lud_Nainaghar lud_central 49.0
127 │ GW_Armando_plate2_TTGW54 GW_Armando_plate2_TTGW54 NG lud_Nainaghar lud_central 49.0
128 │ GW_Armando_plate2_TTGW56 GW_Armando_plate2_TTGW56 NG lud_Nainaghar lud_central 49.0
129 │ GW_Armando_plate2_TTGW60 GW_Armando_plate2_TTGW60 SP lud_Spiti troch_west 55.0
130 │ GW_Armando_plate2_TTGW61 GW_Armando_plate2_TTGW61 SP lud_Spiti troch_west 55.0
131 │ GW_Armando_plate2_TTGW62 GW_Armando_plate2_TTGW62 SP lud_Spiti troch_west 55.0
132 │ GW_Armando_plate2_TTGW67 GW_Armando_plate2_TTGW67 SP lud_Spiti troch_west 55.0
133 │ GW_Armando_plate2_TTGW69 GW_Armando_plate2_TTGW69 SP lud_Spiti troch_west 55.0
134 │ GW_Armando_plate2_TTGW73 GW_Armando_plate2_TTGW73 SA lud_Sathrundi lud_Sath 41.0
135 │ GW_Armando_plate2_TTGW75 GW_Armando_plate2_TTGW75 SA lud_Sathrundi lud_Sath 41.0
136 │ GW_Armando_plate2_TTGW77 GW_Armando_plate2_TTGW77 SA lud_Sathrundi lud_Sath 41.0
137 │ GW_Armando_plate2_TTGW79 GW_Armando_plate2_TTGW79 SA lud_Sathrundi lud_Sath 41.0
138 │ GW_Armando_plate2_TTGW80 GW_Armando_plate2_TTGW80 SA lud_Sathrundi lud_Sath 41.0
139 │ GW_Armando_plate2_TTGW_15_01 GW_Armando_plate2_TTGW_15_01 SR lud_Sural lud_central 45.0
140 │ GW_Armando_plate2_TTGW_15_02 GW_Armando_plate2_TTGW_15_02 SR lud_Sural lud_central 45.0
141 │ GW_Armando_plate2_TTGW_15_03 GW_Armando_plate2_TTGW_15_03 SR lud_Sural lud_central 45.0
142 │ GW_Armando_plate2_TTGW_15_04 GW_Armando_plate2_TTGW_15_04 SR lud_Sural lud_central 45.0
143 │ GW_Armando_plate2_TTGW_15_06 GW_Armando_plate2_TTGW_15_06 SR lud_Sural lud_central 45.0
144 │ GW_Armando_plate2_TTGW_15_10 GW_Armando_plate2_TTGW_15_10 SR lud_Sural lud_central 45.0
145 │ GW_Lane5_AA1 GW_Lane5_AA1 AA vir_S vir_S 25.0
146 │ GW_Lane5_AA10 GW_Lane5_AA10 AA vir_S vir_S 33.0
147 │ GW_Lane5_AA11 GW_Lane5_AA11 AA vir_S vir_S 34.0
148 │ GW_Lane5_AA3 GW_Lane5_AA3 AA vir_S vir_S 26.0
149 │ GW_Lane5_AA4 GW_Lane5_AA4 AA vir_S vir_S 27.0
150 │ GW_Lane5_AA5 GW_Lane5_AA5 AA vir_S vir_S 28.0
151 │ GW_Lane5_AA6 GW_Lane5_AA6 AA vir_S vir_S 29.0
152 │ GW_Lane5_AA7 GW_Lane5_AA7 AA vir_S vir_S 30.0
153 │ GW_Lane5_AA8 GW_Lane5_AA8 AA vir_S vir_S 31.0
154 │ GW_Lane5_AA9 GW_Lane5_AA9 AA vir_S vir_S 32.0
155 │ GW_Lane5_AB1 GW_Lane5_AB1 AB_rep vir_rep vir_rep 20.0
156 │ GW_Lane5_AB2 GW_Lane5_AB2 AB vir vir 21.0
157 │ GW_Lane5_AN1 GW_Lane5_AN1 AN plumb plumb 80.0
158 │ GW_Lane5_AN2 GW_Lane5_AN2 AN plumb plumb 81.0
159 │ GW_Lane5_BK2 GW_Lane5_BK2 BK plumb plumb 78.0
160 │ GW_Lane5_BK3 GW_Lane5_BK3 BK plumb plumb 79.0
161 │ GW_Lane5_DA2 GW_Lane5_DA2 XN obs obs 73.0
162 │ GW_Lane5_DA3 GW_Lane5_DA3 XN obs obs 74.0
163 │ GW_Lane5_DA4 GW_Lane5_DA4 XN obs obs 75.0
164 │ GW_Lane5_DA6 GW_Lane5_DA6 XN obs low_reads 76.0
165 │ GW_Lane5_DA7 GW_Lane5_DA7 XN obs obs 77.0
166 │ GW_Lane5_EM1 GW_Lane5_EM1 EM troch_EM troch_EM 72.0
167 │ GW_Lane5_IL1 GW_Lane5_IL1 IL plumb plumb 82.0
168 │ GW_Lane5_IL2 GW_Lane5_IL2 IL_rep plumb_rep plumb_rep 85.0
169 │ GW_Lane5_IL4 GW_Lane5_IL4 IL plumb plumb 83.0
170 │ GW_Lane5_KS1 GW_Lane5_KS1 OV lud_KS lud_KS 40.0
171 │ GW_Lane5_KS2 GW_Lane5_KS2 OV lud_KS lud_KS 40.0
172 │ GW_Lane5_LN1 GW_Lane5_LN1 LN troch_LN troch_LN 57.0
173 │ GW_Lane5_LN10 GW_Lane5_LN10 LN troch_LN troch_LN 64.0
174 │ GW_Lane5_LN11 GW_Lane5_LN11 LN troch_LN troch_LN 65.0
175 │ GW_Lane5_LN12 GW_Lane5_LN12 LN troch_LN troch_LN 66.0
176 │ GW_Lane5_LN14 GW_Lane5_LN14 LN troch_LN troch_LN 67.0
177 │ GW_Lane5_LN16 GW_Lane5_LN16 LN troch_LN troch_LN 68.0
178 │ GW_Lane5_LN18 GW_Lane5_LN18 LN troch_LN troch_LN 69.0
179 │ GW_Lane5_LN19 GW_Lane5_LN19 LN troch_LN troch_LN 70.0
180 │ GW_Lane5_LN2 GW_Lane5_LN2 LN_rep troch_LN_rep troch_LN_rep 58.0
181 │ GW_Lane5_LN20 GW_Lane5_LN20 LN troch_LN troch_LN 71.0
182 │ GW_Lane5_LN3 GW_Lane5_LN3 LN troch_LN troch_LN 59.0
183 │ GW_Lane5_LN4 GW_Lane5_LN4 LN troch_LN troch_LN 60.0
184 │ GW_Lane5_LN6 GW_Lane5_LN6 LN troch_LN troch_LN 61.0
185 │ GW_Lane5_LN7 GW_Lane5_LN7 LN troch_LN troch_LN 62.0
186 │ GW_Lane5_LN8 GW_Lane5_LN8 LN troch_LN troch_LN 63.0
187 │ GW_Lane5_MN1 GW_Lane5_MN1 MN troch_MN troch_west 51.0
188 │ GW_Lane5_MN12 GW_Lane5_MN12 MN troch_MN troch_west 56.0
189 │ GW_Lane5_MN3 GW_Lane5_MN3 MN troch_MN troch_west 52.0
190 │ GW_Lane5_MN5 GW_Lane5_MN5 MN troch_MN troch_west 53.0
191 │ GW_Lane5_MN8 GW_Lane5_MN8 MN troch_MN troch_west 54.0
192 │ GW_Lane5_MN9 GW_Lane5_MN9 MN troch_MN troch_west 55.0
193 │ GW_Lane5_NA1 GW_Lane5_NA1 NR lud_PK lud_PK 39.2
194 │ GW_Lane5_NA3-3ul GW_Lane5_NA3-3ul NR lud_PK lud_PK 39.2
195 │ GW_Lane5_PT11 GW_Lane5_PT11 KL lud_KL lud_central 42.0
196 │ GW_Lane5_PT12 GW_Lane5_PT12 KL lud_KL lud_central 42.0
197 │ GW_Lane5_PT2 GW_Lane5_PT2 ML lud_ML lud_ML 51.0
198 │ GW_Lane5_PT3 GW_Lane5_PT3 PA lud_PA lud_central 46.0
199 │ GW_Lane5_PT4 GW_Lane5_PT4 PA lud_PA lud_central 46.0
200 │ GW_Lane5_PT6 GW_Lane5_PT6 KL lud_KL lud_central 42.0
201 │ GW_Lane5_SH1 GW_Lane5_SH1 SH lud_PK lud_PK 39.1
202 │ GW_Lane5_SH2 GW_Lane5_SH2 SH lud_PK lud_PK 39.1
203 │ GW_Lane5_SH4 GW_Lane5_SH4 SH lud_PK lud_PK 39.1
204 │ GW_Lane5_SH5 GW_Lane5_SH5 SH lud_PK lud_PK 39.1
205 │ GW_Lane5_SL1 GW_Lane5_SL1 SL plumb plumb 150.0
206 │ GW_Lane5_SL2 GW_Lane5_SL2 SL plumb plumb 151.0
207 │ GW_Lane5_ST1 GW_Lane5_ST1 ST plumb plumb 142.0
208 │ GW_Lane5_ST12 GW_Lane5_ST12 ST plumb plumb 144.0
209 │ GW_Lane5_ST3 GW_Lane5_ST3 ST plumb plumb 143.0
210 │ GW_Lane5_STvi1 GW_Lane5_STvi1 ST_vi vir vir 22.0
211 │ GW_Lane5_STvi2 GW_Lane5_STvi2 ST_vi vir vir 23.0
212 │ GW_Lane5_STvi3 GW_Lane5_STvi3 ST_vi vir vir 24.0
213 │ GW_Lane5_TA1 GW_Lane5_TA1 TA plumb plumb 86.0
214 │ GW_Lane5_TL1 GW_Lane5_TL1 TL vir vir 9.0
215 │ GW_Lane5_TL10 GW_Lane5_TL10 TL vir vir 17.0
216 │ GW_Lane5_TL11 GW_Lane5_TL11 TL vir vir 18.0
217 │ GW_Lane5_TL12 GW_Lane5_TL12 TL vir vir 19.0
218 │ GW_Lane5_TL2 GW_Lane5_TL2 TL vir vir 10.0
219 │ GW_Lane5_TL3 GW_Lane5_TL3 TL_rep vir_rep vir_rep 11.0
220 │ GW_Lane5_TL4 GW_Lane5_TL4 TL vir vir 12.0
221 │ GW_Lane5_TL5 GW_Lane5_TL5 TL vir vir 13.0
222 │ GW_Lane5_TL7 GW_Lane5_TL7 TL vir vir 14.0
223 │ GW_Lane5_TL8 GW_Lane5_TL8 TL vir vir 15.0
224 │ GW_Lane5_TL9 GW_Lane5_TL9 TL vir vir 16.0
225 │ GW_Lane5_TU1 GW_Lane5_TU1 TU nit nit 35.0
226 │ GW_Lane5_TU2 GW_Lane5_TU2 TU nit nit 36.0
227 │ GW_Lane5_UY1 GW_Lane5_UY1 UY_rep plumb_rep plumb_rep 93.0
228 │ GW_Lane5_UY2 GW_Lane5_UY2 UY plumb plumb 88.0
229 │ GW_Lane5_UY3 GW_Lane5_UY3 UY plumb plumb 89.0
230 │ GW_Lane5_UY4 GW_Lane5_UY4 UY plumb plumb 90.0
231 │ GW_Lane5_UY5 GW_Lane5_UY5 UY plumb plumb 91.0
232 │ GW_Lane5_UY6 GW_Lane5_UY6 UY plumb plumb 92.0
233 │ GW_Lane5_YK1 GW_Lane5_YK1 YK vir vir 1.0
234 │ GW_Lane5_YK11 GW_Lane5_YK11 YK vir vir 8.0
235 │ GW_Lane5_YK3 GW_Lane5_YK3 YK vir vir 2.0
236 │ GW_Lane5_YK4 GW_Lane5_YK4 YK vir vir 3.0
237 │ GW_Lane5_YK5 GW_Lane5_YK5 YK vir vir 4.0
238 │ GW_Lane5_YK6 GW_Lane5_YK6 YK vir vir 5.0
239 │ GW_Lane5_YK7 GW_Lane5_YK7 YK vir vir 6.0
240 │ GW_Lane5_YK9 GW_Lane5_YK9 YK vir vir 7.0
241 │ GW_Liz_GBS_Liz10045 GW_Liz_GBS_Liz10045 ML lud lud_ML 51.01
242 │ GW_Liz_GBS_Liz10094 GW_Liz_GBS_Liz10094 ML lud lud_ML 51.02
243 │ GW_Liz_GBS_Liz5101 GW_Liz_GBS_Liz5101 ML lud lud_ML 51.03
244 │ GW_Liz_GBS_Liz5101_R GW_Liz_GBS_Liz5101_R ML_rep lud_rep lud_ML_rep 51.04
245 │ GW_Liz_GBS_Liz5118 GW_Liz_GBS_Liz5118 ML lud lud_ML 51.05
246 │ GW_Liz_GBS_Liz5139 GW_Liz_GBS_Liz5139 ML lud lud_ML 51.06
247 │ GW_Liz_GBS_Liz5142 GW_Liz_GBS_Liz5142 ML lud lud_ML 51.07
248 │ GW_Liz_GBS_Liz5144 GW_Liz_GBS_Liz5144 ML lud lud_ML 51.08
249 │ GW_Liz_GBS_Liz5150 GW_Liz_GBS_Liz5150 ML lud lud_ML 51.09
250 │ GW_Liz_GBS_Liz5159 GW_Liz_GBS_Liz5159 ML lud_chick lud_ML 51.1
251 │ GW_Liz_GBS_Liz5162 GW_Liz_GBS_Liz5162 ML lud_chick lud_ML 51.11
252 │ GW_Liz_GBS_Liz5163 GW_Liz_GBS_Liz5163 ML lud_chick lud_ML 51.12
253 │ GW_Liz_GBS_Liz5164 GW_Liz_GBS_Liz5164 ML lud_chick lud_ML 51.13
254 │ GW_Liz_GBS_Liz5165 GW_Liz_GBS_Liz5165 ML lud lud_ML 51.14
255 │ GW_Liz_GBS_Liz5167 GW_Liz_GBS_Liz5167 ML lud_chick lud_ML 51.15
256 │ GW_Liz_GBS_Liz5168 GW_Liz_GBS_Liz5168 ML lud_chick lud_ML 51.16
257 │ GW_Liz_GBS_Liz5169 GW_Liz_GBS_Liz5169 ML lud_chick lud_ML 51.17
258 │ GW_Liz_GBS_Liz5171 GW_Liz_GBS_Liz5171 ML lud lud_ML 51.18
259 │ GW_Liz_GBS_Liz5172 GW_Liz_GBS_Liz5172 ML lud_chick lud_ML 51.19
260 │ GW_Liz_GBS_Liz5173 GW_Liz_GBS_Liz5173 ML lud_chick lud_ML 51.2
261 │ GW_Liz_GBS_Liz5174 GW_Liz_GBS_Liz5174 ML lud lud_ML 51.21
262 │ GW_Liz_GBS_Liz5175 GW_Liz_GBS_Liz5175 ML lud lud_ML 51.22
263 │ GW_Liz_GBS_Liz5176 GW_Liz_GBS_Liz5176 ML lud lud_ML 51.23
264 │ GW_Liz_GBS_Liz5177 GW_Liz_GBS_Liz5177 ML lud_chick lud_ML 51.24
265 │ GW_Liz_GBS_Liz5178 GW_Liz_GBS_Liz5178 ML lud_chick lud_ML 51.25
266 │ GW_Liz_GBS_Liz5179 GW_Liz_GBS_Liz5179 ML lud_chick lud_ML 51.26
267 │ GW_Liz_GBS_Liz5180 GW_Liz_GBS_Liz5180 ML lud lud_ML 51.27
268 │ GW_Liz_GBS_Liz5182 GW_Liz_GBS_Liz5182 ML lud_chick lud_ML 51.28
269 │ GW_Liz_GBS_Liz5184 GW_Liz_GBS_Liz5184 ML lud_chick lud_ML 51.29
270 │ GW_Liz_GBS_Liz5185 GW_Liz_GBS_Liz5185 ML lud lud_ML 51.3
271 │ GW_Liz_GBS_Liz5186 GW_Liz_GBS_Liz5186 ML lud_chick lud_ML 51.31
272 │ GW_Liz_GBS_Liz5187 GW_Liz_GBS_Liz5187 ML lud_chick lud_ML 51.32
273 │ GW_Liz_GBS_Liz5188 GW_Liz_GBS_Liz5188 ML lud lud_ML 51.33
274 │ GW_Liz_GBS_Liz5189 GW_Liz_GBS_Liz5189 ML lud_chick lud_ML 51.34
275 │ GW_Liz_GBS_Liz5190 GW_Liz_GBS_Liz5190 ML lud_chick lud_ML 51.35
276 │ GW_Liz_GBS_Liz5191 GW_Liz_GBS_Liz5191 ML lud_chick lud_ML 51.36
277 │ GW_Liz_GBS_Liz5192 GW_Liz_GBS_Liz5192 ML lud_chick lud_ML 51.37
278 │ GW_Liz_GBS_Liz5193 GW_Liz_GBS_Liz5193 ML lud_chick lud_ML 51.38
279 │ GW_Liz_GBS_Liz5194 GW_Liz_GBS_Liz5194 ML lud_chick lud_ML 51.39
280 │ GW_Liz_GBS_Liz5195 GW_Liz_GBS_Liz5195 ML lud lud_ML 51.4
281 │ GW_Liz_GBS_Liz5197 GW_Liz_GBS_Liz5197 ML lud lud_ML 51.41
282 │ GW_Liz_GBS_Liz5199 GW_Liz_GBS_Liz5199 ML lud_chick lud_ML 51.42
283 │ GW_Liz_GBS_Liz6002 GW_Liz_GBS_Liz6002 ML lud lud_ML 51.43
284 │ GW_Liz_GBS_Liz6006 GW_Liz_GBS_Liz6006 ML lud lud_ML 51.44
285 │ GW_Liz_GBS_Liz6008 GW_Liz_GBS_Liz6008 ML lud lud_ML 51.45
286 │ GW_Liz_GBS_Liz6009 GW_Liz_GBS_Liz6009 ML lud lud_ML 51.46
287 │ GW_Liz_GBS_Liz6010 GW_Liz_GBS_Liz6010 ML lud lud_ML 51.47
288 │ GW_Liz_GBS_Liz6012 GW_Liz_GBS_Liz6012 ML lud lud_ML 51.48
289 │ GW_Liz_GBS_Liz6014 GW_Liz_GBS_Liz6014 ML lud lud_ML 51.49
290 │ GW_Liz_GBS_Liz6055 GW_Liz_GBS_Liz6055 ML lud lud_ML 51.5
291 │ GW_Liz_GBS_Liz6057 GW_Liz_GBS_Liz6057 ML lud lud_ML 51.51
292 │ GW_Liz_GBS_Liz6060 GW_Liz_GBS_Liz6060 ML lud lud_ML 51.52
293 │ GW_Liz_GBS_Liz6062 GW_Liz_GBS_Liz6062 ML lud lud_ML 51.53
294 │ GW_Liz_GBS_Liz6063 GW_Liz_GBS_Liz6063 ML lud lud_ML 51.54
295 │ GW_Liz_GBS_Liz6066 GW_Liz_GBS_Liz6066 ML lud lud_ML 51.55
296 │ GW_Liz_GBS_Liz6072 GW_Liz_GBS_Liz6072 ML lud lud_ML 51.56
297 │ GW_Liz_GBS_Liz6079 GW_Liz_GBS_Liz6079 ML lud lud_ML 51.57
298 │ GW_Liz_GBS_Liz6203 GW_Liz_GBS_Liz6203 ML lud_chick lud_ML 51.58
299 │ GW_Liz_GBS_Liz6204 GW_Liz_GBS_Liz6204 ML lud_chick lud_ML 51.59
300 │ GW_Liz_GBS_Liz6461 GW_Liz_GBS_Liz6461 ML lud lud_ML 51.6
301 │ GW_Liz_GBS_Liz6472 GW_Liz_GBS_Liz6472 ML lud lud_ML 51.61
302 │ GW_Liz_GBS_Liz6478 GW_Liz_GBS_Liz6478 ML lud lud_ML 51.62
303 │ GW_Liz_GBS_Liz6766 GW_Liz_GBS_Liz6766 ML lud lud_ML 51.63
304 │ GW_Liz_GBS_Liz6776 GW_Liz_GBS_Liz6776 ML lud lud_ML 51.64
305 │ GW_Liz_GBS_Liz6794 GW_Liz_GBS_Liz6794 ML lud lud_ML 51.65
306 │ GW_Liz_GBS_P_fusc GW_Liz_GBS_P_fusc fusc fusc fusc 201.0
307 │ GW_Liz_GBS_P_h_man GW_Liz_GBS_P_h_man hmand hmand hmand 202.0
308 │ GW_Liz_GBS_P_humei GW_Liz_GBS_P_humei hume hume hume 203.0
309 │ GW_Liz_GBS_P_inor GW_Liz_GBS_P_inor inor inor inor 204.0
310 │ GW_Liz_GBS_S_burk GW_Liz_GBS_S_burk burk burk burk 205.0
GOOD NEWS: names of individuals in metadata file and genotype ind file match perfectly.
Polish a few individual names (to match those in other metadata object above, and make more readable graphs):
= correctNames(ind_with_metadata_chr4A.ind)
ind_with_metadata_chr4A.ind = correctNames(ind_with_metadata_chr4A.ID) ind_with_metadata_chr4A.ID
310-element Vector{String}:
"GW_Armando_plate1_AB1"
"GW_Armando_plate1_JF07G02"
"GW_Armando_plate1_JF07G03"
"GW_Armando_plate1_JF07G04"
"GW_Armando_plate1_JF08G02"
"GW_Armando_plate1_JF09G01"
"GW_Armando_plate1_JF09G02"
"GW_Armando_plate1_JF10G03"
"GW_Armando_plate1_JF11G01"
"GW_Armando_plate1_JF12G01"
"GW_Armando_plate1_JF12G02"
"GW_Armando_plate1_JF12G04"
"GW_Armando_plate1_JF13G01"
⋮
"GW_Liz_GBS_Liz6204"
"GW_Liz_GBS_Liz6461"
"GW_Liz_GBS_Liz6472"
"GW_Liz_GBS_Liz6478"
"GW_Liz_GBS_Liz6766"
"GW_Liz_GBS_Liz6776"
"GW_Liz_GBS_Liz6794"
"GW_Liz_GBS_P_fusc"
"GW_Liz_GBS_P_h_man"
"GW_Liz_GBS_P_humei"
"GW_Liz_GBS_P_inor"
"GW_Liz_GBS_S_burk"
Filter to just the individuals also included in the analysis of LHBRs above
= map(in(ind_with_metadata_included.ind), ind_with_metadata_chr4A.ind)
selection
= ind_with_metadata_chr4A[selection, :]
ind_with_metadata_chr4A_included
# select genotypes of just the included individuals, and ignore first column
= geno_chr4A[selection, 2:end]
geno_chr4A_included
#
println(ind_with_metadata_included.gw4A_cluster)
["virLud", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "virLud_obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "virLud", "virLud", "obsPlumb", "virLud", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "virLud_obsPlumb", "virLud_obsPlumb", "obsPlumb", "virLud_obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "virLudHet", "troch", "virLud_troch", "obsPlumb", "virLud_troch", "troch_obsPlumb", "troch_obsPlumb", "troch_obsPlumb", "virLudHet", "virLud_obsPlumb", "virLudHet", "obsPlumb", "virLud_obsPlumb", "virLud_troch", "virLud_obsPlumb", "obsPlumb", "troch_obsPlumb", "troch_obsPlumb", "troch_obsPlumb", "troch", "troch_obsPlumb", "troch", "troch", "troch", "virLud_troch", "troch", "trochHet", "virLud_obsPlumb", "obsPlumb", "obsPlumb", "virLud_obsPlumb", "obsPlumb", "virLud", "virLud", "virLud", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "virLud", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "virLud", "virLud_obsPlumb", "obsPlumb", "obsPlumbHet", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "troch", "obsPlumb", "troch", "troch", "troch_obsPlumb", "virLud_troch", "obsPlumb", "obsPlumb", "obsPlumb", "virLud_obsPlumb", "virLud_obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "troch_obsPlumb", "obsPlumb", "virLud_obsPlumb", "trochHet", "troch", "troch", "troch_obsPlumb", "troch", "troch", "troch", "troch", "troch", "troch", "virLud_obsPlumb", "virLud_troch", "virLud_obsPlumb", "obsPlumb", "virLud_obsPlumb", "virLud_obsPlumb", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "troch_obsPlumb", "obsPlumb", "obsPlumb", "virLud_obsPlumb", "virLud_obsPlumb", "troch", "troch", "troch", "trochHet", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "troch", "virLud", "obsPlumb", "virLud", "obsPlumb", "troch", "obsPlumb", "obsPlumb", "virLud_obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "obsPlumb", "virLud", "virLud", "virLud", "obsPlumb", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "nit", "nit", "obsPlumb", "obsPlumb", "virLud_obsPlumb", "obsPlumb", "obsPlumb", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "troch_obsPlumb", "troch", "troch", "troch", "troch_obsPlumb", "troch", "troch", "troch", "troch", "troch", "troch", "troch_obsPlumb", "troch", "troch", "troch_obsPlumb", "troch_obsPlumb", "troch", "troch_obsPlumb", "troch", "virLud_obsPlumb", "troch_obsPlumb", "troch_obsPlumb", "troch", "virLud_troch", "troch", "troch_obsPlumb", "troch", "virLud_troch", "virLud_troch", "troch_obsPlumb", "troch", "troch", "virLud_troch", "troch", "troch_obsPlumb", "troch", "troch", "troch", "trochHet", "troch", "troch", "troch", "troch"]
Look up the chr4A individual membership in homozygous clusters, and calculate pi and Dxy
= ind_with_metadata_included.gw4A_cluster
indClusterMembership_gw4A
= ["virLud",
clusterNames_gw4A "nit",
"troch",
"obsPlumb"]
# get boundaries of gw4A LHBR:
= "gw4A"
chr
positionMin_chr4A_LHBR, positionMax_chr4A_LHBR, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# select the loci within the gw4A LHBR:
= (positionMin_chr4A_LHBR .<= pos_chr4A.position .<= positionMax_chr4A_LHBR)
selection
= geno_chr4A_included[:, selection]
geno_chr4A_included_LHBR
= pos_chr4A[selection, :]
pos_chr4A_LHBR
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(geno_chr4A_included_LHBR, indClusterMembership_gw4A, clusterNames_gw4A)
freqs, sampleSizes
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames_gw4A)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames_gw4A; among=false)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames_gw4A, pi = getRegionPi(sitePi))
regionPiTable #= 4×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼───────────────────────
1 │ virLud 0.000956575
2 │ nit 0.000332204
3 │ troch 0.000613901
4 │ obsPlumb 0.000261819 =#
# average pi (for chr 4A LHBR) among three major groups:
0.000956575 + 0.000613901 + 0.000261819) / 3
(# 0.000610765
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 6×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼─────────────────────────────
1 │ virLud_nit 0.00325609
2 │ virLud_troch 0.0031813
3 │ virLud_obsPlumb 0.00241666
4 │ nit_troch 0.00286634
5 │ nit_obsPlumb 0.00249507
6 │ troch_obsPlumb 0.00305931 =#
# average Dxy (for chr 4A LHBR) among three major groups:
0.0031813 + 0.00241666 + 0.00305931) / 3
(# 0.0028857566666666674
# Drawing phylogeny (in Illustrator) based on above, between three major groups.
# Ignoring nit, the most recent connection is between virLud and obsPlumb (0.00241666).
# For deeper branch length, am using:
# Calculation for average Dxy between troch and (virLud, obsPlumb):
0.0031813 + 0.00305931) / 2
(# 0.003120305
Good news: 1 region on that scaffold
0.0031203050000000003
Wow, that is an amazing difference between pi within the obsPlumb haplotype and Dxy between that and others (roughly 10x).
Calculate pi and Dxy outside of the chr 4A LHBR (grouped by the LHBR homozygous groups)
# select the loci outside of the gw4A LHBR:
= .!(positionMin_chr4A_LHBR .<= pos_chr4A.position .<= positionMax_chr4A_LHBR)
selection
= geno_chr4A_included[:, selection]
geno_chr4A_included_nonLHBR
= pos_chr4A[selection, :]
pos_chr4A_nonLHBR
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(geno_chr4A_included_nonLHBR, indClusterMembership_gw4A, clusterNames_gw4A)
freqs, sampleSizes
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames_gw4A)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames_gw4A; among=false)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames_gw4A, pi = getRegionPi(sitePi))
regionPiTable #= 4×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼──────────────────────
1 │ virLud 0.0041321
2 │ nit 0.00196343
3 │ troch 0.00551821
4 │ obsPlumb 0.0055897 =#
# average pi (for chr 4A NOT in LHBR) among three major groups:
0.0041321 + 0.00551821 + 0.0055897) / 3
(# 0.005080003333333333
#ratio of average pi outside to average pi within chr 4A LHBR:
0.005080003333333333 / 0.000610765
# 8.317443424776032
# percent lower that average pi is within compared to outside LHBR:
100 * (8.317443424776032 - 1) / 8.317443424776032
# 87.97707481819238
# for obsPlumb haplotype, ratio of pi outside to pi within chr 4A LHBR:
0.0055897 / 0.000261819
# 21.349481893980194
# percent lower that pi of obsPlumb haplotype is within vs. outside of HLBR:
100 * (21.349481893980194 - 1) / 21.349481893980194
# 95.31604558384171
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 6×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼─────────────────────────────
1 │ virLud_nit 0.00440681
2 │ virLud_troch 0.0055599
3 │ virLud_obsPlumb 0.00556595
4 │ nit_troch 0.00520058
5 │ nit_obsPlumb 0.00548082
6 │ troch_obsPlumb 0.00601295 =#
# average Dxy (for OUTSIDE of chr 4A LHBR) among three major groups:
0.0055599 + 0.00556595 + 0.00601295) / 3
(# 0.005712933333333333
#ratio of average Dxy outside to average Dxy within chr 4A LHBR among 3 major groups:
0.005712933333333333 / 0.0028857566666666674
# 1.9797002981309344
# percent lower that average Dxy is within compared to outside LHBR:
100 * (1.9797002981309344 - 1) / 1.9797002981309344
# 49.487303661866626
# Drawing phylogeny (in Illustrator) based on above, between three major groups.
# In this case, the virLud_troch is the lower Dxy so am connecting those more recently.
# For deeper brancha length, using this:
# Calculation of average distance between obsPlumb and (virLud, troch)
0.00556595 + 0.00601295) / 2
(# 0.00578945
0.00578945
Remarkable differences between pi and Dxy in the gw4A LHBR, and between LHBR and non-LHBR part of that chromosome!
Do same with chr 3, which also shows 3 clear haplotype groups, but a very different biogeographic pattern than 4A:
Examine chromosome 3 Large HaploBlock Region (LHBR) with invariant sites included
Before running below, I need to change format of 012NA file to # Before running below, changed 012NA file back into 012minus1 file, using commands like below, so can be read as integer:
cat /Users/darrenirwin/GW_data_from_cedar_Feb2024/GW2022_cedar/infoSites_vcfs/GW2022_all4plates.genotypes.allSites.chrgw3.infoSites.max2allele_noindel.maxmiss60.MQ20.lowHet.tab.012NA | sed 's/NA/-1/g' > /Users/darrenirwin/GW_data_from_cedar_Feb2024/GW2022_cedar/infoSites_vcfs/GW2022_all4plates.genotypes.allSites.chrgw3.infoSites.max2allele_noindel.maxmiss60.MQ20.lowHet.tab.012minus1
= "/Users/darrenirwin/GW_data_from_cedar_Feb2024/GW2022_cedar/infoSites_vcfs/GW2022_all4plates.genotypes.allSites.chrgw3.infoSites.max2allele_noindel.maxmiss60.MQ20.lowHet.tab"
baseName # load metadata
cd(dataDirectory)
= DataFrame(CSV.File(metadataFile)) # the CSV.File function interprets the correct delimiter
metadata_chr3 = ncol(metadata_chr3)
num_metadata_cols_chr3 = nrow(metadata_chr3)
num_individuals_chr3 # read in individual names for this dataset
= string(baseName, ".012.indv")
individuals_file_name_chr3 = DataFrame(CSV.File(individuals_file_name_chr3; header=["ind"], types=[String]))
ind_chr3 = size(ind_chr3, 1) # number of individuals
indNum_chr3 if num_individuals_chr3 != indNum_chr3
println("WARNING: number of rows in metadata file different than number of individuals in .indv file")
end
# read in position data for this dataset
= string(baseName, ".012.pos")
position_file_name_chr3 = DataFrame(CSV.File(position_file_name_chr3; header=["chrom", "position"], types=[String, Int]))
pos_chr3 # read in genotype data
= string(baseName, ".012minus1")
genotype_file_name_chr3 @time if 1 <= indNum_chr3 <= 127
= readdlm(genotype_file_name_chr3, '\t', Int8, '\n'); # this has been sped up dramatically, by first converting "NA" to -1
geno_chr3 elseif 128 <= indNum_chr3 <= 32767
= readdlm(genotype_file_name_chr3, '\t', Int16, '\n'); # this needed for first column, which is number of individual; Int16 not much slower on import than Int8
geno_chr3 else
print("Error: Number of individuals in .indv appears outside of range from 1 to 32767")
end
= size(geno_chr3, 2) - 1 # because the first column is not a SNP (just a count from zero)
loci_count_chr3 print(string("Read in genotypic data at ", loci_count_chr3," loci for ", indNum_chr3, " individuals. \n"))
53.001126 seconds (2.97 M allocations: 15.221 GiB, 19.54% gc time, 0.15% compilation time)
Read in genotypic data at 1855532 loci for 310 individuals.
Check that individuals are same in genotype data and metadata
= hcat(ind_chr3, metadata_chr3)
ind_with_metadata_chr3 println(ind_with_metadata_chr3)
println() # prints a line break
if isequal(ind_with_metadata_chr3.ind, ind_with_metadata_chr3.ID)
println("GOOD NEWS: names of individuals in metadata file and genotype ind file match perfectly.")
else
println("WARNING: names of individuals in metadata file and genotype ind file do not completely match.")
end
310×6 DataFrame
Row │ ind ID location group Fst_group plot_order
│ String String31 String7 String15 String15 Float64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ GW_Armando_plate1_AB1 GW_Armando_plate1_AB1 AB vir vir 20.01
2 │ GW_Armando_plate1_JF07G02 GW_Armando_plate1_JF07G02 ST plumb plumb 108.0
3 │ GW_Armando_plate1_JF07G03 GW_Armando_plate1_JF07G03 ST plumb plumb 109.0
4 │ GW_Armando_plate1_JF07G04 GW_Armando_plate1_JF07G04 ST plumb plumb 110.0
5 │ GW_Armando_plate1_JF08G02 GW_Armando_plate1_JF08G02 ST plumb plumb 111.0
6 │ GW_Armando_plate1_JF09G01 GW_Armando_plate1_JF09G01 ST plumb plumb 112.0
7 │ GW_Armando_plate1_JF09G02 GW_Armando_plate1_JF09G02 ST plumb plumb 113.0
8 │ GW_Armando_plate1_JF10G03 GW_Armando_plate1_JF10G03 ST plumb_vir plumb_vir 170.0
9 │ GW_Armando_plate1_JF11G01 GW_Armando_plate1_JF11G01 ST plumb plumb 114.0
10 │ GW_Armando_plate1_JF12G01 GW_Armando_plate1_JF12G01 ST plumb plumb 115.0
11 │ GW_Armando_plate1_JF12G02 GW_Armando_plate1_JF12G02 ST plumb plumb 116.0
12 │ GW_Armando_plate1_JF12G04 GW_Armando_plate1_JF12G04 ST_vi vir vir 24.001
13 │ GW_Armando_plate1_JF13G01 GW_Armando_plate1_JF13G01 ST plumb plumb 117.0
14 │ GW_Armando_plate1_JF15G03 GW_Armando_plate1_JF15G03 DV plumb plumb 103.0
15 │ GW_Armando_plate1_JF16G01 GW_Armando_plate1_JF16G01 DV_vi plumb_vir vir 24.041
16 │ GW_Armando_plate1_JF20G01 GW_Armando_plate1_JF20G01 MB plumb plumb 94.0
17 │ GW_Armando_plate1_JF22G01 GW_Armando_plate1_JF22G01 MB plumb plumb 95.0
18 │ GW_Armando_plate1_JF23G01 GW_Armando_plate1_JF23G01 VB plumb plumb 98.0
19 │ GW_Armando_plate1_JF23G02 GW_Armando_plate1_JF23G02 VB plumb plumb 99.0
20 │ GW_Armando_plate1_JF24G02 GW_Armando_plate1_JF24G02 VB plumb plumb 100.0
21 │ GW_Armando_plate1_JF26G01 GW_Armando_plate1_JF26G01 ST plumb plumb 118.0
22 │ GW_Armando_plate1_JF27G01 GW_Armando_plate1_JF27G01 ST plumb plumb 119.0
23 │ GW_Armando_plate1_JF29G01 GW_Armando_plate1_JF29G01 ST plumb plumb 120.0
24 │ GW_Armando_plate1_JF29G02 GW_Armando_plate1_JF29G02 ST plumb plumb 121.0
25 │ GW_Armando_plate1_JF29G03 GW_Armando_plate1_JF29G03 ST plumb plumb 122.0
26 │ GW_Armando_plate1_JG02G02 GW_Armando_plate1_JG02G02 PR plumb plumb 145.0
27 │ GW_Armando_plate1_JG02G04 GW_Armando_plate1_JG02G04 PR plumb plumb 146.0
28 │ GW_Armando_plate1_JG08G01 GW_Armando_plate1_JG08G01 ST plumb plumb 123.0
29 │ GW_Armando_plate1_JG08G02 GW_Armando_plate1_JG08G02 ST plumb plumb 124.0
30 │ GW_Armando_plate1_JG10G01 GW_Armando_plate1_JG10G01 ST plumb plumb 125.0
31 │ GW_Armando_plate1_JG12G01 GW_Armando_plate1_JG12G01 ST plumb plumb 126.0
32 │ GW_Armando_plate1_JG17G01 GW_Armando_plate1_JG17G01 ST plumb_vir plumb 127.0
33 │ GW_Armando_plate1_NO_BC_TTGW05 GW_Armando_plate1_NO_BC_TTGW05 blank blank blank -99.0
34 │ GW_Armando_plate1_NO_DNA GW_Armando_plate1_NO_DNA blank blank blank -99.0
35 │ GW_Armando_plate1_RF20G01 GW_Armando_plate1_RF20G01 BJ obs_plumb plumb_BJ 77.501
36 │ GW_Armando_plate1_RF29G02 GW_Armando_plate1_RF29G02 BJ obs_plumb plumb_BJ 77.502
37 │ GW_Armando_plate1_TL3 GW_Armando_plate1_TL3 TL vir vir 11.01
38 │ GW_Armando_plate1_TTGW01 GW_Armando_plate1_TTGW01 MN troch_MN troch_west 53.0
39 │ GW_Armando_plate1_TTGW05_rep1 GW_Armando_plate1_TTGW05_rep1 MN_rep troch_MN_rep troch_west_rep 53.0
40 │ GW_Armando_plate1_TTGW05_rep2 GW_Armando_plate1_TTGW05_rep2 MN troch_MN troch_west 53.0
41 │ GW_Armando_plate1_TTGW06 GW_Armando_plate1_TTGW06 SU lud_Sukhto lud_central 47.0
42 │ GW_Armando_plate1_TTGW07 GW_Armando_plate1_TTGW07 SU lud_Sukhto lud_central 47.0
43 │ GW_Armando_plate1_TTGW10 GW_Armando_plate1_TTGW10 SU lud_Sukhto lud_central 47.0
44 │ GW_Armando_plate1_TTGW11 GW_Armando_plate1_TTGW11 SU lud_Sukhto lud_central 47.0
45 │ GW_Armando_plate1_TTGW13 GW_Armando_plate1_TTGW13 TH lud_Thallighar lud_central 43.0
46 │ GW_Armando_plate1_TTGW17 GW_Armando_plate1_TTGW17 TH lud_Thallighar lud_central 43.0
47 │ GW_Armando_plate1_TTGW19 GW_Armando_plate1_TTGW19 TH lud_Thallighar lud_central 43.0
48 │ GW_Armando_plate1_TTGW21 GW_Armando_plate1_TTGW21 SR lud_Sural lud_central 45.0
49 │ GW_Armando_plate1_TTGW22 GW_Armando_plate1_TTGW22 SR lud_Sural lud_central 45.0
50 │ GW_Armando_plate1_TTGW23 GW_Armando_plate1_TTGW23 SR lud_Sural lud_central 45.0
51 │ GW_Armando_plate1_TTGW29 GW_Armando_plate1_TTGW29 SR lud_Sural lud_central 45.0
52 │ GW_Armando_plate1_TTGW52 GW_Armando_plate1_TTGW52 NG lud_Nainaghar lud_central 49.0
53 │ GW_Armando_plate1_TTGW53 GW_Armando_plate1_TTGW53 NG lud_Nainaghar lud_central 49.0
54 │ GW_Armando_plate1_TTGW55 GW_Armando_plate1_TTGW55 NG lud_Nainaghar lud_central 49.0
55 │ GW_Armando_plate1_TTGW57 GW_Armando_plate1_TTGW57 NG lud_Nainaghar lud_central 49.0
56 │ GW_Armando_plate1_TTGW58 GW_Armando_plate1_TTGW58 NG lud_Nainaghar lud_central 49.0
57 │ GW_Armando_plate1_TTGW59 GW_Armando_plate1_TTGW59 NG lud_Nainaghar lud_central 49.0
58 │ GW_Armando_plate1_TTGW63 GW_Armando_plate1_TTGW63 SP lud_Spiti troch_west 55.0
59 │ GW_Armando_plate1_TTGW64 GW_Armando_plate1_TTGW64 SP lud_Spiti troch_west 55.0
60 │ GW_Armando_plate1_TTGW65 GW_Armando_plate1_TTGW65 SP lud_Spiti troch_west 55.0
61 │ GW_Armando_plate1_TTGW66 GW_Armando_plate1_TTGW66 SP lud_Spiti troch_west 55.0
62 │ GW_Armando_plate1_TTGW68 GW_Armando_plate1_TTGW68 SP lud_Spiti troch_west 55.0
63 │ GW_Armando_plate1_TTGW70 GW_Armando_plate1_TTGW70 SA lud_Sathrundi lud_Sath 41.0
64 │ GW_Armando_plate1_TTGW71 GW_Armando_plate1_TTGW71 SA lud_Sathrundi lud_Sath 41.0
65 │ GW_Armando_plate1_TTGW72 GW_Armando_plate1_TTGW72 SA lud_Sathrundi lud_Sath 41.0
66 │ GW_Armando_plate1_TTGW74 GW_Armando_plate1_TTGW74 SA lud_Sathrundi lud_Sath 41.0
67 │ GW_Armando_plate1_TTGW78 GW_Armando_plate1_TTGW78 SA lud_Sathrundi lud_Sath 41.0
68 │ GW_Armando_plate1_TTGW_15_05 GW_Armando_plate1_TTGW_15_05 SR lud_Sural lud_central 45.0
69 │ GW_Armando_plate1_TTGW_15_07 GW_Armando_plate1_TTGW_15_07 SR lud_Sural lud_central 45.0
70 │ GW_Armando_plate1_TTGW_15_08 GW_Armando_plate1_TTGW_15_08 SR lud_Sural lud_central 45.0
71 │ GW_Armando_plate1_TTGW_15_09 GW_Armando_plate1_TTGW_15_09 SR lud_Sural lud_central 45.0
72 │ GW_Armando_plate1_UY1 GW_Armando_plate1_UY1 UY plumb plumb 87.0
73 │ GW_Armando_plate2_IL2 GW_Armando_plate2_IL2 IL_rep plumb_rep plumb_rep 84.0
74 │ GW_Armando_plate2_JE31G01 GW_Armando_plate2_JE31G01 VB_vi vir_misID vir 24.002
75 │ GW_Armando_plate2_JF03G01 GW_Armando_plate2_JF03G01 ST_vi vir_misID vir 24.003
76 │ GW_Armando_plate2_JF03G02 GW_Armando_plate2_JF03G02 VB_vi vir_misID vir 24.004
77 │ GW_Armando_plate2_JF07G01 GW_Armando_plate2_JF07G01 ST plumb plumb 128.0
78 │ GW_Armando_plate2_JF08G04 GW_Armando_plate2_JF08G04 ST plumb plumb 129.0
79 │ GW_Armando_plate2_JF10G02 GW_Armando_plate2_JF10G02 ST plumb plumb 130.0
80 │ GW_Armando_plate2_JF11G02 GW_Armando_plate2_JF11G02 ST plumb plumb 131.0
81 │ GW_Armando_plate2_JF12G03 GW_Armando_plate2_JF12G03 ST plumb plumb 132.0
82 │ GW_Armando_plate2_JF12G05 GW_Armando_plate2_JF12G05 ST plumb plumb 133.0
83 │ GW_Armando_plate2_JF13G02 GW_Armando_plate2_JF13G02 ST plumb plumb 134.0
84 │ GW_Armando_plate2_JF14G01 GW_Armando_plate2_JF14G01 DV plumb plumb 104.0
85 │ GW_Armando_plate2_JF14G02 GW_Armando_plate2_JF14G02 DV plumb plumb 105.0
86 │ GW_Armando_plate2_JF15G01 GW_Armando_plate2_JF15G01 DV plumb plumb 106.0
87 │ GW_Armando_plate2_JF15G02 GW_Armando_plate2_JF15G02 DV plumb plumb 107.0
88 │ GW_Armando_plate2_JF16G02 GW_Armando_plate2_JF16G02 DV_vi plumb_vir vir 24.042
89 │ GW_Armando_plate2_JF19G01 GW_Armando_plate2_JF19G01 MB plumb plumb 96.0
90 │ GW_Armando_plate2_JF20G02 GW_Armando_plate2_JF20G02 MB plumb plumb 97.0
91 │ GW_Armando_plate2_JF24G01 GW_Armando_plate2_JF24G01 VB plumb plumb 101.0
92 │ GW_Armando_plate2_JF24G03 GW_Armando_plate2_JF24G03 ST plumb plumb 135.0
93 │ GW_Armando_plate2_JF25G01 GW_Armando_plate2_JF25G01 VB plumb plumb 102.0
94 │ GW_Armando_plate2_JF26G02 GW_Armando_plate2_JF26G02 ST plumb plumb 136.0
95 │ GW_Armando_plate2_JF27G02 GW_Armando_plate2_JF27G02 ST plumb plumb 137.0
96 │ GW_Armando_plate2_JF30G01 GW_Armando_plate2_JF30G01 ST_vi vir_misID vir 24.005
97 │ GW_Armando_plate2_JG01G01 GW_Armando_plate2_JG01G01 PR plumb plumb 147.0
98 │ GW_Armando_plate2_JG02G01 GW_Armando_plate2_JG02G01 PR plumb plumb 148.0
99 │ GW_Armando_plate2_JG02G03 GW_Armando_plate2_JG02G03 PR plumb plumb 149.0
100 │ GW_Armando_plate2_JG10G02 GW_Armando_plate2_JG10G02 ST plumb plumb 138.0
101 │ GW_Armando_plate2_JG10G03 GW_Armando_plate2_JG10G03 ST plumb plumb 139.0
102 │ GW_Armando_plate2_JG12G02 GW_Armando_plate2_JG12G02 ST plumb plumb 140.0
103 │ GW_Armando_plate2_JG12G03 GW_Armando_plate2_JG12G03 ST plumb plumb 141.0
104 │ GW_Armando_plate2_LN11 GW_Armando_plate2_LN11 LN_rep troch_LN_rep troch_LN_rep 65.01
105 │ GW_Armando_plate2_LN2 GW_Armando_plate2_LN2 LN troch_LN troch_LN 58.01
106 │ GW_Armando_plate2_NO_BC_TTGW05 GW_Armando_plate2_NO_BC_TTGW05 blank blank blank -99.0
107 │ GW_Armando_plate2_NO_DNA GW_Armando_plate2_NO_DNA blank blank blank -99.0
108 │ GW_Armando_plate2_RF29G01 GW_Armando_plate2_RF29G01 BJ obs_plumb plumb_BJ 77.503
109 │ GW_Armando_plate2_TTGW02 GW_Armando_plate2_TTGW02 MN troch_MN troch_west 53.0
110 │ GW_Armando_plate2_TTGW03 GW_Armando_plate2_TTGW03 MN troch_MN troch_west 53.0
111 │ GW_Armando_plate2_TTGW05_rep3 GW_Armando_plate2_TTGW05_rep3 MN_rep troch_MN_rep troch_west_rep 53.0
112 │ GW_Armando_plate2_TTGW05_rep4 GW_Armando_plate2_TTGW05_rep4 MN_rep troch_MN_rep troch_west_rep 53.0
113 │ GW_Armando_plate2_TTGW08 GW_Armando_plate2_TTGW08 SU lud_Sukhto lud_central 47.0
114 │ GW_Armando_plate2_TTGW09 GW_Armando_plate2_TTGW09 SU lud_Sukhto lud_central 47.0
115 │ GW_Armando_plate2_TTGW12 GW_Armando_plate2_TTGW12 TH lud_Thallighar lud_central 43.0
116 │ GW_Armando_plate2_TTGW14 GW_Armando_plate2_TTGW14 TH lud_Thallighar lud_central 43.0
117 │ GW_Armando_plate2_TTGW15 GW_Armando_plate2_TTGW15 TH lud_Thallighar lud_central 43.0
118 │ GW_Armando_plate2_TTGW16 GW_Armando_plate2_TTGW16 TH lud_Thallighar lud_central 43.0
119 │ GW_Armando_plate2_TTGW18 GW_Armando_plate2_TTGW18 TH lud_Thallighar lud_central 43.0
120 │ GW_Armando_plate2_TTGW20 GW_Armando_plate2_TTGW20 SR lud_Sural lud_central 45.0
121 │ GW_Armando_plate2_TTGW24 GW_Armando_plate2_TTGW24 SR lud_Sural lud_central 45.0
122 │ GW_Armando_plate2_TTGW25 GW_Armando_plate2_TTGW25 SR lud_Sural lud_central 45.0
123 │ GW_Armando_plate2_TTGW27 GW_Armando_plate2_TTGW27 SR lud_Sural lud_central 45.0
124 │ GW_Armando_plate2_TTGW28 GW_Armando_plate2_TTGW28 SR lud_Sural lud_central 45.0
125 │ GW_Armando_plate2_TTGW50 GW_Armando_plate2_TTGW50 NG lud_Nainaghar lud_central 49.0
126 │ GW_Armando_plate2_TTGW51 GW_Armando_plate2_TTGW51 NG lud_Nainaghar lud_central 49.0
127 │ GW_Armando_plate2_TTGW54 GW_Armando_plate2_TTGW54 NG lud_Nainaghar lud_central 49.0
128 │ GW_Armando_plate2_TTGW56 GW_Armando_plate2_TTGW56 NG lud_Nainaghar lud_central 49.0
129 │ GW_Armando_plate2_TTGW60 GW_Armando_plate2_TTGW60 SP lud_Spiti troch_west 55.0
130 │ GW_Armando_plate2_TTGW61 GW_Armando_plate2_TTGW61 SP lud_Spiti troch_west 55.0
131 │ GW_Armando_plate2_TTGW62 GW_Armando_plate2_TTGW62 SP lud_Spiti troch_west 55.0
132 │ GW_Armando_plate2_TTGW67 GW_Armando_plate2_TTGW67 SP lud_Spiti troch_west 55.0
133 │ GW_Armando_plate2_TTGW69 GW_Armando_plate2_TTGW69 SP lud_Spiti troch_west 55.0
134 │ GW_Armando_plate2_TTGW73 GW_Armando_plate2_TTGW73 SA lud_Sathrundi lud_Sath 41.0
135 │ GW_Armando_plate2_TTGW75 GW_Armando_plate2_TTGW75 SA lud_Sathrundi lud_Sath 41.0
136 │ GW_Armando_plate2_TTGW77 GW_Armando_plate2_TTGW77 SA lud_Sathrundi lud_Sath 41.0
137 │ GW_Armando_plate2_TTGW79 GW_Armando_plate2_TTGW79 SA lud_Sathrundi lud_Sath 41.0
138 │ GW_Armando_plate2_TTGW80 GW_Armando_plate2_TTGW80 SA lud_Sathrundi lud_Sath 41.0
139 │ GW_Armando_plate2_TTGW_15_01 GW_Armando_plate2_TTGW_15_01 SR lud_Sural lud_central 45.0
140 │ GW_Armando_plate2_TTGW_15_02 GW_Armando_plate2_TTGW_15_02 SR lud_Sural lud_central 45.0
141 │ GW_Armando_plate2_TTGW_15_03 GW_Armando_plate2_TTGW_15_03 SR lud_Sural lud_central 45.0
142 │ GW_Armando_plate2_TTGW_15_04 GW_Armando_plate2_TTGW_15_04 SR lud_Sural lud_central 45.0
143 │ GW_Armando_plate2_TTGW_15_06 GW_Armando_plate2_TTGW_15_06 SR lud_Sural lud_central 45.0
144 │ GW_Armando_plate2_TTGW_15_10 GW_Armando_plate2_TTGW_15_10 SR lud_Sural lud_central 45.0
145 │ GW_Lane5_AA1 GW_Lane5_AA1 AA vir_S vir_S 25.0
146 │ GW_Lane5_AA10 GW_Lane5_AA10 AA vir_S vir_S 33.0
147 │ GW_Lane5_AA11 GW_Lane5_AA11 AA vir_S vir_S 34.0
148 │ GW_Lane5_AA3 GW_Lane5_AA3 AA vir_S vir_S 26.0
149 │ GW_Lane5_AA4 GW_Lane5_AA4 AA vir_S vir_S 27.0
150 │ GW_Lane5_AA5 GW_Lane5_AA5 AA vir_S vir_S 28.0
151 │ GW_Lane5_AA6 GW_Lane5_AA6 AA vir_S vir_S 29.0
152 │ GW_Lane5_AA7 GW_Lane5_AA7 AA vir_S vir_S 30.0
153 │ GW_Lane5_AA8 GW_Lane5_AA8 AA vir_S vir_S 31.0
154 │ GW_Lane5_AA9 GW_Lane5_AA9 AA vir_S vir_S 32.0
155 │ GW_Lane5_AB1 GW_Lane5_AB1 AB_rep vir_rep vir_rep 20.0
156 │ GW_Lane5_AB2 GW_Lane5_AB2 AB vir vir 21.0
157 │ GW_Lane5_AN1 GW_Lane5_AN1 AN plumb plumb 80.0
158 │ GW_Lane5_AN2 GW_Lane5_AN2 AN plumb plumb 81.0
159 │ GW_Lane5_BK2 GW_Lane5_BK2 BK plumb plumb 78.0
160 │ GW_Lane5_BK3 GW_Lane5_BK3 BK plumb plumb 79.0
161 │ GW_Lane5_DA2 GW_Lane5_DA2 XN obs obs 73.0
162 │ GW_Lane5_DA3 GW_Lane5_DA3 XN obs obs 74.0
163 │ GW_Lane5_DA4 GW_Lane5_DA4 XN obs obs 75.0
164 │ GW_Lane5_DA6 GW_Lane5_DA6 XN obs low_reads 76.0
165 │ GW_Lane5_DA7 GW_Lane5_DA7 XN obs obs 77.0
166 │ GW_Lane5_EM1 GW_Lane5_EM1 EM troch_EM troch_EM 72.0
167 │ GW_Lane5_IL1 GW_Lane5_IL1 IL plumb plumb 82.0
168 │ GW_Lane5_IL2 GW_Lane5_IL2 IL_rep plumb_rep plumb_rep 85.0
169 │ GW_Lane5_IL4 GW_Lane5_IL4 IL plumb plumb 83.0
170 │ GW_Lane5_KS1 GW_Lane5_KS1 OV lud_KS lud_KS 40.0
171 │ GW_Lane5_KS2 GW_Lane5_KS2 OV lud_KS lud_KS 40.0
172 │ GW_Lane5_LN1 GW_Lane5_LN1 LN troch_LN troch_LN 57.0
173 │ GW_Lane5_LN10 GW_Lane5_LN10 LN troch_LN troch_LN 64.0
174 │ GW_Lane5_LN11 GW_Lane5_LN11 LN troch_LN troch_LN 65.0
175 │ GW_Lane5_LN12 GW_Lane5_LN12 LN troch_LN troch_LN 66.0
176 │ GW_Lane5_LN14 GW_Lane5_LN14 LN troch_LN troch_LN 67.0
177 │ GW_Lane5_LN16 GW_Lane5_LN16 LN troch_LN troch_LN 68.0
178 │ GW_Lane5_LN18 GW_Lane5_LN18 LN troch_LN troch_LN 69.0
179 │ GW_Lane5_LN19 GW_Lane5_LN19 LN troch_LN troch_LN 70.0
180 │ GW_Lane5_LN2 GW_Lane5_LN2 LN_rep troch_LN_rep troch_LN_rep 58.0
181 │ GW_Lane5_LN20 GW_Lane5_LN20 LN troch_LN troch_LN 71.0
182 │ GW_Lane5_LN3 GW_Lane5_LN3 LN troch_LN troch_LN 59.0
183 │ GW_Lane5_LN4 GW_Lane5_LN4 LN troch_LN troch_LN 60.0
184 │ GW_Lane5_LN6 GW_Lane5_LN6 LN troch_LN troch_LN 61.0
185 │ GW_Lane5_LN7 GW_Lane5_LN7 LN troch_LN troch_LN 62.0
186 │ GW_Lane5_LN8 GW_Lane5_LN8 LN troch_LN troch_LN 63.0
187 │ GW_Lane5_MN1 GW_Lane5_MN1 MN troch_MN troch_west 51.0
188 │ GW_Lane5_MN12 GW_Lane5_MN12 MN troch_MN troch_west 56.0
189 │ GW_Lane5_MN3 GW_Lane5_MN3 MN troch_MN troch_west 52.0
190 │ GW_Lane5_MN5 GW_Lane5_MN5 MN troch_MN troch_west 53.0
191 │ GW_Lane5_MN8 GW_Lane5_MN8 MN troch_MN troch_west 54.0
192 │ GW_Lane5_MN9 GW_Lane5_MN9 MN troch_MN troch_west 55.0
193 │ GW_Lane5_NA1 GW_Lane5_NA1 NR lud_PK lud_PK 39.2
194 │ GW_Lane5_NA3-3ul GW_Lane5_NA3-3ul NR lud_PK lud_PK 39.2
195 │ GW_Lane5_PT11 GW_Lane5_PT11 KL lud_KL lud_central 42.0
196 │ GW_Lane5_PT12 GW_Lane5_PT12 KL lud_KL lud_central 42.0
197 │ GW_Lane5_PT2 GW_Lane5_PT2 ML lud_ML lud_ML 51.0
198 │ GW_Lane5_PT3 GW_Lane5_PT3 PA lud_PA lud_central 46.0
199 │ GW_Lane5_PT4 GW_Lane5_PT4 PA lud_PA lud_central 46.0
200 │ GW_Lane5_PT6 GW_Lane5_PT6 KL lud_KL lud_central 42.0
201 │ GW_Lane5_SH1 GW_Lane5_SH1 SH lud_PK lud_PK 39.1
202 │ GW_Lane5_SH2 GW_Lane5_SH2 SH lud_PK lud_PK 39.1
203 │ GW_Lane5_SH4 GW_Lane5_SH4 SH lud_PK lud_PK 39.1
204 │ GW_Lane5_SH5 GW_Lane5_SH5 SH lud_PK lud_PK 39.1
205 │ GW_Lane5_SL1 GW_Lane5_SL1 SL plumb plumb 150.0
206 │ GW_Lane5_SL2 GW_Lane5_SL2 SL plumb plumb 151.0
207 │ GW_Lane5_ST1 GW_Lane5_ST1 ST plumb plumb 142.0
208 │ GW_Lane5_ST12 GW_Lane5_ST12 ST plumb plumb 144.0
209 │ GW_Lane5_ST3 GW_Lane5_ST3 ST plumb plumb 143.0
210 │ GW_Lane5_STvi1 GW_Lane5_STvi1 ST_vi vir vir 22.0
211 │ GW_Lane5_STvi2 GW_Lane5_STvi2 ST_vi vir vir 23.0
212 │ GW_Lane5_STvi3 GW_Lane5_STvi3 ST_vi vir vir 24.0
213 │ GW_Lane5_TA1 GW_Lane5_TA1 TA plumb plumb 86.0
214 │ GW_Lane5_TL1 GW_Lane5_TL1 TL vir vir 9.0
215 │ GW_Lane5_TL10 GW_Lane5_TL10 TL vir vir 17.0
216 │ GW_Lane5_TL11 GW_Lane5_TL11 TL vir vir 18.0
217 │ GW_Lane5_TL12 GW_Lane5_TL12 TL vir vir 19.0
218 │ GW_Lane5_TL2 GW_Lane5_TL2 TL vir vir 10.0
219 │ GW_Lane5_TL3 GW_Lane5_TL3 TL_rep vir_rep vir_rep 11.0
220 │ GW_Lane5_TL4 GW_Lane5_TL4 TL vir vir 12.0
221 │ GW_Lane5_TL5 GW_Lane5_TL5 TL vir vir 13.0
222 │ GW_Lane5_TL7 GW_Lane5_TL7 TL vir vir 14.0
223 │ GW_Lane5_TL8 GW_Lane5_TL8 TL vir vir 15.0
224 │ GW_Lane5_TL9 GW_Lane5_TL9 TL vir vir 16.0
225 │ GW_Lane5_TU1 GW_Lane5_TU1 TU nit nit 35.0
226 │ GW_Lane5_TU2 GW_Lane5_TU2 TU nit nit 36.0
227 │ GW_Lane5_UY1 GW_Lane5_UY1 UY_rep plumb_rep plumb_rep 93.0
228 │ GW_Lane5_UY2 GW_Lane5_UY2 UY plumb plumb 88.0
229 │ GW_Lane5_UY3 GW_Lane5_UY3 UY plumb plumb 89.0
230 │ GW_Lane5_UY4 GW_Lane5_UY4 UY plumb plumb 90.0
231 │ GW_Lane5_UY5 GW_Lane5_UY5 UY plumb plumb 91.0
232 │ GW_Lane5_UY6 GW_Lane5_UY6 UY plumb plumb 92.0
233 │ GW_Lane5_YK1 GW_Lane5_YK1 YK vir vir 1.0
234 │ GW_Lane5_YK11 GW_Lane5_YK11 YK vir vir 8.0
235 │ GW_Lane5_YK3 GW_Lane5_YK3 YK vir vir 2.0
236 │ GW_Lane5_YK4 GW_Lane5_YK4 YK vir vir 3.0
237 │ GW_Lane5_YK5 GW_Lane5_YK5 YK vir vir 4.0
238 │ GW_Lane5_YK6 GW_Lane5_YK6 YK vir vir 5.0
239 │ GW_Lane5_YK7 GW_Lane5_YK7 YK vir vir 6.0
240 │ GW_Lane5_YK9 GW_Lane5_YK9 YK vir vir 7.0
241 │ GW_Liz_GBS_Liz10045 GW_Liz_GBS_Liz10045 ML lud lud_ML 51.01
242 │ GW_Liz_GBS_Liz10094 GW_Liz_GBS_Liz10094 ML lud lud_ML 51.02
243 │ GW_Liz_GBS_Liz5101 GW_Liz_GBS_Liz5101 ML lud lud_ML 51.03
244 │ GW_Liz_GBS_Liz5101_R GW_Liz_GBS_Liz5101_R ML_rep lud_rep lud_ML_rep 51.04
245 │ GW_Liz_GBS_Liz5118 GW_Liz_GBS_Liz5118 ML lud lud_ML 51.05
246 │ GW_Liz_GBS_Liz5139 GW_Liz_GBS_Liz5139 ML lud lud_ML 51.06
247 │ GW_Liz_GBS_Liz5142 GW_Liz_GBS_Liz5142 ML lud lud_ML 51.07
248 │ GW_Liz_GBS_Liz5144 GW_Liz_GBS_Liz5144 ML lud lud_ML 51.08
249 │ GW_Liz_GBS_Liz5150 GW_Liz_GBS_Liz5150 ML lud lud_ML 51.09
250 │ GW_Liz_GBS_Liz5159 GW_Liz_GBS_Liz5159 ML lud_chick lud_ML 51.1
251 │ GW_Liz_GBS_Liz5162 GW_Liz_GBS_Liz5162 ML lud_chick lud_ML 51.11
252 │ GW_Liz_GBS_Liz5163 GW_Liz_GBS_Liz5163 ML lud_chick lud_ML 51.12
253 │ GW_Liz_GBS_Liz5164 GW_Liz_GBS_Liz5164 ML lud_chick lud_ML 51.13
254 │ GW_Liz_GBS_Liz5165 GW_Liz_GBS_Liz5165 ML lud lud_ML 51.14
255 │ GW_Liz_GBS_Liz5167 GW_Liz_GBS_Liz5167 ML lud_chick lud_ML 51.15
256 │ GW_Liz_GBS_Liz5168 GW_Liz_GBS_Liz5168 ML lud_chick lud_ML 51.16
257 │ GW_Liz_GBS_Liz5169 GW_Liz_GBS_Liz5169 ML lud_chick lud_ML 51.17
258 │ GW_Liz_GBS_Liz5171 GW_Liz_GBS_Liz5171 ML lud lud_ML 51.18
259 │ GW_Liz_GBS_Liz5172 GW_Liz_GBS_Liz5172 ML lud_chick lud_ML 51.19
260 │ GW_Liz_GBS_Liz5173 GW_Liz_GBS_Liz5173 ML lud_chick lud_ML 51.2
261 │ GW_Liz_GBS_Liz5174 GW_Liz_GBS_Liz5174 ML lud lud_ML 51.21
262 │ GW_Liz_GBS_Liz5175 GW_Liz_GBS_Liz5175 ML lud lud_ML 51.22
263 │ GW_Liz_GBS_Liz5176 GW_Liz_GBS_Liz5176 ML lud lud_ML 51.23
264 │ GW_Liz_GBS_Liz5177 GW_Liz_GBS_Liz5177 ML lud_chick lud_ML 51.24
265 │ GW_Liz_GBS_Liz5178 GW_Liz_GBS_Liz5178 ML lud_chick lud_ML 51.25
266 │ GW_Liz_GBS_Liz5179 GW_Liz_GBS_Liz5179 ML lud_chick lud_ML 51.26
267 │ GW_Liz_GBS_Liz5180 GW_Liz_GBS_Liz5180 ML lud lud_ML 51.27
268 │ GW_Liz_GBS_Liz5182 GW_Liz_GBS_Liz5182 ML lud_chick lud_ML 51.28
269 │ GW_Liz_GBS_Liz5184 GW_Liz_GBS_Liz5184 ML lud_chick lud_ML 51.29
270 │ GW_Liz_GBS_Liz5185 GW_Liz_GBS_Liz5185 ML lud lud_ML 51.3
271 │ GW_Liz_GBS_Liz5186 GW_Liz_GBS_Liz5186 ML lud_chick lud_ML 51.31
272 │ GW_Liz_GBS_Liz5187 GW_Liz_GBS_Liz5187 ML lud_chick lud_ML 51.32
273 │ GW_Liz_GBS_Liz5188 GW_Liz_GBS_Liz5188 ML lud lud_ML 51.33
274 │ GW_Liz_GBS_Liz5189 GW_Liz_GBS_Liz5189 ML lud_chick lud_ML 51.34
275 │ GW_Liz_GBS_Liz5190 GW_Liz_GBS_Liz5190 ML lud_chick lud_ML 51.35
276 │ GW_Liz_GBS_Liz5191 GW_Liz_GBS_Liz5191 ML lud_chick lud_ML 51.36
277 │ GW_Liz_GBS_Liz5192 GW_Liz_GBS_Liz5192 ML lud_chick lud_ML 51.37
278 │ GW_Liz_GBS_Liz5193 GW_Liz_GBS_Liz5193 ML lud_chick lud_ML 51.38
279 │ GW_Liz_GBS_Liz5194 GW_Liz_GBS_Liz5194 ML lud_chick lud_ML 51.39
280 │ GW_Liz_GBS_Liz5195 GW_Liz_GBS_Liz5195 ML lud lud_ML 51.4
281 │ GW_Liz_GBS_Liz5197 GW_Liz_GBS_Liz5197 ML lud lud_ML 51.41
282 │ GW_Liz_GBS_Liz5199 GW_Liz_GBS_Liz5199 ML lud_chick lud_ML 51.42
283 │ GW_Liz_GBS_Liz6002 GW_Liz_GBS_Liz6002 ML lud lud_ML 51.43
284 │ GW_Liz_GBS_Liz6006 GW_Liz_GBS_Liz6006 ML lud lud_ML 51.44
285 │ GW_Liz_GBS_Liz6008 GW_Liz_GBS_Liz6008 ML lud lud_ML 51.45
286 │ GW_Liz_GBS_Liz6009 GW_Liz_GBS_Liz6009 ML lud lud_ML 51.46
287 │ GW_Liz_GBS_Liz6010 GW_Liz_GBS_Liz6010 ML lud lud_ML 51.47
288 │ GW_Liz_GBS_Liz6012 GW_Liz_GBS_Liz6012 ML lud lud_ML 51.48
289 │ GW_Liz_GBS_Liz6014 GW_Liz_GBS_Liz6014 ML lud lud_ML 51.49
290 │ GW_Liz_GBS_Liz6055 GW_Liz_GBS_Liz6055 ML lud lud_ML 51.5
291 │ GW_Liz_GBS_Liz6057 GW_Liz_GBS_Liz6057 ML lud lud_ML 51.51
292 │ GW_Liz_GBS_Liz6060 GW_Liz_GBS_Liz6060 ML lud lud_ML 51.52
293 │ GW_Liz_GBS_Liz6062 GW_Liz_GBS_Liz6062 ML lud lud_ML 51.53
294 │ GW_Liz_GBS_Liz6063 GW_Liz_GBS_Liz6063 ML lud lud_ML 51.54
295 │ GW_Liz_GBS_Liz6066 GW_Liz_GBS_Liz6066 ML lud lud_ML 51.55
296 │ GW_Liz_GBS_Liz6072 GW_Liz_GBS_Liz6072 ML lud lud_ML 51.56
297 │ GW_Liz_GBS_Liz6079 GW_Liz_GBS_Liz6079 ML lud lud_ML 51.57
298 │ GW_Liz_GBS_Liz6203 GW_Liz_GBS_Liz6203 ML lud_chick lud_ML 51.58
299 │ GW_Liz_GBS_Liz6204 GW_Liz_GBS_Liz6204 ML lud_chick lud_ML 51.59
300 │ GW_Liz_GBS_Liz6461 GW_Liz_GBS_Liz6461 ML lud lud_ML 51.6
301 │ GW_Liz_GBS_Liz6472 GW_Liz_GBS_Liz6472 ML lud lud_ML 51.61
302 │ GW_Liz_GBS_Liz6478 GW_Liz_GBS_Liz6478 ML lud lud_ML 51.62
303 │ GW_Liz_GBS_Liz6766 GW_Liz_GBS_Liz6766 ML lud lud_ML 51.63
304 │ GW_Liz_GBS_Liz6776 GW_Liz_GBS_Liz6776 ML lud lud_ML 51.64
305 │ GW_Liz_GBS_Liz6794 GW_Liz_GBS_Liz6794 ML lud lud_ML 51.65
306 │ GW_Liz_GBS_P_fusc GW_Liz_GBS_P_fusc fusc fusc fusc 201.0
307 │ GW_Liz_GBS_P_h_man GW_Liz_GBS_P_h_man hmand hmand hmand 202.0
308 │ GW_Liz_GBS_P_humei GW_Liz_GBS_P_humei hume hume hume 203.0
309 │ GW_Liz_GBS_P_inor GW_Liz_GBS_P_inor inor inor inor 204.0
310 │ GW_Liz_GBS_S_burk GW_Liz_GBS_S_burk burk burk burk 205.0
GOOD NEWS: names of individuals in metadata file and genotype ind file match perfectly.
Polish a few individual names (to match those in other metadata object above, and make more readable graphs):
= correctNames(ind_with_metadata_chr3.ind)
ind_with_metadata_chr3.ind = correctNames(ind_with_metadata_chr3.ID) ind_with_metadata_chr3.ID
310-element Vector{String}:
"GW_Armando_plate1_AB1"
"GW_Armando_plate1_JF07G02"
"GW_Armando_plate1_JF07G03"
"GW_Armando_plate1_JF07G04"
"GW_Armando_plate1_JF08G02"
"GW_Armando_plate1_JF09G01"
"GW_Armando_plate1_JF09G02"
"GW_Armando_plate1_JF10G03"
"GW_Armando_plate1_JF11G01"
"GW_Armando_plate1_JF12G01"
"GW_Armando_plate1_JF12G02"
"GW_Armando_plate1_JF12G04"
"GW_Armando_plate1_JF13G01"
⋮
"GW_Liz_GBS_Liz6204"
"GW_Liz_GBS_Liz6461"
"GW_Liz_GBS_Liz6472"
"GW_Liz_GBS_Liz6478"
"GW_Liz_GBS_Liz6766"
"GW_Liz_GBS_Liz6776"
"GW_Liz_GBS_Liz6794"
"GW_Liz_GBS_P_fusc"
"GW_Liz_GBS_P_h_man"
"GW_Liz_GBS_P_humei"
"GW_Liz_GBS_P_inor"
"GW_Liz_GBS_S_burk"
Filter to just the individuals also included in the analysis of LHBRs above
= map(in(ind_with_metadata_included.ind), ind_with_metadata_chr3.ind)
selection
= ind_with_metadata_chr3[selection, :]
ind_with_metadata_chr3_included
# select genotypes of just the included individuals, and ignore first column
= geno_chr3[selection, 2:end]
geno_chr3_included
println(ind_with_metadata_included.gw3_cluster)
["virLud", "plumb", "plumb", "plumb", "plumb", "plumb", "plumbHet", "plumb", "plumb", "plumbHet", "plumb", "virLud", "plumb", "plumb", "virLud", "plumb", "plumb", "plumb", "plumb", "plumb", "plumb", "plumb", "plumb", "plumb", "plumb", "vir_plumb", "plumb", "plumb", "plumb", "plumb", "plumb", "plumb", "virLud", "trochObs", "trochObs", "virLud", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "virLud", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "trochObs", "virLud", "virLudHet", "virLud_trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "virLud_trochObs", "virLud_trochObs", "virLud", "virLud_trochObs", "plumb", "virLud", "virLudHet", "virLud", "plumb", "plumb", "plumb", "plumbHet", "plumb", "plumb", "plumb", "plumb", "vir_plumb", "plumb", "plumb", "virLud", "plumb", "plumb", "plumbHet", "plumb", "plumbHet", "plumbHet", "plumb", "virLud", "plumbHet", "plumb", "plumb", "plumb", "vir_plumb", "plumb", "plumb", "trochObs", "plumb", "trochObs", "trochObs", "virLudHet", "virLud_trochObs", "virLudHet", "virLud_trochObs", "virLud", "virLud", "virLud", "virLudHet", "virLud", "virLudHet", "virLudHet", "virLud_trochObs", "virLud", "virLud", "trochObsHet", "trochObs", "trochObs", "trochObs", "trochObs", "trochObsHet", "trochObs", "trochObs", "trochObs", "trochObs", "virLud", "virLud", "virLud", "virLud", "virLud_trochObs", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "plumb", "plumb", "plumb", "plumb", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "plumb", "plumb", "virLud", "virLud", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "virLud_trochObs", "trochObs", "trochObs", "virLud", "virLud", "virLud_trochObs", "virLud", "trochObs", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "plumb", "plumb", "plumb", "plumb", "plumb", "virLud", "virLud", "virLud", "plumb", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "nit", "nit", "plumb", "plumb", "plumb", "plumb", "plumb", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "virLud", "trochObs", "trochObs", "trochObs", "trochObs", "virLud_trochObs", "virLud_trochObs", "trochObs", "virLud_trochObs", "trochObs", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "trochObs", "virLud_trochObs", "trochObs", "virLud_trochObs", "trochObs", "trochObs", "virLud_trochObs", "trochObsHet", "trochObs", "virLud_trochObs", "virLud_trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "virLud_trochObs", "virLud_trochObs", "virLud_trochObs", "trochObs", "virLud_trochObs", "trochObs", "trochObs", "trochObs", "virLud_trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs", "trochObs"]
Look up the chr 3 individual membership in homozygous clusters, and calculate pi and Dxy
= ind_with_metadata_included.gw3_cluster
indClusterMembership_gw3
= ["virLud",
clusterNames_gw3 "nit",
"trochObs",
"plumb"]
# get boundaries of gw3 LHBR:
= "gw3"
chr
positionMin_chr3_LHBR, positionMax_chr3_LHBR, regionText,
windowedIndHetStanRegion, meanAcrossRegionIndHetStan,=
genos_highViSHetRegion, pos_highViSHetRegion, regionInfo getWindowedIndHetStanRegion(genosOnly_included,
pos_SNP_filtered,
highViSHetRegions, chr;= 500)
windowSize
# select the loci within the gw3 LHBR:
= (positionMin_chr3_LHBR .<= pos_chr3.position .<= positionMax_chr3_LHBR)
selection
= geno_chr3_included[:, selection]
geno_chr3_included_LHBR
= pos_chr3[selection, :]
pos_chr3_LHBR
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(geno_chr3_included_LHBR, indClusterMembership_gw3, clusterNames_gw3)
freqs, sampleSizes
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames_gw3)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames_gw3; among=false)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for chr 3 LHBR:
= DataFrame(cluster = clusterNames_gw3, pi = getRegionPi(sitePi))
regionPiTable #= 4×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼───────────────────────
1 │ virLud 0.0012486
2 │ nit 0.000697639
3 │ trochObs 0.00136251
4 │ plumb 0.00111764 =#
# average pi (for chr 3 LHBR) among three major groups:
0.0012486 + 0.00136251 + 0.00111764) / 3
(# 0.0012429166
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 6×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼─────────────────────────────
1 │ virLud_nit 0.0026694
2 │ virLud_trochObs 0.00354486
3 │ virLud_plumb 0.00398499
4 │ nit_trochObs 0.00335481
5 │ nit_plumb 0.00400019
6 │ trochObs_plumb 0.00328016 =#
# average Dxy (for chr 3 LHBR) among three major groups:
0.00354486 + 0.00398499 + 0.00328016) / 3
(# 0.0036033366666666663
# Drawing phylogeny (in Illustrator) based on above, between three major groups.
# Lowest Dxy is between trochObs and plumb (0.00328016).
# Calculation of deepest split, an average of virLud diff with (trochObs, plumb):
0.00354486 + 0.00398499) / 2
(# 0.003764925
More than 1 region on that scaffold. Using just the longest one.
Row | regionChrom | regionStart | regionEnd |
---|---|---|---|
String | Int64 | Int64 | |
1 | gw3 | 101192949 | 103495514 |
2 | gw3 | 104554714 | 108279595 |
0.0037649249999999997
Calculate pi and Dxy outside of the chr 3 LHBR (grouped by the LHBR homozygous groups)
# select the loci outside of the gw3 LHBR:
= .!(positionMin_chr3_LHBR .<= pos_chr3.position .<= positionMax_chr3_LHBR)
selection
= geno_chr3_included[:, selection]
geno_chr3_included_nonLHBR
= pos_chr3[selection, :]
pos_chr3_nonLHBR
# Calculate allele freqs and sample sizes
= getFreqsAndSampleSizes(geno_chr3_included_nonLHBR, indClusterMembership_gw3, clusterNames_gw3)
freqs, sampleSizes
# Calculate per-site pi (within-group nucleotide distance)
= getSitePi(freqs, sampleSizes)
sitePi
# calculate pairwise Dxy per site, using data in "freqs" and groups in "groups"
= getDxy(freqs, clusterNames_gw3)
Dxy, pairwiseDxyClusterNames
= getFst(freqs, sampleSizes, clusterNames_gw3; among=false)
Fst, FstNumerator, FstDenominator, pairwiseFstClusterNames
# Now get averages of pi and Dxy for whole region:
= DataFrame(cluster = clusterNames_gw3, pi = getRegionPi(sitePi))
regionPiTable #= 4×2 DataFrame
Row │ cluster pi
│ String Float64
─────┼──────────────────────
1 │ virLud 0.00456714
2 │ nit 0.00161186
3 │ trochObs 0.00568622
4 │ plumb 0.00554501 =#
# average pi (for chr 3 NOT in LHBR) among three major groups:
0.00456714 + 0.00568622 + 0.00554501) / 3
(# 0.005266123
#ratio of average pi outside to average pi within chr 3 LHBR:
0.005266123 / 0.0012429166
# 4.236907769998406
# percent lower that average pi is within compared to outside LHBR:
100 * (4.2369078 - 1) / 4.2369078
# 76.39788149272448
#ratio of pi outside to within LHBR:
0.00456714 / 0.0012486
# 3.6578087457952906
0.00568622 / 0.00136251
# 4.173341847032315
0.00554501 / 0.00111764
# 4.961356071722558
0.00456714 / 0.0012486) + (0.00568622 / 0.00136251) + (0.00554501 / 0.00111764)) / 3
((# 4.264168888183388
= DataFrame(cluster_pair = pairwiseDxyClusterNames, Dxy = getRegionDxy(Dxy))
regionDxyTable #= 6×2 DataFrame
Row │ cluster_pair Dxy
│ String Float64
─────┼─────────────────────────────
1 │ virLud_nit 0.00439336
2 │ virLud_trochObs 0.0059857
3 │ virLud_plumb 0.00660034
4 │ nit_trochObs 0.00548855
5 │ nit_plumb 0.00608182
6 │ trochObs_plumb 0.00652593 =#
# average Dxy (for OUTSIDE of chr 3 LHBR) among three major groups:
0.0059857 + 0.00660034 + 0.00652593) / 3
(# 0.006370656666666666
#ratio of average Dxy outside to average Dxy within LHBR among 3 major groups:
0.006370656666666666 / 0.0036033366666666663
# 1.76798818872508
# percent lower that average Dxy is within compared to outside LHBR:
100 * (1.76798818872508 - 1) / 1.76798818872508
# 43.43853616346196
# Drawing phylogeny (in Illustrator) based on above, between three major groups.
# Lowest Dxy is between virLud and trochObs (0.0059857).
# Calculation of deepest split, an average of plumb diff with (virLud, trochObs):
0.00660034 + 0.00652593) / 2
(# 0.006563135
0.00354486 / 0.0059857) + (0.00328016 / 0.00652593) + (0.00398499 / 0.00660034)) / 3
((# 0.5662038652462132
0.5662038652462132
Really neat results above. More diversity at chr 3 LHBR than at 4A.
Make Supplemental GBI plots
Make list of scaffolds to plot:
= "gw" .* string.(vcat(28:-1:17, 15:-1:1))
scaffolds_to_plot push!(scaffolds_to_plot, "gw1A", "gw4A") # add two other scaffolds
29-element Vector{String}:
"gw28"
"gw27"
"gw26"
"gw25"
"gw24"
"gw23"
"gw22"
"gw21"
"gw20"
"gw19"
"gw18"
"gw17"
"gw15"
⋮
"gw10"
"gw9"
"gw8"
"gw7"
"gw6"
"gw5"
"gw4"
"gw3"
"gw2"
"gw1"
"gw1A"
"gw4A"
Do other setup:
= ["vir","vir_S","nit", "lud_PK", "lud_KS", "lud_central", "lud_Sath", "lud_ML","troch_west","troch_LN","troch_EM","obs","plumb_BJ","plumb","plumb_vir"]
groups_to_plot_all = ["blue","turquoise1","grey","seagreen4","seagreen3","seagreen2","olivedrab3","olivedrab2","olivedrab1","yellow","gold","orange","pink","red","purple"];
group_colors_all
= ["vir","troch_LN","plumb"] # for purpose of calculating pairwise Fst and Fst_group (to determine SNPs)
groups = groups_to_plot_all
plotGroups = group_colors_all
plotGroupColors = "vir" # these groups will determine the color used in the graph
group1 = "plumb"
group2 = ["vir_plumb", "vir_troch_LN", "troch_LN_plumb"] # "Fst_among" #"vir_plumb"
groupsToCompare = 0.2 # only show SNPs with less than this fraction of missing data among individuals
missingFractionAllowed
# Calculate allele freqs and sample sizes (use column Fst_group)
= getFreqsAndSampleSizes(genosOnly_included, ind_with_metadata_indFiltered.Fst_group, groups)
freqs, sampleSizes println("Calculated population allele frequencies and sample sizes")
= getFst(freqs, sampleSizes, groups; among=true) # set among to FALSE if no among Fst wanted (some things won't work without it)
Fst, FstNumerator, FstDenominator, pairwiseNamesFst println("Calculated Fst values")
Calculated population allele frequencies and sample sizes
Calculated Fst values
Loop through scaffolds and make plots, and adding a line for the LHBRs:
(Making inactive because plots were already made)
# for autosomes, Fst > 0.8
= 0.8
Fst_cutoff for i in 1:length(scaffolds_to_plot)
= scaffolds_to_plot[i]
chr = chooseChrRegion(pos_SNP_filtered, chr; positionMin=1, positionMax=scaffold_lengths[chr]) # this gets the maximum position for the chromosome
regionInfo # get info for lines for LHBRs
= highViSHetRegions[highViSHetRegions.regionChrom .== chr, :]
highViSHetRegions_thisScaffold
= plotGenotypeByIndividualWithFst(groupsToCompare, Fst_cutoff,
plotInfo
missingFractionAllowed, regionInfo,
pos_SNP_filtered, Fst, pairwiseNamesFst,
genosOnly_included, ind_with_metadata_indFiltered, freqs,
plotGroups, plotGroupColors;=4, figureSize=(1200,1600), plotTitle = "",
indFontSize= highViSHetRegions_thisScaffold.regionStart,
highlightRegionStarts = highViSHetRegions_thisScaffold.regionEnd,
highlightRegionEnds = "magenta")
highlightRegionColor
println("Completed the figure for ", chr, ".")
if false # set to true to save plot
= string("Figure_", chr, "_Fst3groupsGBIallInds_Fst",Fst_cutoff,"_fromJulia.png")
filename save(filename, plotInfo[1], px_per_unit = 2.0)
println("Saved ", filename)
end
end
Now for Z chromosome, with Fst > 0.9
= ["gwZ"]; Fst_cutoff = 0.9
scaffolds_to_plot for i in 1:length(scaffolds_to_plot)
= scaffolds_to_plot[i]
chr = chooseChrRegion(pos_SNP_filtered, chr; positionMin=1, positionMax=scaffold_lengths[chr]) # this gets the maximum position for the chromosome
regionInfo # get info for lines for LHBRs
= highViSHetRegions[highViSHetRegions.regionChrom .== chr, :]
highViSHetRegions_thisScaffold
= plotGenotypeByIndividualWithFst(groupsToCompare, Fst_cutoff,
plotInfo
missingFractionAllowed, regionInfo,
pos_SNP_filtered, Fst, pairwiseNamesFst,
genosOnly_included, ind_with_metadata_indFiltered, freqs,
plotGroups, plotGroupColors;=4, figureSize=(1200,1600), plotTitle = "",
indFontSize= highViSHetRegions_thisScaffold.regionStart,
highlightRegionStarts = highViSHetRegions_thisScaffold.regionEnd,
highlightRegionEnds = "magenta")
highlightRegionColor
println("Completed the figure for ", chr, ".")
if false # set to true to save plot
= string("Figure_", chr, "_Fst3groupsGBIallInds_fromJulia.png")
filename save(filename, plotInfo[1], px_per_unit = 2.0)
println("Saved ", filename)
end
end
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/Y3ABD/src/scenes.jl:238
Completed the figure for gwZ.