top of page
Search

DNA Analysis, Part 3 - The Initial Dataset

Writer's picture: Geoff GreenwoodGeoff Greenwood

Updated: Feb 18, 2022

The section Ancestry have created for Elder based on their DNA Matches notes that there are "445 4th cousins or closer"; this being the approximate number of individuals who have taken Ancestry's test who have 20cM (the centimorgan is a measure of DNA segment length) or more in common with them.

names removed for privacy

4th cousins means shared 3GGs, of which Elder more than likely has 16 pairs - there is certainly a possibility of cousins intermarrying to reduce this number, but that is rare within so few generations, so we will work with the 16 figure. It would be nice if the law of averages applied perfectly and the 445 4th cousins or closer split evenly into 16 groups of around 28. This won't happen (and cousins closer than 4th will share more than one of those groups), but it gives you a general idea of an expectation we can have for where these cousins will fit as we analyse them.


A great feature of the Shared Matches is that each match details which other individuals in the Shared Matches there is shared DNA with. This enables the creation of preliminary clusters. Again, not a hard-and-fast rule, as one cousin can share DNA with you from a completely different source to the other cousin you match with, but cases like that are uncommon enough that they are easy to spot.


Ancestry sadly doesn't do much more of this work for you. There are software tools to do so for you, but it's not too much hard work to do yourself. A huge advantage of doing it manually is that you can take note of individuals, details they've mentioned on their pages, the names of their trees, and names and locations associated with them as you compile your dataset. These little pieces of information, when combined into a whole, can really help later on down the line when trying to make breakthroughs.


So, to create a dataset, I "View All DNA Matches". This populates a list of Ancestry users in order of the most shared DNA (in Elder's case, this is 302cM - likely a 2nd cousin) to the least. I'm only interested in those up to 20cM at the moment, so scrolling down for a couple of minutes to populate the page lands me at that point. The list will go on down to 8cM, which is Ancestry's cut-off for reliability, with anything beneath that considered a coincidental match. Other genetic genealogists' analyses have calculated around 16cM as the point at which 99% of matches will definitely be related, with 8cM tipping the point below 50%.


One thing Ancestry doesn't do is provide the information as to the specific DNA segment you match with. After our Ancestry analysis, I'll upload this information to sites that do, (MyHeritage and GEDmatch) and use the information they provide me to map out the ancestry of Elder's chromosomes, which will in turn enable me to link DNA matches on those sites to the correct areas of Elder's tree. The ultimate purpose of all of this cousin-linking is to biologically verify generations of ancestors where records could prove uncertain.


names removed for privacy

It turns out the total number of 20cM+ matches on Ancestry is 475. Highlighting the webpage and copying the displayed information into a spreadsheet results in a bit of a jumble. It's possible (as spreadsheets can manipulate data in any way you need them to) to write a series of formulae to distil this huge column of data into a useful dataset. Fortunately, I don't have to describe how to go about doing that, as one Greg Clarke has done a fantastic Google Sheet that takes care of it completely. It can be found here.


Following the instructions there, I have my dataset in no time at all. Thanks Greg! I'm using my own Google Sheet for ease of sharing and because I don't need the processing power of Excel for this simple information, but Excel would work just as well if I wanted to use it.

20 views0 comments

Recent Posts

See All

Comments


bottom of page