Monday, December 14, 2009

255. The Baseline Scenarios -- 31: The Migrants -- Geography

I've been reminded in comments by both German and Maju that representatives of haplogroups M and N are found along both the southern and northern routes, thus apparently negating the possibility of an association between these two major non-African mtDNA haplogroups and the two major OoA pathways postulated by Cavalli-Sforza and others.

Looking more closely at Maju's tree, however (see Post 253), I see that M-derived haplogroups are found exclusively along the southern route [assuming East Asia was initially populated via Southeast Asia, as suggested in a recent publication]. It would appear to be only the N-derived haplogroups that are found along both routes. This suggests the following possibility: 1. an initial migration out of Africa by a group dominated by M haplotypes, whose descendants followed the southern coast of Asia pretty consistently from Yemen to Southeast Asia, Sundaland and the Sahul, with at least one subgroup pressing northward along the Pacific coast toward northeast Asia, Japan, Siberia and ultimately the Americas; 2. a second out of Africa migration by a group dominated by N haplotypes, which could have begun along the southerly route, but at some point bifurcated, possibly at what would have been the first major water obstacle, the Indus River Delta, with one group forging on along the southern route, and another continuing north along the west coast of the Indus to ultimately populate western India, Europe and Central Asia. Since Maju put this tree together, I'm wondering what he thinks of such a possibility.

Another possibility is a single exodus from Africa of groups dominated by L3 haplotypes, which bifurcated into M and N subgroups while in Asia, possibly at the same point, the Indus Delta. It's not clear, however, why all or most of the M groups would have continued along the southern route, with only the N groups bifurcating into northern and southern clades. Which leads me to believe that a dual exodus theory could make more sense. I'm wondering also whether the Y chromosome and autosomal evidence might either clarify or muddy the mtDNA picture.

I'm sorry to make such a fuss over what might seem like a purely technical issue that might never get resolved, but it's important to my project that we be able to determine whether one or two (or possibly more) groups migrated out of Africa during this very early period of human history. I believe the cultural evidence could provide us with some clues, but it would be really helpful to know more clearly what lies behind the sometimes puzzling and contradictory genetic evidence.

18 comments:

Maju said...

I'm pretty sure that all haplogroup distributions and specially radiation correspond to a southern route. There is only one possible exception among the top tier clades, which is haplogroup A. This lineage of East Siberia and America has a stem long enough to think of a long distance founder effect at its formation. All other top tier lineages have southern radiation centers quite clearly.

Hence I am quite against the existence of any "northern route" for Homo sapiens until a second moment, that is probably at the early UP or MP-UP transition, when some rather derived lineages got criss-crossed a bit through that route (Y-DNA Q and N, mtDNA X2, CZ, D).

However some authors like to argue for it, even if they never seem to put forward any solid logic, so I had to mention their position in order to be balanced in what I say. But I don't really believe it's correct.

Since Maju put this tree together, I'm wondering what he thinks of such a possibility.

I elaborated on the, in my opinion, likely spread patterns in a later post than the one you picked that tree from.

While I acknowledge certain amateurism in my reconstruction (and some doubts on the mutation count, see the previous post's comments) I still think that it is a good approximation to what may have actually happened. In brief:

1. Out of Africa (L3).
2. Coalescence of M and N (and possibly other lineages now vanished) in the route to India.
3. Explosion of M (producing the largest starlike tree ever, what means very fast expansion) in India most likely but with a clear tendency to colonize SE Asia immediately after.
4. Likely explosion of N in SE Asia together with some secondary M expansion, maybe already reaching Japan.
5. Further M and N derived expansion in South, SE and East Asia, as well as the first colonization of Sahul.
6. Explosion of R in South Asia.
7. Further expansion of (mostly) R-derived lineages in South and SE Asia, as well as into Papua (P).
8. End of the interaction between South and East Asia, as well as with Sundaland and Sahul (probably caused by the mere leveling of demic pressure). Expansion into the more challenging regions of West Eurasia and NE Asia.

There may be some fine tuning to do to this but I understand that the general process must be pretty much like I described.

So southern route for sure, IMO.

Maju said...

I would add that the structure of Eurasian Y-DNA also seems to present a similar structure:

Top level:

- D and probably C expanded from SE Asia, like mtDNA N
- F expanded from South Asia, like mtDNA M

However at more recent stages the simple parallel breaks up. While Y-DNA D and C are not too difficult to understand, F sublineages are. F splits into:

F1, F4 and H - South Asia
F2 - East Asia
F3 - nobody seems to know
G - West Eurasia
IJK - complex, very widespread

IJK is a total mess. It breaks down into IJ (West) and K. Since a few weeks ago, K splits into two rather western lineages (L and T) and another labyrinthic one: MNOPS, which seems to have a SE Asian center of gravity (M and S correspond to Sahul, NO to East Asia, with a likely coalescence area in Southern China, and only P to the West partially).

A possibility could be that P (including Q and R) migrated westwards by that northern corridor but if so has left no clear traces in its presumable Eastern area of origin. The other possibility is that it expanded via South Asia, where there are some indications of a possible origin, but with a Central Asian tendency in its spread anyhow. I favor this second option, as I figure that it fits with the MP-UP transition in all the area and a possible Altaian origin for Q (found in Siberia and America).

But whatever the case, the southern route also seems confirmed for the initial Y-DNA expansion in Eurasia.

German Dziebel said...

"It's not clear, however, why all or most of the M groups would have continued along the southern route, with only the N groups bifurcating into northern and southern clades."

Ancient DNA from British Columbia (~ 5K) showed both basal sequences belonging to haplogroup M (China Lake) and sequences belonging to haplogroup A (Big Bar Lake). So M and N lineages are intertwined in the north as well as in the south.
Also, N lineages bifurcate into Western (U, etc.) and Eastern (A, X, B, etc.) clades, rather than into Northern and Southern clades.

Finally, the chart you pulled from Luis's website shows M1 as "closer" to African L3 lineages and as somewhat of an outgroup for non-African M, whereas in reality M1 is a subset of M found in the Caucasus, North and East Africa and representing a back migration of M lineages into Africa.

German Dziebel said...

"I figure that it fits with the MP-UP transition in all the area and a possible Altaian origin for Q (found in Siberia and America)."

A subclade of Y-DNA Q is also found in India. The same for mtDNA D lineages. In both cases, those were surprising finds.

"- D and probably C expanded from SE Asia, like mtDNA N
- F expanded from South Asia, like mtDNA M"

Agree that there's a close correspondence between mtDNA and Y-DNA tree topologies.

"But whatever the case, the southern route also seems confirmed for the initial Y-DNA expansion in Eurasia."

All these southern areas (SE Asia, South India, West Asia) are sites of major population growth, accretion, diversification and expansion in my theory. All the "northern" areas are spread zones for the original small Amerindian clades. Higher diversity doesn't automatically mean greater age. Lower intragroup diversity values can come from isolation with genetic drift.

But if I were looking at things from an out of Africa angle, I would also choose a single migration down a southern route.

Maju said...

Victor, you might be interested in reading this paper (PDF) I just located, on a new refined molecular clock estimates for mtDNA. Most relevant may be fig. 6 with a more or less complete phylogenetic tree with their age estimates.

However I must state that one of the foundations of the dating method is the age divergence of hominids and these are, in my opinion, too recent. Notably they argue for a Pan-Homo split "only" 6.5 Mya, when I do not easily accept any date more recent than 8 Mya (maybe should be as old as 10 Mya).

The tree also has some other inconsistencies, as the N node appears to be slightly older than L3, what is a nonsense. Also regional diversity is not fully clear, because they don't reflect all high level sublineages.

Maju said...

A subclade of Y-DNA Q is also found in India.

That's one of the reasons why I think that P (Q and R) may have a rather "western" origin, probably in South Asia. However there are tiny amounts of P(xQ,R) detected in Native Americans (from NW North America), that have been suggested to be related to P(xQ) in Mongolia. Nevertheless, I have never seen this research line satisfactorily concluded.

One of the reasons is that geneticists often test for either Q or R (or a subclade of these), but almost never for both. So you get results with some P(xQ) or P(xR1) that fail to address what exactly is that paragroup.

But if I were looking at things from an out of Africa angle, I would also choose a single migration down a southern route.

Fair enough.

DocG said...

Maju: "Hence I am quite against the existence of any "northern route" for Homo sapiens until a second moment, . . ."

As I understand it, the northern route from, say, the Indus delta to the Caucasus and Europe would have been closed for many thousands of years due to climatic conditions. But that doesn't mean that a colony or colonies of HMP could not have been in place somewhere on the Indus during all that time, and then, when conditions improved, made its way north and west (also east to Central Asia as well). This is more or less the scenario presented by Stephen Oppenheimer and it does make sense. Could this be your "second moment"?

DocG said...

Maju: "But whatever the case, the southern route also seems confirmed for the initial Y-DNA expansion in Eurasia."

Thanks for sharing your thoughts, Maju, and also summarizing all this complicated data. I'm going to withhold an opinion till I've had a chance to think through the cultural evidence more completely.

DocG said...

Maju: "Victor, you might be interested in reading this paper (PDF) I just located, on a new refined molecular clock estimates for mtDNA. Most relevant may be fig. 6 with a more or less complete phylogenetic tree with their age estimates."

Thanks so much for the tip. I just read through this paper (with Oppenheimer as one of the authors!) and found it extremely interesting. What do you make of the finding that M in SE Asia appears to be earlier than M in India? While they discount the importance of Toba, they nevertheless suggest that some sort of bottleneck event could be responsible:
"Alternatively, if M dispersed with N and R through South
Asia, M may have been caught up in a subsequent bottleneck and founder effect so that its age signals the time of re-expansion rather than first arrival" (p. 752).

Maju said...

Victor:

As I see it, it seems a matter of climatic adaptation rather than absolute impossibility. Humankind is a tropically-adapted species and going beyond of the subtropical belt was probably too challenging (and less interesting when there were so many other places to go). It's easy to reckon that challenge and the feat that was overcoming it now that winter is falling upon us, right?. In any case, some adaptations were needed, notably good clothes and tents. The earliest known needle (a humble but surely critical advance) is from Kostenki (early UP East Europe) and I presume that the peoples who moved into Altai, NE Asia and Europe needed that kind of technology before being able to do that "unnatural" feat, going well beyond the span of H. erectus (at least in Asia).

For what I know they were at the Indus all the time since OoA: either with the Sud-Arabian or Fertile Crescent route, they would have ended there rather soon, precisely when our genetic reconstruction of the Eurasian fast expansion begins. We don't have any evidence for what happened between the Upper Nile and South Asia but it's for sure that when the expansion began (leaving genetic evidence) they were already in South and SE Asia.

I'm not sure if my "second moment" corresponds with what Oppenheimer says because right now I can't recall what he says, sorry. I suspect he claimed a northern and a southern route in parallel but I see zero evidence for that. Notice that new data is found almost every other week and discussion on all that also goes on at all levels. Hence the picture varies somewhat over time.

I also used to believe (rather naively) some years ago that mtDNA N and M meant two entirely different migrations and I also accepted that vague idea of a "northern route", suggested in a few classical population genetics materials (including the influent NG's Genographic Project). But the more I know the least I can accept that: it makes no sense.

But you need to know in order to judge and have your own informed opinion. I can give you information and I can give you my opinion, but I cannot make your opinion: that's obviously your job.

Whatever the case, in my interpretation, the gene flow through that route is relatively limited, affecting almost exclusively the cold North of the continent (and by extension America). In Y-DNA terms it comes to be: Q to the East through a semi-erased Siberian trail and N to the West through a probably more fresh Arctic path. For mtDNA the northern trail is thinner: some X2 to the East (found in Altai and North America) and some CZ, D and to a lesser extent G to the West.

Maju said...

What do you make of the finding that M in SE Asia appears to be earlier than M in India?.

Two things:

1. I don't really take seriously the current age estimates because they need to make way too many assumptions. So far age estimation from genetic data is nothing more than an erudite hunch. It's my opinion but it's clear that it's not a C-14 date nor anything of the like. When done with some property anyhow the confidence intervals are always very large (but often they don't even mention such error margins... or cut them at whim).

2. The differential apparent dates have much more to do with the fact that they represent a regional average, hence the region where M lineages expanded for longer (South Asia, where M in many variants is very much dominant) looks like having a younger M. It's an artifact of the regionalization of the analysis. There is only one MRCA for M and that is the M node, which is the same for all sublineages. Again my opinion anyhow.

There are many uncertainties in this issue of dating, including some fundamentals, like the correct Pan-Homo split age (this paper uses a 6.5 My, when I think it should be at least 8 My).

Take it all genetic age estimates with a pinch of salt.

German Dziebel said...

"The tree also has some other inconsistencies, as the N node appears to be slightly older than L3, what is a nonsense."

I'm not surprised, especially since L3 is closer to N than to other African Ls. In this case, we have a close mitochondrial parallel to the migration of Y-DNA DE into Africa. This is consistent with your own correlation between Y-DNA CDE and mtDNA N.

"I'm going to withhold an opinion till I've had a chance to think through the cultural evidence more completely."

A northern route seems to be a possibility for a back migration to Africa at 40-45K for which archaeological and genetic evidence abound. I hope you, Victor, will give a fair assessment of the cultural correlates of this migration.

Maju said...

It's an artifact of average sampling. All lineages appear younger where they dominate the landscape: M in South Asia and N and R in West Eurasia and East Asia. But that's only an artifact of having more lineages to pick up the two longest ones. They have not measured the age of M in West Eurasia but I'm sure it'd look older too, just because there is little of it around and most of its detectable evolution happened at the early stages of Eurasian expansion, and not more recently.

That's what I understand. I don't think you can make age estimates of regional subgroups of a phylogenetic clade. The only possible age estimates, if any, are for the whole lineage or the sublineages but not for these confusing regional groupings.

Anyhow there are other questions as a very marked difference in the different branches' lengths. Some clades like R1 have like 20 mutations from the R node, while H sublineages have only the order of 4-5. That's why H appears to be younger (I also think it's an artifact of a poor method and poor understanding).

This issue of differential lengths was noticed in the early 2000s when studying some L2 Afroamericans from the Domincan Republic: with the whole mtDNA genome sequence the various were very different in the number of mutations accumulated since the L2 root (will try to find the link later).

I don't think that the discipline of genetics has any clear solution for this issue other than randomness. Some have proposed adaptative selection but I think it has to do with population levels (and its consequences on drift), a factor that is normally ignored for mere convenience when dealing with historical population genetics but that must have played a most important role: many novel branches with high population levels but low drift (so they remain private and ill-studied) or fewer mutations but some real chances that some of these become fixated in the long run at low population levels.

Notice that various researches find totally inconsistent dates in their very rough estimates. For example in yet another paper, H as a whole is found to be much more recent than at least one of its derived subclades, what is a nonsense.

When you get used to the various age estimates, you get used to consider them with extreme caution.

I'll try to add the "bibliography" in another post to come. Cheers.

...

PS- Notice that one of the interesting, but hidden in text, news from this paper is the finding that M, N and two unnamed L3 subclades sharing an HVS polymorphism at locus 195 (page 750 near the end). This seems to mean that there was a particular sublineage of L3 (albeit ill-defined, as they only share a control region mutation) that was involved in the OoA.

I wonder which are those other two L3 sublineages.

Maju said...

References for the previous post:

- On the issue of very different parallel mutation rates: Torroni, 2001.

- On wildly incongruent age estimates: Roostalu 2006. Check table 1, where H11 is attributed 43,900 years of existence and H1 as 22,600 years old, in contrast to the usual estimates of H as whole being 18,000 y.o. (see Achilli 2004) or even as recent as 12,000 y.o. (mentioned in Soares'09 as older estimate, but "refined" to be also 18,000 y.o. by them).

Overall there's brutal uncertainties surrounding these age estimates but for some reason they have become a very popular pass-time among geneticists and something that many "populist" genetic sites sell as hard fact, when they are nothing but a conjecture.

DocG said...

Maju: "The differential apparent dates have much more to do with the fact that they represent a regional average, hence the region where M lineages expanded for longer (South Asia, where M in many variants is very much dominant) looks like having a younger M. It's an artifact of the regionalization of the analysis."

I find it difficult to believe they'd make such a fundmental blunder, especially since this would be the sort of thing it would be easy to correct for. It makes sense to me because it helps account for a very real gap between Africa and SE Asia as far as music is concerned, and also some other important cultural and even phenotypic differences, such as the distribution of tonal languages and "negrito" phenotypes. I've already written about the musical aspect in my "Echoes" essay and I'll probably be raising this issue again here in future posts.

German Dziebel said...

"The differential apparent dates have much more to do with the fact that they represent a regional average, hence the region where M lineages expanded for longer (South Asia, where M in many variants is very much dominant) looks like having a younger M. It's an artifact of the regionalization of the analysis."

You just hinted at a very important aspect of world phylogeography: longer branches are associated with a narrow geographic range in the spread of a haplogroup, while haplogroups and lineages with shorter branches tend to have a wide geographic distribution. The latest M16 from Madagaskar is a case in point. Or Australian mtDNA S. Or Pygmy and Khoisan lineages. As genetic diversity is directly proportionate to geographic distance, long-range haplogroups harbor lots of diversity, hence they must be older.

Maju said...

I find it difficult to believe they'd make such a fundmental blunder...

I don't. Genetic research is full of such half-baked stuff. It's a young science, you know.

It makes sense to me because it helps account for a very real gap between Africa and SE Asia as far as music is concerned, and also some other important cultural and even phenotypic differences, such as the distribution of tonal languages and "negrito" phenotypes.

Seems quite risky to me to associate phenotypes with DNA too tightly. Negrito phenotypes, as well as other "Australoid" types look to me just variants of the archaic Eurasian types, not "African".

But up to you.

You just hinted at a very important aspect of world phylogeography: longer branches are associated with a narrow geographic range in the spread of a haplogroup, while haplogroups and lineages with shorter branches tend to have a wide geographic distribution.

It may be the case. Anyhow it's just my tentative explanation. I am not at all satisfied by the usual age estimate methods: I strongly believe that they are missing something important. Phylogenies make sense, age estimates more often than not, do not make much sense.

For instance, take the common European haplogroup H, that has been located in fossil remains of Morocco (it's also common in North Africa) and Portugal of pre-Neolithic times, in both cases as dominant as it is now. By the usual age estimates it should not have spread before at least 18,000 BP but that means a Magdalenian or Epipaleolithic spread (some would even argue for a Neolithic spread), whose archaeology is quite strictly circumscribed to parts of Europe. You can relate the Oranian (or Iberomaurusian) culture of Taforalt (and much of North Africa) to the Iberian Gravetto-Solutrean (and there is quite clear evidence that North African H is Iberian-derived)... but you can't make any connection with Magdalenian or later cultures at least until Neolithic.

So in this case at least the age estimates make no sense: H must be at least of Gravettian age in Europe and that means an overall age (it spread from West Asia probably) of more than 30,000 years, almost double as the current mainstream estimates.

This can only be explained because the age estimate methods are flawed in one or several aspects. And, well, if you study these methods a bit, you realize that they are little more than a bunch of rough assumptions embedded in a more or less elegant equation.

German Dziebel said...

"This can only be explained because the age estimate methods are flawed in one or several aspects. And, well, if you study these methods a bit, you realize that they are little more than a bunch of rough assumptions embedded in a more or less elegant equation."

I agree. It came to my attention twice: first, when Tad Schurr was presenting dates for the diversification of Asian-Amerindian mtDNA haplogroups at Clovis and Beyond in 1999 (all dates were older in the Americas, which he claimed must be due to sample errors) and then in 2004, when Zhivotovsky was presenting his novel methods at Stanford workshops with Feldman, Mountain and Underhill. It's still an infant science, indeed. Great raw data, rather tentative methods of handling it.