Thursday, January 7, 2010

277. The Baseline Scenarios -- 53: Questions

Before continuing with my southern route scenario, I want to go over some of the genetic evidence I've been considering recently, because there are questions, some raised by comments from Maju and German, others that came to mind as I was posting. I've been testing one particular hypothesis against various types of evidence, emphasizing aspects of that evidence that seem to support the hypothesis -- but, as I've been reminded by my colleagues, I may have been ignoring aspects of the same evidence that don't fit the hypothesis.

Let's begin with the map for tribal India presented in Post 275, which shows a distribution for haplogroup M (largely in eastern India, mostly absent in the west) remarkably similar to the distribution mapped by Stephen Oppenheimer, in support of his Toba theory. Since posting that map, I've been bothered by the fact that the populations in question also reveal what looks like a high degree of haplotype diversity, as expressed by the many divisions we see within each pie chart. As the usual sign of a population bottleneck is relatively low diversity, the question arises of whether this evidence might actually be in conflict with the theory I've been considering, the notion that Toba or some other disaster precipitated a major bottleneck in this area shortly after the initial migration.

I've given this some thought, and realize that this evidence is not as straightforward as it might seem. When we look into the matter more deeply, it becomes clear that the whole question of "bottleneck evidence" with respect to an event such as Toba is problematic. The first thing to understand is that a major bottleneck undoubtedly occurred when the relatively small HMP group first left Africa, so, to be consistent with the Out of Africa model, the many different haplotypes we see among tribal peoples in India, or anywhere else in Asia, must be understood as having accumulated since that bottleneck.

Once this is understood, we are left with the problem of looking for evidence of yet another bottleneck that may have occurred only a short time after. I remember discussing this problem a few years ago with Floyd Reed, Sarah Tishkoff's principal statistician. According to Floyd, it would be almost impossible to detect a second bottleneck by the usual methods if it had occurred only a few thousand years after the first. Evidence of the initial bottleneck would obscure evidence of a subsequent one. If that's the case, then efforts to detect a second bottleneck using haplotype diversity as a signal may be beside the point. Diversity would already have been low after the first bottleneck -- and any diversity we now see must have accumulated in the many tens of thousands of years since then.

Thus, the sort of evidence presented by Oppenheimer, based on distribution pattern rather than genetic diversity, may be our best bet in assessing the fit of the genetic evidence with a possible post-exodus bottleneck. In other words, it looks as though this case is going to have to be settled on the basis of circumstantial evidence -- there were no eye-witnesses and no DNA left at the scene. :-)

With respect to evidence presented in the previous post, especially the map (Map B) I singled out in Figure 4 of the paper by Graham Coop et al., I've been accused by Maju of "only looking at the data in ways that are convenient for your hypothesis." He is bothered, for example, by the fact that I ignored the other two maps in Figure 4, A and C, which showed distributions very different from B, containing no gaps and with derived allele branches in East Asia. Was I focusing only on the evidence that suited me and ignoring what didn't -- what is known as "cherrypicking"? I don't think so, but I realize now that I should have explained myself rather than taking it for granted that others would understand.

Right now particle physicists are focused on the Large Hadron Collider in Switzerland, which will soon be smashing protons together in a search for something called the Higgs Boson. The computers will be programmed to collect and evaluate data on billions, possibly trillions, of collisions, but the vast majority of those collisions will be ignored. Not because the physicists are "only looking at the data in ways that are convenient for [their] hypothesis," but because any sign anywhere of the Higgs boson is what they are after. And if they find such a sign, they will continue looking for other, similar signs to confirm the first one, and make sure it isn't an artifact. In my own, far more modest (and inexpensive) way, this is what I am doing.

Coop et al. could have published a thousand maps but the one I'd be zeroing in on would be the one containing the gap I'm looking for. And for exactly the same reason as the physicists when they ignore all the collisions that seem irrelevant from the point of view of their research. The maps in question are maps showing the worldwide distribution of "ancestral" and "derived" alleles of certain genes possibly associated with adaptation. Since there have been many migrations over many parts of the globe during many periods in human history, through which various genes may have spread, it stands to reason that not every such map will be relevant to the hypothesis I'm testing.

But if even only one such map reveals a gap in precisely the same place as the other gaps I've been considering, that counts as supporting evidence for the hypothesis I'm testing and I must take note of it. The existence of such a map does not constitute proof. But it would certainly stand as supporting evidence, just as a particle collision showing certain characteristics would be supporting evidence for the existence of the Higgs. In both cases, what is being sought is something highly distinctive and precisely defined. If it turns up anywhere in the evidence, that must be taken seriously. In other words, we are not in this case taking a vote of particles or genes. We are looking for a needle in a haystack made of particles or genes, which is a very different thing.

What makes the distribution in question so distinctive is the presence of a significant gap in a particular region of the world: South Asia. But actually, any large gap at all in a map representing a migration should draw attention to itself, because migrations are continuous, not gapped -- unless the participants can fly. A migration from Africa to East Asia should not have a hole in it covering the entirety of Arabia, Iran, Pakistan and India.

Let's take another look at Figure 4:


It's important to understand that each of the above maps represents two different migrations, by which two different alleles of the same gene were spread. Thus when we examine map B, we see, in red, a map of the migration of the derived allele of SLC24A5, superimposed on a map representing the much earlier migration of the ancestral allele, in blue. Note that this migration alone, of all the others displayed on all three maps, is discontinuous. It apparently begins in Africa and winds up in East Asia and Melanesia, with the greatest concentration in Southeast Asia. But there is a huge gap in between where we see no blue at all. How can this be?

(to be continued . . .)

No comments:

Post a Comment