Uncommon Descent


15 December 2010

New Peer-Reviewed Pro-ID Paper in BIO-COMPLEXITY

William Dembski

A Vivisection of the ev Computer Organism: Identifying Sources of Active Information

George Montañez, Winston Ewert, William Dembski, Robert Marks

 

Abstract

ev is an evolutionary search algorithm proposed to simulate biological evolution. As such, researchers have claimed that it demonstrates that a blind, unguided search is able to generate new information. However, analysis shows that any non-trivial computer search needs to exploit one or more sources of knowledge to make the search successful. Search algorithms mine active information from these resources, with some search algorithms performing better than others. We illustrate these principles in the analysis of ev. The sources of knowledge in ev include a Hamming oracle and a perceptron structure that predisposes the search towards its target. The original ev uses these resources in an evolutionary algorithm. Although the evolutionary algorithm finds the target, we demonstrate a simple stochastic hill climbing algorithm uses the resources more efficiently.

Full Text: PDF

SociBook del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon
13 Responses

1

noam_ghish

12/15/2010

10:48 pm

Congrats on the published paper, Dembski rocks. Does anyone know the best ID book or essay that analyzes the fossil record and its problems?


2

noam_ghish

12/16/2010

10:19 am

This comes from a book that tries to prove Darwinism, The Making of the Fittest by Sean Carroll:

most animals are endowed with similar tool kits of body-building and organ-building genes (our phylum, the vertebrates, does have a larger number of these tool-kit genes because of some large-scale genome duplications). This tells us that the tool kit itself is ancient and must have been in place in a common ancestor before most types of modern animals bodies and body parts evolved.

Is he saying:

1. body-building genes were built before they were used or
2. body-building genes were used as soon as they were built and they are still being used today


3

DarelRex

12/16/2010

1:50 pm

Although certainly an interesting analysis of ev, I wonder, though, if ev‘s applicability to biological evolution is being at all challenged here, because: (a) evolutionists do not claim that evolution operates without a fitness function; they say that natural selection — i.e. the physical requirements this world places on the tasks of surviving and thriving — is the fitness function that evolution exploits.  And (b) evolutionists, in all likelihood, do not claim that mutation-selection evolution uses the fitness function (natural selection) in any ultimately efficient way; they simply claim that it uses it.

It seems to me that to challenge ev‘s relevance to biological evolution, we would need to show that there is something specifically incorrect about how ev simulates the biological tasks it claims to, or that ev is tackling tasks with which even anti-evolutionists do not take issue (e.g. antibiotic resistance).


4

R0bb

12/16/2010

4:24 pm

Congratulations to the authors for publishing this. On glancing over it, it looks like great work, although I see a minor issue in the second column of page 3 and major issues in the paper’s conclusions.

The reasoning leading to Eq. 15 seems incorrect, although the results seem reasonable. You say, “Because there have been successful simulations, we know at least one of these can match the target sequence.” The pronoun these seems to refer to the 10^25 sequences with exactly 16 ones. But that doesn’t make sense, since those 10^25 sequences contain exactly one of each unique sequence.

On the other hand, in an earlier paragraph, you say that you “conservatively assume there is only one perceptron that generates an output matching a desired 16 bit target”. That is, you’re assuming that only one of the 10^157 genomic sequences maps to the target. This makes more sense, but under this assumption, your resultant probability is off by a factor of about 10^132.

In order to justify your calculated probability, what you need to assume is that the perceptron doesn’t favor some sequences with 16 ones over other sequences with 16 ones. This assumption certainly doesn’t hold exactly, but from scaled-down empirical tests, it seems to hold well enough for government work, especially for sequences with spread-out binding sites.

I’ll save my comments regarding the paper’s conclusions for another time.


5

LivingstoneMorford

12/16/2010

6:11 pm

Nice work, Dr. Dembski.
This deserves attention from my blog:
http://talkbio.blogspot.com/20.....under.html


6

GMont

12/16/2010

8:58 pm

R0bb,

Thank you for your interest in our work.

Our sentence “Because there have been successful simulations, we know at least one of these can match the target sequence” is referring to output vectors (of which there are 10^25 with exactly 16 ones), not input genomes. Since the mapping from input genome to output vector is complex (more than one genome possibly mapping to a given output vector), we analyze from the perspective of outputs. The sentence is simply saying that at least one of the 10^25 outputs matches the target sequence. They are each unique, so we could have more strongly stated that exactly one of the 10^25 vectors matches the target, but our point was simply to show that the target sequence was represented among those vectors.

Secondly, since we did not analytically demonstrate that there were no “16 ones” sequences that were favored over others (ie we did not further analyze bias within this set), our bound is an estimate, not an absolute bound. However, since ev picks a random sequence as a target for each run, any biases within the set of 16 ones sequences would tend to wash out over several (or several billion) runs. Sometimes it would choose a favored sequence within this set, and other times it would not. We simply looked at the behavior over the complete ensemble. Hence, our empirical estimate should be a reasonable one. You are free to perform additional experiments to assess the presence of any further bias within the set, which would provide a useful additional result.

Thank you again for your interest in the paper.


7

QuiteID

12/17/2010

2:54 pm

noam_ghish, I’ll try to answer your questions:

1. I don’t know that there’s a better treatment of the subject of fossils in ID literature than Icons of Evolution by Jonathan Wells, now an ID classic! It has chapters on horse fossils, hominid fossils, and Archaeopteryx.

2. I haven’t read the Carroll books but I would assume it’s neither. That is, I think he’d spin a tale that those genes may have been used for something else, duplicated, and the the extra copy kind of kidnapped for body-building. I think he’d say they weren’t “body building genes” until they started building bodies.


8

R0bb

12/17/2010

5:02 pm

Hi George. I’m the same R0b that you’ve spoken with in the past — I forgot my password so I created a new user.

I’m confused by the statement, “In both cases we conservatively assume there is only one perceptron that generates an output matching a desired 16 bit target, like that of Equation 10.” I assume that you mean “…there is only one perceptron input that generates an output matching a desired 16 bit target…” But this assumption immediately yields an endogenous information value of exactly 522 bits. The math in this section, which yields a result of 92 bits, is obviously not based on this assumption.

Also, the phrase “Because there have been successful simulations” doesn’t make sense. The desired “16 ones” sequence is represented exactly once in the 10^25 sequences regardless of whether there have been successful simulations or not.

If you, as you say, change “at least one” to “exactly one”, then the results of 92 bits and 90 bits are not actually upper bounds. Rather, they’re rough estimates that could be low or high.

You’re right that if the perceptron favors some “16 ones” sequences over others, that bias will be washed out over several runs with different targets. I had interpreted this section as calculating the endogenous information with respect to a given target, not the average endogenous information with respect to all “16 ones” targets. Sorry for that misreading. I actually tested for such a bias three years ago, and found that targets with overlapping binding sites tend to be slightly favored or disfavored, while targets without overlapping binding sites get average representation. But the bias appears to be small relative to the scale that we’re dealing with.

It’s good to talk to you again.


9

GMont

12/21/2010

5:34 pm

Dear R0b(b),

Thank you again for your thoughtful discussion of the paper. I apologize for my delayed response.

The confusion over the phrasing is due to what I pointed out previously, namely, that the statement is referring to at least one (and as you and I have clarified, only one) output matching the target sequence. It could be the case that a target “match” could consist of matching say 90% or more of the sites correctly, so the phrasing is not as tautological (or trivial) as it may first seem. It could also be the case that the target consists of 17 ones, instead of 16, in which case none of the 16 ones sequences could match a target. I agree that the wording is not clear; I apologize for my part in that. Co-authoring a paper with multiple authors can sometimes result in fragments of paragraphs that become unclear after many edits, by many different people. This is probably one such example.

As for the bound being an upper bound, mathematically, it is due to the fact that Pr[hitting target|16 ones] >= pi. Therefore, the pi variable represents a lower bound on the probability, and raising the actual probability above pi (such as by further biasing the ev structure), will correspondingly lower the difficulty of the remaining search problem. The large bias we discovered in the perceptron already reduces the problem substantially (by more than 70%), and it is possible that additional smaller sources of bias may exist. At very least, the problem faced is much less than the 256 bits assumed, due to the large bias of the perceptron.

Thanks again for your interaction on the paper and discussion.


10

R0bb

12/22/2010

2:41 pm

Hi George. As far as I can tell, we’re still left with the fact that the assumption, “there is only one perceptron that generates an output matching a desired 16 bit target”, yields a result of 522 bits instead of 92 bits. And the reasoning, “Because there have been successful simulations,” still doesn’t make sense. And in Pr[hitting target|16 ones] ≥ pi, the ≥ should actually be ≅. But these are a minor issues, so I won’t belabor them.

And I agree that the problem size is much smaller than 256 bits for targets with few ones or few zeros. Of course, nobody ever claimed that the problem size is 256 bits. And since Schneider likely didn’t know that the perceptron favors targets with few ones, we seem to have a case where active information was introduced by accident rather than by design.

Regarding the conclusions of the paper:

The success of ev is largely due to active information introduced by the Hamming oracle and from the perceptron structure.
It is not due to the evolutionary algorithm used to perform the search. Indeed, other algorithms are shown to mine active information more efficiently from the knowledge sources provided by ev.

ev’s (32,64)-ES is a workable, albeit suboptimal, algorithm. If we replace it with the EIL’s baseline algorithm, namely random sampling, the Hamming distance is useless and the problem is intractable. It would be accurate to say ev’s success is not solely due to the evolutionary algorithm, since no ES can succeed without a suitable objective function. But to deny any credit to the evolutionary algorithm is inaccurate.

Our results show that, contrary to these claims, ev does not demonstrate “that biological information…can rapidly appear in genetic control systems subjected to replication, mutation, and selection”. We show this by demonstrating that there are at least five sources of active information in ev.

But those five sources are all elements of the replication-mutation-selection model, which by definition includes an objective function and optimization by mutation. In ev, the objective function is comprised of the perceptron and the Hamming function, and optimization by mutation necessarily involves repeated queries and a mutation rate. Granted, the objective function in ev is quite friendly in that it’s mostly (although not completely) continuous and favors targets with few ones or few zeros. And the mutation rate is, as the paper says, workable but not optimal. We could tweak these two elements to get a search that’s less efficient, or one that’s more efficient. But the fact remains that ev’s particular replication-mutation-selection model works.

Perhaps Schneider’s claim is being challenged on the basis that the model contains pre-existing “active information”, so it doesn’t actually create information ex nihilo. But saying that the model contains positive active information is synonymous with saying that it works better than brute force. So Schneider’s claim that R_sequence rapidly goes from 0 to 4 because of the replication-mutation-selection process agrees with the EIL’s authors’ claim that the replication-mutation-selection model contains active information. The authors are not actually challenging Schneider’s claim — at best, they’re challenging his semantics.

As far as ev can be viewed as a model for biological processes in nature, it provides little evidence for the ability of a Darwinian search to generate new information.

ev generates novel genomic sequences that meet the specified criteria. Whether we label those sequences “new information” or not is a semantic issue. The usage of equivocal terms like “information” results in opponents talking past each other. Schneider uses strictly classical information concepts in his analysis, so pointing out pre-existing “active information” accomplishes nothing in terms of addressing Schneider’s claims.

Similarly, Dr. Dembski has previously stated that “Schneider thinks that he has generated complex specified information for free,” and has accused Schneider of “smuggling in complex specified information.” In reality, Schneider didn’t claim that the output of ev contains CSI, or that the ev code does not contain CSI. He likely had never heard of Dembski’s concepts when he wrote his ev paper.

And Schneider isn’t without sin in this regard. He says that “the ev model quantitatively addresses the question of how life gains information, a valid issue raised by creationists,” and he points to Royal Truman’s “The Problem of Information for the Theory of Evolution”. But Truman is not talking about Shannon’s account of information — in fact, he dismisses it as irrelevant to evolution debate, opting instead for Gitt’s definitions. So Schneider’s work doesn’t actually address Truman’s argument.

And Truman’s paper is in response to Dawkins’ “The Information Challenge”, which, like Schneider’s paper, deals only with classical information theory. So the conversation that culminates in the EIL’s latest paper has a long history of equivocation. (To be fair, Truman is at least up front about the fact that his definition of information is different from Dawkins’.)

Much more to say, but this is way too long already. Merry Christmas to all, and to all a good night!


11

GMont

12/22/2010

5:11 pm

Dear Robb,

I am afraid you are again discussing the inputs to the perceptron (with the accompanying 522 bit claim), when I have previously explained that the sentence is in reference to the outputs, not the inputs. We do not know how many inputs map to a given output, nor did we attempt to analyze this. Instead, we tackled the problem from the view of the output, which makes the analysis simpler.

As for your other problems with the semantics of the paper, you are entitled to disagree with the importance of the work or its impact. You could feel that extracting 90 bits of information from, say, a 1000 bit map is an impressive display of “information generation from scratch”, but others are not as convinced.

Merry Christmas.


12

R0bb

12/22/2010

5:57 pm

George, your analysis shows that 2 out of 10^28 inputs map to a given “16 ones” output. That’s what it means to say that p = 2*10^-28. I think we’re not communicating well on this issue so I’m going to drop it.

With regards to information generation from scratch, I’m at a loss as to what that means in the context of the EIL framework. Marks and Dembski say, “Intelligence creates information,” but can an intelligent being find a small target without problem-specific knowledge? Certainly we can use our intelligence to exploit existing information, as the EIL authors illustrate so well in the “Efficient Extraction” paper, but intelligence is useless in an informational void. Anyone who disagrees is welcome to try using their intelligence to guess my debit card PIN.

Can you provide an example of information being generated from scratch?


13

R0bb

12/22/2010

7:04 pm

Also, with regards to semantics, I have no problems with the semantics of the paper. My only concern is that Schneider’s claims be judged according to his stated meaning, and not according to someone else’s definitions.

Also, I agree with the EIL’s position that ev isn’t very impressive. I feel the same about WEASEL and Avida. Their behavior is not particularly surprising or enlightening, as we already know that evolution works when given a sufficiently gradual evolutionary path from A to B. The real question is whether such paths exist for all observed biological structures. (I suppose the EIL would also ask whether we would expect such paths to exist without having been designed.)

Joyeux Noel!


Post a Response

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

You must be logged in to post a comment.