Uncommon Descent


7 December 2009

New Dembski-Marks Paper

William Dembski

William A. Dembski and Robert J. Marks II, “Bernoulli’s Principle of Insufficient Reason and Conservation of Information in Computer Search,” Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics. San Antonio, TX, USA – October 2009, pp. 2647-2652.

Abstract: Conservation of information (COI) popularized by the no free lunch theorem is a great leveler of search algorithms, showing that on average no search outperforms any other. Yet in practice some searches appear to outperform others. In consequence, some have questioned the significance of COI to the performance of search algorithms. An underlying foundation of COI is Bernoulli’s Principle of Insufficient Reason1(PrOIR) which imposes of a uniform distribution on a search space in the absence of all prior knowledge about the search target or the search space structure. The assumption is conserved under mapping. If the probability of finding a target in a search space is p, then the problem of finding the target in any subset of the search space is p. More generally, all some-to-many mappings of a uniform search space result in a new search space where the chance of doing better than p is 50-50. Consequently the chance of doing worse is 50-50. This result can be viewed as a confirming property of COI. To properly assess the significance of the COI for search, one must completely identify the precise sources of information that affect search performance. This discussion leads to resolution of the seeming conflict between COI and the observation that some search algorithms perform well on a large class of problems.

[ IEEE | pdf ]

SociBook del.icio.us Digg Facebook Google Yahoo Buzz StumbleUpon
83 Responses

1

R0b

12/07/2009

4:08 pm

Dr. Dembski,

Congratulations to you and Dr. Marks. I applaud you for stating your position in footnote 12, as well as the publisher for allowing you to do so.

I’ll repeat a question I asked in a previous thread since it’s more appropriate here: In Section IV.B, you seem to be defining the term intrinsic target as an outcome to which the given process (or search) is biased, eg Conway’s evolutionary endpoints. Am I reading this correctly?

Since every non-uniform distribution is biased toward some subset of the sample space, it would seem that every non-uniform distribution has an intrinsic target and positive active information. And since this paper makes it clear that intelligence is the source of active information, it would appear that every non-uniform distribution has an intelligent source. Is that the position of the EIL?


2

tragic mishap

12/07/2009

4:09 pm

Dr. Dembski, how in the name of all that is holy did you get this paper published considering all the problems with it exposed by Wikipedia?

;)


3

R0b

12/07/2009

4:11 pm

Clarification @ 1: “this paper” in the the last paragraph refers to Life’s Conservation Law.


4

Upright BiPed

12/07/2009

4:32 pm

Dr Dembski, congratulations to you and Mr Marks.


5

halo

12/07/2009

4:46 pm

off-topic:

Don Prothero’s review of ‘Signature in the Cell’ on Amazon:

http://www.amazon.com/review/R.....KGTP0HY63N

I honestly thought it was written by the typical uneducated raving village atheist before I saw the name!

Maybe I was right.


6

tragic mishap

12/07/2009

5:10 pm

Prothero: “They haven’t created life in a test tube yet–but they are very close.”

Yeah, mostly by borrowing stuff from existing life and intelligently designing it.


7

Adel DiBagno

12/07/2009

5:22 pm

Cheats!


8

Collin

12/07/2009

5:48 pm

Off-topic,

This article shows why computer models do not replace real testing.

http://www.msnbc.msn.com/id/33.....e-science/


9

Zachriel

12/07/2009

6:11 pm

Dembski & Marks: Likewise, COI establishes important limitations for search algorithms, as revealed in the amount of active information that must be infused into a search for it to be successful.

If we take a random choice from all possible search algorithms and match it to a random choice from all possible landscapes, our results would, on average, be no better than a random search.

If, however, we take a standard evolutionary algorithm, and match it to a well-defined landscape with plenty of hills to climb, then our results would be better than a random search.

We don’t have to infuse information into a search if the search and landscape are matched by circumstance. A replication and selection algorithm (life) is well-suited to a search of an ordered environment (the natural world).

Dembski & Marks: Prior knowledge about the smoothness of a search landscape required for gradient based hill-climbing, is not only common but is also vital to the success of some search optimizations. Such procedures, however, are of little use when searching to find a sequence of, say, 7 letters from a 26-letter alphabet to form a word that will pass successfully through a spell checker …

Does that mean an evolutionary algorithm won’t work to find perfectly spelled words because there are no hills to climb?


10

Douglas Moran

12/07/2009

6:47 pm

Excellent paper with a great conclusion. Congrats!


11

JPCollado

12/07/2009

7:07 pm

Prothero: “They haven’t created life in a test tube yet–but they are very close.”

Every religion has faith in and hope for the future.


12

Graham

12/07/2009

8:41 pm

Re post #1: I have similar problems. Calling it an ‘intrinsic’ target seems to be a bit of linguistic sleight-of-hand. Its not a target at all, just a constraint on the possible end-points, eg: its unlikely life forms would evolve to an extremely large size, simply due to the physical constraints of the environment.

Also, if evolution has no ‘target’, then how is any consideration of ’search’ algorithms relevant ?


13

Allen_MacNeill

12/07/2009

8:49 pm

First, congratulations to Dr. Dembski & Marks; publication is the life blood of all career academics and the living heart of the intellectual process. It takes courage and hard work (and a little bit of luck) to get your original work published, and more of the same to weather the criticism that inevitably ensues. But, just as one cannot have a fencing match without an opponent, real progress in any intellectual endeavor cannot come from consensus, but only from the clash of ideas and evidence.

And so, to specifics:

I have no quibble with most of the mathematical analysis presented. Indeed, given the assumptions upon which the authors’ COI theory is based (with which I do not necessarily agree, but which are clearly presented in this paper), the analysis presented is apparently not completely outside the domain of NFL theorems in general.

However, the same cannot be said for the application of these ideas to biological evolution. To be specific, consider the following quote (Re Dembski & Marks (2009) pg. 2651 lines 2-5):

“From the perspective
of COI, these limited number of endpoints on which evolution
converges constitute intrinsic targets, crafted in part by initial conditions and the environment.” [emphasis added]

This is indeed the crux of the issue vis-a-vis biological evolution. While it is clearly the case that Simon Conway-Morris asserts that there is an apparently limited number of biological “endpoints”, it is neither the case that Morris’ viewpoint represents the core of evolutionary theory, nor that his point is relevant to the analysis of COI presented in Dr. Dembski and Marks’ paper.

To be specific, the highlighted qualifier from the quote above – crafted in part by initial conditions and the environment – is precisely the issue under debate between evolutionary biologists and supports of ID.

Taken at face value, this qualifying phrase means that, given specific starting conditions and a specific time-varying environmental context, the various mechanisms of evolution (e.g. mutation, natural selection, genetic drift/inbreeding, etc.) tend to converge on a relatively limited set of genotypic/phenotypic “endstates” (i.e. what could be loosely referred to as “evolutionary adaptations”).

This is simply another way of defining evolutionary convergence, and in no way constitutes evidence for intrinsic evolutionary teleology. On the contrary, it simply provides support for the hypothesis that, given similar conditions, similar outcomes result.

Furthermore, it assumes that virtually all characteristics of living organisms are adaptations (that is, genotypic/phenotypic characteristics that fulfill some necessary function in the lives of organisms). However, this is manifestly not the case, nor is it an absolutely necessary component of current evolutionary theory. On the contrary, many (perhaps the majority) of the characteristics of living organisms are not adaptive. This is certainly the case at the level of the genome, as evidenced by the neutral and nearly neutral theories of molecular evolution.

Finally, Morris’ (and, by extension, Dembski and Marks’) position completely omits any role for historical contingency, which both the fossil and genomic record indicate are of extraordinary importance in macroevolution. As Dembski and Marks state, the “endpoints” (perhaps it would be more precise to refer to them as “way stations”) of macroevolution depend fundamentally on initial conditions and the environment. But this is not fundamentally different from Darwin’s position in the Origin of Species:

“The complex and little known laws governing variation are the same, as far as we can see, with the laws which have governed the production of so-called specific forms. In both cases physical conditions seem to have produced but little direct effect; yet when varieties enter any zone, they occasionally assume some of the characters of the species proper to that zone.” [Darwin, C. (1859) Origin of Species, pg. 472 http://darwin-online.org.uk/co.....ageseq=490 , emphasis added]

However, their analysis completely ignores the appearance (or non-appearance) of new genotypic and phenotypic variations, and on the accidental disappearance of such characteristics (via extinction), without regard to the adaptive value of such characteristics, or the lack thereof.

In other words, Dembski and Marks’ analysis, while interesting from the standpoint of what could be called “abstract” search algorithms, completely fails to address the central issues of evolutionary biology: the source of evolutionary novelty (i.e. the “engines of variation”), the effects of changing environmental conditions on the actual forms and functions of living organisms, and the fundamental importance of historical contingency in the ongoing evolution of genotypes and phenotypes.


14

Allen_MacNeill

12/07/2009

9:31 pm

Go here for more analysis (including hyperlinks):

http://evolutionlist.blogspot......chers.html


15

Clive Hayden

12/07/2009

9:49 pm

Congratulations Dr. Dembski and Dr. Marks!


16

Upright BiPed

12/07/2009

10:12 pm

“…real progress in any intellectual endeavor cannot come from consensus, but only from the clash of ideas and evidence.”

Like Climategate and Unguided Evolution


17

Prof_P.Olofsson

12/08/2009

12:18 am

Congratulation Bill! So you’re a Bayesian now! ;)


18

Uvula Presley

12/08/2009

12:42 am

To intrinsic target agnostics.

Flipping a coin has an intrinsic target of a 50-50 heads and tails.

The asymptotic partition theorem says a randomly chosen irrational number will have 10% of its base ten digits being equal to 3.

And for those in Rio Linda, pinball machines have goals dictated by landscape. No matter what the path, no matter the interference of man, the ball ends up dropping in the little hole.


19

Uvula Presley

12/08/2009

12:45 am

Oh. Bayes liked ID. Maybe he was a YEC. His day job was as a Presbyterian minister.


20

T. lise

12/08/2009

1:02 am

Congratulation Bill and Mark. Keep up the good work………


21

GradStudent

12/08/2009

1:09 am

Zachriel,

You have not considered any adversarial landscapes, e.g. those that are not smooth or those that are full of “delta peaks”. In fact give me any optimization algorithm and we can perhaps find a landscape where it would do worse than blind search.

Of course in some situations one can safely assume some regularities such as smoothness of the landscape, number of modes, etc. and leverage that within the particular optimization strategy.


22

Mung

12/08/2009

2:06 am

We don’t have to infuse information into a search if the search and landscape are matched by circumstance.

You mean accidental? By chance? Random?


23

Mung

12/08/2009

2:17 am

Typo!

The caption for Section III is missing the ‘N’ in BERNOULLI.


24

Mark Frank

12/08/2009

4:14 am

I still can’t get my head round treating the evolutionary algorithm as a discrete finite random variable.

First it is necessary to define more precisely what is actually meant by the evolutionary algorithm. To make any sense at all then the algorithm has to include the mechanism(s) of variation, of selection, and the environment in which they operate – these all affect the probability of the outcome and therefore in ID terms provide “exogenous information”.

Given this:

1) It is not discrete – conceivable algorithms vary continuously.

2) It is not finite. There are an infinite number of conceivable algorithms.

3) Most importantly – it is not a variable. The mechanisms and environment were not selected from all conceivable such mechanisms and environments. They are a given.

I think the source of confusion is illustrated by the reference to Conway Morris’s observation on convergence. It may well be that given the evolutionary algorithm that the result is to some extent determined. That does not imply that the evolutionary algorithm was selected among many to provide that outcome. The algorithm existed first. The outcome followed. It is like saying what are the chances of the laws of gravity being such that the moon orbits the earth.


25

Heinrich

12/08/2009

5:02 am

The assumption is conserved under mapping. If the probability of finding a target in a search space is p, then the problem of finding the target in any subset of the search space is p.

Isn’t this just wrong? It’s like saying that the probability of finding a Dr. Dembski authored a blog post on the internet is equal to the probability of that he authored one on UD.

It’s apparent from the paper that what is meant is that the assumption of insufficient reason is conserved over a summation of all possible mappings.


26

Mark Frank

12/08/2009

7:48 am

#24

Heinrich

I think the phrase you quote is a bit imprecise. The key phrase is:

“all some-to-many mappings of a uniform search space result in a new search space where the chance of doing better than p is 50-50″.

(which is explained in more detail in the paper on page 2649 and appendix A).

As I understand it this means that if you select a subset of all blogs on the internet, without knowing anything about how blogs are distributed on the internet, then you have as much chance of decreasing your probability of getting a blog authored by WAD as you have of increasing the probability. This is fairly obviously true. I just think it is irrelevant to evolution.


27

Prof_P.Olofsson

12/08/2009

11:36 am

Regarding my post [17], as D&M point out, in the frequentist paradigm Bernoulli’s PrOIR has no meaning. Bill has previously argued hard for this paradigm against the subjective probability interpretation of Bayesianism, so I assume he has changed his views, which is, of course, perfectly fine. There isn’t much of a fight between the two camps these days.

As for the math, frankly, the PrOIR is never used in probability. Consider D&M’s Formula (1). On the one hand they claim that it assumes the PrOIR due to “no prior knowledge.” But before Formula (1) they state that the deck is well shuffled which is a lot of prior knowledge and precisely the prior knowledge that warrants the uniform distribution. One must not confuse prior knowledge of the distribution of cards (which we do have) with prior knowledge of the location of the ace of spades (which we don’t have). As an example of the former, if I tell you I have a deck of card in my hand, you have no idea of the distribution of cards. It could be new and perfectly ordered, it could be well shuffled, it could be manipulated by me, etc. According to the PrOIR you must assume that the deck is well shuffled which clearly makes no sense.

The only possible practical use of the PrOIR would be as a prior in a Bayesian setting, but then it is only there to be replaced by the posterior after data has been gathered. In other words, we can only invoke the PrOIR if we intend to abandon it. As D&M do not device any mechanism for updating to posterior distributions, their results have no practical applications.


28

Prof_P.Olofsson

12/08/2009

11:49 am

About footnote [7] in the D&M paper.

Seriously, to probabilists such as Bill and myself, the original NFL Theorem by Wolpert and MacReady is completely trivial, essentially stating that if we flip a fair coin repeatedly, the outcome of the 11th flip is independent of the first 10 flips (see Haggstrom’s proof). Now, W&M did not realize the simple proof of their result, but a triviality it is nevertheless. In contrast, Godel’s Incompleteness Theorem is one of the most profound mathematical results, killing the hope that mathematical truth and provability are equivalent. Anybody who has read its proof knows that it invokes an ingenious idea (Godel numbers) and is far from trivial.

Leave poor old Kurt alone!


29

Heinrich

12/08/2009

11:50 am

Bill has previously argued hard for this paradigm against the subjective probability interpretation of Bayesianism, so I assume he has changed his views, which is, of course, perfectly fine.

IIRC, Dr. Dembski explicitly used a Bayesian approach to analyse the Jesus’ tomb claim.

One well-known problem with PrOIR is that it only works on specific scales – a transformation makes the distribution non-uniform. If Dr. Dembski could solve that, it would be a substantial contribution, and would revitalise objective Bayesianism.


30

Collin

12/08/2009

12:01 pm

Allen_McNeill said: “On the contrary, many (perhaps the majority) of the characteristics of living organisms are not adaptive.”

What does that mean exactly?


31

Prof_P.Olofsson

12/08/2009

12:10 pm

Heinrich[28],
How could it be solved? For example, if p is uniform on [0,1], then p2 is not uniform and vice versa. There’s no way out. Then again, Bayesian inference is all about the posterior distribution. You never assume a uniform prior and then treat it as the truth. Gotta have data.


32

Prof_P.Olofsson

12/08/2009

12:18 pm

Post [30], should be “p squared.” Trying again: p2


33

Prof_P.Olofsson

12/08/2009

12:18 pm

Nope. :(


34

Heinrich

12/08/2009

12:24 pm

I understood!

You’re right, of course. There’s some work (which you may know better than me) on extending the Jeffrey’s priors to multivariate distributions, but the last I heard there was still work to be done.


35

R0b

12/08/2009

1:23 pm

Prof Olofsson and Heinrich, in fairness to Drs Marks and Dembski, they address the problem of transformations in section III, and they get around it by restricting themselves to finite discrete spaces.

Of course, in doing so, they fall into the trap that they describe as familiarity zones. When the problem definition specifies a certain finite space, it’s providing significant problem-specific information. A baseline random search takes advantage of that information, so it doesn’t actually follow the PrOIR as Marks and Dembski interpret it.

Marks and Dembski say, “The ‘no prior knowledge’ cited in Bernoulli’s PrOIR is all or nothing: we have prior knowledge about the search or we don’t.” But something is always known about the problem, unless the problem is completely undefined.


36

Prof_P.Olofsson

12/08/2009

1:37 pm

ROb[34],

Yes, in [30] I was not addressing the paper but made a general comment on [28].

They still have the same problem as in the juvenile delinquency example (where they claim a uniform distribution is not warranted). If you have a uniform distribution over the deck, you don’t get a uniform distribution over the dichotomy ace/no ace and vice versa.

Anyway, my main point is that Bernoulli’s PrOIR is never used in applications. We know more now than we did 300 years ago.


37

JT

12/08/2009

2:05 pm

This concerns the following objection to PrOIR from Keynes that Dembski mentions:

“Consider the specific volume of a given substance. Let us suppose that we know the specific volume to lie between 1 and 3, but that we have no information as to whereabouts in this interval its exact value is to be found. The Principle of Indifference [Bernoulli’s PrOIR] would allow us to assume that it is as likely to lie between 1 and 2 as between 2 and 3; for there is no reason for supposing that it lies in one interval rather than in the other. The specific density is the reciprocal of the specific volume, so that if the later is v the former if 1/v . Our data remaining as before, we know that the specific density must lie between 1 and 1/3 , and, by the use of the Principle … as before, that it is as likely to lie between 1 and 2/3 as between 2/3 and 1/3 .”

There seems to be a blunder here because the observation should have been that the specific density is as likely to be between 1 and 1/2 as it is to be between 1/2 and 1/3 (so, 1,1/2,1/3 is the sequence; not 1,2/3,1/3 as erroneously stated in the above.)

But also, such a probability distribution would be contingent on prior knowledge about the distribution of its reciprocal. Without that knowledge, presumably we could conclude that it was as likely for the specific density to be between 1 and 1/6 as between 1/6 and 1/3. So the relevant issue above is not infinite alternatives as subsequently speculated:

Bertrand [3] was “so much impressed by the contradictions of geometrical probability that he wishes to exclude all examples in which the number of alternatives is infinite” [44].

Even with a deck of cards you could say the sequence number of the card selected at random is as likely to be between 1 and 26 as between 26 and 52, and therefore the reciprocal of the sequence number of the selected card is as likely to be between 1 and 1/26 as between 1/26 and 1/52. (Maybe the problem lies with values less than 1, not infinite alternatives.)


38

JT

12/08/2009

2:23 pm

just forget that – I misread what Keynes was saying.


39

Prof_P.Olofsson

12/08/2009

2:32 pm

JT[36],
There is no blunder. The example illustrates the difference in applying the PrOIR to the volumes and to the densities. If you apply it to the volume, it does not apply to the specific density (as you show) and if you apply it to the density it does not apply to the volume. The point is that both volume and density cannot be uniform on their respective ranges (assuming a continuous scale).

There is no paradox in your example with the deck of cards since “between 1 and 1/26″ means one of the 26 numbers 1, 1/2, 1/3,…,1/26. For finite spaces, the uniform distribution is preserved under transformation.


40

Mung

12/08/2009

2:40 pm

Allen_McNeill said: “On the contrary, many (perhaps the majority) of the characteristics of living organisms are not adaptive.”

What does that mean exactly?

This mean that even if we find some feature which is non-adaptive, we can still claim that it was the result of evolution, by calling it a spandrel. (An unintended by-product?)

For example, if we cannot explain the mind’s capacity for abstract mathematics in terms of adaptation, we can claim that the capacity for abstract mathematics is a by-product of some other feature that was adaptive. Isn’t the science of evolutionary theory wonderful?


41

R0b

12/08/2009

3:01 pm

Prof Olofsson @ 35, I agree with all of your points. To their credit, the Drs acknowledge some of the problems with the PrOIR, which they didn’t do in previous papers. Unfortunately, they do very little to mitigate the problems, leaving us to question whether their PrOIR-based measures are non-arbitrary and useful.


42

Prof_P.Olofsson

12/08/2009

3:09 pm

RoB[40].
Yes, I agree we should give credit where it is due. I am sure D&M have more papers in the making, and maybe these problems will be addressed.


43

Mystic

12/08/2009

3:38 pm

Prof. Olofsson,

Congratulation Bill! So you’re a Bayesian now!

No, he’s just not a frequentist. ;) If he were a Bayesian, he’d drag himself out of the PrOIR (yech!), and speak of exchangeability.

if we flip a fair coin repeatedly, the outcome of the 11th flip is independent of the first 10 flips

The “some to many” function g in Appendix A is just the converse of a function h — reverse the arrows in Fig. 1. Drawing h uniformly and selecting x in Omega-hat to obtain solution omega = h(x) is equivalent to drawing omega uniformly from Omega.

Why do things have to be so darned simple!?


44

Prof_P.Olofsson

12/08/2009

5:08 pm

Mystic[43],
I’ll leave it to Bill to state what he is or is not.

What does my quote “if we flip” have to do with your subsequent comment?


45

Innerbling

12/08/2009

5:11 pm

P.Olofsson commented in 26:

As for the math, frankly, the PrOIR is never used in probability. Consider D&M’s Formula (1). On the one hand they claim that it assumes the PrOIR due to “no prior knowledge.” But before Formula (1) they state that the deck is well shuffled which is a lot of prior knowledge and precisely the prior knowledge that warrants the uniform distribution. One must not confuse prior knowledge of the distribution of cards (which we do have) with prior knowledge of the location of the ace of spades (which we don’t have). As an example of the former, if I tell you I have a deck of card in my hand, you have no idea of the distribution of cards. It could be new and perfectly ordered, it could be well shuffled, it could be manipulated by me, etc. According to the PrOIR you must assume that the deck is well shuffled which clearly makes no sense.

If Prof olofsson holds a deck of cards and I have no prior knowledge about the distribution any one guess would be the best one I can make so from my perspective the deck would be well shuffled no matter how it is actually ordered.
In fact it could be said that from perspective of guessing something about a data random is the natural state of ANY data if there is no prior knowledge about it or how it was generated. So from this perspective I disagree with Prof.Olofsson and claim that well shuffled means no prior knowledge.


46

JT

12/08/2009

5:45 pm

39-

It seems that probability is not defined for an infinite range of values as if one value occurs then an event with probability 0 has occurred.


47

Prof_P.Olofsson

12/08/2009

5:56 pm

JT[45],
There are different levels of “infinite.” On a countable set, probability is defined in the same way as on a finite set, by point probabilities. No problems there. For example, the numbers exp(-1)/n! give a probability distribution on the infinite countable set 0,1,2,…

On an uncountable set, such as the interval [0,1], probability is defined through probability density functions which are integrated in order to find probabilities. As you point out, this means that each individual point has probability 0.


48

JT

12/08/2009

8:19 pm

Rob [35]:

Of course, in doing so, they fall into the trap that they describe as familiarity zones. When the problem definition specifies a certain finite space, it’s providing significant problem-specific information. A baseline random search takes advantage of that information, so it doesn’t actually follow the PrOIR as Marks and Dembski interpret it.
Marks and Dembski say, “The ‘no prior knowledge’ cited in Bernoulli’s PrOIR is all or nothing: we have prior knowledge about the search or we don’t.” But something is always known about the problem, unless the problem is completely undefined.

It seems you could have knowledge to specify a finite space without it dictating a non-uniform distribution of that space. There is an implicit understanding of the universe in biology that allows for the conception of random mutations being a large part of the explanation for life. That might as well be the implicit framework for randomness assumed in the Dembski/Marks paper – the one already generally accepted. And as far as the causes of mutation, I would have thought those were assumed to be uniformly distributed. Of course the natural selection component is not being considered here. But if that is is just attributes of the physical environment then that should be ultimately uniform as well. Except supposedly the fact that the background radiation isn’t smooth I guess means it wasn’t uniform or something.

And I just wanted to point out that Prof Ollofson seems to be rejecting PrOIR outright:

According to the PrOIR you must assume that the deck is well shuffled which clearly makes no sense.


49

JT

12/08/2009

8:41 pm

If the finite space we’re talking about is non-uniform (the one leading to life, say), then there has to be a potential version of it that could be legitmately called uniform. That is the relevant one I think.


50

Zachriel

12/08/2009

9:53 pm

JT: And as far as the causes of mutation, I would have thought those were assumed to be uniformly distributed.

Though most studies indicate that genetic mutations are random with respect to fitness, the distribution of mutations is far from random. Not only are certain base substitutions more common than others, but some stretches of genome are more likely to experience mutations. However, for the purposes of a general discussion of evolutionary algorithms, purely random mutations can be assumed.

JT: Of course the natural selection component is not being considered here. But if that is is just attributes of the physical environment then that should be ultimately uniform as well.

The environment is hardly random. Energy and matter are not distributed uniformly through space. If you find a mote of water, there is a better chance there is another close by, and another, and another. Look at the ocean! Look at the sky! Because matter tends to clump, we have volumes with surfaces and objects with interactions. And then there’s chemistry…

The universe is structured, at the very least, by *proximity*. In terms of evolutionary search, there are hills to climb.


51

Zachriel

12/08/2009

9:53 pm

JT: And as far as the causes of mutation, I would have thought those were assumed to be uniformly distributed.

Though most studies indicate that genetic mutations are random with respect to fitness, the distribution of mutations is far from random. Not only are certain base substitutions more common than others, but some stretches of genome are more likely to experience mutations. However, for the purposes of a general discussion of evolutionary algorithms, purely random mutations can be assumed.

JT: Of course the natural selection component is not being considered here. But if that is is just attributes of the physical environment then that should be ultimately uniform as well.

The environment is hardly random. Energy and matter are not distributed uniformly through space. If you find a mote of water, there is a better chance there is another close by, and another, and another. Look at the ocean! Look at the sky! Because matter tends to clump, we have volumes with surfaces and objects with interactions. And then there’s chemistry…

The universe is structured, at the very least, by *proximity*. In terms of evolutionary search, there are hills to climb.


52

bandsci

12/08/2009

9:56 pm

Is this implying that the references are made references for the purpose of the search; or the references fit the search and therefor and specifically for the search?


53

bandsci

12/08/2009

9:57 pm

“References to “geographical structure[s],” “link structure[s],” search space “clustering,” and smooth surfaces conducive to “hill climbing” reinforce rather that refute the quasi-teleological conclusion that the success of evolutionary search depends solely on active information from prior knowledge [32].”

Is this implying that the references are made references for the purpose of the search; or the references fit the search and therefor and specifically for the search?


54

Zachriel

12/08/2009

10:02 pm

Of course, I may have missed your point entirely.

JT: And I just wanted to point out that Prof Ollofson seems to be rejecting PrOIR outright:

According to the PrOIR you must assume that the deck is well shuffled which clearly makes no sense.

Such an assumption is useful only as a placeholder for testing. We might tentatively assume the deck is shuffled, then pull a few cards to determine whether our original hypothesis was plausible. But to *presume* the deck is shuffled is a sure way to lose your shirt.


55

Mystic

12/08/2009

10:53 pm

Prof. Olofsson:

Bayesian is as Bayesian does. I was joking about the authors’ assiduous avoidance of the term “Bayesian” in the body of the paper, as well as their failure to refer to “reference” [61], Data Analysis: A Bayesian Tutorial. (By the way, it seems that their self-reference [14] was scheduled to appear in the November 2008 issue of a journal that last released an issue in January 2008.)

You wrote of tossing a coin to determine membership of a solution in the target. I had thought of rolling a fair die to determine the value of h(x), i.e., the assignment of a solution to description x. A “success” is h(x) in T. (Flipping a p-biased coin to determine success would be more direct.) The number of successes for M assignments is binomially distributed with mean pM.

It bears mention that knowledge of the number of successes for a particular representation h is not exploitable in search in the space Omega-hat.


56

GradStudent

12/09/2009

2:13 am

Some questions for the more informed on this thread:

1. Is the idea of PrOIR essentially the idea of maximum entropy? Maxent modeling has been shown to be very useful in many contexts.

2. “Active Information” — is this analogous to the well-known idea of “Information Gain”?

3. Question to Prof. Olofsson: Why should it matter that we assume a shuffled deck of cards? Isn’t the result the same if we assume nothing about the “shuffleness” of the deck but just leave it hidden?

4. Comment to Allen MacNeill: Regarding your comment on the emergence of novelty just by neutral theories as opposed to some biological optimization strategy — if one is not optimizing it seems one would just be adding unnecessary noise (variations) which wouldn’t help at all other than to diversify the set of biological “particles”. One could think of a strategy to interleave optimization and neutral drifting, but this is simply just another search strategy. So in short, how does optimization+drift escape the conclusions of Dembski’s paper?


57

Mark Frank

12/09/2009

4:28 am

#41 [R0b]

To their credit, the Drs acknowledge some of the problems with the PrOIR, which they didn’t do in previous papers. Unfortunately, they do very little to mitigate the problems, leaving us to question whether their PrOIR-based measures are non-arbitrary and useful.

On reflection I don’t see that this paper does anything to mitigate the problems. All that equation 4 shows is that if you transform a discrete data space and apply PrOIR to the new data space you are equally likely to increase or decrease the probability of hitting a target. This doesn’t change the fact that some transformations will increase the probability and others will decrease it. So there is no objective basis for PrOIR.


58

Prof_P.Olofsson

12/09/2009

12:02 pm

GradStudent[56],
3. The term “well shuffled” means that the deck has been shuffled enough so that each card is equally likely to be in any position. In other words, “well shuffled” is the colloquial term for “uniform distribution.” If you shuffle well and draw the top card, shuffle well and draw the top card, and so on, in the long run you will get the ace of spades the fraction 1/52 of draws. Thus, in a well shuffled deck, you can say that the probability is 1/52 that the top card is the ace of spades. This is the frequentist interpretation of probability. If you don’t know anything about the deck, what basis is there to assign a probability to the top card being the ace of spades?


59

Prof_P.Olofsson

12/09/2009

12:04 pm

GradStudent[56],

1. Yes, they are the same. A uniform distribution on a finite set has maximum entropy over all distributions.


60

Prof_P.Olofsson

12/09/2009

12:09 pm

Mystic[55],

You wrote of tossing a coin to determine membership of a solution in the target

No, I wrote about the original NFL theorem, see my comment [28].


61

Prof_P.Olofsson

12/09/2009

12:16 pm

Mark Frank[57],

So there is no objective basis for PrOIR.

Indeed. And that’s why it is never used in practice. Anytime a uniform distribution is assumed, it is because we have prior information, not because we don’t. Coin flips, die rolls, and lottery drawings, warrant a uniform distribution by the laws of physics or by examination of data, not by referring to a 300-year-old “principle.”

In fact, I’d challenge anybody to show me an application where the PrOIR is used.


62

GradStudent

12/09/2009

12:24 pm

Hi Prof Olofsson,

Thanks for the response. I guess we do know that all 52 cards are in the deck (if I am not mistaken in this example); we just don’t know whether the deck has been well-shuffled or not. In that case, we can treat it as hidden and have a discrete distribution over the possible card permutations and then just put a relatively flat prior on that. And then a relatively flat prior on top of that prior, etc.

I guess at some point in the hierarchy one should insert something like the PrOIR assumption. But if the hierarchy has many different levels, then it seems the influence of even a very informative prior at the top would be swamped by the addition of variance at each level, and once it gets down to the lower levels (e.g. the distribution over card permutations), it would look a lot like PrOIR.

Thus it seems the idea of PrOIR is pretty reasonable. Any flaws with my thinking?


63

Prof_P.Olofsson

12/09/2009

12:43 pm

GradStudent[62],
No flaws, but when you talk about priors you do so in order to get to a posterior, after data collection. I have nothing against using a flat prior in a Bayeisan anlysis, but you can’t draw any conclusions based on that prior.

If D & M intended to do a Bayesian analysis, I’d have no problems with using PrOIR to get a prior (hey, a proir prior, that’s cool!).


64

Uvula Presley

12/09/2009

1:13 pm

61: “In fact, I’d challenge anybody to show me an application where the PrOIR is used.”

Burg’s algorithm


65

Uvula Presley

12/09/2009

1:20 pm

61: “In fact, I’d challenge anybody to show me an application where the PrOIR is used.”

Wolpert & Macready’s No Free Lunch Theorem


66

Uvula Presley

12/09/2009

1:21 pm

61: “In fact, I’d challenge anybody to show me an application where the PrOIR is used.”

The Drake Equation


67

GradStudent

12/09/2009

1:22 pm

Prof Olofsson,

I totally agree that it is non-controversial to use a reasonably flat prior to get a posterior because hopefully with enough data the likelihood would dominate and the effect of the prior would be negligible.

But it seems there is still an intuitive argument for PrOIR even without any data. One argument is the hierarchy idea which I outlined above; one can just push the uncertainty all the way back up the hierarchy (since one is maximally uncertain about all those hidden variables anyways), and as the # of levels goes to infinity, it is highly probable that the lowest prior on this hierarchy would come close to looking like a PrOIR (even before looking at any data).

Another argument for PrOIR is it seems to be centered between all other possible priors. If I told you that I had a prior p over a discrete space in mind, and if I demanded that you provide a guess q of what that prior was (and you would pay me distance[p || q] for your error), it seems wise to choose a uniform p.


68

Uvula Presley

12/09/2009

1:25 pm

61: “In fact, I’d challenge anybody to show me an application where the PrOIR is used.”

Polling surveys where there is no prior information.

Want more?


69

R0b

12/09/2009

1:35 pm

JT @ 48:

It seems you could have knowledge to specify a finite space without it dictating a non-uniform distribution of that space.

My point was that even with a uniform distribution, there is still problem-specific information.

Take, for instance, Dawkins’ WEASEL algorithm, in which the search space consists of all sequences of 28 English uppercase letters. The baseline search is uniformly random, i.e. it has no information to guide it to a particular sequence of 28 English uppercase letters. But the baseline search does know that the target has 28 English uppercase letters. Without that knowledge, the baseline search would take much longer.


70

Prof_P.Olofsson

12/09/2009

2:14 pm

UvulaPresley [65,66,68],
The (original) NFL Theorem does not rely on the PrOIR. It assumes a uniform distribution. In any application, that assumption needs to be verified.

I don’t know anything about the Drake equation but from reading online, it seems that it is based on several estimated parameters, thus not relying on the PrOIR.

As for the “Polling surveys where there is no prior information” I don’t know what you mean.

My challenge was regarding an application of the PrOIR.


71

Prof_P.Olofsson

12/09/2009

2:31 pm

GradStudent[67],

In your hierachy, aren’t you just applying the PrOIR repeatedly as you go upward?
As for the “guess my prior,” there is no way I can compute and minimize my expected loss without having a distribution for your prior q.

These are nice philosophical discussions and obviously you know what you’re talking about. I’m still highly skeptical when it comes to applications though.


72

Uvula Presley

12/09/2009

2:34 pm

“The (original) NFL Theorem does not rely on the PrOIR. It assumes a uniform distribution.”

This is the same thing. Would you agree if I said the NFL “assumes” the PrOIR? Are we nicking at pics?

It sure seems like the Drake equation is applying (or assuming) the PrOIR.

“Polls” as in Obama’s approval rating.

http://www.rasmussenreports.co.....cking_poll

And Burg’s algorithm remains solid.


73

Uvula Presley

12/09/2009

2:38 pm

Wow. Blogging like this is for people with lots of time.

Lets go do something useful. Bake a cake. Shop. Do our nails. Make up Richard Dawkins jokes. Read more Dembski books.


74

Mark Frank

12/09/2009

2:58 pm

#73

Lets go do something useful. Bake a cake. Shop. Do our nails. Make up Richard Dawkins jokes. Read more Dembski books.

Is this in decreasing order of usefulness?


75

GradStudent

12/09/2009

3:00 pm

Hi Prof P.Olofsson,

Well, thanks for the discussion. It seems this thread has been booted from the front page and so maybe we should wrap it up.

Regarding applying the PrOIR repeatedly at each level, actually, I don’t think that is necessary to assume PrOIR at each level if one has a lot of levels. It seems that all you need is a prior that has positive probability over the whole space. But the key idea is that each prior adds a little more variance, and with many levels, that variance could drown out everything else and thus make the lowest prior look uniform.

Regarding the “guess my prior” game, if you were absolutely *forced* to make a choice based on no knowledge, my guess is that you would guess uniform rather than placing high probability on a subset of the space.

Of course I just made up these arguments offhand and there could be serious flaws in them.

Regarding applications, there is a lot of work in maxent modeling as I suggested earlier. They are assuming maximum entropy and seem to be getting great results:

http://homepages.inf.ed.ac.uk/lzhang10/maxent.html

Is this what you mean by application of PrOIR?


76

Prof_P.Olofsson

12/09/2009

3:55 pm

UP[72].
No it’s not the same thing and we’re not nitpicking. The NFLT says that if the distribution is uniform, then certain conclusions hold. If you can verify uniformity, the NFLT applies. It’s got nothing to do with the PrOIR, just like your other examples don’t. Opinion polls obviously has nothing at all to do with PrOIR as they are based on collecting data from which one gets information.

You are right, I’d rather be painting my nails!


77

Prof_P.Olofsson

12/09/2009

4:02 pm

GradStudent[75],
Sorry, that’s just way too much for me to read right now. If they assume maximum entropy based on PrOIR alone, and draw conclusions from that assumption alone, without analyzing any data or invoking any other information, sure, it’s an application. I doubt that’s what is done though.

I agree, we should probably quit. Let me just finish like I started and quote myself:

Consider D&M’s Formula (1). On the one hand they claim that it assumes the PrOIR due to “no prior knowledge.” But before Formula (1) they state that the deck is well shuffled which is a lot of prior knowledge and precisely the prior knowledge that warrants the uniform distribution. One must not confuse prior knowledge of the distribution of cards (which we do have) with prior knowledge of the location of the ace of spades (which we don’t have).


78

R0b

12/09/2009

6:46 pm

I’ll make one more comment on the paper, regarding Appendix B: The LCI actually follows immediately from the second sentence of the appendix. That sentence says that blindly finding the target has the same probability as finding the target with a blindly selected algorithm. From this it follows that blindly finding the target is at least as probable as finding the target with a blindly selected algorithm AND the selected algorithm being as good as it is. That’s the LCI.


79

Mystic

12/10/2009

3:16 am

The argument of Appendix B is invalid. For a fixed representation of randomized algorithms (e.g., as probabilistic Turing machines), the space of search algorithms, Omega_2, is countably infinite, not finite.

Consider a search algorithm that obtains i.i.d. uniform bits from a random source, and simulates a toss of a biased coin (probability of heads is p) to decide which of two deterministic search algorithms to run on a search problem. There are at least as many randomized algorithms for simulating the coin toss as there are rational probabilities p = n / d, and the set of rational probabilities is countably infinite. Note also that the algorithms must obtain |d| random bits in the worst case. Thus there is no upper bound on running time for algorithms of this form.

Vanishingly few random search processes have exact implementations as randomized search algorithms. And many randomized search algorithms require impractical time and space. When we observe a success for a search algorithm, the algorithm is much more likely to be small and fast than big and slow. It is very important to consider this observational bias of ours.


80

Mystic

12/10/2009

3:49 am

My first application of evolutionary programming was to optimization of the connection weights of recurrent neural nets. Now, I could have exploited the mathematical analysis that yielded partial derivatives of the objective function for the weights. But I knew also that the computation of the partial derivatives was very slow, and I had to believe that “quick and dirty” EP would go faster, by the clock on the wall, than “neat and clean” gradient descent. And I was right.

Dembski and Marks will say that the gradient descent algorithm did better than EP because it required fewer trials than EP to obtain an acceptable solution. But I say that EP outperformed gradient descent because it found an acceptable solution with much less computational work than gradient descent did. The “information gain” in knowing the gradient was not worth what it cost.


81

Zachriel

12/10/2009

7:44 am

This may have been lost above due to the long moderation delay. (Why are Zachriel’s comments being moderated?)

Dembski & Marks: Prior knowledge about the smoothness of a search landscape required for gradient based hill-climbing, is not only common but is also vital to the success of some search optimizations. Such procedures, however, are of little use when searching to find a sequence of, say, 7 letters from a 26-letter alphabet to form a word that will pass successfully through a spell checker …

Does that mean an evolutionary algorithm can’t navigate the wordscape of the dictionary from shorter to longer words because there are no hills to climb?


82

Prof_P.Olofsson

12/11/2009

10:32 am

Mystic[79],
Good point. It’s even an uncountable space if you choose probabilities in [0,1]. This error has also been pointed out by Tom English.


83

DiEb

12/11/2009

7:43 pm

Dear Dr. Dembski,

you didn’t define what you understand by a these some-to-many mapping, and it’s not a common term. Could you do so, please? For me, they seem to be just reverse images under a mapping from Ω’ to Ω….


Post a Response

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>