NCGR/Berkeley Double-Blind Test

Under the heading 'A double-blind test of astrology', the 5 December 1985 issue of the respected scientific journal NATURE carried a detailed seven-page report of tests conducted by members of the University of California, in cooperation with the national Council for Geocosmic Research (NCGR). The period of the tests is not specified, but I assume it is in the 1981-2 period. The paper by Shawn Carlson (University of California Department of Physics) was received by NATURE 11 April 1983, and accepted by them on 14 Oct 1985.

The tests are taken as giving no evidence for the working of astrology. Carlson states: "we are now in a position to argue a surprisingly strong case against natal astrology as practised by reputable astrologers". Newspapers, serious and popular, picked up the story immediately, with a common theme of 'astrologers prove themselves wrong'. Because these results were given some importance by enemies of astrology, they merit our attention. The main elements of the experiment are here abstracted from the report, with Carlson's conclusions.

"the positions of the 'planets' (all planets, the Sun and Moon, plus other objects defined by astrologers) at the moment of birth can be used to determine the subject's general personality traits and tendencies in temperament and behavior, and to indicate the major issues which the subject is likely to encounter".

This formulation was accepted by astrological advisors as central to natal astrology, and it would at the same time be capable of scientific testing. Both in test design and in the selection of participating astrologers, the experimenters closely consulted NCGR, because of its record of research interest, and because of its esteem world-wide in the astrological community. "Care was taken to include all suggestions by the astrologers provided they could be followed without biasing the experiment for or against the astrological thesis". Care was also taken to eliminate any possibility of hidden clues not based purely on astrological information. This involved the use of a rigorous 'double-blind' where even the experimenters had no access to any information that might produce bias, and which could be communicated subconsciously or otherwise to subjects or astrologers. One part of this process was the random assigning of code-numbers to the subjects, with the master-list kept, unseen by any participant during the test, under the supervision of an uninvolved third party.

Volunteer subjects came from amongst Berkeley students and the general public in San Francisco Bay area. Subjects were selected only where birth time was reliably documented to give an expected accuracy within 15 minutes. Subjects who from both preliminary and follow-up questionnaires indicated 'strong disbelief' in astrology were eliminated, in case their selection of any astrological interpretation made for them would be biased. Subjects who had previously had a horoscope reading prepared for them were also eliminated "because they might be able to select (or reject) the correct interpretation based on a knowledge of what to expect". The original batch chosen was 256 (128 subjects plus 128 control group), although a considerable number of these lost interest and did not bother to complete the whole test. Two room-mates dropped out because they became convinced that astrology was the work of the devil.

The personality of the subjects was assessed by the use of the CALIFORNIA PERSONALITY INVENTORY (CPI). This involves the subject in answering 480 true-false questions, each of which contributes to a ranking on one of 18 personality attribute scales. The subject's score for each scale is compared to the norm for that scale. His deviation from the norm (which differs by sex) is commonly plotted in graph form, which readily conveys his 'CPI Profile'. For illustration the 18 (male) attributes are:

The CPI was chosen over other available tests "because the advising astrologers judged the CPI profile attributes to be closest to those discernible by astrology". Participating astrologers received a booklet explaining the interpretation of CPI attributes.

Participating astrologers were invited to take part after NCGR- nominated advisers declared themselves "satisfied that the experiment was a 'fair test' of astrology", and established their predictions of minimum expected astrological effect. This effect would be expected from any random sample of astrologers approved by NCGR, who "compiled a list of approximately 90 astrologers with some background in psychology who were familiar with the CPI and held in high esteem by their peers". Two others who wanted to join in were vouched for by NCGR. All were invited to participate, 28 accepted. This was regarded as a disappointing response by the experimenters. Of the 28, an unspecified number "simply refused to participate as promised". Some backed out when they realised the time they would be required to give. One astrologer "tried to bargain his services in exchange for free access to our raw data".

The Experiment consisted of two parts. Part I attempted to directly assess the validity of astrological interpretations. Part II tested astrologers' ability to match natal horoscopes and CPI Profiles. No analysis was undertaken until all data from the experiment was gathered; and the methods used in the analysis were established before the start of the experiment.

Part I: Subject selection of own natal interpretation

In Part I, natal charts were computed for the volunteer subjects. The sex of the subject was not stated. These charts were then distributed amongst the astrologers, who prepared (typically four) typed interpretations in a predetermined format. The categories for interpretation were: 1) Personality/Temperament; 2) Relationships; 3) Education; 4) Career/goals; 5) Current situation. A limit was given for the length of each interpretation, and there were some other restrictions: advice and predictions were not permitted (since a subject might reject a correct description because he disagreed with the advice or prediction). Direct references to the chart were not permitted ("you have sun in Leo") and there was to be no indication of the subject's age.

Each volunteer was sent their own natal interpretation plus two belonging to other subjects, without indication of which was their own interpretation. They were asked to choose one of the three as their correct interpretation, and also to make a second choice.

A subject's selection of natal interpretations may be affected by the widely disseminated popular knowledge of Sun-sign attributions. To the extent that the Sun plays a significant role in the full horoscope delineation, subjects might be biased in the direction of accepting the correct interpretation on recognising descriptions appropriate to their Sun-sign "regardless of whether or not the astrological hypothesis is correct". A 'control group' of subjects was therefore created, following recommendations by the advising astrologers. The most important elements in the creation of this control are as follows: control group subjects were matched with members of the test group, so that for each member of the latter there was a control group member of the same Sun-sign. The control group member was born at least three years apart from his or her test group match, to ensure that the natal charts would be sufficiently dissimilar apart from the single common factor of the Sun sign. Within these requirements, the matching was made randomly. Control subjects were (presumably) not aware that they were in the control group.

In the experiment, a test group subject received three natal interpretations. "If the astrological hypothesis is false, members of both groups should identify the test subject's interpretation with equal frequency". If the test group scored significantly higher than the control group, then an astrological phenomenon would have been demonstrated, quite apart from the 'Sun-sign bias'.

Leaving to one side the special problem of Sun-sign bias, which could be checked by reference to the control group as explained above, then the following result can be expected. If there was no effect due to astrological interpretation (i.e. astrological hypothesis not proven), then test subjects should pick their natal interpretation as first choice at the rate of chance, one- third of occasions. The advising astrologers predicted that the correctness of astrological interpretation would allow test subjects to pick out their own interpretation as first choice on at least half of all occasions.

The experimenters decided in advance a level of result which they would regard as 'statistically significant'. This was taken as the level of 2.5 SD (standard deviations from the mean), which would require at least 39 first choices in 83 subjects in the test group.

We see from Table 1 that, when presented with three interpretations, 28 of the 83 test subjects selected their own natal interpretation as first choice. This falls far below the advising astrologers' expectations, and is exactly at the chance level (83 / 3 = 27.67). The result does not confirm the astrological hypothesis.

There has occurred an unexpected fluctuation in the control group results. Control subjects selected their 'matched' test subject's interpretation as their own first choice well above chance expectation (at SD +2.34). Carlson concludes that this cannot be an 'astrological effect' (since control group subjects did not receive their own interpretations). Neither can this be due to Sun sign bias, because the test group did not score at this level. "We thus interpret this as a statistical fluctuation".

Although the astrological hypothesis is not confirmed in Part I, Carlson discusses doubts over the adequacy of Part I as a test of astrology. These doubts arise from results of a subsidiary test (here briefly summarised), involving the CPI Profile.

In addition to selecting their own from three given natal interpretations, subjects were also tested for their ability to select their own from three CPI Profiles given to them. To test for "possible psychological bias" a control group of subjects was created, each member being given three Profiles (one of which was the matched test subject's Profile). This was a different control group to that used in the astrological experiment, as it had to be re-established on male(female) to male(female) matching, since the CPI grades differently by sex.

The results in brief: the control group selected the test subject's CPI Profile in first, second (and third) places at or near the chance rate (as expected). The test group of 56 subjects gave their own CPI Profile as first choice on 25 occasions (chance expectation 56/3 = 18.67). The figures for second and third place choice are, respectively, 16 and 15.

The figure for first choice, although 1.79 SD above mean, is disappointing and would not be 'statistically significant' at the level pre-established in the experiment (2.5 SD). Apparently, this is the first such test in the literature demonstrating subjects' ability to recognise their description from this or similar personality inventories.

Because of this inconclusive result Carlson states that "if subjects cannot recognise accurate descriptions of themselves at a significant level then the experiment would show a null result however well astrology worked".

Part II: Matching of natal horoscope & CPI profile by astrologers

In Part II, the astrologers were sent a previously agreed number of horoscopes. To reduce work, these included horoscopes the astrologer had already interpreted in Part I. With each horoscope were given three CPI Profiles, only one of which was the profile for the native. The other two were randomly chosen. The astrologers were required to select the two profiles, as first and second choice, which best described the personality of the native as assessed from the horoscope. They were also required to rate each CPI on a scale of 1 (low) to 10 (high) as to how well it matched the horoscope.

If there was no effect due to astrological interpretation (i.e. astrological hypothesis not proven), then the astrologers should pick the correct CPI Profile as first choice for the natal chart at the rate of chance, one-third of occasions. The advising astrologers predicted a correct first choice on at least half of all occasions.

We see from Table 2 that the astrologers correctly matched the subject's CPI Profile and natal horoscope on 40 occasions out of 116 (the figures for 'first choice'), consistent with chance. This is not consistent with the astrological hypothesis.

Carlson states that analysis of the 'weighting' by astrologers of the good or bad fit of the CPI Profiles and the horoscope gave "no convincing evidence that the astrologers tended to rate the correct CPI's higher than the incorrect CPI's".

Carlson's overall conclusion

The "somewhat more illuminating" results of Part II allow Shawn Carlson to arrive at the following overall conclusion:

"...astrology was given every reasonable chance to succeed. It failed. Despite the fact that we worked with some of the best astrologers in the country..., despite the fact that astrologers approved the design and predicted 50 percent as the 'minimum' effect... astrology failed to perform at a level better than chance... The experiment clearly refutes the astrological hypothesis."

Notes

reference: NATURE vol 318. 5 December 1985 p419-425. Published by Macmillan Journals, UK.
This summary first appeared in Astrology Winter 1985/6 vol 59.no.4, pp183-8. Minor and non-substantive amendments have been made in the version published here. Appendix 2 of The Moment of Astrology is adapted from 'The Marriage of Scientism and Naive Astrology', the commentary which followed (pp189-93) the original summary.

Soon after the publication of Carlson's results, fundamental criticisms came from a number of authoritative sources, including Hans Eysenck, although in my view none have been as comprehensive as those offered in my discussion in Astrology Quarterly. To date, the criticisms offered in the original Astrology summary and commentary, including the details of the 'hidden' and unpublished standard deviations in Carlson's study, have not to the best of my knowledge been refuted or even questioned.

It should be noted that weaknesses in this material have been debated on several occasions in public settings, including at least two Astrological Association sponsored International Research Conferences (November 1996 and 1998), and are well known to Geoffrey Dean who is credited in NATURE as commenting on the study. Despite these discussions and despite criticism from Hans Eysenck, Geoffrey Dean consistently refused to withdraw his support for Carlson's study (and tacitly for its tendentious and unscientific conclusions) although being asked explicitly and publicly to do so. To the best of my knowledge Dean has not done so to date. This is an important detail because it appears that the results from this Test have been included in his 'metastatistical analysis' of all Vernon-Clark type tests. The inclusion of known faulty data weakens our confidence in Dean's ability to draw reliable conclusions from such an analysis.

This discussion may be augmented in the light of further information.

An extended summary of the NCGR/Berkeley Double-Blind Test of Astrology undertaken by Shawn Carlson and published in 1985

Introduction

Part I: Subject selection of own natal interpretation

Part II: Matching of natal horoscope & CPI profile by astrologers

Carlson's overall conclusion