Eikosogram (Java GUI Application)

This file contains java code to build interactive eikosograms. The code is immediately executable after downloading and contains a number of built-in examples to illustrate various probabilistic concepts with simple eikosograms. In addition to the self-contained examples, any file of cross-classified data (suitably formatted) can be read in and its dependence structure displayed using eikosograms. Independence structure can be introduced visually by clicking on boxes which remove the barriers between categories, the so-called "water container" metaphor.

The author of this code is Glenn Lee under the supervision of R.W. Oldford .

Interactive eikosograms

Eikosograms (from the Greek: eikos for chance and gramma for writing/drawing) are a simple graphic which have been around for decades. They are related to Venn diagrams and to Mosaic plots. Much more information can be found in a series of papers by R.W. Oldford on these diagrams.

These diagrams are useful for introducing the axioms and rules of probability including Bayes rule. The Figure at left shows an example of two categorical variables eikosYvX Here Y takes on two values, yes and no, and X takes on two values, black and white. The diagram shows various probabilities associated with these two random variables. In this diagram, it might be helpful to think of Y as being a response, and X as being an explanatory variate.

Areas, widths and heights are meaningful in an eikosogram. The diagram is a 1 by 1 square and so has total area of 1; this represents the total of the joint probabilities of X and Y. The diagram is divided vertically into two parts, of width 1/4 and 3/4 (left and right strips respectively). The area of each strip represents the marginal probability of X taking that value. So, for example, Pr(X=black) = 1/4, as recorded on the top axis of the eikosogram. Note that because the height of the diagram is 1, the marginal probabilities associated with X can be thought of as simply the width of each strip.

Joint probabilities are represented by the areas of the corresponding rectangles. For example, in the diagram, the area of the green rectangle on the bottom left equals the joint probability Pr(Y=yes & X = black). Similarly, the white rectangle at the upper right has area equal to the joint probability Pr(Y=no & X= white).

The rectangles have height equal to the corresponding conditional probability of Y given X. For example, the height of the bottom left (green) rectangle is Pr(Y=yes | X=black) and equals 2/3, as shown on the axis at the right (what is actually shown on the right is the height of each horizontal line in the diagram; there is a choice between showing the actual values or a simple axis),

Because the area of the rectangle is its height x width, the probabilistic relation that

Pr(Y=yes & X=black) = Pr(Y=yes|X=black) x Pr(X=black)
follows immediately from the eikosogram. The corresponding relationship holds for all other rectangles in the eikosogram.

Alternatively, the ratio of the area of the bottom left (green) rectangle to the area of the entire leftmost strip is also the conditional probabilty (and is clearly the height of the green rectangle). Expressed this way we have:

Pr(Y=yes | X = black) = Pr(Y= yes & X = black) / Pr (X=black)
and, along the way, the marginalizing relationship;
Pr(X=black) = Pr(Y= yes & X = black) + Pr(Y= no & X = black)
from the larger rectangle's area being the sum of its parts.

The eikosogram is clearly asymmetric in its relationship between Y and X. The variable appearing on the vertical axis has its probabilistic structure expressed in terms of joint and conditional probabilities, while the variable on the horizontal axis has its structure expressed in terms of joint and marginal probabilities. Interchanging the role of the variables will yield a different eikosogram.

The Figure at left eikosXvY shows an eikosogram with the roles of X and Y interchanged. As before, the area of each rectangle equals a joint probability and its value is unchanged from the previous diagram, though its dimensions will have changed (typically). Note that the colouring follows the layout of the rectangles (bottom-most having the same dark colour, etc.) and not the values of the random variables.

Now, the width of the strips correspond to the marginal probabilities of Y, and the heights of the rectangles the conditional probabilities of X given Y. All probabilistic rules follow as before with the roles of Y and X interchanged.

Note that because the probabilities have been preserved via the areas of the rectangles, it must be the case, for example, that the area of the bottom left rectangle is identical in both diagrams (since both have Y=yes & X= black); all that has changed are its dimensions. This is because both areas equal the Pr(Y=yes & X= black). Matching corresponding height x width formulas for this area gives

Pr(Y=yes | X = black) * Pr (X=black) = Pr(X=black | Y= yes) * Pr(Y=yes)
or Bayes Theorem. The corresponding Bayes relations follow for all other rectangles, a simple consequence of common area. Both diagrams are automatically produced by the software.

The software has many features. For binary Y especially, having the heights of the horizontal segments appear within the eikosogram might be preferred on occasion. The Figure below shows this case.

The software also provides some limited interaction with the eikosograms, Principally, this is the through the small squares atop each division between values of the explanatory variate(s). Clicking on the square "removes" the barrier. In terms of probability, this has the effect of removing the distinction between the two neighbouring values of the explanatory variate. If the distinction is removed, the corresponding probabilities must be allocated without distinction as well. Clicking on the box a second time reintroduces the barrier.

The following quicktime movie illustrates the interaction.

Consider only the left eikosogram (Y vs. X) first. Removing the barrier means removing the distinction between X=black and X=white. Consequently (ignoring the vertical line and the bottom axis for the moment), the height of the green bar is now the (marginal) probability Pr(Y=yes).

A simple metaphor might help. Imagine that the eikosogram is actually a container, originally split into two compartments according to whether X=black or X=white. In each compartment is an amount of green liquid whose height in each compartment is determined by the corresponding conditional probability. Removing the distinction between X=black and X=white corresponds to removing the barrier between the two compartments. With the barrier removed, the green liquid must settle to a new level; the total amount of liquid (or probability) remains unchanged. The new level is the marginal Pr(Y=yes).

Pause the movie just after the box has first been clicked and the eikosogram is flat. Imagine that the barrier has been perforated rather than entirely removed, so that the vertical line remains (though more faintly) and we can again talk about the explanatory variate taking different values. The horizontal level of green is now identical when X=black and when X=white. That is, when the eikosogram looks like this, we have

Pr(Y=yes | X = black) = Pr(Y=yes | X =white) and similarly for Y=no. It does not matter which value X takes, the probability that Y takes its various values does not depend on the value of X. Flatness in an eikosogram means probabilistic independence, in this case complete independence between Y and X. Other kinds of independence including a variety of conditional independencies are easily seen in an eikosogram (examples come with the software).

Given that flatness corresponds to independence, the interaction with the eikosogram through perforation of barriers can also be interpreted as the deliberate imposition of some (usually conditional) independence. Doing this will have consequences. The effect on other relations of asserting this independence is reflected in changes to the right most (X vs Y) eikosogram. As can be seen in the movie, asserting flatness in the left eikosogram forces flatness in the right.

In making this calculation, the water container (i.e. green liquid) metaphor was applied. So when a barrier is perforated the location of the barrier is preserved in the diagram where the perforation took place. In probabilistic terms, the marginal probabilities of the explanatory variate(s) in that diagram are preserved. Adjustments might be made to marginal probabilities in other diagrams.

Important note: Every time a barrier is perforated, it is done so with respect to the probabilities from the current state of the eikosograms. If all interaction is with respect to a single eikosogram, the effects will be unique. However, if actions are taken back and forth between different eikosograms of the data, the results will depend on the order of the actions taken. Again, all calculations are based on the water container metaphor with respect to the particular eikosogram and using the most recently determined values of all probabilities.

To top

Downloading the code.

The code is available as an executable (platform independent) jar file called Eikosogram.jar.

Download this file into a directory where the user has write permission (the examples in the program will write data out to files (with names like data34.txt etc.) in that directory. Once downloaded, the java application can be run in the usual fashion.
Important Note: The java source code is also available in case someone would like to make improvements. Further information on the source can be had here,
To top

Running the application

Having the Eikosogram.jar file in hand either:

Simply launch by double clicking on Eikosogram.jar, or
At a command line, type:

java -jar Eikosogram.jar

The user can access this web page directly from the applet through its "Help" menu.

The applet uses "eikoso.jpg" as its "icon". At present this appears only in response to the "About Eikosogram ..." on this applet's "Help" menu. However, it seems that in some instances it might be necessary to extract the .jpg file from the .jar file for this to appear. This is done by typing:

jar xf Eikosogram.jar eikoso.jpg
at a command line.

To top

Input file format.

In addition to the examples which the code contains, it is possible to input data from a file.

The input file contains the names and values of each categorical variable followed by the values (integer counts) of each cross-classified cell in order of the variables. The index of the first variable changes most slowly, that of the last variable most quickly.

Important Note: the delimiter between items on the same line is a comma; no blanks can appear between counts.

Some examples follow.

1. A two by two table. The file contains a comment line (optional, could be more than one) and 4 lines of data:

# This is a comment line (begins with #)
Y,female,male
X,yes,no
40,30
20,90

Here,

40 corresponds to Y=female, X=yes;
30 corresponds to Y=female, X=no;
20 corresponds to Y=male, X=yes
and 90 corresponds to Y=male, X=no;

The corresponding table would be:

X
yes	no

female

male

40	30
20	90

2. Example 2. A two by two by two table. The file contains:

;;; This is also a comment line (begins with ;)
Y,yes,no
X,yes,no
Z,yes,no
210,196,42,112
105,245,252,98

Here,

210 corresponds to Y=yes, X=yes, Z=yes
196 corresponds to Y=yes, X=yes, Z=no;
42 corresponds to Y=yes, X=no, Z=yes;
and 112 corresponds to Y=yes, X=no, Z=no.
The last line has similar meaning except Y=no.

The corresponding three way table would be:

Y=yes	Z
	yes	no

X	yes	210	196
X	no	42	112

Y=no	Z
	yes	no

X	yes	105	245
X	no	252	98

To top

Two more simple but multi-way examples.
- A two by two by two example 2x2x2
- A two by two by two by three example 2x2x2x3
Vietnam draft lotteries. Source: US Selective Services
- Annotated version of the 1969 draw for the 1970 US draft. This file contains comments that are applicable to all files below.
- Unannotated versions of all Vietnam era lotteries:
  - The Dec 1, 1969 draw for the 1970 US draft. Largest lottery number called up was 195.
  - The July 1, 1970 draw for the 1971 US draft. Largest lottery number called up was 125.
  - The Aug 5, 1971 draw for the 1972 US draft. Largest lottery number called up was 88.
  - The Feb 2, 1972 draw for the 1973 US draft. No one called up that year.

In general, the input file will look like:

# Comments must appear on a line beginning either with the sharp sign #
# as in the statistical programming lanuage S
;;; or with a semi-colon (one will do) as in the ANSI standard
;;; programming language Common Lisp and in the statistical
;;; system Quail.
### Either will do.
variable_name_1, value1,value2,...,valueN1
variable_name_2, value1,value2,...,valueN2
.
.
.
variable_name_M, value1,value2,...,valueNm
data_1_1,data_1_2,...data_1_N
.
.
.
data_k_1,data_k_2,...data_k_N

where k=N1 and N = N2*...*Nm (i.e. the first index/variable changes most slowly).

Note again that there are no spaces between any of the data items.

To top