This file contains java code to build interactive eikosograms. The code is immediately executable after downloading and contains a number of built-in examples to illustrate various probabilistic concepts with simple eikosograms. In addition to the self-contained examples, any file of cross-classified data (suitably formatted) can be read in and its dependence structure displayed using eikosograms. Independence structure can be introduced visually by clicking on boxes which remove the barriers between categories, the so-called "water container" metaphor.
The author of this code is Glenn Lee under the supervision of R.W. Oldford .
Eikosograms (from the Greek: eikos for chance and gramma for writing/drawing) are a simple graphic which have been around for decades. They are related to Venn diagrams and to Mosaic plots. Much more information can be found in a series of papers by R.W. Oldford on these diagrams.
These diagrams are useful for introducing the axioms and rules of probability including Bayes rule. The Figure at left shows an example of two categorical variables Here Y takes on two values, yes and no, and X takes on two values, black and white. The diagram shows various probabilities associated with these two random variables. In this diagram, it might be helpful to think of Y as being a response, and X as being an explanatory variate.
Areas, widths and heights are meaningful in an eikosogram. The diagram is a 1 by 1 square and so has total area of 1; this represents the total of the joint probabilities of X and Y. The diagram is divided vertically into two parts, of width 1/4 and 3/4 (left and right strips respectively). The area of each strip represents the marginal probability of X taking that value. So, for example, Pr(X=black) = 1/4, as recorded on the top axis of the eikosogram. Note that because the height of the diagram is 1, the marginal probabilities associated with X can be thought of as simply the width of each strip.
Joint probabilities are represented by the areas of the corresponding rectangles. For example, in the diagram, the area of the green rectangle on the bottom left equals the joint probability Pr(Y=yes & X = black). Similarly, the white rectangle at the upper right has area equal to the joint probability Pr(Y=no & X= white).
The rectangles have height equal to the corresponding conditional probability of Y given X. For example, the height of the bottom left (green) rectangle is Pr(Y=yes | X=black) and equals 2/3, as shown on the axis at the right (what is actually shown on the right is the height of each horizontal line in the diagram; there is a choice between showing the actual values or a simple axis),
Because the area of the rectangle is its height x width,
the probabilistic relation that
Alternatively, the ratio of the area of the bottom left (green) rectangle to the area
of the entire leftmost strip is also the conditional probabilty (and is clearly the height of
the green rectangle). Expressed this way we have:
The eikosogram is clearly asymmetric in its relationship between Y and X. The variable appearing on the vertical axis has its probabilistic structure expressed in terms of joint and conditional probabilities, while the variable on the horizontal axis has its structure expressed in terms of joint and marginal probabilities. Interchanging the role of the variables will yield a different eikosogram.
The Figure at left shows an eikosogram with the roles of X and Y interchanged. As before, the area of each rectangle equals a joint probability and its value is unchanged from the previous diagram, though its dimensions will have changed (typically). Note that the colouring follows the layout of the rectangles (bottom-most having the same dark colour, etc.) and not the values of the random variables.
Now, the width of the strips correspond to the marginal probabilities of Y, and the heights of the rectangles the conditional probabilities of X given Y. All probabilistic rules follow as before with the roles of Y and X interchanged.
Note that because the probabilities have been preserved via the areas of the rectangles,
it must be the case, for example, that the area of the bottom left rectangle is identical
in both diagrams (since both have Y=yes & X= black); all that has changed are its dimensions.
This is because both
areas equal the Pr(Y=yes & X= black). Matching corresponding height x width formulas
for this area gives
The software has many features. For binary Y especially, having the heights of the horizontal segments appear within the eikosogram might be preferred on occasion. The Figure below shows this case.
The software also provides some limited interaction with the eikosograms, Principally, this is the through the small squares atop each division between values of the explanatory variate(s). Clicking on the square "removes" the barrier. In terms of probability, this has the effect of removing the distinction between the two neighbouring values of the explanatory variate. If the distinction is removed, the corresponding probabilities must be allocated without distinction as well. Clicking on the box a second time reintroduces the barrier.
The following quicktime movie illustrates the interaction.
Consider only the left eikosogram (Y vs. X) first. Removing the barrier means removing the distinction between X=black and X=white. Consequently (ignoring the vertical line and the bottom axis for the moment), the height of the green bar is now the (marginal) probability Pr(Y=yes).
A simple metaphor might help. Imagine that the eikosogram is actually a container, originally split into two compartments according to whether X=black or X=white. In each compartment is an amount of green liquid whose height in each compartment is determined by the corresponding conditional probability. Removing the distinction between X=black and X=white corresponds to removing the barrier between the two compartments. With the barrier removed, the green liquid must settle to a new level; the total amount of liquid (or probability) remains unchanged. The new level is the marginal Pr(Y=yes).
Pause the movie just after the box has first been clicked and the eikosogram is flat. Imagine that the barrier has been perforated rather than entirely removed, so that the vertical line remains (though more faintly) and we can again talk about the explanatory variate taking different values. The horizontal level of green is now identical when X=black and when X=white. That is, when the eikosogram looks like this, we have
Given that flatness corresponds to independence, the interaction with the eikosogram through perforation of barriers can also be interpreted as the deliberate imposition of some (usually conditional) independence. Doing this will have consequences. The effect on other relations of asserting this independence is reflected in changes to the right most (X vs Y) eikosogram. As can be seen in the movie, asserting flatness in the left eikosogram forces flatness in the right.
In making this calculation, the water container (i.e. green liquid) metaphor was applied. So when a barrier is perforated the location of the barrier is preserved in the diagram where the perforation took place. In probabilistic terms, the marginal probabilities of the explanatory variate(s) in that diagram are preserved. Adjustments might be made to marginal probabilities in other diagrams.
Important note: Every time a barrier is perforated, it is done so with respect to the probabilities from the current state of the eikosograms. If all interaction is with respect to a single eikosogram, the effects will be unique. However, if actions are taken back and forth between different eikosograms of the data, the results will depend on the order of the actions taken. Again, all calculations are based on the water container metaphor with respect to the particular eikosogram and using the most recently determined values of all probabilities.
The code is available as an executable (platform independent) jar file called Eikosogram.jar.
Download this file into a directory where the user has write permission (the
examples in the program will write data out to files
(with names like data34.txt etc.)
in that directory.
Once downloaded, the java application can
be run in the usual fashion.
Important Note: The java source code is also available
in case someone would like to make improvements.
Further information on the source can be had
here,
To top
The user can access this web page directly from the applet through its "Help" menu.
The applet uses "eikoso.jpg" as its "icon". At present this appears only
in response to the "About Eikosogram ..." on this applet's "Help" menu.
However, it seems that in some instances it might be necessary to extract
the .jpg file from the .jar file for this to appear. This is done by typing:
In addition to the examples which the code contains, it is possible to input data from a file.
The input file contains the names and values of each categorical variable followed by the values (integer counts) of each cross-classified cell in order of the variables. The index of the first variable changes most slowly, that of the last variable most quickly.
Important Note: the delimiter between items on the same line is a comma; no blanks can appear between counts.
Some examples follow.
1. A two by two table. The file contains a comment line (optional, could be more than one) and 4 lines of data:
# This is a comment line (begins with #)
Y,female,male
X,yes,no
40,30
20,90
Here,
The corresponding table would be:
|
|||||||
|
|
2. Example 2. A two by two by two table. The file contains:
;;; This is also a comment line (begins with ;)
Y,yes,no
X,yes,no
Z,yes,no
210,196,42,112
105,245,252,98
Here,
The corresponding three way table would be:
|
|
In general, the input file will look like:
# Comments must appear on a line beginning either with the sharp sign #
# as in the statistical programming lanuage S
;;; or with a semi-colon (one will do) as in the ANSI standard
;;; programming language Common Lisp and in the statistical
;;; system Quail.
### Either will do.
variable_name_1, value1,value2,...,valueN1
variable_name_2, value1,value2,...,valueN2
.
.
.
variable_name_M, value1,value2,...,valueNm
data_1_1,data_1_2,...data_1_N
.
.
.
data_k_1,data_k_2,...data_k_N
where k=N1 and N = N2*...*Nm (i.e. the first index/variable changes most slowly).
Note again that there are no spaces between any of the data items.
To top