This is an R package by Catherine Hurley and myself, available from CRAN , Below are some additional materials on this package that might not be available on CRAN.

- Tutorial material from tutorial given in Tokyo at the
Institute of Statistical Mathematics :
- TutorialTalk.pdf
- Might need to load this first: guidedpcp.R
- Tutorial demo script: Tutorial1.R

This is an interactive java application which displays eikosograms, useful for teaching probability.

Quail is a free extension to ANSI Common Lisp that runs on Macintoshes and Windows machines. More info can be had at the Quail site.

- This is the segmentation data from
UCI Machine Learning
Repository
I have put this data set into a form that is ready for use in Quail (and in S/R below)
which I
call the pixels dataset:
- A summary of the data. Briefly, there are 7 classes, 19 continuous measurements, 210 observations in the training set and 2100 in the test set. Each observation is a pixel taken from an image; the measurements are characteristics of that pixel and its neighbours and the class of the pixel is the part of the image it comes from (e.g. CEMENT, PATH, FOLIAGE, SKY, etc.).
- The training data: pixels-train.lsp

- The checker data (see below in the S language part) is used here in Quail to show how it is that a linear discriminant might still work on data if the variables are expanded with appropriate functions of the original explanatory variates.

Directory containing Quail-code described here.

- Quail-code for class heights and mixtures of univariate Gaussians (normals)

S is a statistical programming language developed at Bell Labs from the 1970s to the present time (see history of S ). Splus and R are statistical systems based on separate implementations of the S language. The department here has a page containing some useful information on the language and its R implementation.

Below is a bunch of code written mostly for classroom/course uses. It has been written in the S language and tested only in the R implementation of S.

For those who are new to S there are a few classic mistakes that can be easily made. Some of these are recorded here . For the more adventurous, take care to follow the scoping rules of S -- some examples .

- This is the segmentation data from
UCI Machine Learning
Repository
I have put this data set into a form that is ready for use in S/R which I
call the pixels dataset:
- A summary of the data. Briefly, there are 7 classes, 19 continuous measurements, 210 observations in the training set and 2100 in the test set. Each observation is a pixel taken from an image; the measurements are characteristics of that pixel and its neighbours and the class of the pixel is the part of the image it comes from (e.g. CEMENT, PATH, FOLIAGE, SKY, etc.).
- The training data: pixels.train The S dataframe is called pixels.tr and has 210 observations.
- The test data: pixels.test The S dataframe is called pixels.te and has 2100 observations.

- A function to produce nuggets data.
- The Gauss2 data from the notes.
- The checker board data from the notes.

Directory containing R-code .

- Various sorts implemented for classroom demonstration only (compares timing of different sorting algorithms as a function of the proportion of the data which are already in order before the sort is called).
- Dimension reduction methods illustrated on the segmentation (or pixels) data set. This includes principal components and crimcoords.
- Code illustrating principal components analysis on the checker data. This is useful only for illustration of the S-code and the action of the principal components on the original data. A slightly more useful example (i.e. having more dimensions) is given here for the Iris data.
- K nearest neigbour predictors for the checker data and again for the nuggets data
- Recursive partitioning (classification trees) applied to the Iris data.
- Multinomial (and other) modelling of Fisher's (Anderson's) Iris data. This includes some discussion of the construction of data frames.
- Neural net analyses of the checker board data. Plus the ad hoc model averaging for the neural nets.
- logistic and logistic-gam and other analyses of the checker board data
- The logistic, lda, qda code.
- The logistic results on the test data.
- The mixture model discriminant code.
- The logistic GAM code.

- Discriminant functions with class probabilities example (not done in class).
- R-code from the class on quadratic discriminant functions .
- R-code from the class on Linear discriminant function theory.