Predicting incomplete tables.

Why can't I predict my two-way table or its margins?

This is my response to an ASReml user who had trouble predicting a twoway table and its margins. Since the example is instructive, I have taken the liberty of sharing it for your interest.
 Title: exampledata.
  A  * # !I
  B  * # !I
  C    #!*10
  D    #!*10
  E  * #  !I
  F  * # !I
 # Check/Correct these field definitions.
 exampledata.csv  !SKIP 1
 tabulate C ~ B A F  !stats
 tabulate C  ~ B A   !stats
 tabulate C ~ A B    !stats
 tabulate C ~ F B    !stats

 C ~ mu F B D F.B F.D -B.D -F.B.D,         # Specify fixed model
      !r A           # Specify random model
A has 140 levels,
B has 6 levels,
D is a covariate
E is not used so ignored in these notes
F has 4 levels.
The combinations of B A F define the individual observations.

It appears levels of A are largely nested in levels of B but not completely in that some levels of A appear in two different levels of B So there are 140 levels of A and 157 levels in A.B

Use of !I says 'TREAT THE values in the data file as labels rather than directly as codes'. However it seems that they should be taken directly as level codes so I have changed !I to * so that they appear in natural order.

The analysis of the example gives
          - - - Results from analysis of C - - -

          Approximate stratum variance decomposition
 Stratum     Degrees-Freedom   Variance      Component Coefficients

 Source                Model  terms     Gamma     Component    Comp/SE   % C
 A                       140    140  0.102574E-05  0.133539E-08   0.00   0 B
 Variance                280    248   1.00000      0.130188E-02  11.14   0 P
 Warning: Code B - fixed at a boundary (!GP)       F - fixed by user
               ? - liable to change from P to B    P - positive definite
               C - Constrained by user (!VCC)      U - unbounded
               S - Singular Information matrix
 S means there is no information in the data for this parameter.
 Very small components with Comp/SE ratios of zero sometimes indicate poor
           scaling.  Consider rescaling the design matrix in such cases.

                                   Wald F statistics
     Source of Variation           NumDF    DenDFcn Fic    Fcn M Pcn
   7 mu                                1     248.0  1281.91   283.34 . <.001
   6 F                                 3     248.0   201.57   192.33 A <.001
   2 B                                 5     248.0    11.44    11.72 A <.001
   4 D                                 1     248.0     3.31     0.06 A 0.801

   8 F.B                               9     248.0     3.89     3.40 b <.001
   9 F.D                               3     248.0     1.06     0.98 B 0.405
  10 B.D                               4     248.0     1.18     1.18 B 0.323
  11 F.B.D                             6     248.0     1.26     1.26 C 0.277
 Notice: The DenDF values are calculated ignoring fixed/boundary/singular
             variance parameters using algebraic derivatives.
   1 A                                   140 effects fitted (       2 are zero)
Which shows no variance component associated with A, A big effect of F and B, and interaction. ; no effect of D So you wanted to predict these tables. Tabulation shows 18 combinations of F and B
   F1   B1  b2  B3  B4  B5  B6
   F2   B1  b2  B3  -   -   -
   F3   B1  b2  B3  B4  B5  B6
   F4   -   -   -   B4  B5  B6
Surprisingly, only 14 combinations are reported from predict F B

The ones missing are F1B6 F3B6 F4B5 and F4B6 despite the fact that there is the correct DF (1 + 3 + 5 + 9=18)

My first guess is that this might be a scaling effect but multiplying by 10 did not solve the problem.

Second was that it was associated with the NS D regressions. Dropping the F.B.D and B.D model terms resolved the problem. Looking at the ANOVA table again, we see that these terms were deficient in DF (B.D had 4 not 5, F.B.D had 6 not 9) so these singularities were sufficient to make some cells not estimable.

Now concerning the F and B tables, given the 6 missing cells, there is no standard way to calculate the margins (except F1 and F3 which are complete).

There are two possibilities in ASReml but you must determine which if either is valid.
  • I have added !FITMARGIN to PREDICT F B and this generates marginal means from the F B table assuming that interaction effects associated with missing cells are zero.
  • I have added !PRESENT F B to the other two predict statements so that marginal means are calculated just from those cells in the row/column of F x B table which are present.

    Neither of these approaches is necessarily appropriate or reasonable. Given the large F effects, B means using the PRESENT strategy will be confounded with the F effects (at least comparisons between the B1 B2 B3 set and the B4 B5 B6 set).

    ARG 25 Oct 2008

    See Also