Contents - Index

How many principal components are needed


This can be investigated from three perspectives:


I suggest to look at the Information Content of each PC, which is printed in the log file after each biplot is generated, which looks like:


PC Singular Value % of Total SS Information Content (IC)  

1 19.664 62.9 8.177  

2 7.833 10 1.3  

3 6.912 7.8 1.014  

4 4.536 3.3 0.429  

5 3.449 1.9 0.247  

6 1.628 0.4 0.052  


A PC is information rich if its IC is greater than 1. In the above example, it appears 2 PC's are sufficient but there is still some formation in PC3. Experience indicates that usually 3 PC's, thus a 3-D biplot, are sufficient for most datasets.



The following may also be relevant for this question:


1) One approach is to examine the biplot based on Scaling = 1, i.e., tester-standard-deviation scaled. In such a biplot, all testers are assumed to be equally important (with a variace of 1). Therefore, all testers should have vectors of equal length if the biplot adequately approximates the tester-standardized data. For the same reason, if there are testers that have considerably shorter vectors than others, then the biplot does not fully display the patterns with regard to these testers.  The following GGE biplot is based on this scaling for the genotype by environment dataset:



It can be seen that all testers have almost the same vector length (except for Tester 3), indicating that the biplot is adequate. However, this may not relevant to the adequacy of a biplot that is based on other sacling method.


2) Another approach is to examine a secondary biplot, in addition to the primary biplot of PC1 vs. PC2, and see if there are any additional patterns. There should be no clear pattern in a secondary biplot if the primary biplot has sufficiently summarized the data. For example, the biplot of PC3 vs. PC4 in this example does not reveal any obvious patterns (Except that Tester 3 seems to be different from all other testers), suggesting that the primary biplot is adequate, the same conclusion as that from the first approach:



3) Along this line, if the primary biplot does not reveal any patterns, that means that there are no patterns in the data, and the question whether the biplot adequately displays the patterns of the data becomes irrelevant, and the searching for patterns should stop.


In depth: What if two PC's are not sufficientlinkID=32}