Contents - Index

Principal component analysis


Principal component analysis (PCA) is a well-known multivariate analysis technique aiming at summarizing data with reduced dimensions. Its focus is on the interrelationships among testers (variables) rather than the entries (observations). Therefore, PCA generates a "plot" of testers rather than a "biplot" of both testers and entries. 


Mathematically, PCA is a process that decomposes the covariance matrix of a matrix into two parts: eigenvalues and column eigenvectors, whereas Singular Value Decomposition (SVD) decomposes a matrix per se into three parts: singular values, column eigenvectors, and row eigenvectors. The relationships between PCA and SVD lie in that the eigenvalues are the square of the singular values and the column vectors are the same for both.   


Gabriel (1971) was the first to propose the use of biplots in displaying the results from PCA, where the term was equivalently used for SVD. In a PCA biplot, the row vectors are loadings on the column eigenvectors (regression coefficients of the rows of the original matrix on the column eigenvectors). 


The definition of PCA is rather ambiguous in the literature; it is variably referred to model 0, model 1, or model 2 by different researchers (see references on this). It is more accurate to use SVD after specific centering and scaling methods instead of PCA when referring a particular biplot or analysis.