Rdimtools is an R package for dimension reduction (DR) - including feature selection and manifold learning - and intrnsic dimension estimation (IDE) methods. We aim at building one of the most comprehensive toolbox available online, where current version delivers 141 DR algorithms and 17 IDE methods.
The philosophy is simple, the more we have at hands, the better we can play.
Our logo characterizes the foundational nature of multivariate data analysis; we may be blind people wrangling the data to see an elephant to grasp an idea of what the data looks like with partial information from each algorithm.
You can install a release version from CRAN:
or the development version from github:
## install.packages("devtools") devtools::install_github("kyoustat/Rdimtools")
Here is an example of dimension reduction on the famous
iris dataset. Principal Component Analysis (
do.pca), Laplacian Score (
do.lscore), and Diffusion Maps (
do.dm) are compared, each from a family of algorithms for linear reduction, feature extraction, and nonlinear reduction.
# load the library library(Rdimtools) # load the data X = as.matrix(iris[,1:4]) lab = as.factor(iris[,5]) # run 3 algorithms mentioned above mypca = do.pca(X, ndim=2) mylap = do.lscore(X, ndim=2) mydfm = do.dm(X, ndim=2, bandwidth=10) # visualize par(mfrow=c(1,3)) plot(mypca$Y, pch=19, col=lab, xlab="axis 1", ylab="axis 2", main="PCA") plot(mylap$Y, pch=19, col=lab, xlab="axis 1", ylab="axis 2", main="Laplacian Score") plot(mydfm$Y, pch=19, col=lab, xlab="axis 1", ylab="axis 2", main="Diffusion Maps")
Swill Roll is a classic example of 2-dimensional manifold embedded in ℝ3 and one of 11 famous model-based samples from
aux.gensamples() function. Given the ground truth that d = 2, let’s apply several methods for intrinsic dimension estimation.
# generate sample data set.seed(100) roll = aux.gensamples(dname="swiss") # we will compare 6 methods (out of 17 methods from version 1.0.0) vecd = rep(0,5) vecd = est.Ustat(roll)$estdim # convergence rate of U-statistic on manifold vecd = est.correlation(roll)$estdim # correlation dimension vecd = est.made(roll)$estdim # manifold-adaptive dimension estimation vecd = est.mle1(roll)$estdim # MLE with Poisson process vecd = est.twonn(roll)$estdim # minimal neighborhood information # let's visualize plot(1:5, vecd, type="b", ylim=c(1.5,2.5), main="true dimension is d=2", xaxt="n",xlab="",ylab="estimated dimension") xtick = seq(1,5,by=1) axis(side=1, at=xtick, labels = FALSE) text(x=xtick, par("usr"), labels = c("Ustat","correlation","made","mle1","twonn"), pos=1, xpd = TRUE)
We can observe that all 5 methods we tested estimated the intrinsic dimension around d = 2. It should be noted that the estimated dimension may not be integer-valued due to characteristics of each method.