The book: A Handbook of Statistical Analyses Using R
2008/10/29
rle and inverse.rle
rle return a list of run length of the vector, length for the run length, value for it's corresponding value
inverse.rle rollback the original vector
example:
> x <- c(1,0,0,1,1,1,1,1,0,0,0) > (tmp <- rle(x)) Run Length Encoding lengths: int [1:4] 1 2 5 3 values : num [1:4] 1 0 1 0
>inverse.rle(tmp)
[1] 1 0 0 1 1 1 1 1 0 0 0
Application: Finding peaks in vector
getPeaks <- function(x) { tmp <- rle(diff(x)>0) change.index <- cumsum(c(1, tmp$lengths)) ind <- change.index[which(tmp$values ==FALSE)] return(ind) }
> x<-c(rnorm(300),rnorm(700,5,2)) > d1<- density(x) > ind <- getPeaks(d1$y) > plot(d1,xlab="",ylab="") > ind [1] 149 279 > points(d1$x[ind], d1$y[ind], pch = 16, col="red")
inverse.rle rollback the original vector
example:
> x <- c(1,0,0,1,1,1,1,1,0,0,0) > (tmp <- rle(x)) Run Length Encoding lengths: int [1:4] 1 2 5 3 values : num [1:4] 1 0 1 0
>inverse.rle(tmp)
[1] 1 0 0 1 1 1 1 1 0 0 0
Application: Finding peaks in vector
getPeaks <- function(x) { tmp <- rle(diff(x)>0) change.index <- cumsum(c(1, tmp$lengths)) ind <- change.index[which(tmp$values ==FALSE)] return(ind) }
> x<-c(rnorm(300),rnorm(700,5,2)) > d1<- density(x) > ind <- getPeaks(d1$y) > plot(d1,xlab="",ylab="") > ind [1] 149 279 > points(d1$x[ind], d1$y[ind], pch = 16, col="red")
2008/07/23
Hough Transformation
Ideal sample
library(PET)
testLine <- matrix(0, ncol=100, nrow=100)
diag(testLine) <- 1
ph <- hough(testLine) viewData(list(testLine, ph$hData), list("line","houghTrans"))
Wafer Map:
library(rimage)
library(pixmap)
tmp <- read.jpeg("xxx.jpg")
tmp1 <- pixmapRGB(c(tmp[, 1:400, 1], tmp[, 1:400, 2], tmp[, 1:400, 3]), 400, 400)
tmp2 <- as(tmp1, "pixmapGrey")
tmp22 <- 1- tmp2@grey
hh <- hough(tmp22)
viewData(list(tmp2@grey, tmp22, hh$hData), list("W1", "W1 inverse","Hough Transformation"))
tmp22f <- matrix(ifelse(unlist(tmp22)<0.3,0,1),nrow=400,ncol=400)
hp <- hough(tmp22f) viewData(list(tmp2@grey,tmp22f,hp$hData),list("Original Map","Pattern Map","Hough Transformation"))
library(PET)
testLine <- matrix(0, ncol=100, nrow=100)
diag(testLine) <- 1
ph <- hough(testLine) viewData(list(testLine, ph$hData), list("line","houghTrans"))
Wafer Map:
library(rimage)
library(pixmap)
tmp <- read.jpeg("xxx.jpg")
tmp1 <- pixmapRGB(c(tmp[, 1:400, 1], tmp[, 1:400, 2], tmp[, 1:400, 3]), 400, 400)
tmp2 <- as(tmp1, "pixmapGrey")
tmp22 <- 1- tmp2@grey
hh <- hough(tmp22)
viewData(list(tmp2@grey, tmp22, hh$hData), list("W1", "W1 inverse","Hough Transformation"))
tmp22f <- matrix(ifelse(unlist(tmp22)<0.3,0,1),nrow=400,ncol=400)
hp <- hough(tmp22f) viewData(list(tmp2@grey,tmp22f,hp$hData),list("Original Map","Pattern Map","Hough Transformation"))
2008/07/02
R by
unique return without change the order
by return with the factor order
example:
> tmppar <- paste("X", c(1,1,1,3,3,8,8,6,6,5,5,2,2,7,7,4,4),sep="")
> tmppar
[1] "X1" "X1" "X1" "X3" "X3" "X8" "X8" "X6" "X6" "X5" "X5" "X2" "X2" "X7" "X7" "X4" "X4"
> unique(tmppar)
[1] "X1" "X3" "X8" "X6" "X5" "X2" "X7" "X4"
> tmp1 <- by(rnorm(length(tmppar)),tmppar,FUN=min)
> tmp1
INDICES: X1
[1] -0.7628715
-------------------------------------------------------------------------------------------------
INDICES: X2
[1] -0.5590477
-------------------------------------------------------------------------------------------------
INDICES: X3
[1] 1.214794
-------------------------------------------------------------------------------------------------
INDICES: X4
[1] 0.445656
-------------------------------------------------------------------------------------------------
INDICES: X5
[1] -0.8465884
-------------------------------------------------------------------------------------------------
INDICES: X6
[1] -1.096506
-------------------------------------------------------------------------------------------------
INDICES: X7
[1] -0.2734341
-------------------------------------------------------------------------------------------------
INDICES: X8
[1] 0.4012138
by return with the factor order
example:
> tmppar <- paste("X", c(1,1,1,3,3,8,8,6,6,5,5,2,2,7,7,4,4),sep="")
> tmppar
[1] "X1" "X1" "X1" "X3" "X3" "X8" "X8" "X6" "X6" "X5" "X5" "X2" "X2" "X7" "X7" "X4" "X4"
> unique(tmppar)
[1] "X1" "X3" "X8" "X6" "X5" "X2" "X7" "X4"
> tmp1 <- by(rnorm(length(tmppar)),tmppar,FUN=min)
> tmp1
INDICES: X1
[1] -0.7628715
-------------------------------------------------------------------------------------------------
INDICES: X2
[1] -0.5590477
-------------------------------------------------------------------------------------------------
INDICES: X3
[1] 1.214794
-------------------------------------------------------------------------------------------------
INDICES: X4
[1] 0.445656
-------------------------------------------------------------------------------------------------
INDICES: X5
[1] -0.8465884
-------------------------------------------------------------------------------------------------
INDICES: X6
[1] -1.096506
-------------------------------------------------------------------------------------------------
INDICES: X7
[1] -0.2734341
-------------------------------------------------------------------------------------------------
INDICES: X8
[1] 0.4012138
2008/05/08
Canonical Correlation Analysis
CCA(Canonical correlation Analysis) for the model AX = BY , where X and Y both have dimension more than one.
The R code for this,
library(CCA)
res <- cc(X, Y)
#compare coefficient under same PC's
ccaPlot <- function(res, choice = 1) {
par(mfrow=c(2,1))
barplot(res$xcoef[, choice], las =2, main = paste("Coefficient of PC", choice))
barplot(res$ycoef[, choice], las=2)
}
Sample graphic
The R code for this,
library(CCA)
res <- cc(X, Y)
#compare coefficient under same PC's
ccaPlot <- function(res, choice = 1) {
par(mfrow=c(2,1))
barplot(res$xcoef[, choice], las =2, main = paste("Coefficient of PC", choice))
barplot(res$ycoef[, choice], las=2)
}
Sample graphic
2008/04/22
Using Parallel Coordinate Plot for variable selection
For detecting critical parameters, we can transform the problem into high-dimensional data problem. The parallel coordinate plot is good for high dimensions visualization. But it's lack of index for point out which variable is important. One can use this plot as first filtering method.
For construct above chart, we put the response variable in the first column, and sorting the data by the response variable, use different colors by grouping the data for easier detect.
For construct above chart, we put the response variable in the first column, and sorting the data by the response variable, use different colors by grouping the data for easier detect.
2008/02/25
Data normality test
For testing the data is from normal or not, the Shapiro test is more powerful then other tests. But Shapiro has restrict in R, the sample size must less than 5000.
So facing data points great than 5000, we use Kolmogorov-Smirnov tests.
The code as following:
So facing data points great than 5000, we use Kolmogorov-Smirnov tests.
The code as following:
normalityTest <- function(x, group=NULL, alpha=0.05){
if(is.null(group)){
x <- na.omit(x)
if(all(x==x[1]) | length(x)<5 ){
return(FALSE)
} else if (length(x) > 5000){
return(ks.test(x, "pnorm", mean(x), sd(x))$p.value>alpha)
} else {
return(shapiro.test(x)$p.value>alpha)
}
} else {
ng <- as.character(unique(group))
for(ng.ind in 1:length(ng)){
tmp <- x[group==ng[ng.ind]]
tmp <- na.omit(tmp)
if(all(tmp==tmp[1]) | length(tmp)<5){
stop(return(FALSE))
} else {
if(length(tmp) > 5000){
p <- ks.test(x, "pnorm", mean(x), sd(x))$p.value
} else {
p <- shapiro.test(tmp)$p.value
}
if(p < alpha) stop(return(FALSE))
}
}
return(TRUE)
}
}
2008/02/21
Tukey HSD groups represent in R
For the post hoc comparison of the ANOVA, R provide TukeyHSD for pairwise comparison and graphics function. For more informative represent the result, it's a good one from the SAS output. The following code do the same thing as SAS except the order is from minimul to maximum
2008/01/30
Calculate variance from summary data
grandMean <- function(meanX, nX){
return(sum(meanX*nX)/sum(nX))
}
grandSTD <- function(meanX, stdX, nX){
gmean <- grandMean(meanX, nX)
ssr <- sum((nX-1)*stdX*stdX) + sum(nX*meanX*meanX)
ssm <- 2*sum(nX*meanX)*gmean
ssg <- sum(nX)*gmean*gmean
return(sqrt((ssr-ssm+ssg)/(sum(nX)-1)))
}
2008/01/14
Applying ICA in wafer map recognization
ICA(Independent Component Analysis)
Assumption: patterns are independent
Final map: regards as some patterns composite result
Training Step:
Step 1: Choose about 20 wafer maps from same pattern, applying ICA to each maps(suppose
have 2 or 3 sources)
Step 2: Averaging as the basis for the pattern
Classification Step:
For the unknown patterns map, applying ICA to get the basis
Using KNN to predicted
Assumption: patterns are independent
Final map: regards as some patterns composite result
Training Step:
Step 1: Choose about 20 wafer maps from same pattern, applying ICA to each maps(suppose
have 2 or 3 sources)
Step 2: Averaging as the basis for the pattern
Classification Step:
For the unknown patterns map, applying ICA to get the basis
Using KNN to predicted
訂閱:
文章 (Atom)