2008/10/29

Exercise of HSAUR Chapter 1

The book: A Handbook of Statistical Analyses Using R

rle and inverse.rle

rle return a list of run length of the vector, length for the run length, value for it's corresponding value
inverse.rle rollback the original vector

example:
> x <- c(1,0,0,1,1,1,1,1,0,0,0) > (tmp <- rle(x)) Run Length Encoding lengths: int [1:4] 1 2 5 3 values : num [1:4] 1 0 1 0

>inverse.rle(tmp)
[1] 1 0 0 1 1 1 1 1 0 0 0


Application: Finding peaks in vector
getPeaks <- function(x) { tmp <- rle(diff(x)>0) change.index <- cumsum(c(1, tmp$lengths)) ind <- change.index[which(tmp$values ==FALSE)] return(ind) }

> x<-c(rnorm(300),rnorm(700,5,2)) > d1<- density(x) > ind <- getPeaks(d1$y) > plot(d1,xlab="",ylab="") > ind [1] 149 279 > points(d1$x[ind], d1$y[ind], pch = 16, col="red")

2008/07/23

Hough Transformation

Ideal sample
library(PET)
testLine <- matrix(0, ncol=100, nrow=100)

diag(testLine) <- 1

ph <- hough(testLine)
viewData(list(testLine, ph$hData), list("line","houghTrans"))

Wafer Map:
library(rimage)
library(pixmap)

tmp <- read.jpeg("xxx.jpg")
tmp1 <- pixmapRGB(c(tmp[, 1:400, 1], tmp[, 1:400, 2], tmp[, 1:400, 3]), 400, 400)
tmp2 <- as(tmp1, "pixmapGrey")
tmp22 <- 1- tmp2@grey

hh <- hough(tmp22)
viewData(list(tmp2@grey, tmp22, hh$hData), list("W1", "W1 inverse","Hough Transformation"))

tmp22f <- matrix(ifelse(unlist(tmp22)<0.3,0,1),nrow=400,ncol=400)

hp <- hough(tmp22f) viewData(list(tmp2@grey,tmp22f,hp$hData),list("Original Map","Pattern Map","Hough Transformation"))

2008/07/02

R by

unique return without change the order
by return with the factor order
example:
> tmppar <- paste("X", c(1,1,1,3,3,8,8,6,6,5,5,2,2,7,7,4,4),sep="")
> tmppar
[1] "X1" "X1" "X1" "X3" "X3" "X8" "X8" "X6" "X6" "X5" "X5" "X2" "X2" "X7" "X7" "X4" "X4"
> unique(tmppar)
[1] "X1" "X3" "X8" "X6" "X5" "X2" "X7" "X4"
> tmp1 <- by(rnorm(length(tmppar)),tmppar,FUN=min)
> tmp1
INDICES: X1
[1] -0.7628715
-------------------------------------------------------------------------------------------------
INDICES: X2
[1] -0.5590477
-------------------------------------------------------------------------------------------------
INDICES: X3
[1] 1.214794
-------------------------------------------------------------------------------------------------
INDICES: X4
[1] 0.445656
-------------------------------------------------------------------------------------------------
INDICES: X5
[1] -0.8465884
-------------------------------------------------------------------------------------------------
INDICES: X6
[1] -1.096506
-------------------------------------------------------------------------------------------------
INDICES: X7
[1] -0.2734341
-------------------------------------------------------------------------------------------------
INDICES: X8
[1] 0.4012138

2008/05/08

Canonical Correlation Analysis

CCA(Canonical correlation Analysis) for the model AX = BY , where X and Y both have dimension more than one.
The R code for this,
library(CCA)
res <- cc(X, Y)
#compare coefficient under same PC's
ccaPlot <- function(res, choice = 1) {
par(mfrow=c(2,1))
barplot(res$xcoef[, choice], las =2, main = paste("Coefficient of PC", choice))
barplot(res$ycoef[, choice], las=2)

}

Sample graphic

2008/04/22

Using Parallel Coordinate Plot for variable selection

For detecting critical parameters, we can transform the problem into high-dimensional data problem. The parallel coordinate plot is good for high dimensions visualization. But it's lack of index for point out which variable is important. One can use this plot as first filtering method.



For construct above chart, we put the response variable in the first column, and sorting the data by the response variable, use different colors by grouping the data for easier detect.

2008/02/25

Data normality test

For testing the data is from normal or not, the Shapiro test is more powerful then other tests. But Shapiro has restrict in R, the sample size must less than 5000.
So facing data points great than 5000, we use Kolmogorov-Smirnov tests.
The code as following:


normalityTest <- function(x, group=NULL, alpha=0.05){

if(is.null(group)){
x <- na.omit(x)
if(all(x==x[1]) | length(x)<5 ){
return(FALSE)
} else if (length(x) > 5000){
return(ks.test(x, "pnorm", mean(x), sd(x))$p.value>alpha)
} else {
return(shapiro.test(x)$p.value>alpha)
}
} else {
ng <- as.character(unique(group))
for(ng.ind in 1:length(ng)){
tmp <- x[group==ng[ng.ind]]
tmp <- na.omit(tmp)
if(all(tmp==tmp[1]) | length(tmp)<5){
stop(return(FALSE))
} else {
if(length(tmp) > 5000){
p <- ks.test(x, "pnorm", mean(x), sd(x))$p.value
} else {
p <- shapiro.test(tmp)$p.value
}
if(p < alpha) stop(return(FALSE))
}
}
return(TRUE)
}
}

2008/02/21

Tukey HSD groups represent in R

For the post hoc comparison of the ANOVA, R provide TukeyHSD for pairwise comparison and graphics function. For more informative represent the result, it's a good one from the SAS output. The following code do the same thing as SAS except the order is from minimul to maximum

2008/01/30

Calculate variance from summary data


grandMean <- function(meanX, nX){
return(sum(meanX*nX)/sum(nX))
}

grandSTD <- function(meanX, stdX, nX){
gmean <- grandMean(meanX, nX)
ssr <- sum((nX-1)*stdX*stdX) + sum(nX*meanX*meanX)
ssm <- 2*sum(nX*meanX)*gmean
ssg <- sum(nX)*gmean*gmean
return(sqrt((ssr-ssm+ssg)/(sum(nX)-1)))
}

2008/01/14

Applying ICA in wafer map recognization

ICA(Independent Component Analysis)

Assumption: patterns are independent
Final map: regards as some patterns composite result
Training Step:
Step 1: Choose about 20 wafer maps from same pattern, applying ICA to each maps(suppose
have 2 or 3 sources)
Step 2: Averaging as the basis for the pattern

Classification Step:
For the unknown patterns map, applying ICA to get the basis
Using KNN to predicted

CC Copyright

創用 CC 授權條款
本著作由Chunhung Chou製作,以創用CC 姓名標示-相同方式分享 3.0 Unported 授權條款釋出。