2008/02/25

Data normality test

For testing the data is from normal or not, the Shapiro test is more powerful then other tests. But Shapiro has restrict in R, the sample size must less than 5000.
So facing data points great than 5000, we use Kolmogorov-Smirnov tests.
The code as following:


normalityTest <- function(x, group=NULL, alpha=0.05){

if(is.null(group)){
x <- na.omit(x)
if(all(x==x[1]) | length(x)<5 ){
return(FALSE)
} else if (length(x) > 5000){
return(ks.test(x, "pnorm", mean(x), sd(x))$p.value>alpha)
} else {
return(shapiro.test(x)$p.value>alpha)
}
} else {
ng <- as.character(unique(group))
for(ng.ind in 1:length(ng)){
tmp <- x[group==ng[ng.ind]]
tmp <- na.omit(tmp)
if(all(tmp==tmp[1]) | length(tmp)<5){
stop(return(FALSE))
} else {
if(length(tmp) > 5000){
p <- ks.test(x, "pnorm", mean(x), sd(x))$p.value
} else {
p <- shapiro.test(tmp)$p.value
}
if(p < alpha) stop(return(FALSE))
}
}
return(TRUE)
}
}

CC Copyright

創用 CC 授權條款
本著作由Chunhung Chou製作,以創用CC 姓名標示-相同方式分享 3.0 Unported 授權條款釋出。