2008/10/29

Exercise of HSAUR Chapter 1

The book: A Handbook of Statistical Analyses Using R
data("Forbes2000", packages = "HSAUR")
##ex 1.1
> tmp <- subset(Forbes2000, country==c("United States","United Kingdom","France","Germany"))
> by(tmp$profits, as.character(tmp$country), FUN=function(x)median(x, na.rm=T))
INDICES: France
[1] 0.215
------------------------------------------------------------------------------
INDICES: Germany
[1] 0.245
------------------------------------------------------------------------------
INDICES: United Kingdom
[1] 0.17
------------------------------------------------------------------------------
INDICES: United States
[1] 0.26
##ex 1.2
> subset(Forbes2000, country=="Germany" & profits < 0, select = "name")
name
350 Allianz Worldwide
364 Deutsche Telekom
397 E.ON
431 HVB-HypoVereinsbank
500 Commerzbank
798 Infineon Technologies
869 BHW Holding
926 Bankgesellschaft Berlin
1034 W&W-Wustenrot
1187 mg technologies
1477 Nurnberger Beteiligungs
1887 SPAR Handels
1994 Mobilcom
##ex 1.3
> summary(subset(Forbes2000, country=='Bermuda', select = "category"))
category
Insurance :10
Conglomerates : 2
Oil & gas operations: 2
Banking : 1
Capital goods : 1
Food drink & tobacco: 1
(Other) : 3
==> Insurance
##ex 1.4
> o <- order(Forbes2000$profits, decreasing = T)
> tmp <- Forbes2000[o,]
> tmp1 <- tmp[1:50,]
> plot(tmp1$sales, log(tmp1$assets), main = "Forbes2000 Top 50 Profits Company's Sales vs Assets", xlab = "Sales", ylab = "log(Assets)")
> text(tmp1$sales, log(tmp1$assets), abbreviate(tmp1$country))
##ex 1.5
> by(Forbes2000$sales, as.character(Forbes2000$country), FUN = function(x) mean(x, na.rm = T))
> tmp <- Forbes2000[Forbes2000$profits>5, c("name","country")]
> summary(tmp$country)

rle and inverse.rle

rle return a list of run length of the vector, length for the run length, value for it's corresponding value
inverse.rle rollback the original vector

example:
> x <- c(1,0,0,1,1,1,1,1,0,0,0) > (tmp <- rle(x)) Run Length Encoding lengths: int [1:4] 1 2 5 3 values : num [1:4] 1 0 1 0

>inverse.rle(tmp)
[1] 1 0 0 1 1 1 1 1 0 0 0


Application: Finding peaks in vector
getPeaks <- function(x) { tmp <- rle(diff(x)>0) change.index <- cumsum(c(1, tmp$lengths)) ind <- change.index[which(tmp$values ==FALSE)] return(ind) }

> x<-c(rnorm(300),rnorm(700,5,2)) > d1<- density(x) > ind <- getPeaks(d1$y) > plot(d1,xlab="",ylab="") > ind [1] 149 279 > points(d1$x[ind], d1$y[ind], pch = 16, col="red")

CC Copyright

創用 CC 授權條款
本著作由Chunhung Chou製作,以創用CC 姓名標示-相同方式分享 3.0 Unported 授權條款釋出。