遊樂場: R

顯示具有 R 標籤的文章。顯示所有文章

2021/04/19

ggplot2 boxplot to ggplotly

Package "plotly" provide powerful web interactive tools in chart. ggplot2 is very intuitively for creating charts. Use ggplotly to transform ggplot2 object into plotly object is very convenient. But sometimes it's will make you surprised

For example

```{r}
library(ggplot2)
library(plotly)

df <- diamonds
p <- ggplot(df, aes_string(x = 'carat', y = 'cut')) +
geom_boxplot()
print(p)

```

But the chart will not as you desire if you use ggplotly

```{r}
library(ggplot2)
library(plotly)
df <- diamonds
p <- ggplot(df, aes_string(x = 'carat', y = 'cut')) +
geom_boxplot()
print(ggplotly(p))
```

plotly default set boxplot as verticle, so we use coord_flip to plot horizontal boxplot

```{r}
p <- ggplot(df, aes_string(y = 'carat', x = 'cut')) +
geom_boxplot()+
coord_flip()
print(ggplotly(p))

```

for grouped boxplot

```{r}
p <- ggplot(df, aes_string(y = 'carat', x = 'cut', color = 'clarity')) +
geom_boxplot() +
coord_flip()
print(p)
```

use ggplotly get stacked boxplot

```{r}
p <- ggplot(df, aes_string(y = 'carat', x = 'cut', color = 'clarity')) +
geom_boxplot() +
coord_flip()
print(ggplotly(p))
```

fine tune ploty layout:

```{r}
p <- ggplot(df, aes_string(y = 'carat', x = 'cut', color = 'clarity')) +
geom_boxplot() +
coord_flip()
print(ggplotly(p) %>% layout(boxmode='group'))
```

2015/08/19

updateusr method for better barplot with lines/points

When using barplot with add y2 axis for lines or points by par(new=TRUE) as show in the example code usually get shifted.

The updateusr method from Greg Snow's TeachingDemos package is a great solution for this problem. The idea behind updateusr is define new plot region according to the ratio of one unit in each axis.

The code:

Example code:(add space 8/20)

Result:

2014/12/08

Error: Package 'XXX' was build before 3.x.x: please re-install it

For install new package which previous install use different R version cause the error:
"Error: Package 'XXX' was build before 3.x.x: please re-install it"

solution:
update.packages(checkBuilt = TRUE, ask = FALSE)

for user privilege issue:
use SUDO R, then update.packages(checkBuilt = TRUE, ask = FALSE)

Reference:
https://stackoverflow.com/questions/16987948/causes-of-error-package-was-built-before-3-0-0-please-re-install-it

2010/09/05

Patches for speeding up R

http://www.cs.toronto.edu/~radford/R-mods.html

先down下來找時間研究一下

2010/02/25

Tips for enhance R code

After reading Writing Efficient Programs in R and R Code optimization and Packages Creation, 3 tips by now.
1. avoid data frame (from 2nd)

2. ifelse is slower than if () { } else { } (from 1st)

3. aovid using rbind, cbind in loops, predifine a NA array or matrix(from 2nd)

Examples:

2009/07/02

R package

The advantage of using R packages. How to create your own R packages and a little about S3 and S4 class.

http://epub.ub.uni-muenchen.de/6175/1/tr036.pdf

2009/02/18

Axis scaling algorithm

A good look axes scale usually increase by 10^(x) or 0.5*10^(x) according to the data's range. Dorothy E. Pugh provide an algorithm (and SAS Macro) for generated suitable axis scale, I modified the SAS code into R code

reference:Dorothy E. Pugh(SUGI25), "A Robust Generalized Axis-scaling Macro"

2008/07/02

R by

unique return without change the order
by return with the factor order
example:
> tmppar <- paste("X", c(1,1,1,3,3,8,8,6,6,5,5,2,2,7,7,4,4),sep="")
> tmppar
[1] "X1" "X1" "X1" "X3" "X3" "X8" "X8" "X6" "X6" "X5" "X5" "X2" "X2" "X7" "X7" "X4" "X4"
> unique(tmppar)
[1] "X1" "X3" "X8" "X6" "X5" "X2" "X7" "X4"
> tmp1 <- by(rnorm(length(tmppar)),tmppar,FUN=min)
> tmp1
INDICES: X1
[1] -0.7628715
-------------------------------------------------------------------------------------------------
INDICES: X2
[1] -0.5590477
-------------------------------------------------------------------------------------------------
INDICES: X3
[1] 1.214794
-------------------------------------------------------------------------------------------------
INDICES: X4
[1] 0.445656
-------------------------------------------------------------------------------------------------
INDICES: X5
[1] -0.8465884
-------------------------------------------------------------------------------------------------
INDICES: X6
[1] -1.096506
-------------------------------------------------------------------------------------------------
INDICES: X7
[1] -0.2734341
-------------------------------------------------------------------------------------------------
INDICES: X8
[1] 0.4012138

2008/02/25

Data normality test

For testing the data is from normal or not, the Shapiro test is more powerful then other tests. But Shapiro has restrict in R, the sample size must less than 5000.
So facing data points great than 5000, we use Kolmogorov-Smirnov tests.
The code as following:


normalityTest <- function(x, group=NULL, alpha=0.05){

   if(is.null(group)){
      x <- na.omit(x)
      if(all(x==x[1]) | length(x)<5 ){
         return(FALSE)  
      } else if (length(x) > 5000){ 
         return(ks.test(x, "pnorm", mean(x), sd(x))$p.value>alpha)
      } else {
         return(shapiro.test(x)$p.value>alpha)
      }  
   } else {
      ng <- as.character(unique(group))
      for(ng.ind in 1:length(ng)){
         tmp <- x[group==ng[ng.ind]]
         tmp <- na.omit(tmp) 
         if(all(tmp==tmp[1]) | length(tmp)<5){
            stop(return(FALSE))
         } else {
             if(length(tmp) > 5000){
               p <- ks.test(x, "pnorm", mean(x), sd(x))$p.value
             } else {   
               p <- shapiro.test(tmp)$p.value
             }   
             if(p < alpha) stop(return(FALSE))
         }   
      }
      return(TRUE)
   }
}

2008/02/21

Tukey HSD groups represent in R

For the post hoc comparison of the ANOVA, R provide TukeyHSD for pairwise comparison and graphics function. For more informative represent the result, it's a good one from the SAS output. The following code do the same thing as SAS except the order is from minimul to maximum

2008/01/30

Calculate variance from summary data


grandMean <- function(meanX, nX){
   return(sum(meanX*nX)/sum(nX))
}

grandSTD <- function(meanX, stdX, nX){
   gmean <- grandMean(meanX, nX)
   ssr <- sum((nX-1)*stdX*stdX) + sum(nX*meanX*meanX)
   ssm <- 2*sum(nX*meanX)*gmean
   ssg <- sum(nX)*gmean*gmean
   return(sqrt((ssr-ssm+ssg)/(sum(nX)-1)))
}

2007/12/13

Use scan to read data -- modified code

在做what的設定時需要是list component.
也就是說當你設了what=list(...)之後在讀資料時會以一個column一個column來讀, 每個column的屬性就是what所設定的
所以當你把what="character"時其實是把所有的資料都當做是一個column的character
用例子來說明

example:
原始的資料檔如下
Isat_N43_mA_10/.32,7.1,6.2,5.3
Isat_N4_mA_10/.18,7.35,6.45,5.55
Isat_P43_mA_10/.27,-2.46,-2.91,-3.36
Isat_P4_mA_10/.18,-2.32,-2.71,-3.1

當用scan(infile, what="character",sep=",")
Read 16 items
[1] "Isat_N43_mA_10/.32" "7.1" "6.2" "5.3"
[5] "Isat_N4_mA_10/.18" "7.35" "6.45" "5.55"
[9] "Isat_P43_mA_10/.27" "-2.46" "-2.91" "-3.36"
[13] "Isat_P4_mA_10/.18" "-2.32" "-2.71" "-3.1"

當用scan(infile, what=list("character",bouble(0),double(0),double(0)),sep=",")
讀進去後的結果:

Read 4 records

[[1]]
[1] "Isat_N43_mA_10/.32" "Isat_N4_mA_10/.18" "Isat_P43_mA_10/.27" "Isat_P4_mA_10/.18"

[[2]]
[1] 7.10 7.35 -2.46 -2.32

[[3]]
[1] 6.20 6.45 -2.91 -2.71

[[4]]
[1] 5.30 5.55 -3.36 -3.10

要把資料轉成dataframe 只須用as.data.frame即可(data frame 其實就是vector of list 所以實在是不需要多此一舉的)

修改過的scan