I previously mentioned that outliers in microarray data are what you should be interested in. But what did I mean by that?
Let’s take a look. The following is an excerpt of simple R script which can be used to examine microarray data outputted from the CARMAweb service:
> gene.normal <- normal.df[“LAMP3”, ]
> t.gene.normal <- as.data.frame(t(gene.normal))
> gene.cancer <- cancer.df[“LAMP3”, ]
> t.gene.cancer <- as.data.frame(t(gene.cancer))
> boxplot(t.gene.normal$LAMP3, t.gene.cancer$LAMP3, names = c(“Normal”, “Cancer”), ylab = “RMA”, main = “LAMP3”)
> mean.normal <- mean(t.gene.normal$LAMP3)
> mean.cancer <- mean(t.gene.cancer$LAMP3)
> fold.change <- mean.cancer/ mean.normal
In this example LAMP3 is being examined. Judging from the average fold change there is no difference between normal tissue and cancer. However if you look at the box plot:
You see that there are three tumours in which the robust multiarray average (RMA) is much higher than in normal tissue. These could represent a small subpopulation of tumours. Although it is difficult to tell from this dataset alone which contains only 45 tumours.
These tumours are potentially important and this would be missed by only looking at the fold change. You could imagine a drug that targets LAMP3 being effective only in this subset. This is much better than no drug being developed because on average there appears to be no fold change.