R script, microarray data, and the interesting outliers

I previously mentioned that outliers in microarray data are what you should be interested in. But what did I mean by that?

Let’s take a look. The following is an excerpt of simple R script which can be used to examine microarray data outputted from the CARMAweb service:

> gene.normal <- normal.df[“LAMP3”, ]

> t.gene.normal <- as.data.frame(t(gene.normal))

> gene.cancer <- cancer.df[“LAMP3”, ]

> t.gene.cancer <- as.data.frame(t(gene.cancer))

> boxplot(t.gene.normal$LAMP3, t.gene.cancer$LAMP3, names = c(“Normal”, “Cancer”), ylab = “RMA”, main = “LAMP3”)

> mean.normal <- mean(t.gene.normal$LAMP3)

> mean.cancer <- mean(t.gene.cancer$LAMP3)

> fold.change <- mean.cancer/ mean.normal

> mean.normal

[1] 5.377307

> mean.cancer

[1] 5.634959

> fold.change

[1] 1.047915

In this example LAMP3 is being examined. Judging from the average fold change there is no difference between normal tissue and cancer. However if you look at the box plot:

You see that there are three tumours in which the robust multiarray average (RMA) is much higher than in normal tissue. These could represent a small subpopulation of tumours. Although it is difficult to tell from this dataset alone which contains only 45 tumours.

These tumours are potentially important and this would be missed by only looking at the fold change. You could imagine a drug that targets LAMP3 being effective only in this subset. This is much better than no drug being developed because on average there appears to be no fold change.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s