Some interesting facts about the phylogeny of CUB domains

The Complement subcomponents C1r/ C1s, sea urchin epidermal growth factor (Uegf), Bone morphogenetic protein 1 (Bmp1) domain (CUB domain) is a structural fold named after the first proteins in which it was identified [1].

The CUB domain is predominantly found in multicellular eukaryotes excluding fungi. However they are found in some unicellular plants and protozoa. The genomes of single celled alga and plankton as well as multicellular moss and poplar tree contain CUB proteins. There are few known CUB proteins from protozoa, however the parabasalid human parasite Trichomonas vaginalis expresses three proteins that contain a CUB domain and the slime mold Polysphondylium pallidum expresses a CUB protein. CUB domains appear to have been present in some of the earliest unicellular marine eukaryotes such as alga and plankton and have become established in multicellular eukaroytes.

There are examples of CUB domains in bacteria such as Clostridium perfringens [2] and archaea. The Clostridium perfringens and archaea CUB domain gene was probably obtained by horizontal gene transfer from a eukaryote [2].

Interestingly CUB domains have structural similarity to a number of viral capsid proteins including the small protein subunit of the bean-pod mottle virus (BPMV) capsid [3,4]. CUB domains and these capsid proteins may have evolved similar structures through convergent evolution [4].

  1. Bork, P. and G. Beckmann, The CUB Domain : A Widespread Module in Developmentally Regulated Proteins. Journal of Molecular Biology, 1993. 231(2): p. 539-545.

  2. Briggs, D.C. and A.J. Day, A bug in CUB’s clothing: similarity between clostridial CBMs and complement CUBs. Trends in Microbiology, 2008. 16(9): p. 407-408.
  3. Varela, P.F., et al., The 2.4 Å resolution crystal structure of boar seminal plasma PSP-I/PSP-II: a zona pellucida-binding glycoprotein heterodimer of the spermadhesin family built by a CUB domain architecture. Journal of Molecular Biology, 1997. 274(4): p. 635-649.
  4. Romero, A., et al., The crystal structures of two spermadhesins reveal the CUB domain fold. Nat Struct Biol, 1997. 4(10): p. 783-8.

R script, microarray data, and the interesting outliers

I previously mentioned that outliers in microarray data are what you should be interested in. But what did I mean by that?

Let’s take a look. The following is an excerpt of simple R script which can be used to examine microarray data outputted from the CARMAweb service:

> gene.normal <- normal.df[“LAMP3”, ]

> t.gene.normal <-

> gene.cancer <- cancer.df[“LAMP3”, ]

> t.gene.cancer <-

> boxplot(t.gene.normal$LAMP3, t.gene.cancer$LAMP3, names = c(“Normal”, “Cancer”), ylab = “RMA”, main = “LAMP3”)

> mean.normal <- mean(t.gene.normal$LAMP3)

> mean.cancer <- mean(t.gene.cancer$LAMP3)

> fold.change <- mean.cancer/ mean.normal

> mean.normal

[1] 5.377307

> mean.cancer

[1] 5.634959

> fold.change

[1] 1.047915

In this example LAMP3 is being examined. Judging from the average fold change there is no difference between normal tissue and cancer. However if you look at the box plot:

You see that there are three tumours in which the robust multiarray average (RMA) is much higher than in normal tissue. These could represent a small subpopulation of tumours. Although it is difficult to tell from this dataset alone which contains only 45 tumours.

These tumours are potentially important and this would be missed by only looking at the fold change. You could imagine a drug that targets LAMP3 being effective only in this subset. This is much better than no drug being developed because on average there appears to be no fold change.

Next generation treatments for type I diabetes – Biology vs Engineering

Some time ago I produced an article for the science website Apptheneum on the future of type I diabetes treatments which focused on gene and cell therapy as potential cures:

These are purely biological solutions, however the engineers also have their own.

The solution offered by the engineers has one distinct advantage – the immune system is irrelevant. One of the problems of using gene or cell therapy is that type I diabetes is an autoimmune disease. This means that even if you “cure” the disease it is likely to recur unless the underlying immune dysfunction is also dealt with or circumvented. There is a long way to go on this front. The engineers solution is an artificial pancreas which is not exposed to the immune system as it rests outside of the body.

“The artificial pancreas is not a replica organ; it is an automated insulin delivery system designed to mimic a healthy person’s glucose-regulating function”:

You can argue as to whether the name is appropriate as blood glucose regulation is only one aspect of the function of the pancreas but I’m sure someone has already. The NIH launched a $20 million program to fund artificial pancreas clinical trials in 2014:

So it looks like improved insulin delivery devices will become available in the not so distant future. However the problem with non-biologically engineered solutions is that they cannot perfectly regulate blood glucose levels (as cell and gene therapies potentially can) and it is likely that over time people with type I diabetes using artificial pancreases will still develop problems with their feet associated with reduced circulation and nerve damage. Cardiovascular disease, retinopathy (eye damage), general nerve damage, kidney disease, and sexual dysfunction will still be major problems. However artificial pancreases will no doubt be a vast improvement over manual insulin injection.

For this reason I believe that the engineers will win the race but lose the war. It is only a matter of time until the immune system is understood to a level where the underlying autoimmune disease can be dealt with.

The pancreatic cancer database – an excellent resource

The pancreatic cancer database1 is a one-stop shop for finding information derived from the literature on the expression levels of mRNA, miRNA, and protein in pancreatic cancer:

It has been produced by the team behind the 2009 PLOS Medicine paper2 which catalogued a list of potential biomarkers for pancreatic cancer using an algorithm that examined microarray databases and the published literature for overexpressed mRNAs and proteins.

You can search by gene or protein identifiers or browse by gene symbol. All results are hyperlinked to the relevant PubMed entries.

It should be noted that the database is not complete. For example searches for “CDCP1” (there is data in the literature) or “IL24” yield no results.

The database is a useful resource to quickly check the status of a gene of interest. However the results are not fine grained. Microarray data is reported as average fold change. Pancreatic cancer is an extremely heterogeneous disease and it is worth keeping in mind that average fold change can mask important outliers. It is the outliers that are interesting/ important.

It is certainly worth taking a closer look at the underlying microarray data. This will require some processing and analysis. However the R statistical programming language and various web resources such as CARMAweb3 make this relatively straightforward:

  1. Thomas, Joji Kurian, Min-Sik Kim, Lavanya Balakrishnan, Vishalakshi Nanjappa, Rajesh Raju, Arivusudar Marimuthu, Aneesha Radhakrishnan, et al. ‘Pancreatic Cancer Database: An Integrative Resource for Pancreatic Cancer’. Cancer Biology & Therapy 15, no. 8 (August 2014): 963–67. doi:10.4161/cbt.29188.
  2.  Harsha, H. C., Kumaran Kandasamy, Prathibha Ranganathan, Sandhya Rani, Subhashri Ramabadran, Sashikanth Gollapudi, Lavanya Balakrishnan, et al. ‘A Compendium of Potential Biomarkers of Pancreatic Cancer’. PLoS Medicine 6, no. 4 (7 April 2009): e1000046. doi:10.1371/journal.pmed.1000046.
  3.  Rainer, J., F. Sanchez-Cabo, G. Stocker, A. Sturn, and Z. Trajanoski. ‘CARMAweb: Comprehensive R- and Bioconductor-Based Web Service for Microarray Data Analysis’. Nucleic Acids Research 34, no. Web Server (1 July 2006): W498–503. doi:10.1093/nar/gkl038.