Exam

Task 1 - Factors

The purpose of our research is to develop a biosimilar drug that would allow to cure leukaemia without chemotherapy. To do this, first, we have to study the molecular interaction network of a cell and have to identify the malfunctions which make a cell have leukaemia. The manual curation of many research paper was made by a phd student from the group. The data is typed manually to an excel (Manual Curation.xlsx), so there are many typos. This is a huge problem, because our R script which does the statistic for planning the experiments, identifies the functions by the string, eg. the “co-factor” and “Cofactor” different category, however, those are the same. So we get false data that is meaningless laboratory trials, money and stress. A protein may have several functions, which are separated by commas. That’s means a protein may belong to several categories at a time:

Uniprot AC funtioin
A42373 Co-factor
B288P2 Co-factor, Endocytosis related, Scaffold
C7SYN5 Cofactor

B288P2 comes in three different categories, while A42373 is just one. Both of them belong to the “Co-factor”" category. Conversely, C7SYN5 would be in a separate category from A42373, although it appears to be in a category, but has been mistyped. “Uniprot A” and “Uniprot B” are the Uniprot IDs of the interacting proteins 1 and 2, while “Function A” and “Function B” are the list of interacting proteins 1 and 2 of the interaction. The two proteins are equivalent, the connection is undirected, so there is no need to classify the functions in A and B.

Task Collect all the different functions from the table! (eg. the result for the table above is: [1] "Co-factor" "Endocytosis related " "Scaffold" "Cofactor")

Task 2 - Data cleaning

The following tasks have to apply on the data and instructions from Task 1:

Task 2/a Correct the typos in Function A and Function B column!

Task 2/b Collect the proteins for all functions and write to a separate file per function!

Task 3 - Loops

We study drug synergism in our lab. We study drug synergism in our lab. We used statistical test planning to use as few animals as possible to get enough data. The amount of data generated is manually unprocessable. The attached files contain the data obtained from the statistical evaluation and the results of the colleagues. Since a large amount of data is generated each month, both processing and part of the decision making process need to be automated. If synergism was fuond among two drugs, further testing is required.

  • Collect the files in which P <0.05!
  • Assemble a data.frame with the properties of these drugs (a file has a line item) and write it out to a file!
  • The name of the file should be today’s date!
  • Filter the final data.frame by the following criteria and write it to a file:
    • changes in the effect of at least one drug greater than 50%
    • the effect of both drugs is greater than 50%
    • the effect of both drugs is greater than 50% and not ‘None’
    • A change in the effect of one drug is greater than 50% and its effect is ‘Strengthening’
    • the effect of both drugs is greater than 50% and the effect of ‘Strengthening’

Task 4 - Differential Expression

During our research project one of our co-workers found an interesting paper:

Li, H.R., Lovci, M.T., Kwon, Y.S., Rosenfeld, M.G., Fua, X.D., and Yeo, G.W. (2008). Determination of tag density required for digital transcriptome analysis: Application to an androgen-sensitive prostate cancer model. PNAS 105, 20179 p.

It’s contains a study which is relevant for our resultes:

“This case study considers RNA-Seq data from a treatment vs control experiment. Genes stimulated by androgens (male hormones) are implicated in the survival of prostate cancer cells and are potential target of anti-cancer treatments. Three replicate RNA samples were collected from prostate cancer cells (LNCaP cell line) after treatment with an androgen hormone (100uM of DHT). Four replicate control samples were also collected from cells treated with an inactive compound.”

That would be wery helpful for our project if we know Which genes expressed significantly higher level in the samples treated with DHT compare with the controll samples?

You can find tha data used by this paper in the following file: pnas_expression.txt

Be careful, the data file contains one irrelevant column! Before you start the differential expression analysis exclude the genes which aren’t expressed in either one of the samples (the entire row contains 0)! Create a data.frame type variable contains those genes which expressed significantly higher level in the samples treated with DHT compare with the control samples and its’ logFC, logCPM, PValue and adjusted P-value! Save this data.frame to a tabulator separated table file.

Task 5 - Plotting 1

Read in the count table (pnas_expression.txt) and result table (result.txt) described in the 4th task. Make average of the counts per genes, then plot out their logarithm (y axis) in relation to the logarithm of the gene lenght (x axis). Create clear titles for axes and figures. Use small circle-shapen dots on the plot, and form their color to opaque black with the help of the rgp command. (Look up the usage of the rgb command). Accordingly, the density of the dots will be easily sensible. Plot the logarithm (y axis) of the average counts of the genes (DHT treated, higher expressed genes) from the result table onto the figure with red colour, in relation to the logarithm of the gene lenghts (x axis). Thus we can check, whether the gene lengths or the average of the counts influenced that we found significant difference between the treated and the control samples.

Task 6 - Plotting 2

Read in the count table (pnas_expression.txt) described in the 4th task. Create histogram figure from the logarithm of the counts, with the help of the multhist command which can be found in the plotrix package. Decrease the number of the groups (which used by default) so that the multhist can divide the counts into fewer groups. Color the columns of the various samples by using RColorBrewer. Give main title of the plot, then give a name to the Y axis. Create figure label by using legend command, where the colours are displayed in small squares.