Gene Regulation Network Based Analysis Associated with TGF-βeta Stimulation in Lung Adenocarcinoma Cells

Background Transforming growth factor (TGF)-β is over-expressed in a wide variety of cancers such as lung adenocarcinoma. TGF-β plays a major role in cancer progression through regulating cancer cell proliferation and remodeling of the tumor micro-environment. However, it is still a great challenge to explain the phenotypic effects caused by TGF-β stimulation and the effect of TGF-β stimulation on tumor micro-environment. Objectives To address this issue, in the present study we used two time-course microarray data in human lung adenocarcinoma cells and applied bioinformatics methods to explore the gene regulation network responding to TGF-β stimulation in lung adenocarcinoma cells. Materials and Methods The time-dependent reverse-engineering method, protein-protein interaction network analyses, and calculation of the similarity measures between the links were used to construct gene regulatory network and to extract gene clusters. Results Utilizing the constructed gene regulation network, we predicted NEFL and LUC7A show the opposite and the same change with C21orf90 if HAND2 is knocked-out after treatment with TGF-β1 for 4 hours and for 12 hours respectively. FGG and HSPC009 are predicted to display the opposite change with NEFL if CSMD1 is knocked out after treatment with TGF-β1 for 12 hours. Additionally, by integrating two datasets, we specially identified several nested clusters which included those genes regulated by TGF-β stimulation in lung adenocarcinoma cells. Conclusions Our analysis can help a better understanding regarding how TGF-β stimulation causes the expression change of a number of the genes and provide a novel insight into TGF-β stimulation effect on lung adenocarcinoma cells.


Background
Transforming growth factor (TGF)- plays a major role in the initiation and progression of cancers. In the earlier stages, TGF- works as a tumor suppressor via inhibiting cell growth and apoptotic induction in the epithelial cells (1). On the contrary, in the later stages, the epithelial cells will become refractory to the growth inhibitory eff ect of TGF- which acts as a tumor promoter, increasing the tumor-promoting activity, cause the invasiveness, and metastasis (1,2). Several previous observations have approved that the normal epithelial cells show diff erential response to TGF- stimulation as compared to the tumor cells (3). In response to TGF- stimulation, tumor cells display an increased production of proteases and down-regulation of the inhibitors of the proteases, whereas this is not observed in the normal cells (4). Recent studies have found some novel regulation relationships between TGF- and genes in lung tumor cells. For example, Wang et al. have found that TGF- regulates the proliferation of lung adenocarcinoma cells by inhibiting PIK3K3 expression (5). Yu et al. unveiled a novel link between TGF- and Rac1. They considered the atypical Rac1 activator DOCK4 as a key component of the TGF-/Smad pathway that promotes Hua L. et al. lung adenocarcinoma cell extravasation and metastasis (6). Risolino et al. have found that the transcription factor PREP1 induces Epithelial-mesenchymal transition (EMT) and metastasis by controlling the TGF--SMAD3 pathway in non-small cell lung adenocarcinoma (7). However, it is not clear if various actions of the TGF- on the normal and lung tumor cells are due to diff erential gene regulations.
We know that gene regulatory networks have an important role in each life processes including cell diff erentiation, cell metabolism, cell cycle, and signal transduction (8). For example, De and Berx have claimed that the EMT-associated reprogramming of the cells not only suggests that fundamental changes in several regulatory networks might occur but also that an intimate interplay exists between such networks. Disturbance of the controlled epithelial cells' reproduction balance is triggered by altering several layers of regulation (9). Meng et al. constructed the lung adenocarcinoma related regulatory network using microarray data and found that FLI and TAL1 promote TGFBR and KDR expression respectively; the result of which is activation of the TGF- signaling pathway (10). Genovese et al. uncovered a novel regulation of TGF- signaling via a Smad4 transcriptomic network by miR-34a through constructing a network model based on the complex multidimensional cancer genomic data (11). Vilar et al. developed a concise computational model of the TGF- pathway and showed that the fi rst layer of communication with the environment, the ligand-receptor network, is not merely a passive transducer of the signals but rather embeds properties that make it a signal processing unit (12). Therefore, these evidence support the identifi cation of the intricate interplay between genes responsible for the observed phenotypes based on the gene regulation network, and will help to understand how TGF- stimulation aff ects the biological change of normal cells or tumor cells.

Objectives
In the present study, we used two time-course microarray data in human normal lung epithelial cell and lung adenocarcinoma cell and applied novel bioinformatics methods to explore the gene regulation networks associated with TGF- stimulation in two diff erent cell lines. We predicted the network change when several genes regulated by TGF- are knocked-out. Our analysis can help to understand better how TGF- causes the expression change of other genes and gives an insight into TGF- eff ect on lung adenocarcinoma cells, as well as the development of the more eff ective lung adenocarcinoma treatment strategies.

Data Description
To explore the network change when genes regulated by TGF- are knocked-out, we selected the gene expression profi ling regulated by TGF- in the normal lung epithelial cells (HPL1D) and lung carcinoma cells (A549). HPL1D and A549 cells were treated with TGF- 1 for 1, 4, and 12 hours, the total RNAs were extracted and were used for microarray using human 19 k arrays (4). This dataset was downloaded from Gene Expression Omnibus database (http://www.ncbi. nlm.nih.gov/geo/) (accession No. GSE7436). Genes which show log 2 ratios greater than 0.37 (1.3 fold induced) or less than -1.5 (0.3 fold induced) at any one of the time points are considered as up-regulated by TGF- or down-regulated by TGF-, respectively. To help observe the data distribution, we used the supra-hexagonal map (13) to display the samples' characteristics. The supra-hexagonal map provides a choice for calculating the covariance matrix based on a variety of diff erent distance metrics. It converts the gene-sample matrix into the codebook matrix and genes with similar data patterns were taken as the same or nearby nodes in the map by applying a self-organizing learning algorithm for the symmetric topology of the supra-hexagonal map. We observed the obvious gene expression diff erence between normal lung epithelial cells (HPL1D) and lung carcinoma cells (A549) at diff erent time points (supplementary Fig. 1).

Diff erentially Expressed Genes Filtration under Diff erent Time Points
In this analysis, we used R-bioconductor limma package (http://www.bioconductor.org) to select diff erentially expressed genes under diff erent time points between normal lung epithelial cells (HPL1D) and lung carcinoma cells (A549). To select those genes that signifi cantly distinguish the normal lung epithelial cells from the lung carcinoma cells as well as avoiding the constructed network complication which may cause the network can not be well explained; therefore, we kept the top 20 most signifi cant diff erentially expressed genes distinguishing HPL1D from A549 at each time point. In addition, we also selected the top 20 ranked genes with the global diff erentially expression across all time points (supplementary Table 1  Table 1). Finally, all of the reserved diff erentially expressed genes were used for further network construction.

Construction of the Gene Regulation Network
In the present study, we used the time-dependent reverse-engineering method to construct gene regulatory network. This method relies on a Lasso penalized estimation of the linear regression model (14). Suppose the linear regression model is: are the predictors, and i  is a noise following some probabilistic distributions. Assume that the predictors are standardized and that the response is centered, the Lasso penalized estimation is then given by: Where  is a non-negative scalar that determines the level of the constraints. Based on this method, the time dependent reverse-engineering method for constructing gene regulation network was described as followings simply (14): Suppose that we have selected N genes across T time points and for P subjects; we defi ne npt x is the expression of gene n for individual p at the time-point t. Since each gene is considered as a response variable, therefore the model is composed of N linear regression models. It is defi ned:   . n  is a non-negative scalar that determines the level of the constraints. At each step, the estimation of matrices done several times throughout the cross-validation. In this analysis, the reserved top ranked diff erentially expressed genes were used to construct gene regulation network. The Cascade package (15) of R software (http://www. r-project.org) was used to implement this program.

Data Description
To further explore the gene regulation relationships associated with TGF- stimulation in the lung carcinoma cells, we performed the additional analysis for another time course microarray data. We downloaded this dataset from Gene Expression Omnibus (accession No. GSE17708 (16)). This dataset was obtained from human A549 lung adenocarcinoma cells undergoing TGF--induced epithelial-mesenchymal transition (EMT).

Filtration of Diff erentially Expressed Genes at Diff erent Time Points
For GSE17708 dataset, at each time point, excluding 0 hours, the diff erentially expressed genes were selected for their ability that distinguishes the expression at this time point from the expression at 0 hour. In the previous analysis, the genes with p-value <0.001 and >2-fold change at each time point were considered as diff erentially expressed genes (17). We kept these genes for further analysis, as well.

TGF- Associated Gene Clusters Identifi cation in Lung Adenocarcinoma Cells; an Integration of the Two Datasets
To understand the functional gene clusters (i.e., networks) associated with TGF- in lung adenocarcinoma cells, we mapped the diff erentially expressed genes extracted from the two datasets; A)-the fi ltered diff erentially genes for GSE7436 and B)-the fi ltered diff erentially expressed genes for GSE17708, to protein-protein interaction (PPI) networks using the STRING database (http://string-db.org) which is a database of known and predicted protein interactions. To avoid those pairs with a lower association to the lung adenocarcinoma into the analysis, we fi ltered the gene pairs which only include those genes related to the lung cancer using Online Mendelian Inheritance in Man (OMIM) database (18). In the current study, we consider a gene network to be a set of closely interrelated links. By calculating the similarity measures between links, we can determine the expected amount of overlap clusters around a gene. A gene belongs to multiple clusters means that this gene is important in the gene regulation network. The program was implemented with Linkcomm package (19) of R software (http://www.r-project.org).

Extracting the Signifi cant Diff erentially Expressed Gene
For each time point, we selected the top 20 most signifi cant diff erentially expressed genes between normal lung epithelial cells and lung carcinoma cells. We also kept the top 20 ranked genes with the global diff erentially expression. Finally, the total of 80 diff erentially genes was identifi ed. Among the global diff erentially expressed genes, HAND2 displayed the obvious down-regulation in the normal lung epithelial cells after treatment with TGF- 1 for 1 hour (Fig. 1). Although HAND2 did not display an obvious up-or down-regulation in the lung carcinoma cell at three-time points, the trend of up-regulation seems obvious (log2ratio>0). There are some studies that have demonstrated HAND2 over-expression in the lung squamous cell carcinomas and de-regulated in the histological subtypes (4). Recent evidence has suggested that HAND2 methylation is a common and crucial molecular alteration in several types of cancers, and could be employed as a potential biomarker for the early detection as well as a predictor of the treatment response (20).

Construction of the Gene Regulation Network Associated with TGF- Stimulation
We used the identifi ed 80 diff erentially expressed genes to construct gene regulation network associated with the TGF- stimulation. Considering that a larger number of edges in the network makes it diffi cult to interpret the relationships between genes, therefore, we selected a cutoff to simplify the network. In order to choose the best cutoff , we used the evolution method which allows us to see the evolution of the network when the cut off is growing up. At each step, the positions of the genes are re-calculated. The current choice of cutoff relies on a p-value which corresponds to the adequacy of the data to a power law distribution. The p-value should ensure the scale-freeness of the network is reliable (14,21). Finally, a cutoff of 0.10 was selected to fi lter the network (supplementary Fig. 2). The fi ltered network is shown in the supplementary Figure 3.

Prediction of Gene Expression Modulations a fter a Knocked-out Experiment
Next, we wanted to predict the changed genes' regulations if expressions of some genes are knocked out. We found when HAND2 is knocked-out following to the 4 hours treatment with TGF- 1 , NEFL, and LUC7A show the opposite change trend with C21orf90 which is up-regulated ( Fig. 2A). When HAND2 is knocked out after treatment with TGF- 1 for 12 hours, NEFL, LUC7A, C21orf90, and HSPC009 displayed Hua L. et al.
the same change trend (Fig. 2B). Meanwhile, we found C21orf90 to show a diff erent expression trend after knocking out of HAND2 at diff erent time points. Although few studies have confi rmed the direct association between C21orf90 and lung carcinoma, the expression of C21orf90 was approved to be related to the lung parenchyma which is aff ected by the abnormal infl ammatory immune response (22). When CSMD1 was knocked out following to 12 hours of TGF- 1 , FGG and HSPC009 displayed the opposite change compared to the NEFL. FGG and HSPC009 were all up-regulated noticeably (Fig. 2C). We know that CSMD1 is a tumor suppressor, and a previous array-based comparative genomic hybridization (aCGH) analysis detected the loss of CSMD1 in lung squamous cell carcinomas (23). From the analysis, we can observe that knocking-down of CSMD1 causes the change in the FGG expression. Although the current evidence can not support the regulation relationships between CSMD1 and FGG, it has been reported that the expression of FGG changed during EMT of lung cancer by several genes such as FOXA1 knockdown in A549 cells (24). Therefore, these potential fi ndings need to be validated by more molecular biology experiments in future studies.

Identifi cation of Gene Clusters Associated with TGF- in Lung Adenocarcinoma Cells (Integration of Two Datasets)
After 2,714 diff erentially expressed genes which include 80 diff erentially expressed genes extracted from GSE7436 and 2,634 diff erentially expressed genes extracted from GSE17708 were mapped to the protein-protein interaction network, 273 gene pairs which also include genes related to the lung cancer were fi ltered. Then we used Jaccard coeffi cient to cluster links between these genes. When cutting the dendrogram at a point (height = 0.5972) where the partition density is maximized, 17 gene clusters were identifi ed (supplementary Fig. 4). Meanwhile, the number of nodes (genes) in the largest gene cluster is 19.
We observed 6 genes that belong to more than fi ve gene clusters (Fig. 3A and Fig. 3B), including JUN (belongs to 9 clusters), VEGFA (belongs to 8 clusters), IL6 (belongs to 8 clusters), EGFR (belongs to 7 clusters), TGFB1 (belongs to 6 clusters), and EGR1 (belongs to 5 clusters). It is noted that these genes were all associated with the TGF- induction and activity in the lung cancer. For example, it has been reported that TGF- is the major inducer of the interleukin-6 (IL-6) and vascular endothelial growth factor (VEGF), and the increased production of TGF- is followed by the increased IL-6 and VEGF secretion related to tumor cell proliferation (25). In addition, Finocchiaro et al. have suggested the role of TGFB as a mediator of the intrinsic resistance to EGFR tyrosine kinase inhibitors in non-small cell lung cancer (NSCLC) patients (26). As well, genes that belong to multiple gene clusters are possible to be discovered in the sets of genes that belong to clusters that are entirely nested within a larger cluster of the genes. For example, gene A, B C involved in a cluster (green color) are entirely nested within the larger cluster including gene A, B, C and D (brown color) (Fig. 3C). Therefore, genes involved in those nested clusters should be noted for their potential interaction eff ect with other genes. In the present study, we acquired some nested gene clusters. For example, Figure 3D shows the genes involved in the two nested gene clusters. We found that JUNB, EGR2, ZFP36, IRF1, IRF2, and MYD88 all are involved in the nested gene clusters.

Gene Set Enrichment Analysis (Integration of Two Datasets)
To explore the potential biological knowledge of the identifi ed gene sets or gene networks, the total of 2,714 diff erentially expressed genes extracted from the two datasets were uploaded to ConceptGen software (16) which off ers over 20,000 concepts comprising 14 diff erent types of biological knowledge for performing the enrichment analysis. The signifi cance of the overrepresentation is measured by a modifi ed Fisher's exact test (p-value) and q-values which take into account the estimated proportion of false positives incurred based on p-values. By default, only concepts with q-values < 0.05 are displayed in this analysis. The results can partly refl ect TGF- induction and the complexity of EMT process. For example, the protein interactions among the diff erentially expressed genes link the TGFB1, TGFB2, and TGFB3 interactions directly (Fig. 4A). Moreover, we found these diff erentially expressed genes set also links JUN interactions. It has been reported that dysregulated c-jun expression may be involved in the acquisition of anchorage independence in the process of human lung carcinogenesis (27). Transcription factors interactions among the diff erentially expressed genes link the transcription factor Egr-1 and p53 (Fig.  4B). Previous studies have found that Egr-1 mediates the stimulation of collagen transcription elicited by TGF-. Also, it was found that TGF- is necessary for the development of the pulmonary fi brosis (28). It has been reported that TGF- causes a time-and dosedependent increase in Egr-1 protein and mRNA levels as well as enhancement of the Egr-1 gene transcription Hua L. et al. Meanwhile, C21orf90 is up-regulated (log2ratio>0.37). (B) The changed gene expressions when HAND2 is knocked out following to the treatment with TGF- 1 for 12 hours. When HAND2 (colored by green) is knocked out, NEFL (colored by pink), LUC7A (colored by pink), C21or f90 (colored by red), and HSPC009 (colored by red) display the same changed trend. (C) The changed gene expression when CSMD1 is knocked out after treatment with TGF- 1 for 12 hours. When CSMD1 (colored by green) is knocked out, FGG and HSPC009 (colored by purple) have displayed the opposite change with NEFL (colored by pink). Meanwhile, FGG and HSPC009 are all up-regulated markedly (log2ratio>0.37).
Hua L. et al.
via serum response elements in the normal fi broblasts (29). In addition, recent studies have shown that p53 aff ects TGF- /SMAD3-mediated signaling, cell migration, and tumorigenesis. Mutant p53 proteins also regulate Nox4-dependent signaling in TGF--mediated cell motility (30). Other enrichment analyses such as those carried out based on Gene Ontology (GO) or Kyoto Encyclopedia of Genes and Genomes (KEGG) have also resulted in fi nding that these diff erentially expressed genes set are related to the TGF- induction and activities.
In summary, we observed NEFL and LUC7A show an opposite and the same change with C21orf90 if HAND2 is knocked-out following to TGF- 1 treatment for 4 and 12 hours, respectively. In addition, FGG and HSPC009 display an opposite change with respect to NEFL if CSMD1 is knocked out after treatment with TGF- 1 for 12 hours. Furthermore, by integrating the two datasets, we specifi cally have identifi ed several nested clusters which include those genes regulated by the TGF- in the lung adenocarcinoma cells.

Discussion
It is known that TGF- plays important roles in cancer progression, aff ecting both tumor and stromal cells. Gene regulatory networks may have a potential infl uence on cell diff erentiation and cell metabolism. Most infl uence of the TGF- is brought about by regulation of gene expression. In the current study, our aim was to explore the intricate interplay between genes in response to the TGF- stimulation in the lung adenocarcinoma cells. Our analysis shows that when a number of genes involved in gene regulation network are knocked out, several other genes will show up-regulation or down-regulation trends. By integrating two time-course microarray data in human lung adenocarcinoma cells, we specifi cally have identifi ed some nested gene clusters and found that TGFB1, EGR1, EGR2, EGFR, IL6, JUN, and JUNB all are involved in these clusters. The previous evidence has confi rmed the potential regulation role of TGFB on these genes, such as EGR1, EGFR, and IL6 in the lung cancer. Taken together, our analysis can help to understand better how TGF- causes changes in the It should point out the limitation of our study. The shortcoming of time-course microarray data is the low sample size. Although we integrated two datasets to implement the analysis in this paper, however, this integration is not enough. In the practice, utilizing more available time-course datasets and performing the synthesized analysis such as Meta-evaluation will help to improve the reliability of the results. Moreover, our prediction for the expression change of the NEFL, LUC7A, C21orf90, FGG, and HSPC009 if HAND2 and CSMD1 are knocked-out TGF- 1 treatment at diff erent time points need to be validated by further molecular biology experiments in the future studies.