Prediction of the protein structure and function of the Hc-STP-1 cDNA sequence isolated from Haemonchus contortus
INTRODUCTION
Majority of the biological processes in a cell are governed by the reactions involving the phosphorylation and dephosphorylation of proteins. These molecular switches regulate the expression of genes, as well as the progression of the cell cycle. In addition, cellular differentiation is also controlled by the transfer of phosphate groups to or from a specific protein (Liu et al 2008). Other cellular processes that are influenced by protein phosphorylation include programmed cell death, cellular transformation and transmission.
Phosphatases are the main proteins involved in the transfer of phosphate groups to or from a specific protein substrate (Davare et al 2000). Biochemical investigations have resulted in the identification of a number of phosphatases, with parallel sub-classification schemes based on their functional capacity. Phosphatases could be further grouped according to their specificity to specific substrates (Pais et al 2009). As such, there are therefore serinethreonine phosphatases, as well as tyrosine phosphatases. Interestingly, there are certain phosphatases that confer dual specificity, wherein the enzyme has the capability of using both serinethreonine and tyrosine as its substrate (Bakan et al 2008).
There are currently a number of serinethreonine phosphatases that have been identified, with each type classified according to their mechanism of action and dependence on cofactors (Golden et al 2008). To date, the largest group is the protein phosphatase (PP) family, which is comprised of 7 subfamilies. It has been reported that majority of the phosphorylation activities within the cell are performed by serinethreonine phosphatases PP1 and PP2A (Adams et al 2005). Another group of protein phosphatases is characterized by its dependence on metal ions (PPMs).
The advent of computational techniques has allowed researchers to study macromolecules using a different approach, mainly involving sequence analysis and prediction strategies. Such novel method has been applied to almost every area of biomedical research, from basic biological processes to drug discovery. In the field of biochemistry, computational methods have facilitated in protein structure and function predictions, which could subsequently lead to the discovery of protein targets for molecular therapeutics. However, it should be understood that the quality and reliability of data generated by bioinformatics tools are dependent on human reasoning and thus the researcher still controls the direction and progression of the computational analysis.
This report will focus on the computational prediction of the structure and function of a complementary deoxyribonucleic acid (cDNA) sequence that was isolated from the strongylid nematode, Haemonchus contortus. This species commonly thrives as a parasite in small ruminant vertebrate species. This study will initially predict various levels of protein configurations based on the cDNA sequence of interest.
Furthermore, this investigation will attempt to determine putative functional sites within the predicted protein, in order to infer possible roles of the protein product in the cell. We hypothesize that the protein product of Hc-STP-1 carries specific amino sequence motifs that serve as binding sites for other proteins of the cell. In addition, the putative protein of Hc-STP-1 may have active sites the influence the rate of binding with other cellular proteins. Candidate interacting proteins will also be presented in this study, alongside prospective approaches for inhibition and enhancement of phosphorylation reactions within the cell.
RESULTS
Translation of the 951-base pair cDNA sequence of serinethreonine phosphatase-1 derived from Haemonchus contortus (Hc-STP-1) using BLASTX 2.2.23 (Altschul 1997) generated a primary polypeptide chain of approximately 317 amino acids in length (Figure 1). Secondary protein structure prediction of the translated primary polypeptide chain using PSIPRED (Yang et al 2010) resulted the identification of regions that showed specific configurations (Figure 2). Majority of the secondary protein configuration were observed to have high levels of confidence in prediction, as indicated in the blue bars above the predicted secondary protein structures.
The secondary protein configuration was mainly composed of helices that were distributed across the entire stretch of the polypeptide chain. A total of 12 helices were found in the predicted secondary protein structure, with varying lengths ranging from 4 to 17 amino acids in length. In addition, there were 13 strands that were distributed across the entire stretch of the polypeptide chain. The length of the strands also varied, yet were relatively shorter as compared to that of the helices, ranging from 2 to 6 residues in length. The remaining regions of the secondary polypeptide structure were predicted to follow a coiled configuration.
One peculiar feature of the predicted secondary structure was that amino acid residues 121 to 160 was largely predicted as three tandem coils with very short coils of 1-3 residues in between. More specifically, two of the three coils located at residues 140 to 160 only had one amino acid residue in between and this is located at residue 152. It is possible that this region of the polypeptide chain may be directed towards further conformational changes that may influence its interaction with other substrates.
Prediction of the tertiary structure of Hc-STP-1 using MEMSAT-SVM (Nugent 2010) resulted in a protein structure that traversed the cell membrane (Figure 3). Both N- and C-
terminals of the protein were located within the cytoplasm of the cell. Two major helical segments traverse the cell membrane, the first helix starts at residue 93 and runs through to residue 108. The second helix runs from residue 147 to 162. Amino acid residues 109 to 146 are located at the extracellular side of the cell membrane.
The BLASTX results also provided similarity searches in the protein database of GenBank. The translated sequence of Hc-STP-1 was found to be highly similar (90 to 95 sequence identity) to four serinethreonine phosphatase enzymes of Trichostrongylus virtrinus. The GenBank entries are as follows embCAM84506.1, embCAM84505.1, embCAM84509.1 and embCAM84507.1. The protein alignments shown in Figures 4 through 7 indicate where similarities were evident, as well as where differences were observed. Gaps were inserted within the query sequence in order to attain a global alignment of the query and the database protein.
Identification of conserved domains within the polypeptide sequence indicated that the product of Hc-STP-1 contained sequence motifs that conferred regulatory functions in relation to the cell cycle. In addition, the polypeptide sequence also showed motifs associated with the synthesis of specific proteins that were responsible for the normal physiology of the cell. Given the high similarity of the translated Hc-STP-1 sequence with identified proteins in the database, it is highly likely that the protein product of the cDNA query would function as a phosphatase.
DISCUSSION
The employment of bioinformatics tools has facilitated in the prediction of protein structure and function of the cDNA sequence of Hc-STP-1. From a 951-base pair DNA sequence isolated from Haemonchus contortus, a 317-amino acid residue polypeptide chain was predicted from the BLASTX translation feature. Further prediction analysis has generated the secondary configuration of the protein of interest. Bioinformatics has allowed the identification of identification of amino acid segments that would configure into coils, thread or sheets. Such prediction is mainly based on the combination of amino acids that are present within a defined polypeptide length.
The presence of helices within a secondary protein configuration allows the macromolecule to achieve a condensed state as it progresses to its final tertiary or quaternary structure. In addition, the strands within the secondary protein configuration allow interactions between other amino acids within the protein sequence. The strands also allow interactions between two different proteins, as these structures are capable of arranging themselves in a parallel orientation between each other.
Secondary protein structure prediction has determined that the polypeptide chain is predominantly composed of helices. The region of the polypeptide chain that was covered with helices was also observed in the predicted tertiary structure. It should be understood that in order for a protein to exist within the cell membrane and at the same time attain its normal physiological function, it is essential that this specific region exist as a helix. Such configuration has been predicted in the tertiary structure of the Hc-STP-1 protein. The helical structure facilitates in the protection of polar amino acids within the inner side of the helix. On the other hand, the amino acids that have non-polar R groups are positioned in the outer side of the helical structure. This polarity preserves the protein and thus maintains its protein functionality, as it exists within the bipolar plasma membrane of the cell.
Sequence alignments using the predicted protein of Hc-STP-1 showed that the sequence was highly homologous to other identified serinethreonine phosphatases. The protein alignments serve as tools in quantifying the exert of similarity of a pair of protein sequences and this also allows the identification of blocks of amino acid sequence that have been conserved across different taxa. There are two major types of conserved domains that have been detected in the predicted protein of cDNA Hc-STP-1.
Metallophosphatases (MPPs) are considered as a superfamily of enzymes that have the inherent capacity to interact with metal ions. Such interactions with managanese or zinc ions assist in the caging of specific polar amino acids such as histidine and asparagine. The most common metallophosphatases include exonucleases, phosphoprotein phosphatases and sphingomyelinases. This conserved region is composed of a beta-pleated sheet that is positioned between two metal-dependent active sites that are localized to the C-terminal side of protein region. The metallophosphatase domain facilitates in the coordinated interaction with metal ions within the cell s environment.
Another conserved domain that has been identified within the predicted polypeptide of the cDNA Hc-STP-1 is that of the serinethreonine phosphatase. This domain is capable of transferring a phosphate group from one site to another in the presence of either the serine or threonine amino acid residue in its active site. The presence of these conserved domains allows the design of future protein manipulations that could deactivate the protein. More importantly, the prediction of the structure and function of the cDNA sequence allows researchers to design molecular treatments that could target such proteins, especially when they are found to be highly active or over-expressed. Certain medical conditions, such as cancer and metabolic syndromes, are commonly characterized with active expression of proteins that regulate major cellular processes such as cell division and differentiation. The prediction of the structure and function of a cDNA sequence through the use of bioinformatics tools will facilitate in future investigations that would want to address methods in regulating such biological pathways.
CONCLUSION
Computational analysis has predicted that the protein product of the cDNA Hc-STP-1 is a serinethreonine phosphatase that is very similar to paralogous sequences in other species. The predicted protein is a 317-amino acid polypeptide that contains coils, threads and helices. The protein has been predicted to traverse the cell membrane, with both N- and C-terminal located within the cytoplasm of the cell. Molecular targets can be designed to regulate the activity of the predicted protein within the cell.
0 comments:
Post a Comment