Sanjeev Kumar, Balraj Singh Gill and Navgeet
Sanjeev Kumar*, Balraj Singh Gill and Navgeet
Centre for Biosciences, Central University of Punjab, Bathinda, India
Received date: September 19, 2016; Accepted date: October 25, 2016; Published date: November 01, 2016
Citation: Kumar S, Gill BS, Navgeet. Unveiling Hypothetical Proteins of Ganoderma: An Endeavour in Functional Annotation Using In silico tools. J In Silico In Vitro Pharmacol. 2016, 2:4
Copyright: © 2016 Kumar S, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Proteins mediate multitude of housekeeping functions vital for cell integrity and survival and even infinitesimally small aberration in protein functioning can cause numerous abnormalities. Numerous therapeutic effects observed in Ganoderma species directed the researchers to characterize protein sequences which may possess the potential to impart new edges to immune function and designing effective drugs. Although, numerous fungal proteins have been annotated, functions have not yet been assigned to 33 mitochondrial proteins of this fungus belonging to the genus of polypore mushrooms. Annotating functions of Ganoderma carried out for the first time, divulged parameters decisive in characterization including localizations of the protein, motif, domain, cluster, phylogenetic relationship, and protein interaction with other molecules. The information obtained using various bioinformatics tools results in functional annotation of proteins which plays role as integrase, prefoldin, endonuclease, dehydrogenase, polymerase, GIY-YIG endonuclease, LAGLIDADG endonuclease and reverse transcriptase.
Ganoderma lucidum (GL); Hypothetical proteins (HPs); Bioinformatics tools
Proteins arbitrate multitude of essential housekeeping functions critical for cellular survival and cell integrity. Even little aberration in protein misfolding and aggregation can compromise normal functioning, engendering neurodegeneration and necessitating therapeutic intervention [1] Proteins exhibit indistinguishable activities providing additional edges to immune function, in addition, to possessing anti-cancer potential. Moreover, the discovery of sundry other medicinal values in natural products instigated the researchers to sequence the genome and focus on diverse aspects of proteomics. The genome sequencing laid the foundation for rapid accumulation of different characterized as well as uncharacterized genes in Gene bank. Genome sequencing has achieved the functional annotation only to 50-60% of genes; therefore making functional annotation of remaining uncharacterized proteins a challenging task [2]. The introduction of bioinformatics tools has resulted in the inception of a new era [3] results in solving and annotating function to proteins through less time-consuming means.
A commonly known basidiomycetes fungus, Ganoderma, importantly indicated for cancer and neurodegenerative diseases, has been indisputably claimed to be an inexhaustible resource of cardinal myco-constituents including terpenoids, polysaccharides, and proteins [4,5]. Mitochondrial genome sequencing of Ganoderma lucidum revealed a total of 57 protein-coding genes, 2 rRNA and 26 tRNA [6]. Another species, Ganoderma sinensis, has not been still sequenced, but UniProt confirms the presence of 25 hypothetical proteins (HPs) [7]. The sequence retrieval from UniProt disclosed that genus Ganoderma have 33 hypothetical proteins. Proteins isolated from genus Ganoderma exhibit plethora of bioactivities via the various adapter proteins dominant at different stages of immune modulation [8,9], inflammation, and cancer signaling [10,11]. In addition to these, there are some other proteins that are predicted by gene prediction software, without in-vivo demonstration, known as hypothetical proteins (HPs)/non-characterized/unknown proteins, which play a decisive role in signaling mechanism. Annotating functions to these uncharacterized proteins can help in fathoming the various mechanisms fundamental to the adaptation of this fungus to a various stress condition. In silico studies have been found to be reliable enough to reveal crucial and lucid information about the biological functionality and comparative genomics in lesser time as compared to an experimental characterization which is more time-consuming [12]. This research work was carried out with an impetus to fathom the genomics and to understand the functionality, of the Ganoderma hypothetical proteins, uncharacterized until now [13].
Sequence analysis and comparison forms the prime step in identifying homologues and sequence similarity for characterization of the protein. BLAST proved to be the most reliable method for comparing the query sequences with the database for the prediction of proteins. Multiple sequence alignment of homologues in a family has been found to be a reliable method for functional annotation important in conserved domains. Motif analysis is a necessary step in the identification and characterization of HPs and detection of common motifs among proteins in sequence identities (e.g., less than 30%) may provide important clues for function or classification of HPs into appropriate families [14].
Various databases available freely for determining structure and function of motif include GenomeNet [15], PROSITE [16], PRINTS [17], Pfam [18], ProDom [19], BLOCKS [20], MEME [21] and InterPro [22] using InterProScan [23]. STRING database [24], is such software, pivotal in revealing the functionality of individual protein as well as the interaction with other factors. Moreover, it also determines the gene neighbourhood, gene fusion events and co-occurrence of specific subset of species which confer to the protein confidence score by direct (physical) and indirect (functional) associations, which in turn, are based on genomic context, high-throughput experiments, conserved co-expression and previous knowledge of the protein.
In order to predict the functionality of hypothetical proteins in Ganoderma species, various bio informational tools were used, each equipped to highlight a different parameter of protein (Table 1). Sequence retrieval from UniProt, a database of the protein sequence, was the primary step in the process of annotation. This was followed by the functional annotation of the hypothetical protein which involved prediction of the functional aspect of proteome using the bioinformatics tool, by analyzing the sequence, conserved domain, analyzing motif, phylogenetic relationship, and protein interaction.
a) Sequence similarity search | |
BLAST | Finds similar sequence |
Hhpred | Homology detection |
b) Physicochemical Characterization | |
ExPASy-ProtParam | Analysis of physic-chemical properties |
c) Sub-cellular predication | |
PSORT B | 97%precision |
PSLpred | 91% accuracy |
CELLO | 91% accuracy |
SignalP | Signal peptide cleavage site |
SecretomeP | location of the cleavage site |
TMHMM | Membrane topology Hidden Markov Model |
HMMTQP | Transmembrane topology |
d) Sequence alignment | |
PRALINE | multiple sequence alignment |
e) Protein classification | |
Pfam | Multiple alignment and HMMs |
CATH | hierarchical domain classification |
Superfamily | SCOP database |
SYSTERS | Protein Family Database |
SVMProt | Classification on the basis of primary sequence. |
CDART | NCBI Entrez Protein Database |
PANTHER | Evolutionary relationships |
ProtoNet | Automatic hierarchical clustering |
SMART | Sequence analysis |
f) Motif prediction | |
InterProScan | Diagnostic signatures |
MOTIF | Motif-based sequence analysis tools |
MEME Suite | - |
g) Clustering | |
CLUSS | Substitution Matching Similarity (SMS) |
h) Protein-protein interaction | |
STRING Version 9.1 | Predicts protein interactions |
Table 1: List of bioinformatics tools and databases used for annotating function.
Sequence analysis and physiochemical properties
Sequence investigation of the Ganoderma mitochondrial genome disclosed a total of 33 hypothetical proteins (https:// www.ncbi.nlm.nih.gov/genome/) subsequently retrieved from UniProt (https://www.uniprot.org/). Primarily, Expasy’s ProtParam server [25] was employed to compute numerous theoretical physiochemical properties such as molecular weight, amino acid composition, theoretical isoelectric point, extinction coefficient, instability index, aliphatic index, estimated half-life, instability index, extinction coefficient, average hydropath city (GRAVY) to deduce the protein sequence and assign function to the protein (Table 2).
S.No. | Proteinname | No. of amino acids | M.W. | Theo. pI | Instability index | Aliphatic index | Hydropathicity | Location WoLF PSORT | CELLO | Signal P 4.1 | Secretom P | TMHMM | HMMTOP |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1. | Galu_Mp10 | 151 | 17498.3 | 9.49 | Stable | 140.60 | 0.984 | Mito | P.M. | No | Yes | 4 | 4 |
2. | Galu_Mp16 | 439 | 51185.2 | 9.26 | Stable | 115.88 | 0.194 | Mito | P.M. | No | No | 4 | 7 |
3. | Galu_Mp17 | 283 | 32595.6 | 5.42 | Stable | 112.54 | -0.073 | Mito | P.M. | No | No | 2 | 3 |
4. | Galu_Mp19 | 192 | 21490.0 | 9.37 | Stable | 93.49 | -0.094 | Mito | P.M. | No | No | 0 | 3 |
5. | Galu_Mp20 | 294 | 32986.8 | 9.33 | Unstable | 127.28 | 0.531 | Mito | P.M. | Yes | Yes | 5 | 5 |
6. | Galu_Mp21 | 109 | 12368.5 | 9.62 | Stable | 111.65 | 0.162 | Mito | P.M. | No | No | 0 | 0 |
7. | Galu_Mp22 | 568 | 66673.6 | 8.75 | Stable | 87.48 | -0.317 | Mito | P.M. | No | No | 0 | 0 |
8. | Gasi_Mp30 | 236 | 27666.0 | 9.52 | Stable | 97.03 | -0.307 | No | Mito | No | No | 0 | 0 |
9. | Gasi_Mp34 | 385 | 44224.4 | 9.69 | Stable | 97.43 | -0.203 | No | O.M. | No | No | 0 | 0 |
10. | Gasi_Mp42 | 267 | 30441.6 | 9.89 | Stable | 94.16 | -0.176 | Nuc | Mito | No | No | 0 | 2 |
11. | Gasi_Mp09 | 295 | 34308.8 | 9.81 | Stable | 86.58 | -0.469 | No | Mito | No | No | 0 | 0 |
12 | Gasi_Mp23 | 186 | 21502.3 | 9.72 | Stable | 105.32 | -0.399 | Mito | Mito | No | No | 0 | 0 |
13 | Gasi_Mp21 | 364 | 42387.4 | 9.89 | Stable | 91.29 | -0.196 | No | O.M. | No | Yes | 1 | 1 |
14 | Gasi_Mp11 | 205 | 24146.6 | 9.56 | Stable | 114.54 | 0.013 | Mito | Cyto | No | No | 0 | 0 |
15 | Gasi_Mp33 | 768 | 88137.8 | 9.81 | Stable | 114.74 | 0.147 | No | P.M. | No | No | 7 | 8 |
16 | Gasi_Mp32 | 750 | 85209.4 | 9.81 | Stable | 87.08 | -0.327 | Mito | Mito | Yes | Yes | 0 | 0 |
17 | Gasi_Mp31 | 345 | 40236.1 | 9.65 | Stable | 89.28 | -0.377 | No | O.M. | No | No | 0 | 0 |
18 | Gasi_Mp40 | 356 | 40436.3 | 9.53 | Stable | 87.84 | -0.106 | No | O.M. | No | No | 0 | 2 |
19 | Gasi_Mp24 | 218 | 26171.0 | 9.52 | Stable | 102.84 | -0.161 | Mito | Cyto | No | Yes | 1 | 1 |
20 | Gasi_Mp41 | 215 | 24693.7 | 9.40 | Stable | 98.33 | -0.115 | No | O.M. | No | No | 0 | 0 |
21 | Gasi_Mp06 | 250 | 29198.8 | 9.61 | Stable | 84.60 | -0.298 | No | O.M. | No | No | 1 | 0 |
22 | Gasi_Mp05 | 410 | 46647.5 | 9.92 | Stable | 98.85 | -0.182 | No | Mito | No | No | 0 | 0 |
23 | Gasi_Mp27 | 131 | 15218.8 | 9.32 | Stable | 99.69 | -0.142 | Mito | Mito | No | No | 0 | 0 |
24 | Gasi_Mp44 | 248 | 28615.7 | 9.67 | Stable | 91.13 | -0.087 | No | P.M. | No | No | 0 | 0 |
25 | Gasi_Mp38 | 323 | 37058.4 | 9.52 | Stable | 95.63 | -0.397 | No | Mito | No | No | 0 | 0 |
26 | Gasi_Mp36 | 189 | 21941.1 | 9.40 | Stable | 96.93 | -0.346 | Mito | Mito | No | No | 0 | 1 |
27 | Gasi_Mp39 | 345 | 39119.1 | 9.28 | Stable | 94.84 | -0.059 | No | P.M. | No | No | 1 | 2 |
28 | Gasi_Mp26 | 317 | 36589.6 | 9.41 | Stable | 102.33 | -0.158 | No | O.M. | No | No | 0 | 0 |
29 | Gasi_Mp37 | 308 | 35293.4 | 9.63 | Stable | 80.71 | -0.368 | No | Nuc | No | No | 0 | 0 |
30 | Gasi_Mp22 | 141 | 16444.8 | 9.12 | Stable | 78.72 | -0.460 | Nuc | Cyto | No | No | 0 | 0 |
31 | Gasi_Mp10 | 421 | 47613.9 | 9.35 | Stable | 91.62 | -0.149 | No | P.M. | No | No | 0 | 0 |
32 | Gasi_Mp35 | 344 | 39209.1 | 9.47 | Stable | 102.01 | -0.207 | No | Mito | No | No | 0 | 0 |
33 | Gasi_Mp25 | 299 | 34587.6 | 9.41 | Stable | 107.93 | 0.095 | No | P.M. | No | No | 0 | 4 |
Theo: Theoretical pI;M.W.: Molecular Weight;P.M.: Plasma Membrane;Nuc: Nucleus;O.M.: Outer Membrane;Mito: Mitochondria;Cyto: Cytoplasmic |
Table 2: List of predicted sub-cellular localization of HPs in Ganoderma.
Sub-cellular localization prediction
The role of protein sub-cellular localizations in cytology, proteomics, and drug design investigation is axiomatic. Determining the location of the protein and annotating the genome would help in strategizing the drug and vaccine delivery approach with enhanced selectivity and efficacy, a prerequisite for facilitating systemic pharmacokinetics. This also becomes important because the drug delivery system involves the absorption of the drug in the cytoplasm and vaccine in the surface membrane. In addition, UniProt provides information about various proteins which have still not been characterized, thus, making the prediction of their localizations an urgent need. Where experimental methods may prove to be time-consuming, numerous recently exploited in-silico tools have paved the way for quicker determination of such pivotal characteristics. Such bioinformatics tools that clearly demarcate the protein localizations include WoLF PSORT [26] and CELLO [27]. CELLO is a multi-class support vector machine classification system divulging critical factors like the amino acid composition, dipeptide composition, partitioned amino acid composition and sequence composition. WOLF PSORT is a similar protein subcellular location prediction tool which, based on sorting signals, amino acid composition and functional motifs, converts protein amino acid sequences into numerical localization features which are then predicted. Prediction of signal protein was done using the Signal P 4.1 software [28] whereas the SecretomeP server [29] determined the cleavage sites in amino acid sequences. In addition, TMHMM [30] and HMMTOP [31] calculated the transmembrane helices and topology of the protein, distinguishing between soluble and membrane proteins with high degree of accuracy Moreover, methods based on protein sorting signals, TMHMM and HMMTOP, function on the basis of transport of initial synthesis to its functional areas, depending on signals of N and C-terminal (Table 2). Another method for annotation is based on protein function domain and gene ontology. The former is based on evolutionary process, conserved and known functional domain, whereas the latter is concerned with labeling the gene product.
Hypothetical proteins: Comparing the sequences
The initial step involves the prediction of functionality by comparing the sequence of the concerned hypothetical proteins with the sequences retrieved from UniProt databases. BLASTp [32] and HHpred [33] computed the similarity in query sequence and homology, revealing a total of 33 HPs in Ganoderma (Hidden Markov models) against the non-redundant database which gave different hits with different E-values (Table 3).
S.No. | Protein name | BLASTp | HHpred | ProDom | BLOCK |
---|---|---|---|---|---|
1 | Galu_Mp10 | (mitochondrion) [Moniliophthoraroreri] like protein | Piscicolin 126 immunity like protein | No | Fungal pheromone STE3 G-PCR signature |
2 | Galu_Mp16 | No | No | No | No |
3 | Galu_Mp17 | No | Prefoldin | No | Thyroid hormone-inducible hepatic Spot 14 |
4 | Galu_Mp19 | Laglidadg endonuclease family protein | Homing endonuclease, Laglidadg | Endonuclease Laglidadg | Homing endonuclease, Laglidadg/HNH |
5 | Galu_Mp20 | NADH dehydrogenase | No | No | Flagellar basal body-associated protein FliL |
6 | Galu_Mp21 | DNA directed RNA polymerase | DNA-directed RNA polymerase in mitochondria | RNA polymerase DNA-directed Nucleotidyl transferase |
Frizzled protein signature |
7 | Galu_Mp22 | DNA directed RNA polymerase | DNA directed RNA polymerase, bacteriophage T7 RNA | polymerase RNA DNA-directed mitochondrion transferase transcription nucleotidyl transferase | DNA-directed RNA polymerase, bacteriophage type |
8 | Gasi_Mp30 | GIY-YIG endonuclease | Uvr abc system protein C; DNA binding protein | Mitochondrion membrane oxidase transmembrane metal-binding iron heme | Excinuclease ABC, C subunit, N-term |
9 | Gasi_Mp34 | Intronic ORF at intron 1 of cox1)GIY endonuclease | Uvr abc system protein C | Endonuclease intron-encoded hydrolase DNA mitochondrion | Excinuclease ABC, C subunit, N-term |
10 | Gasi_Mp42 | Intronic ORF at intron 6 of cox1 | Intron-associated endonuclease 1 | Intronic Intron | Nuclear hormones receptors DNA-bind |
11 | Gasi_Mp09 | Intronic ORF at intron 6 of cox1 | Intron-associated endonuclease 1GIY-YIGEndonuclease | GIY-YIGEndonuclease | Excinuclease ABC, C subunit, N-term |
12 | Gasi_Mp23 | GIY Cytb i2 grp ID protein (Podosporaanserina (strain S/ATCC M...) | Intron-Associated Endonuclease 1GIY-YIG | Putative Mitochondrion EndonucleaseGIY-YIG | Intron-encoded nuclease 2 domain |
13 | Gasi_Mp21 | Laglidadg endonuclease n1 Tax Gibberel | Laglidadg homing endonuclease | Endonuclease mitochondrionIntronicGIY-YIG | Homing endonuclease, Laglidadg/HNH |
14 | Gasi_Mp11 | Laglidadg endonuclease n1 Tax Gibberellazeae PH-1 RepIDA5J053_GIBZE | Intron-encoded endonuclease | EndonucleaseMitochondrion DNALaglidadg Site-specific Intron homing | Laglidadg DNA endonuclease |
15 | Gasi_Mp33 | Homing endonuclease (Agaricus bisporus) | Laglidadg homing endonuclease | Mitochondrion EndonucleaseLaglidadg DNA COX1 AI2 Laglidadg | Homing endonuclease, Laglidadg/HNH |
16 | Gasi_Mp32 | RNA-directed DNA polymerase | Telomerase reverse transcriptase | Putative RNA-directed Transcriptase | Telomere reverse transcriptase |
17 | Gasi_Mp31 | Laglidadg endonuclease (Thanatephoruscucumeris (strain AG1-I...) | Laglidadg homing endonuclease | Endonuclease mitochondrion | Homing endonuclease, Laglidadg/HNH |
18 | Gasi_Mp40 | Laglidadg endonuclease (Ajellomycesdermatitidis (strain ER-3) | Laglidadg homing endonuclease | MitochondrionCOX1-I6 | Homing endonuclease, Laglidadg/HNH domain |
19 | Gasi_Mp24 | Maturase/DNA endonuclease (Saccharomyces paradoxus) | Intron-encoded endonuclease | EndonucleaseLaglidadg site-specific intron homing | Laglidadg DNA endonuclease |
20 | Gasi_Mp41 | Laglidadg endonuclease (Agaricusbisporus) | Intron-encoded endonuclease | EndonucleasePutative Homing DNA Site-SpecificLaglidadg Laglidadg Orf | Laglidadg DNA endonuclease |
21 | Gasi_Mp06 | Probable intron-encoded Laglidadg... (Piriformosporaindica (strain DSM 11827) | Intron-encoded endonuclease | Laglidadg site-specific intron homing | Laglidadg DNA endonuclease |
22 | Gasi_Mp05 | Laglidadg intron encoded protein (Nectria haematococca (strain 77-13-4) | Laglidadg DNA endonuclease | Laglidadg intron DNA | Homing endonuclease, Laglidadg/HNH |
23 | Gasi_Mp27 | Laglidadg endonuclease (Agrocybeaegerita) | Intron-encoded endonuclease | Laglidadg site-specific intron homing | Laglidadg DNA endonuclease |
24 | Gasi_Mp44 | Laglidadg endonuclease (Glomus sp. DAOM 240422) | Intron-encoded endonuclease | Laglidadg site-specific intron homing | Laglidadg DNA endonuclease |
25 | Gasi_Mp38 | Laglidadg endonuclease (Ajellomycescapsulatus (strain H88) | Laglidadg homing endonuclease | Laglidadg DNA uncharacterized intron cox1 | Homing endonuclease, Laglidadg/HNH |
26 | Gasi_Mp36 | Laglidadg intron encoded protein (Nectriahaematococca (strain 77-13-4) | Laglidadg homing endonuclease | Intronic intron | Homing endonuclease, Laglidadg/HNH |
27 | Gasi_Mp39 | Laglidadg (Rhynchosporiumsecalis) | Laglidadg homing endonuclease | Laglidadg DNA uncharacterized intron cox1 | Homing endonuclease, Laglidadg/HNH |
28 | Gasi_Mp26 | Laglidadgsuperfamily | Laglidadg homing endonuclease | Laglidadg | Homing endonuclease, Laglidadg/HNH |
29 | Gasi_Mp37 | Laglidadgsuperfamily | Laglidadg homing endonuclease | Laglidadg | Homing endonuclease, Laglidadg/HNH |
30 | Gasi_Mp22 | Laglidadgsuperfamily | Laglidadg homing endonuclease | Laglidadg | Homing endonuclease, Laglidadg/HNH |
31 | Gasi_Mp10 | Laglidadgsuperfamily | Laglidadg homing endonuclease | Laglidadg | Homing endonuclease, Laglidadg/HNH |
32 | Gasi_Mp35 | Laglidadgsuperfamily | Laglidadg homing endonuclease | Laglidadg | Homing endonuclease, Laglidadg/HNH |
33 | Gasi_Mp25 | Laglidadgsuperfamily | Laglidadg homing endonuclease | Laglidadg | Homing endonuclease, Laglidadg/HNH |
Table 3: Functional prediction of HPs by BLASTp, HHpred, ProDom and BLOCK.
BLASTp is a commonly used algorithm, included under Basic Local Alignment Search Tool, for searching protein sequences against the non-redundant protein sequences (nr). For each HP queried, this database provides 100 homologs, excluding the proteins with low query coverage of 50% or low sequence identity, referred to as the remote homologues. On the contrary, proteins with sequence identities of 40% with an Evalue of 0.005 form the close homologs of HPs. Moreover, this search tool is also employed for examining the availability of structural homologs in Protein Data Bank (PDB). On the other hand, HHpred is utilized for detection of remote protein homology via pairwise comparison of profile hidden Markov models (HMMs), which it achieves by searching various protein databases like PDB, SCOP, CATH, etc. It is also employed for detection of structural homologs. In comparison to BLASTp, used for determining the sequence identity between two proteins sequences, PRALINE is used for multiple sequences comparisons.
ProtoNet 6.1 clustering based automatic hierarchical database of the SWISS-PROT protein relies on sequence similarity defined in BLAST. ProtoNet predicted superfamilies and subfamilies and large-scale protein annotation. Protein motif sequences are the signature of protein families predicting the function of HPs, particularly catalytic property of the enzyme. Motif known as a super secondary structure of a protein is useful for functional prediction of protein analyzed by the database including InterProScan, PANTHER, Pfam, SMART, ProSite, SUPERFAMILY, etc. InterProScan analyzed different recognition protein and predicted the results by analyzing various parameters related to protein family membership, domains and repeats, biological process and molecular function of HPs. MOTIF was used to assign the probable role of the HPs utilizing Pfam and PROSITE (Table 4).
Protein name | InterProScan | SUPERFAMILY | CATH | PANTHER | MOTIFF | SVMProt | CDART | SMART | ProtoNet 6.1 |
---|---|---|---|---|---|---|---|---|---|
Galu_Mp10 | No | No | No | No | HupE/UreJ protein (7..99) Pfam | No | No | 4 transmembrane | Cluster 1128119 Name:Bacteroides fragilis |
Galu_Mp16 | No | No | No | No | Bunyavirus glycoprotein G2 (309..371)Pfam | Transmembrane | No | 3 low complexityand4 transmembrane | Cluster 1892543 Cluster Name:Pleurotusostreatus |
Galu_Mp17 | No | Prefoldin | No | No | 4 1. Tubulin-beta mRNA autoregulation signal. MRDL (Motif)-PROSITE 2. Helper component proteinase (88..215) 3. Mer2 (56..178) 4.MetRS-N binding domain (92..184)-Pfam | Transmembrane | No | 1 low complexity, 2.coiled coil and 2 transmembrane region | Cluster 4127534 cluster Name:SAFF domain. Muniscin C-terminal |
Galu_Mp19 | Homing endonuclease, Laglidadg | Homing endonuclease | Homing endonuclease | No | Laglidadg endonuclease(57..156) Pfam Laglidadg-like domain (56..138) Pfam | Transferases Glycosyltransferase | Laglidadg-like domain | No | Cluster 4142293 Homing endonuclease, Laglidadg/HNH |
Galu_Mp20 | No | Hect, E3 ligase catalytic domain | No results | No | Domain of unknown function (DUF4544) (161..210) Pfam CoA-binding domain (136..189) Pfam | Pleurotusostreatus | No domain | 5 transmembrane and 1 low complexity | Cluster 3780223 Cluster Name:Mitochondrion |
Galu_Mp21 | DNA-directed RNA polymerase | DNA/RNA polymerase | No | DNA-dependent RNA polymerase(1..108) Pfam | Oxidoreductases-Acting on a heme group of donors | DNA-dependent RNA polymerase | No | Cluster 4111445 Cluster Name:DNA-directed RNA polymerasebacteriophage type | |
Galu_Mp22 | DNA-directed RNA polymerase | DNA/RNA polymerase | DNA directed RNA polymerase | No | DNA-directed RNA polymerase N-terminal (441..517) Pfam | Transferases-Transferring Phosphorus-Containing Groups | DNA-dependent RNA polymerase | No | Cluster 4111445 Cluster Name:DNA-directed RNA polymerasebacteriophage type |
Gasi_Mp30 | GIY-YIG nuclease | GIY-YIG endonuclease | Uvr ABC system protein C-like domain | UVRC/OXIDOREDUCTASE | GIY_YIG (PROSITE and Pfam) | Zinc-binding protein family | GIY_YIG superfamily | GIYc | Cluster 3476869 Cluster Name:Nuclease-associated modular DNA-binding 1 |
Gasi_Mp34 | GIY-YIG nuclease | GIY-YIG endonuclease | Uvr ABC system protein C-like domain | No | GIY-YIG (Pfam) | All DNA-binding | GIYc IENR1 | Cluster 3312983 Intron encoded nuclease | |
Gasi_Mp42 | Introns endonuclease GIY-YIG superfamily | GIY-YIG endonucleaseDNA-binding domain of intron-encoded endonucleases | Uvr ABC system protein C-like domain | No | PROSITE (GIY-YIG domain profile) Pfam (GIY-YIG catalytic domain) (NUMOD1 domain) | Zinc-binding | GIY-YIG nuclease domain superfamily | GIY-YIG type nuclease Introns encoded nuclease repeat motif | Cluster 4159271Intron endonuclease, group I |
Gasi_Mp09 | Introns endonuclease I GIY-YIG superfamily Nuclease associated modular domain3 | GIY-YIG superfamily DNA-binding domain of intron-encoded endonucleases | Uvr ABC system protein C-like domain | No | PROSITE (GIY-YIG domain profile) Pfam (NUMOD3 motif) GIY-YIG NUMOD1 HTH_17 HTH_23 HTH_24 HTH_psq HTH_Tnp_ISL3 |
Zinc-binding | GIY-YIG nuclease domain superfamily | GIY-YIG nuclease Introns encoded nuclease motif | Cluster 4159271 Cluster Name: Intron endonuclease, group I |
Gasi_Mp23 | Intron endonuclease, group I GIY-YIG nuclease Superfamily Nuclease associated molecular domain | DNA-binding domain of intron-encoded endonucleaseGIY-YIG endonuclease | Uvr ABC system protein C-like domain | No | Pfam NUMOD1 domain CENP-B N-terminal DNA-binding domain | Outer membrane DNA condensation TC 3.A.5 Type II (general) secretory pathway (IISP) family TC 3.A.1 ATP-binding cassette (ABC) family EC 4.2.-.-Lyases-Carbon-Oxygen Lyases | GIY-YIG superfamily NUMOD1 domain | Introns encoded nuclease motif | Cluster 3993847 Cluster Name: Cytochrome b/b6, N-terminal |
Gasi_Mp11 | Homing endonucleases LAGLI DADG | Homing endonucleasesGroup I mobile intron endonuclease | intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg DNA endonuclease family | DNA repair Calcium-binding Magnesium-binding (Protein family) | Laglidadg endonuclease family | Laglidadg domain | Cluster 4236360 Cluster Name: Laglidadg DNA endonuclease |
Gasi_Mp33 | Homing endonuclease Laglidadg | Homing endonucleasesGroup I mobile intron endonuclease | Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg endonuclease Acyl phosphatase ASCH domain | All lipid-binding protein Zinc-binding | Laglidadg like domain | Laglidadg domain 6transmembrane domain | Cluster 3940652 Cluster Name: Homing endonuclease, Laglidadg/HNH |
Gasi_Mp32 | Reverse transcriptase domain | DNA/RNA polymerases Reverse transcriptase | No | COX1/OXI3 INTRON 1 PROTEIN-RELATED | PROSITE RT_POL (Reverse transcriptase (RT) catalytic domain profile). Pfam Reverse transcriptase (RNA-dependent DNA polymerase) Type II intron maturase HNH endonuclease | Zinc-binding Transmembrane | Reverse transcriptaseType II intron maturase | HNHnucleases | Cluster 4092847 Cluster Name:Intron maturase, type II |
Gasi_Mp31 | Homing endonuclease Laglidadg/HNH | Homing endonucleases Group I mobile intron endonuclease | Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg endonuclease | Zinc-binding | Laglidadg like domain | Laglidadg | Cluster 4271810 cluster Name: Homing endonuclease, Laglidadg/HNH |
Gasi_Mp40 | Homing endonuclease Laglidadg/HNH | Homing endonucleasesGroup I mobile intron endonuclease | Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg endonuclease | Transmembrane | Laglidadgdomain | No | Cluster 4296638 Cluster Name: Homing endonuclease, Laglidadg/HNH |
Gasi_Mp24 | Homing endonuclease Laglidadg | Homing endonucleasesGroup I mobile intron endonuclease | Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg DNA endonuclease family | EC 1.9.-.-: Oxidoreductases-Acting on a heme group of donors | Laglidadg endonuclease | Laglidadg | Cluster 4041029 Cluster Name:Laglidadg DNA endonuclease |
Gasi_Mp41 | Homing endonuclease Laglidadg | Homing endonucleases Group I mobile intron endonuclease |
Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg DNA endonuclease familyBacterial lipoate protein ligase C-terminus | All lipid-binding proteins | Laglidadg endonuclease | Laglidadg | Cluster 4041029 Cluster Name: Laglidadg DNA endonuclease |
Gasi_Mp06 | Homing endonuclease Laglidadg | Homing endonucleases. Group I mobile intron endonuclease | Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Laglidadg | Zinc-binding | Laglidadg endonuclease | Laglidadg | Cluster 3993847 Cluster Name: Cytochrome b/b6, N-terminal |
Gasi_Mp05 | Laglidadg intron encoded protein (Nectriahaematococca (strain 77-13-4) | Homing endonucleases. Group I mobile intron endonuclease | Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg endonuclease | Zinc-binding | Laglidadg endonuclease | Laglidadg | Cluster 4142293 Cluster Name: Homing endonuclease, Laglidadg/HNH |
Gasi_Mp27 | Homing endonucleases. Laglidadg | Homing endonucleases Group I mobile intron endonuclease |
Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg endonuclease |
Zinc-binding | Laglidadg | Laglidadg | Cluster 4041029. Cluster Name: Laglidadg DNA endonuclease |
Gasi_Mp44 | Homing endonucleasesLaglidadg | Homing endonucleases Group I mobile intron endonuclease |
Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg endonuclease |
EC 2.4-Transferases – Glycosyl transferases | Laglidadg | Laglidadg | Cluster 3987550. Cluster Name: Laglidadg DNA endonuclease |
Gasi_Mp38 | Homing endonucleases Laglidadg | Homing endonucleases Group I mobile intron endonuclease |
Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg endonuclease |
EC 2.4-Transferases Glycosyl transferases | Laglidadg | Laglidadg | Cluster 4396454 Cluster Name: Homing endonuclease, Laglidadg/HNH |
Gasi_Mp36 | Homing endonucleases LAGLI DADG | Homing endonucleases Group I mobile intron endonuclease |
Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg endonuclease |
DNA replication EC 3.1.-.-Hydrolases-Acting on Ester Bonds | Laglidadg | Laglidadg and SCOP domain | Cluster 4396454. Cluster Name: Homing endonuclease, Laglidadg/HNH |
Gasi_Mp39 | Homing endonucleases LAGLI DADG | Homing endonucleases Group I mobile intron endonuclease |
Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg endonuclease |
EC 2.4.-.-: Transferases – Glycosyl transferases | Laglidadg | Laglidadg | Cluster 4089688 Cluster Name: Homing endonuclease, Laglidadg/HNH |
Gasi_Mp26 | Homing endonucleases LAGLI DADG |
Homing endonucleases Group I mobile intron endonuclease |
Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg endonuclease |
EC 3.1.-.-: Hydrolases-Acting on Ester Bonds | Laglidadg | Laglidadg | Cluster 3993847 Cluster Name: Cytochrome b/b6, N-terminal |
Gasi_Mp37 | Homing endonucleases LAGLI DADG | Homing endonucleases Group I mobile intron endonuclease |
Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg endonuclease. Phenazine biosynthesis protein A/B |
EC 2.4.-.-: Transferases – Glycosyl transferases | Laglidadg | Laglidadg | Cluster 4376345 Cluster Name: Homing endonuclease, Laglidadg/HNH |
Gasi_Mp22 | Homing endonucleases LAGLI DADG | Homing endonucleases. Group I mobile intron endonuclease | Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg endonuclease |
EC 2.4.-.-: Transferases – Glycosyl transferases | Laglidadg | Laglidadg | Cluster 4142293 Cluster Name:Homing endonuclease, Laglidadg/HNH |
Gasi_Mp10 | Homing endonucleases LAGLI DADG | Homing endonucleases Group I mobile intron endonuclease |
Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | PROSITE: Serine proteases, subtilase family, histidine active site. PfamLaglidadg endonuclease |
Transmembrane | Laglidadg | Laglidadg | Cluster 4142293 Cluster Name:Homing endonuclease, Laglidadg/HNH |
Gasi_Mp35 | Homing endonucleases LAGLI DADG | Homing endonucleases. Group I mobile intron endonuclease | Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg endonuclease | EC 3.1.-Hydrolases-Acting on Ester Bonds EC 1.1.-Oxidoreductases Acting on the CH-OH group of donors |
Laglidadg | Laglidadg | Cluster 3913259 Cluster Name: Homing endonuclease, Laglidadg/HNH. |
Gasi_Mp25 | Homing endonucleases LAGLI DADG | Homing endonucleases. Group I mobile intron endonuclease | Intron-encoded DNA endonuclease aI3-like domain 1/2 | No | Pfam Laglidadg endonuclease |
Zinc-binding | Laglidadg | Laglidadg | Cluster 2938885 Cluster Name: Homing endonuclease, Laglidadg/HNH |
Table 4: List of different bioinformatics tools used to annotate domains of HPs in Ganoderma.
Function prediction
Functional prediction for the hypothetical proteins of Ganoderma retrieved from UniProt, was carried out with the help of various freely available databases such Pfam, SUPERFAMILY [34], CATH, PANTHER [35], SYSTERS [36], SVMProt [37], CDART [38], SMART [39] and ProtoNet [40]. Presently, BLASTp was used for searching SYSTERS database and the output was obtained in the form of clusters of functionally related proteins. The cluster sequences with an E-value of 0.005 were classified as HP (Table 5). SUPERFAMILY database provided structural and functional annotation based on hidden Markov models with the evolutionary relationship. CATH provided the hierarchical classification of protein domains including class, architecture, topology, homologous superfamily. Hierarchical classification of the protein sequence into superfamily and subfamily clusters was achieved by SYSTERS (SYSTEmatic Re-Searching) with the backing of BLASTp. SVMProt predicted the family of functional protein based on SVM-based classification [37] determining the function of proteins based on the primary structure and also classified the distantly related proteins and homologous proteins of different function [39]. CDART searches the NCBI Entrez Protein Database based on domain architecture, which, in turn, uses the evolutionary distances by direct sequence similarity. Furthermore, CDART found similarities between protein across significant evolutionary distances by utilizing sensitive protein domain profiles rather than using the direct sequence similarity, in addition providing the information about the domain [38]. SMART (Simple Modular Architecture Research Tool), based on profile hidden Markov models determined domains in protein sequences. Data obtained from SMART was used in creating the Conserved Domain Database collection which was also made available as part of the Interpro database. PANTHER is a (Protein Analysis through Evolutionary Relationships) comprehensively organized database that analyzed protein families and subfamilies. PANTHER 7.0 covers the whole genome of 48 organisms and gives the phylogenetics relationship of genes and families. Depending on Hidden Markov Models (HMM), various hits with an E-value less than 1e-3 acutely defined the functionality of HPs.
Protein name | E-value | SYSTERS | Organism | Gene Ontology | STRING | |
---|---|---|---|---|---|---|
Galu_Mp10 | 0.002 | 121514 | Pleurotus ostreatus (HPs) | Extra chromosomal DNA (plasmid) | No | |
Schizophyllum commune (Orf214) | ||||||
0.024 | 73717 | Amsacta moorei entomopoxvirus (AMV015) | ||||
Galu_Mp16 | 0.85 | 128944 | Bacteroides fragilis (Integrase) | DNA recombination | No | |
Bacteroides thetaiotaomicron (Putative Integrase) | ||||||
Bacteroides uniformis (Integrase IntN1) | ||||||
Bacteroides uniformis (IntN1) | ||||||
Galu_Mp17 | 0.056 | 115279 | Fusobacterium nucleatum subsp. nucleatum (ABC transporter ATP-binding protein) | Thyroid | No | |
Nostoc sp. PCC 7120 (Hypothetical protein Alr4911) | ||||||
Galu_Mp19 | 3.00E-11 | 138807 | Podospora anserina | endonuclease activity | No | |
Galu_Mp20 | 4.3 | 146675 | Drosophila melanogaster | defense/immunity protein activity | COX1-i1 protein | |
Galu_Mp21 | 0.033 | 86244 | 1.Borrelia burgdorferi (Hypothetical protein BB0399) | NoATP binding | No | |
Clostridium acetobutylicum (Glycosyltransferase domain containing protein) | ||||||
0.36 | 106851 | nucleotide binding | ||||
0.47 | 101650 | transferase activity | ||||
Galu_Mp22 | 3.00E-05 | 85579 | Spizellomyces punctatus (Orf361) | DNA-directed RNA polymerase activity | No | |
Plasmodium falciparum (RpoD protein) | ||||||
0.23 | 106486 | |||||
Gasi_Mp30 | 2.00E-11 | 139982 | Agrocybe aegerita | Cellular component Mitochondria | Yes | |
COX2: cytochrome c oxidase subunit 2 | ||||||
Hypocrea jecorina | ||||||
Gasi_Mp34 (Ganoderma sinense) | 7.00E-10 | 114742 | Podospora anserina | intron homing | No | |
Trimorphomyces papilionaceus | DNA binding | |||||
endonuclease activity | ||||||
Gasi_Mp42(Ganoderma sinense) | 2.00E-20 | 139982 | Agrocybe aegerita | DNA metabolic proces | atp9 GIY endonuclease (276 aa) | |
(Gibberellazeae) | ||||||
Hypocrea jecorina | Catalytic activity | |||||
Excinuclease ABC, C subunit, N-terminal | ||||||
GIY-YIG endonuclease | ||||||
Gasi_Mp09 (Ganoderma sinense) | 2.00E-20 | 139982 | ·Agrocybe aegerita | DNA metabolic proces | atp9 GIY endonuclease (276 aa) | |
(Gibberellazeae) | ||||||
·Hypocrea jecorina | Catalytic activity | |||||
Excinuclease ABC, C subunit, N-terminal | ||||||
GIY-YIG endonuclease | ||||||
Gasi_Mp23 (Ganoderma sinense) | 6.00E-05 | 139982 | Agrocybe aegerita | Cellular metabolic process | CYT1: | |
C terminal fragment of CaP19.3527, likely cytochrome C1 (288 aa | ||||||
Podospora anserina | ||||||
Hypocrea jecorina | Catalytic activity | |||||
Gasi_Mp11(Ganodermasinense | 6.00E-18 | 102515 | Allomyces macrogynus | Catalytic activity | COX3: Cytochrome c oxidase subunit 3 (EC 1.9.3.1) (Cytochrome c oxidase polypeptide III); Subunits I, (269 aa) | |
Gasi_Mp33 | 2.00E-25 | 103988 | Monosiga brevicollis | DNA metabolic process | SPMIT.02: | |
DNA binding | Uncharacterized cox1 intron-1 45.6 kDa protein (384 aa) | |||||
Gasi_Mp32 | 1.00E-14 | 138811 | Arabidopsis thaliana | DNA metabolic process | SPMIT.06: Uncharacterized cox1 intron-2 37.2 kDa protein (323 aa) | |
RNA-directed DNA polymerase (reverse transcriptase) | ||||||
Gasi_Mp31 | 4.00E-37 | 138807 | Emericella nidulans | DNA metabolic process | eugene3.75340001 | |
hypothetical protein (277 aa) | ||||||
Podospora anserina | DNA binding | (Populustrichocarpa) | ||||
Gasi_Mp40 | 1.00E-51 | 138808 | Allomyces macrogynus | DNA metabolic process | cox3-i3: COX3-i1 (185 aa) | |
Neurospora crassa | DNA binding | |||||
Gasi_Mp24 | 2.00E-13 | 139169 | ·Saccharomyces cerevisiae | Cell part | bI2 | |
Cytochrome b mRNA maturase bI2 | ||||||
Catalytic activity | ||||||
Gasi_Mp41 | 4.00E-17 | 139170 | Chlamydomonas humicola | Cell part | AI5_ALPHA | |
Intron-encoded DNA endonuclease aI5 alpha precursor (DNA endonuclease I-SceIV) | ||||||
Catalytic activity | ||||||
Gasi_Mp06 | 1.00E-19 | 101961 | Rhizophydium sp. 136 | Cellular metabolic process | BI2Cytochrome b mRNA maturase bI2 | |
Catalytic activity | ||||||
Gasi_Mp05 | 2.00E-27 | 139538 | Podospora anserina | DNA metabolic process | nad3 | |
Laglidadg endonuclease (423 aa) | ||||||
DNA binding | ||||||
Gasi_Mp27 | 3.00E-11 | 102515 | Allomyces macrogynus | Cell part | AI5_ALPHA | |
Catalytic activity | Intron-encoded DNA endonuclease aI5 alpha precursor (DNA endonuclease I-SceIV) | |||||
Gasi_Mp44 | 2.00E-20 | 102515 | Allomyces macrogynus | Cell part | AI5_ALPHA | |
Catalytic activity | Intron-encoded DNA endonuclease aI5 alpha precursor (DNA endonuclease I-SceIV) | |||||
Gasi_Mp38 | 2.00E-78 | 138828 | Schizosaccharomyces japonicus | DNA metabolic process | SPMIT.03Uncharacterized cox1 intron-2 37.2 kDa protein (323 aa) | |
(Schizosaccharomycespombe) | ||||||
DNA binding | ||||||
Gasi_Mp36 | 3.00E-61 | 138807 | Podospora anserina | DNA metabolic process | gw1.8046.2.1 | |
Hypothetical protein (229 aa) | ||||||
DNA binding | (Populustrichocarpa) | |||||
Gasi_Mp39 | 2.00E-06 | 138828 | Schizosaccharomyces japonicus | DNA metabolic process | nd1-i2 | |
ND1-i2 protein (722 aa) | ||||||
DNA binding | (Yarrowialipolytica) | |||||
Gasi_Mp26 | 1.00E-62 | 138820 | Emericella nidulans | Cellular metabolic process | BI3 :Cytochrome b mRNA maturase bI3 | |
Catalytic activity | ||||||
Gasi_Mp37 | 1.00E-12 | 138819 | Podospora anserina | DNA metabolic process | AI4Intron-encoded DNA endonuclease aI4 precursor (DNA endonuclease I-SceII) | |
DNA binding | ||||||
Gasi_Mp22 | 3.00E-16 | 139380 | Chlamydomonas frankii | DNA metabolic process | nad4L :Laglidadg endonuclease (Gibberellazeae) | |
DNA binding | ||||||
Gasi_Mp10 | 4.00E-22 | 139538 | Podospora anserina | DNA metabolic process | nad3: Laglidadg endonuclease (Gibberellazeae) | |
DNA binding | ||||||
Gasi_Mp35 | 2.00E-45 | 138807 | Podospora anserina | DNA metabolic process | cox1-i2: COX1-i2 protein | |
(Yarrowialipolytica) | ||||||
DNA binding | ||||||
Gasi_Mp25 | 3.00E-19 | 104255 | Rhizophydium sp. 136 | DNA metabolic process | bI3 : Cytochrome b mRNA maturase bI3(Debaryomyceshansenii) | |
DNA binding |
Table 5: Functional prediction of HPs in SYSTER, InterPro, Gene Ontology, Superfamily and STRING.
Associative network among HPs in Ganoderma
Proteins interact with a network which modulates the functionality of the related proteins. Associative networking among proteins is still to be explored completely. In addition, there is a necessity for experimental performance owing to the crucial role of interactive partners that participate in multimeric complex depicting its relation between protein and function [41]. Physical, functional, experimental, and coexpression are major forces in STRING database depicting interaction. Besides, it also furnishes information about the metabolic or epigenetic associative network related to these proteins. In present work, STRING 9.1 was used for the prediction of the associative network of HPs in Ganoderma, based on parameters comprising neighborhood, co-occurrence, co-expression, experiments, databases, and homology of predicted partners of HPs (Table 5).
Availability of Ganoderma commercial products has authenticated its medicinal value. Ganoderma protein sequence, retrieved from NCBI and UniProt, were annotated function to delve deeper in understanding the proteomics. In the present study, 33 HPs of genus Ganoderma were investigated and characterized by bioinformatics tools like BLAST, HHpred, Pfam, PANTHER, CATH, CDART, and SVMProt. Tools like InterProScan BLOCK, MOTIF were employed for discovering functional motifs in the HPs (Tables 3 and 6).
S.No. | Proteinname | Species | UNIPROT ID | Protein Function |
---|---|---|---|---|
1 | Galu_Mp10 | Ganoderma lucidum | S4UWG8 | Piscicolin 126 immunity like protein |
2 | Galu_Mp16 | Ganoderma lucidum | S4UVR8 | Integrase family |
3 | Galu_Mp17 | Ganoderma lucidum | S4UWD1 | Prefoldin |
4 | Galu_Mp19 | Ganoderma lucidum | S4UU33 | DNA endonuclease |
5 | Galu_Mp20 | Ganoderma lucidum | S4UUK5 | NADH dehydrogenase |
6 | Galu_Mp21 | Ganoderma lucidum | S4UWH0 | DNA-directed RNA polymerase |
7 | Galu_Mp22 | Ganoderma lucidum | S4UVS2 | DNA-directed RNA polymerase |
8 | Gasi_Mp30 | Ganoderma sinense | V5KV85 | GIY-YIG endonuclease |
9 | Gasi_Mp34 | Ganoderma sinense | V5KVR6 | GIY-YIG endonuclease |
10 | Gasi_Mp42 | Ganoderma sinense | V5KVQ5 | GIY-YIG endonuclease |
11 | Gasi_Mp09 | Ganoderma sinense | V5KVM5 | GIY-YIG endonuclease |
12 | Gasi_Mp23 | Ganoderma sinense | V5KWR6 | GIY-YIG endonuclease |
13 | Gasi_Mp21 | Ganoderma sinense | V5KVM0 | Laglidadg homing endonuclease |
14 | Gasi_Mp11 | Ganoderma sinense | V5KVL3 | Laglidadg homing endonuclease |
15 | Gasi_Mp33 | Ganoderma sinense | V5KWS5 | Laglidadg homing endonuclease |
16 | Gasi_Mp32 | Ganoderma sinense | V5KVP5 | Reverse transcriptase |
17 | Gasi_Mp31 | Ganoderma sinense | V5KVN0 | Laglidadg homing endonuclease |
18 | Gasi_Mp40 | Ganoderma sinense | V5KV87 | Laglidadg homing endonuclease |
19 | Gasi_Mp24 | Ganoderma sinense | V5KVQ8 | Laglidadg homing endonuclease |
20 | Gasi_Mp41 | Ganoderma sinense | V5KVN9 | Laglidadg homing endonuclease |
21 | Gasi_Mp06 | Ganoderma sinense | V5KVP6 | Laglidadg homing endonuclease |
222 | Gasi_Mp05 | Ganoderma sinense | V5KWQ3 | Laglidadg homing endonuclease |
23 | Gasi_Mp27 | Ganoderma sinense | V5KVP1 | Laglidadg homing endonuclease |
24 | Gasi_Mp44 | Ganoderma sinense | V5KVS4 | Laglidadg homing endonuclease |
25 | Gasi_Mp38 | Ganoderma sinense | V5KWS9 | Laglidadg homing endonuclease |
26 | Gasi_Mp36 | Ganoderma sinense | V5KVN5 | Laglidadg homing endonuclease |
27 | Gasi_Mp39 | Ganoderma sinense | V5KVS0 | Laglidadg homing endonuclease |
28 | Gasi_Mp26 | Ganoderma sinense | V5KVM6 | Laglidadg homing endonuclease |
29 | Gasi_Mp37 | Ganoderma sinense | V5KVQ1 | Laglidadg homing endonuclease |
30 | Gasi_Mp22 | Ganoderma sinense | V5KVN8 | Laglidadg homing endonuclease |
31 | Gasi_Mp10 | Ganoderma sinense | V5KV81 | Laglidadg homing endonuclease |
32 | Gasi_Mp35 | Ganoderma sinense | V5KV86 | Laglidadg homing endonuclease |
33 | Gasi_Mp25 | Ganoderma sinense | V5KV84 | Laglidadg homing endonuclease |
Table 6: List of HPs with gene and UNIPROT ID, SVM Prot with annotated function of HPs in Ganoderma lucidum.
The physicochemical properties of all 33 hypothetical proteins were determined by the ExPASy-ProtParam software (Table 2).
MOTIF predicted nature of Galu_Mp10 containing HupE/ UreJ motif, which was found to have similarity with piscicolin 126 immunity proteins as disclosed by HHpred software. TMHMM and HMMTOP give clue about four transmembrane proteins in the sequences which may contributes towards adaptability of Ganoderma in different environmental conditions. On the other hand, Gene Ontology provided information about the extrachromosomal plasmid, thus, verifying its relevancy as an immunological function [5]. In addition, BLOCK databases based on multiple alignments of conserved regions found Galu_Mp10 to be a fungal pheromone, STE3 G-PCR, which aids in detection and verification of protein sequence homology.
Another stable HP was predicted to be Galu_Mp16, on the basis of similarity and homology of sequence by BLASTp and HHpred tool. Subsequent analysis revealed the motif to be similar to Bunyavirus glycoprotein G2 (309.371) as deduced by Pfam-based on annotations and multiple sequence alignments in MOTIF. SYSTERS database, based on hierarchical partitioning, found Galu_Mp16 to be similar in sequence to integrase by in InterPro (IPR), whereas, Gene Ontology indicated it to possess DNA recombination function. STRING database found integrase to be present in the superfamily of Galu_Mp16.
Galu_Mp17 was suggested by HHpred to possess the prefoldin function evidenced throughout the literature for its fundamental role in protein folding. These proteins exhibit protein-folding activity synergistically combining with other protein. SYSTERS conferred Galu_Mp 17 to be similar to SMC proteins (Structural Maintenance Chromosome) which have ATPase family with a role in DNA recombination and repair. BLOCK further conferred it to possess thyroid hormone-inducible hepatic Spot 1 having the ability to regulate lipogenesis especially synthesis of fatty acids [8,9]. Motif by PROSITE which is a database of protein domains, families, and functional sites predicted hypothetical protein to play a role as tubulin-beta mRNA autoregulation signal, helper component proteinase, Mer2 whereas Pfam database based on multiple sequence alignments predicted it as MetRS-N binding domain.
BLASTp and HHpred predicted Galu_Mp19 to be homing endonuclease LAGLIDADG [42]. ProDom database consisting of an automatic compilation of homologous domains found Galu_Mp19 to be endonuclease LAGLIDADG, which was also confirmed by BLOCK. InterProScan identifies it as LAGLIDADG based on a prediction of protein families, domains and functional sites of the Galu_Mp19. SUPERFAMILY verified its evolutionary domain as an endonuclease later verified by CATH database. Motif annotated the protein to be LAGLIDADG by Pfam. SVMProt predicted the functional family based on a primary structure having the tendency for the classification of homologous proteins and distantly related proteins of different function and predicted it as glycosyltransferase. CDART and ProtoNet 6.1 predict and confirmed homing endonuclease LAGLIDADG-like domain. Gene Ontology predicted Galu_Mp19 as homing endonuclease LAGLIDADG, which is a rare cutting enzyme encoded by introns and inteins. Homing endonuclease is highly invasive elements which promote recombination by breakage of the double strand and facilitates repair system [42]. In spite of all, STRING does not exhibit any interaction with other.
Galu_Mp20 is another hypothetical protein portraying NADPH dehydrogenase function, a key enzyme for oxidative phosphorylation in the mitochondria, a crucial role in triggering apoptosis in addition to correlating mitochondrial activities and programmed cell death [10]. BLASTp found the hypothetical protein Galu_Mp20 to possess NADH dehydrogenase activity by sequence similarity, while the conserved sequence in BLOCK related this protein with flagellar basal body-associated protein FliL. SUPERFAMILY predicted it to possess Hect, E3 ligase catalytic domain whereas InterPro (IPR) prediction of protein families, domains, and functional sites revealed it to be lipoprotein, type 6Ferritin/ribonucleotide reductase-like. Gene Ontology recognized Galu_Mp20 to exhibit defense/immunity protein activity
Similarly, the hypothetical protein Galu_Mp21 was observed to possess DNA-dependent RNA polymerases like activity as indicated by BLASTp and HHpred bioinformatics tool. ProDom found DNA-directed RNA polymerase whereas conserved sequences in BLOCK database revealed it to have a frizzled protein signature, further confirmed by InterProScan and SUPERFAMILY prediction tools. Motif by Pfam revealed the N-terminal of the DNA-directed RNA polymerase, whereas SVMProt indicates the primary structure to be consisting of transferring phosphorus-containing groups. CDART and ProtoNet 6.1 determine the similarities across significant evolutionary distances of proteins in Galu_Mp21 to have an activity like DNA-dependent RNA polymerase. Gene Ontology denoted it to have ATP binding, nucleotide binding, and transferase activity, while STRING revealed an absence of its protein interaction with others. Importantly, RNA polymerase controls the gene transcription and improves the efficiency in adaptability in different sort of stress. From these results, it can be surmised that these genes can modulate certain enzyme and their expression, thus, assisting in acclimatization and adaption of the fungus to a particular environment. It also plays a role in telomerase and RNA silencing which may be a target for designing various therapeutics agents.
Gasi_Mp30 through BLASTp was found to have similarity with GIY-YIG endonuclease, whereas HHpred homology detected it to be like Uvr abc system protein C. ProDom analyzed it as metal-binding iron heme domain of the oxidase transmembrane contained on the mitochondrion membrane, whereas BLOCK indicated it to have likeness to N-terminal of the c subunit of excinuclease ABC by the conserved sequence analysis. InterProScan, SUPERFAMILY predicted the protein families, domains and functional sites as GIY-YIG endonuclease. CATH and PANTHER analyzed sequences to have a domain similar to Uvr ABC system protein C while SVMProt deciphered it as zinc-binding protein family. CDART and SMART predicted the domain, further confirming the HP to be similar to GIY_YIG, whereas ProtoNet 6.1 showed its similarity with nuclease-associated modular DNA-binding 1. Gene Ontology used the term in cellular component associated with mitochondria whereas STRING database showed interaction with COX2 cytochrome c oxidase subunit 2.
Gasi_Mp34 protein was also seen to have similarity, as detected by BLASTp with, GIY-YIG endonuclease whereas HHpred by homology detected it be similar to Uvr abc system protein C. ProDom analyzed it as endonuclease intron-encoded hydrolase. In this case also BLOCK detected it to have a likeness to N-terminal of the c subunit of excinuclease ABC by the conserved sequence analysis. InterProScan, SUPERFAMILY predicted protein families, domains and functional sites as GIY-YIG endonuclease. GIY-YIG motif was confirmed by Motif whereas SVMProt deciphered Gasi_Mp34 protein as all DNA-binding. SMART predicted the domain and upheld the conclusion of it being GIY_YIG to be true. ProtoNet 6.1 hierarchically classified it as an intron-encoded nuclease. Gasi_Mp34 was observed to have no positive interaction with others proteins.
Hypothetical protein Gasi_Mp23 was seen to have similarity with GIY Cytb i2 grp ID protein as projected by BLASTp and further confirmed it as GIY-YIG endonuclease by HHpred. ProDom analyzed it as mitochondrial endonuclease GIY-YIG, whereas BLOCK indicated Intron-encoded nuclease two domains. InterProScan and SUPERFAMILY predicted it as GIY-Journal YIG endonuclease. Motif predicted NUMOD1 domain and SMART found it to possess introns encoded nuclease motif while ProtoNet 6.1 has Cytochrome b/b6, N-terminal. Gene ontology found it to have catalytic activity whereas STRING defined and characterized the CYT1: C-terminal fragment of CaP19.3527 indicating it to be likely cytochrome C1.
Other hypothetical proteins, Gasi_Mp09, and Gasi_Mp42, through different parameters (Table 2) were annotated function as GIY-YIG endonuclease. It is interesting to note that the GIY-YIG nuclease domain superfamily has been implicated in the cellular process including DNA repair and recombination, transfer of mobile genetic elements, and restriction of incoming foreign DNA. It serves as a scaffold for metal ion required for catalysis of phosphodiester bond cleavage. Moreover, the nucleases of the GIY-YIG have been seen to be involved in many cellular processes including DNA repair and recombination, transfer of mobile genetic elements, and restriction of incoming foreign DNA. The GIY-YIG domain also forms a compact structural domain serving as a scaffold for the coordination of a divalent metal ion required for catalysis of the phosphodiester bond cleavage.
Another hypothetical protein, Gasi_Mp32, was predicted as reverse transcriptase enzyme, a key enzyme in antiviral drugs and insulin production. Initially, BLASTp predicted it as RNA-directed DNA polymerase, and HHpred detected to be telomerase reverse transcriptase. ProDom database consisting of an automatic compilation of homologous domains found RNA-directed transcriptase which was further confirmed by BLOCK as telomerase. InterProScan identified it as reverse transcriptase based on a prediction of protein families, domains and functional sites of the Gasi_Mp32. SUPERFAMILY verified its evolutionary domain as reverse transcriptase which was also verified by PANTHER database. Motif annotated the protein as reverse transcriptase by Pfam. SVMProt predicted its functional family based on the primary structure and having the capability for the classification of distantly related proteins and homologous proteins of different function and predicts it as zinc binding. CDART, SMART and ProtoNet 6.1 predicted and confirmed this hypothetical protein to be reverse transcriptase.
Homing endonuclease
In addition to the above discussed hypothetical proteins, others including Gasi_Mp21, Gasi_Mp11, Gasi_Mp33, Gasi_Mp31, Gasi_Mp40, Gasi_Mp24, Gasi_Mp41, Gasi_Mp06, Gasi_Mp05, Gasi_Mp27, Gasi_Mp44, Gasi_Mp25, Gasi_Mp35, Gasi_Mp10, Gasi_Mp22, Gasi_Mp37, Gasi_Mp39, Gasi_Mp26, Gasi_Mp36 and Gasi_Mp38 were analyzed with respect to their molecular weight, theoretical pI, aliphatic index and hydropathcity (Table 2). The information thus obtained taken together indicated them to be behaving as LAGLIDADG endonuclease. BLASTp, HHpred, ProDom, BLOCK and various bioinformatics tools with high confidence predicted its function as LAGLIDADG. Homing endonucleases, encoded by open reading frame in self-splicing introns and having an independently folded domain of self-splicing introns known as inteins facilitates self-propagation [43]. It endorses the homing of their respective genetic elements into allelic intronless and inteinless sites and thus playing a vital role in recombination. LAGLIDADG motif plays a crucial role in protein folding, dimerization or interdomain packing and catalysis [42]. Homing endonuclease plays a pivotal role in genome analysis, gene manipulation, cloning, recombination events, double-stranded repair, and transposition as rare cutting endonucleases to uphold chromosomal integrity and viability. LAGLIDADG plays a crucial role in protein folding, dimerization or interdomain packing and catalysis. Importantly, endonuclease plays a pivotal role in DNA repair, and little deviation in normal functioning may lead to the genesis of anomalies.
(Table 6) List of HPs with gene and UNIPROT ID, SVM Prot with annotated function of HPs in Ganoderma lucidum
Prediction and annotation of function to hypothetical proteins forms an indispensable part of bioinformatics and proteomics. Annotating functions to the uncharacterized proteins can assist in fathoming the various mechanisms fundamental in the adaptation of this fungus to a various stress condition. Literature has time and again declared the importance of protein which is unexplored, thus, motivating us to annotate functions to the 33 hypothetical proteins, being carried out for the first time. This endeavor reached it fruition by the assistance of various in silico tools, helping in understanding various decisive parameters along with the characteristics that shape the protein function. Some determining features of the proteins are also discussed. Among the 33 hypothetical proteins sequenced and characterized, all were predicted precisely with high confidence, except Galu_Mp10 and Galu_Mp16, which was mainly due to the absence of insufficient data. The information, thus obtained, reinstates the versatility and multifaceted nature of the fungal proteins. Delving deeper in knowing the revealed functions may help in understanding the regulation of various signaling pathways modulating cell cycle, providing a more lucid view for medical interventions. Lastly, but not all the least, the exploring the functional nature of these proteins may provide a platform for charting out effective therapies and designing drugs. In-depth predictions and functional annotation of the HPs, if and when, carried out, would assist in understanding the nuances of proteomics better.
Authors thanks Central University of Punjab, Bathinda for providing the necessary facilities to carry out the present work.