Unveiling Hypothetical Proteins of Ganoderma: An Endeavour in Functional Annotation Using In silico Tools

Sanjeev Kumar, Balraj Singh Gill and Navgeet

Published Date: 2016-11-01
DOI10.21767/2469-6692.100012

Sanjeev Kumar*, Balraj Singh Gill and Navgeet

Centre for Biosciences, Central University of Punjab, Bathinda, India

*Corresponding Author:
Sanjeev Kumar
Centre for Biosciences, Central University of Punjab
Bathinda, India
Tel: +91 9501278687
E-mail: sanjeevpuchd@gmail.com

Received date: September 19, 2016; Accepted date: October 25, 2016; Published date: November 01, 2016

Citation: Kumar S, Gill BS, Navgeet. Unveiling Hypothetical Proteins of Ganoderma: An Endeavour in Functional Annotation Using In silico tools. J In Silico In Vitro Pharmacol. 2016, 2:4

Copyright: © 2016 Kumar S, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of In Silico & In Vitro Pharmacology

Abstract

Proteins mediate multitude of housekeeping functions vital for cell integrity and survival and even infinitesimally small aberration in protein functioning can cause numerous abnormalities. Numerous therapeutic effects observed in Ganoderma species directed the researchers to characterize protein sequences which may possess the potential to impart new edges to immune function and designing effective drugs. Although, numerous fungal proteins have been annotated, functions have not yet been assigned to 33 mitochondrial proteins of this fungus belonging to the genus of polypore mushrooms. Annotating functions of Ganoderma carried out for the first time, divulged parameters decisive in characterization including localizations of the protein, motif, domain, cluster, phylogenetic relationship, and protein interaction with other molecules. The information obtained using various bioinformatics tools results in functional annotation of proteins which plays role as integrase, prefoldin, endonuclease, dehydrogenase, polymerase, GIY-YIG endonuclease, LAGLIDADG endonuclease and reverse transcriptase.

Keywords

Ganoderma lucidum (GL); Hypothetical proteins (HPs); Bioinformatics tools

Introduction

Proteins arbitrate multitude of essential housekeeping functions critical for cellular survival and cell integrity. Even little aberration in protein misfolding and aggregation can compromise normal functioning, engendering neurodegeneration and necessitating therapeutic intervention [1] Proteins exhibit indistinguishable activities providing additional edges to immune function, in addition, to possessing anti-cancer potential. Moreover, the discovery of sundry other medicinal values in natural products instigated the researchers to sequence the genome and focus on diverse aspects of proteomics. The genome sequencing laid the foundation for rapid accumulation of different characterized as well as uncharacterized genes in Gene bank. Genome sequencing has achieved the functional annotation only to 50-60% of genes; therefore making functional annotation of remaining uncharacterized proteins a challenging task [2]. The introduction of bioinformatics tools has resulted in the inception of a new era [3] results in solving and annotating function to proteins through less time-consuming means.

A commonly known basidiomycetes fungus, Ganoderma, importantly indicated for cancer and neurodegenerative diseases, has been indisputably claimed to be an inexhaustible resource of cardinal myco-constituents including terpenoids, polysaccharides, and proteins [4,5]. Mitochondrial genome sequencing of Ganoderma lucidum revealed a total of 57 protein-coding genes, 2 rRNA and 26 tRNA [6]. Another species, Ganoderma sinensis, has not been still sequenced, but UniProt confirms the presence of 25 hypothetical proteins (HPs) [7]. The sequence retrieval from UniProt disclosed that genus Ganoderma have 33 hypothetical proteins. Proteins isolated from genus Ganoderma exhibit plethora of bioactivities via the various adapter proteins dominant at different stages of immune modulation [8,9], inflammation, and cancer signaling [10,11]. In addition to these, there are some other proteins that are predicted by gene prediction software, without in-vivo demonstration, known as hypothetical proteins (HPs)/non-characterized/unknown proteins, which play a decisive role in signaling mechanism. Annotating functions to these uncharacterized proteins can help in fathoming the various mechanisms fundamental to the adaptation of this fungus to a various stress condition. In silico studies have been found to be reliable enough to reveal crucial and lucid information about the biological functionality and comparative genomics in lesser time as compared to an experimental characterization which is more time-consuming [12]. This research work was carried out with an impetus to fathom the genomics and to understand the functionality, of the Ganoderma hypothetical proteins, uncharacterized until now [13].

Sequence analysis and comparison forms the prime step in identifying homologues and sequence similarity for characterization of the protein. BLAST proved to be the most reliable method for comparing the query sequences with the database for the prediction of proteins. Multiple sequence alignment of homologues in a family has been found to be a reliable method for functional annotation important in conserved domains. Motif analysis is a necessary step in the identification and characterization of HPs and detection of common motifs among proteins in sequence identities (e.g., less than 30%) may provide important clues for function or classification of HPs into appropriate families [14].

Various databases available freely for determining structure and function of motif include GenomeNet [15], PROSITE [16], PRINTS [17], Pfam [18], ProDom [19], BLOCKS [20], MEME [21] and InterPro [22] using InterProScan [23]. STRING database [24], is such software, pivotal in revealing the functionality of individual protein as well as the interaction with other factors. Moreover, it also determines the gene neighbourhood, gene fusion events and co-occurrence of specific subset of species which confer to the protein confidence score by direct (physical) and indirect (functional) associations, which in turn, are based on genomic context, high-throughput experiments, conserved co-expression and previous knowledge of the protein.

Materials and Methods

In order to predict the functionality of hypothetical proteins in Ganoderma species, various bio informational tools were used, each equipped to highlight a different parameter of protein (Table 1). Sequence retrieval from UniProt, a database of the protein sequence, was the primary step in the process of annotation. This was followed by the functional annotation of the hypothetical protein which involved prediction of the functional aspect of proteome using the bioinformatics tool, by analyzing the sequence, conserved domain, analyzing motif, phylogenetic relationship, and protein interaction.

a) Sequence similarity search
BLAST Finds similar sequence
Hhpred Homology detection
b) Physicochemical Characterization
ExPASy-ProtParam Analysis of physic-chemical properties
c) Sub-cellular predication
PSORT B 97%precision
PSLpred 91% accuracy
CELLO 91% accuracy
SignalP Signal peptide cleavage site
SecretomeP location of the cleavage site
TMHMM Membrane topology Hidden Markov Model
HMMTQP Transmembrane topology
d) Sequence alignment
PRALINE multiple sequence alignment
e) Protein classification
Pfam Multiple alignment and HMMs
CATH hierarchical domain classification
Superfamily SCOP database
SYSTERS Protein Family Database
SVMProt Classification on the basis of primary sequence.
CDART  NCBI Entrez Protein Database
PANTHER Evolutionary relationships
ProtoNet Automatic hierarchical clustering
SMART Sequence analysis
f) Motif prediction
InterProScan Diagnostic signatures
MOTIF Motif-based sequence analysis tools
MEME Suite -
g) Clustering
CLUSS Substitution Matching Similarity (SMS)
h) Protein-protein interaction
STRING Version 9.1  Predicts protein interactions

Table 1: List of bioinformatics tools and databases used for annotating function.

Sequence analysis and physiochemical properties

Sequence investigation of the Ganoderma mitochondrial genome disclosed a total of 33 hypothetical proteins (https:// www.ncbi.nlm.nih.gov/genome/) subsequently retrieved from UniProt (https://www.uniprot.org/). Primarily, Expasy’s ProtParam server [25] was employed to compute numerous theoretical physiochemical properties such as molecular weight, amino acid composition, theoretical isoelectric point, extinction coefficient, instability index, aliphatic index, estimated half-life, instability index, extinction coefficient, average hydropath city (GRAVY) to deduce the protein sequence and assign function to the protein (Table 2).

S.No. Proteinname No. of amino acids M.W. Theo. pI Instability index Aliphatic index Hydropathicity Location WoLF PSORT CELLO Signal P 4.1 Secretom P TMHMM HMMTOP
1. Galu_Mp10 151 17498.3 9.49 Stable 140.60 0.984 Mito P.M. No Yes 4 4
2. Galu_Mp16 439 51185.2 9.26 Stable 115.88 0.194 Mito P.M. No No 4 7
3. Galu_Mp17 283 32595.6 5.42 Stable 112.54 -0.073 Mito P.M. No No 2 3
4. Galu_Mp19 192 21490.0 9.37 Stable 93.49 -0.094 Mito P.M. No No 0 3
5. Galu_Mp20 294 32986.8 9.33 Unstable 127.28 0.531 Mito P.M. Yes Yes 5 5
6. Galu_Mp21 109 12368.5 9.62 Stable 111.65 0.162 Mito P.M. No No 0 0
7. Galu_Mp22 568 66673.6 8.75 Stable 87.48 -0.317 Mito P.M. No No 0 0
8. Gasi_Mp30 236 27666.0 9.52 Stable 97.03 -0.307 No Mito No No 0 0
9. Gasi_Mp34 385 44224.4 9.69 Stable 97.43 -0.203 No O.M. No No 0 0
10. Gasi_Mp42 267 30441.6 9.89 Stable 94.16 -0.176 Nuc Mito No No 0 2
11. Gasi_Mp09 295 34308.8 9.81 Stable 86.58 -0.469 No Mito No No 0 0
12 Gasi_Mp23 186 21502.3 9.72 Stable 105.32 -0.399 Mito Mito No No 0 0
13 Gasi_Mp21 364 42387.4 9.89 Stable 91.29 -0.196 No O.M. No Yes 1 1
14 Gasi_Mp11 205 24146.6 9.56 Stable 114.54 0.013 Mito Cyto No No 0 0
15 Gasi_Mp33 768 88137.8 9.81 Stable 114.74 0.147 No P.M. No No 7 8
16 Gasi_Mp32 750 85209.4 9.81 Stable 87.08 -0.327 Mito Mito Yes Yes 0 0
17 Gasi_Mp31 345 40236.1 9.65 Stable 89.28 -0.377 No O.M. No No 0 0
18 Gasi_Mp40 356 40436.3 9.53 Stable 87.84 -0.106 No O.M. No No 0 2
19 Gasi_Mp24 218 26171.0 9.52 Stable 102.84 -0.161 Mito Cyto No Yes 1 1
20 Gasi_Mp41 215 24693.7 9.40 Stable 98.33 -0.115 No O.M. No No 0 0
21 Gasi_Mp06 250 29198.8 9.61 Stable 84.60 -0.298 No O.M. No No 1 0
22 Gasi_Mp05 410 46647.5 9.92 Stable 98.85 -0.182 No Mito No No 0 0
23 Gasi_Mp27 131 15218.8 9.32 Stable 99.69 -0.142 Mito Mito No No 0 0
24 Gasi_Mp44 248 28615.7 9.67 Stable 91.13 -0.087 No P.M. No No 0 0
25 Gasi_Mp38 323 37058.4 9.52 Stable 95.63 -0.397 No Mito No No 0 0
26 Gasi_Mp36 189 21941.1 9.40 Stable 96.93 -0.346 Mito Mito No No 0 1
27 Gasi_Mp39 345 39119.1 9.28 Stable 94.84 -0.059 No P.M. No No 1 2
28 Gasi_Mp26 317 36589.6 9.41 Stable 102.33 -0.158 No O.M. No No 0 0
29 Gasi_Mp37 308 35293.4 9.63 Stable 80.71 -0.368 No Nuc No No 0 0
30 Gasi_Mp22 141 16444.8 9.12 Stable 78.72 -0.460 Nuc Cyto No No 0 0
31 Gasi_Mp10 421 47613.9 9.35 Stable 91.62 -0.149 No P.M. No No 0 0
32 Gasi_Mp35 344 39209.1 9.47 Stable 102.01 -0.207 No Mito No No 0 0
33 Gasi_Mp25 299 34587.6 9.41 Stable 107.93 0.095 No P.M. No No 0 4
Theo: Theoretical pI;M.W.: Molecular Weight;P.M.: Plasma Membrane;Nuc: Nucleus;O.M.: Outer Membrane;Mito: Mitochondria;Cyto: Cytoplasmic

Table 2: List of predicted sub-cellular localization of HPs in Ganoderma.

Sub-cellular localization prediction

The role of protein sub-cellular localizations in cytology, proteomics, and drug design investigation is axiomatic. Determining the location of the protein and annotating the genome would help in strategizing the drug and vaccine delivery approach with enhanced selectivity and efficacy, a prerequisite for facilitating systemic pharmacokinetics. This also becomes important because the drug delivery system involves the absorption of the drug in the cytoplasm and vaccine in the surface membrane. In addition, UniProt provides information about various proteins which have still not been characterized, thus, making the prediction of their localizations an urgent need. Where experimental methods may prove to be time-consuming, numerous recently exploited in-silico tools have paved the way for quicker determination of such pivotal characteristics. Such bioinformatics tools that clearly demarcate the protein localizations include WoLF PSORT [26] and CELLO [27]. CELLO is a multi-class support vector machine classification system divulging critical factors like the amino acid composition, dipeptide composition, partitioned amino acid composition and sequence composition. WOLF PSORT is a similar protein subcellular location prediction tool which, based on sorting signals, amino acid composition and functional motifs, converts protein amino acid sequences into numerical localization features which are then predicted. Prediction of signal protein was done using the Signal P 4.1 software [28] whereas the SecretomeP server [29] determined the cleavage sites in amino acid sequences. In addition, TMHMM [30] and HMMTOP [31] calculated the transmembrane helices and topology of the protein, distinguishing between soluble and membrane proteins with high degree of accuracy Moreover, methods based on protein sorting signals, TMHMM and HMMTOP, function on the basis of transport of initial synthesis to its functional areas, depending on signals of N and C-terminal (Table 2). Another method for annotation is based on protein function domain and gene ontology. The former is based on evolutionary process, conserved and known functional domain, whereas the latter is concerned with labeling the gene product.

Hypothetical proteins: Comparing the sequences

The initial step involves the prediction of functionality by comparing the sequence of the concerned hypothetical proteins with the sequences retrieved from UniProt databases. BLASTp [32] and HHpred [33] computed the similarity in query sequence and homology, revealing a total of 33 HPs in Ganoderma (Hidden Markov models) against the non-redundant database which gave different hits with different E-values (Table 3).

S.No. Protein name BLASTp HHpred ProDom BLOCK
1 Galu_Mp10 (mitochondrion) [Moniliophthoraroreri] like protein Piscicolin 126 immunity like protein No Fungal pheromone STE3 G-PCR signature
2 Galu_Mp16 No No No No
3 Galu_Mp17 No Prefoldin No Thyroid hormone-inducible hepatic Spot 14
4 Galu_Mp19 Laglidadg endonuclease family protein Homing endonuclease, Laglidadg Endonuclease Laglidadg Homing endonuclease, Laglidadg/HNH
5 Galu_Mp20 NADH dehydrogenase No No Flagellar basal body-associated protein FliL
6 Galu_Mp21 DNA directed RNA polymerase DNA-directed RNA polymerase in mitochondria RNA polymerase DNA-directed
Nucleotidyl transferase
Frizzled protein signature
7 Galu_Mp22 DNA directed RNA polymerase DNA directed RNA polymerase, bacteriophage T7 RNA polymerase RNA DNA-directed mitochondrion transferase transcription nucleotidyl transferase DNA-directed RNA polymerase, bacteriophage type
8 Gasi_Mp30 GIY-YIG endonuclease Uvr abc system protein C; DNA binding protein Mitochondrion membrane oxidase transmembrane metal-binding iron heme Excinuclease ABC, C subunit, N-term
9 Gasi_Mp34 Intronic ORF at intron 1 of cox1)GIY endonuclease Uvr abc system protein C Endonuclease intron-encoded hydrolase DNA mitochondrion Excinuclease ABC, C subunit, N-term
10 Gasi_Mp42 Intronic ORF at intron 6 of cox1 Intron-associated endonuclease 1 Intronic Intron Nuclear hormones receptors DNA-bind
11 Gasi_Mp09 Intronic ORF at intron 6 of cox1 Intron-associated endonuclease 1GIY-YIGEndonuclease GIY-YIGEndonuclease Excinuclease ABC, C subunit, N-term
12 Gasi_Mp23 GIY Cytb i2 grp ID protein (Podosporaanserina (strain S/ATCC M...) Intron-Associated Endonuclease 1GIY-YIG Putative Mitochondrion EndonucleaseGIY-YIG Intron-encoded nuclease 2 domain
13 Gasi_Mp21 Laglidadg endonuclease n1 Tax Gibberel Laglidadg homing endonuclease Endonuclease mitochondrionIntronicGIY-YIG Homing endonuclease, Laglidadg/HNH
14 Gasi_Mp11 Laglidadg endonuclease n1 Tax Gibberellazeae PH-1 RepIDA5J053_GIBZE Intron-encoded endonuclease EndonucleaseMitochondrion DNALaglidadg Site-specific Intron homing Laglidadg DNA endonuclease
15 Gasi_Mp33 Homing endonuclease (Agaricus bisporus) Laglidadg homing endonuclease Mitochondrion EndonucleaseLaglidadg DNA COX1 AI2 Laglidadg Homing endonuclease, Laglidadg/HNH
16 Gasi_Mp32 RNA-directed DNA polymerase Telomerase reverse transcriptase Putative RNA-directed Transcriptase Telomere reverse transcriptase
17 Gasi_Mp31 Laglidadg endonuclease (Thanatephoruscucumeris (strain AG1-I...) Laglidadg homing endonuclease Endonuclease mitochondrion Homing endonuclease, Laglidadg/HNH
18 Gasi_Mp40 Laglidadg endonuclease (Ajellomycesdermatitidis (strain ER-3) Laglidadg homing endonuclease MitochondrionCOX1-I6 Homing endonuclease, Laglidadg/HNH domain
19 Gasi_Mp24 Maturase/DNA endonuclease (Saccharomyces paradoxus) Intron-encoded endonuclease EndonucleaseLaglidadg site-specific intron homing Laglidadg DNA endonuclease
20 Gasi_Mp41 Laglidadg endonuclease (Agaricusbisporus) Intron-encoded endonuclease EndonucleasePutative Homing DNA Site-SpecificLaglidadg Laglidadg Orf Laglidadg DNA endonuclease
21 Gasi_Mp06 Probable intron-encoded Laglidadg... (Piriformosporaindica (strain DSM 11827) Intron-encoded endonuclease Laglidadg site-specific intron homing Laglidadg DNA endonuclease
22 Gasi_Mp05 Laglidadg intron encoded protein (Nectria haematococca (strain 77-13-4) Laglidadg DNA endonuclease Laglidadg intron DNA Homing endonuclease, Laglidadg/HNH
23 Gasi_Mp27 Laglidadg endonuclease (Agrocybeaegerita) Intron-encoded endonuclease Laglidadg site-specific intron homing Laglidadg DNA endonuclease
24 Gasi_Mp44 Laglidadg endonuclease (Glomus sp. DAOM 240422) Intron-encoded endonuclease Laglidadg site-specific intron homing Laglidadg DNA endonuclease
25 Gasi_Mp38 Laglidadg endonuclease (Ajellomycescapsulatus (strain H88) Laglidadg homing endonuclease Laglidadg DNA uncharacterized intron cox1 Homing endonuclease, Laglidadg/HNH
26 Gasi_Mp36 Laglidadg intron encoded protein (Nectriahaematococca (strain 77-13-4) Laglidadg homing endonuclease Intronic intron Homing endonuclease, Laglidadg/HNH
27 Gasi_Mp39 Laglidadg (Rhynchosporiumsecalis) Laglidadg homing endonuclease Laglidadg DNA uncharacterized intron cox1 Homing endonuclease, Laglidadg/HNH
28 Gasi_Mp26 Laglidadgsuperfamily Laglidadg homing endonuclease Laglidadg Homing endonuclease, Laglidadg/HNH
29  Gasi_Mp37 Laglidadgsuperfamily Laglidadg homing endonuclease Laglidadg Homing endonuclease, Laglidadg/HNH
30 Gasi_Mp22 Laglidadgsuperfamily Laglidadg homing endonuclease Laglidadg Homing endonuclease, Laglidadg/HNH
31 Gasi_Mp10 Laglidadgsuperfamily Laglidadg homing endonuclease Laglidadg Homing endonuclease, Laglidadg/HNH
32 Gasi_Mp35 Laglidadgsuperfamily Laglidadg homing endonuclease Laglidadg Homing endonuclease, Laglidadg/HNH
33 Gasi_Mp25 Laglidadgsuperfamily Laglidadg homing endonuclease Laglidadg Homing endonuclease, Laglidadg/HNH

Table 3: Functional prediction of HPs by BLASTp, HHpred, ProDom and BLOCK.

BLASTp is a commonly used algorithm, included under Basic Local Alignment Search Tool, for searching protein sequences against the non-redundant protein sequences (nr). For each HP queried, this database provides 100 homologs, excluding the proteins with low query coverage of 50% or low sequence identity, referred to as the remote homologues. On the contrary, proteins with sequence identities of 40% with an Evalue of 0.005 form the close homologs of HPs. Moreover, this search tool is also employed for examining the availability of structural homologs in Protein Data Bank (PDB). On the other hand, HHpred is utilized for detection of remote protein homology via pairwise comparison of profile hidden Markov models (HMMs), which it achieves by searching various protein databases like PDB, SCOP, CATH, etc. It is also employed for detection of structural homologs. In comparison to BLASTp, used for determining the sequence identity between two proteins sequences, PRALINE is used for multiple sequences comparisons.

ProtoNet 6.1 clustering based automatic hierarchical database of the SWISS-PROT protein relies on sequence similarity defined in BLAST. ProtoNet predicted superfamilies and subfamilies and large-scale protein annotation. Protein motif sequences are the signature of protein families predicting the function of HPs, particularly catalytic property of the enzyme. Motif known as a super secondary structure of a protein is useful for functional prediction of protein analyzed by the database including InterProScan, PANTHER, Pfam, SMART, ProSite, SUPERFAMILY, etc. InterProScan analyzed different recognition protein and predicted the results by analyzing various parameters related to protein family membership, domains and repeats, biological process and molecular function of HPs. MOTIF was used to assign the probable role of the HPs utilizing Pfam and PROSITE (Table 4).

Protein name InterProScan SUPERFAMILY CATH PANTHER MOTIFF SVMProt CDART SMART ProtoNet 6.1
Galu_Mp10 No No No No HupE/UreJ protein (7..99) Pfam No No 4 transmembrane Cluster 1128119 Name:Bacteroides fragilis
Galu_Mp16 No No No No Bunyavirus glycoprotein G2 (309..371)Pfam Transmembrane No 3 low complexityand4 transmembrane Cluster 1892543 Cluster Name:Pleurotusostreatus
Galu_Mp17 No Prefoldin No No 4 1. Tubulin-beta mRNA autoregulation signal. MRDL (Motif)-PROSITE 2. Helper component proteinase (88..215) 3. Mer2 (56..178) 4.MetRS-N binding domain (92..184)-Pfam Transmembrane No 1 low complexity, 2.coiled coil and 2 transmembrane region Cluster 4127534
cluster Name:SAFF domain.
Muniscin C-terminal
Galu_Mp19 Homing endonuclease, Laglidadg Homing endonuclease Homing endonuclease No Laglidadg endonuclease(57..156) Pfam Laglidadg-like domain (56..138) Pfam Transferases Glycosyltransferase Laglidadg-like domain No Cluster 4142293
Homing endonuclease, Laglidadg/HNH
Galu_Mp20 No Hect, E3 ligase catalytic domain No results No Domain of unknown function (DUF4544) (161..210) Pfam CoA-binding domain (136..189) Pfam Pleurotusostreatus No domain 5 transmembrane and 1 low complexity Cluster 3780223 Cluster Name:Mitochondrion
Galu_Mp21 DNA-directed RNA polymerase DNA/RNA polymerase   No DNA-dependent RNA polymerase(1..108) Pfam Oxidoreductases-Acting on a heme group of donors DNA-dependent RNA polymerase No Cluster 4111445 Cluster Name:DNA-directed RNA polymerasebacteriophage type
Galu_Mp22 DNA-directed RNA polymerase DNA/RNA polymerase DNA directed RNA polymerase No DNA-directed RNA polymerase N-terminal (441..517) Pfam  Transferases-Transferring Phosphorus-Containing Groups DNA-dependent RNA polymerase No Cluster 4111445 Cluster Name:DNA-directed RNA polymerasebacteriophage type
Gasi_Mp30 GIY-YIG nuclease GIY-YIG endonuclease Uvr ABC system protein C-like domain UVRC/OXIDOREDUCTASE GIY_YIG (PROSITE and Pfam) Zinc-binding protein family GIY_YIG superfamily GIYc Cluster 3476869 Cluster Name:Nuclease-associated modular DNA-binding 1
Gasi_Mp34 GIY-YIG nuclease GIY-YIG endonuclease Uvr ABC system protein C-like domain No GIY-YIG (Pfam) All DNA-binding   GIYc IENR1 Cluster 3312983 Intron encoded nuclease
Gasi_Mp42 Introns endonuclease GIY-YIG superfamily GIY-YIG endonucleaseDNA-binding domain of intron-encoded endonucleases Uvr ABC system protein C-like domain No PROSITE (GIY-YIG domain profile) Pfam (GIY-YIG catalytic domain) (NUMOD1 domain) Zinc-binding GIY-YIG nuclease domain superfamily GIY-YIG type nuclease Introns encoded nuclease repeat motif Cluster 4159271Intron endonuclease, group I
Gasi_Mp09 Introns endonuclease I GIY-YIG superfamily Nuclease associated modular domain3 GIY-YIG superfamily DNA-binding domain of intron-encoded endonucleases Uvr ABC system protein C-like domain No PROSITE (GIY-YIG domain profile) Pfam (NUMOD3 motif)
GIY-YIG
NUMOD1
HTH_17
HTH_23
HTH_24
HTH_psq
HTH_Tnp_ISL3
Zinc-binding GIY-YIG nuclease domain superfamily GIY-YIG nuclease Introns encoded nuclease motif Cluster 4159271 Cluster Name: Intron endonuclease, group I
Gasi_Mp23 Intron endonuclease, group I GIY-YIG nuclease Superfamily Nuclease associated molecular domain DNA-binding domain of intron-encoded endonucleaseGIY-YIG endonuclease Uvr ABC system protein C-like domain No Pfam NUMOD1 domain CENP-B N-terminal DNA-binding domain Outer membrane DNA condensation TC 3.A.5 Type II (general) secretory pathway (IISP) family TC 3.A.1 ATP-binding cassette (ABC) family EC 4.2.-.-Lyases-Carbon-Oxygen Lyases GIY-YIG superfamily NUMOD1 domain Introns encoded nuclease motif Cluster 3993847 Cluster Name: Cytochrome b/b6, N-terminal
Gasi_Mp11 Homing endonucleases LAGLI DADG Homing endonucleasesGroup I mobile intron endonuclease intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam Laglidadg DNA endonuclease family DNA repair Calcium-binding Magnesium-binding (Protein family) Laglidadg endonuclease family Laglidadg domain Cluster 4236360 Cluster Name: Laglidadg DNA endonuclease
Gasi_Mp33 Homing endonuclease Laglidadg Homing endonucleasesGroup I mobile intron endonuclease Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam Laglidadg endonuclease Acyl phosphatase ASCH domain All lipid-binding protein Zinc-binding Laglidadg like domain Laglidadg domain 6transmembrane domain Cluster 3940652 Cluster Name: Homing endonuclease, Laglidadg/HNH
Gasi_Mp32 Reverse transcriptase domain DNA/RNA polymerases Reverse transcriptase No  COX1/OXI3 INTRON 1 PROTEIN-RELATED  PROSITE RT_POL (Reverse transcriptase (RT) catalytic domain profile). Pfam Reverse transcriptase (RNA-dependent DNA polymerase) Type II intron maturase HNH endonuclease Zinc-binding Transmembrane Reverse transcriptaseType II intron maturase HNHnucleases Cluster 4092847 Cluster Name:Intron maturase, type II
Gasi_Mp31 Homing endonuclease Laglidadg/HNH Homing endonucleases Group I mobile intron endonuclease Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam Laglidadg endonuclease Zinc-binding Laglidadg like domain Laglidadg Cluster 4271810 cluster Name: Homing endonuclease, Laglidadg/HNH
Gasi_Mp40 Homing endonuclease Laglidadg/HNH Homing endonucleasesGroup I mobile intron endonuclease Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam Laglidadg endonuclease Transmembrane Laglidadgdomain No Cluster 4296638 Cluster Name: Homing endonuclease, Laglidadg/HNH 
Gasi_Mp24 Homing endonuclease Laglidadg Homing endonucleasesGroup I mobile intron endonuclease Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam Laglidadg DNA endonuclease family EC 1.9.-.-: Oxidoreductases-Acting on a heme group of donors Laglidadg endonuclease Laglidadg Cluster 4041029
Cluster Name:Laglidadg DNA endonuclease
Gasi_Mp41 Homing endonuclease Laglidadg Homing endonucleases
Group I mobile intron endonuclease
Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam Laglidadg DNA endonuclease familyBacterial lipoate protein ligase C-terminus All lipid-binding proteins Laglidadg endonuclease Laglidadg Cluster 4041029
Cluster Name: Laglidadg DNA endonuclease
Gasi_Mp06 Homing endonuclease Laglidadg Homing endonucleases. Group I mobile intron endonuclease Intron-encoded DNA endonuclease aI3-like domain 1/2 No Laglidadg Zinc-binding Laglidadg endonuclease Laglidadg Cluster 3993847
Cluster Name: Cytochrome b/b6, N-terminal
Gasi_Mp05 Laglidadg intron encoded protein (Nectriahaematococca (strain 77-13-4) Homing endonucleases. Group I mobile intron endonuclease Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam Laglidadg endonuclease Zinc-binding Laglidadg endonuclease Laglidadg Cluster 4142293 Cluster Name: Homing endonuclease, Laglidadg/HNH 
Gasi_Mp27 Homing endonucleases. Laglidadg Homing endonucleases
Group I mobile intron endonuclease
Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam
Laglidadg endonuclease
Zinc-binding Laglidadg Laglidadg Cluster 4041029. Cluster Name: Laglidadg DNA endonuclease
Gasi_Mp44 Homing endonucleasesLaglidadg Homing endonucleases
Group I mobile intron endonuclease
Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam
Laglidadg endonuclease
EC 2.4-Transferases – Glycosyl transferases Laglidadg Laglidadg Cluster 3987550. Cluster Name: Laglidadg DNA endonuclease
Gasi_Mp38 Homing endonucleases Laglidadg Homing endonucleases
Group I mobile intron endonuclease
Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam
Laglidadg endonuclease
EC 2.4-Transferases Glycosyl transferases Laglidadg Laglidadg Cluster 4396454
Cluster Name: Homing endonuclease, Laglidadg/HNH 
Gasi_Mp36 Homing endonucleases LAGLI DADG Homing endonucleases
Group I mobile intron endonuclease
Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam
Laglidadg endonuclease
DNA replication EC 3.1.-.-Hydrolases-Acting on Ester Bonds Laglidadg Laglidadg and SCOP domain Cluster 4396454. Cluster Name: Homing endonuclease, Laglidadg/HNH
Gasi_Mp39 Homing endonucleases LAGLI DADG Homing endonucleases
Group I mobile intron endonuclease
Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam
Laglidadg endonuclease
EC 2.4.-.-: Transferases – Glycosyl transferases Laglidadg Laglidadg Cluster 4089688
Cluster Name: Homing endonuclease, Laglidadg/HNH
Gasi_Mp26 Homing endonucleases
LAGLI
DADG
Homing endonucleases
Group I mobile intron endonuclease
Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam
Laglidadg endonuclease
EC 3.1.-.-: Hydrolases-Acting on Ester Bonds Laglidadg Laglidadg Cluster 3993847
Cluster Name: Cytochrome b/b6, N-terminal
 Gasi_Mp37 Homing endonucleases LAGLI DADG Homing endonucleases
Group I mobile intron endonuclease
Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam
Laglidadg endonuclease. Phenazine biosynthesis protein A/B
EC 2.4.-.-: Transferases – Glycosyl transferases Laglidadg Laglidadg Cluster 4376345
Cluster Name: Homing endonuclease, Laglidadg/HNH 
Gasi_Mp22 Homing endonucleases LAGLI DADG Homing endonucleases. Group I mobile intron endonuclease Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam
Laglidadg endonuclease
EC 2.4.-.-: Transferases – Glycosyl transferases Laglidadg Laglidadg Cluster 4142293
Cluster Name:Homing endonuclease, Laglidadg/HNH
Gasi_Mp10 Homing endonucleases LAGLI DADG Homing endonucleases
Group I mobile intron endonuclease
Intron-encoded DNA endonuclease aI3-like domain 1/2 No PROSITE:
Serine proteases, subtilase family, histidine active site. PfamLaglidadg endonuclease
Transmembrane Laglidadg Laglidadg Cluster 4142293
Cluster Name:Homing endonuclease, Laglidadg/HNH
Gasi_Mp35 Homing endonucleases LAGLI DADG Homing endonucleases. Group I mobile intron endonuclease Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam Laglidadg endonuclease EC 3.1.-Hydrolases-Acting on Ester Bonds
EC 1.1.-Oxidoreductases Acting on the CH-OH group of donors
Laglidadg Laglidadg Cluster 3913259
Cluster Name: Homing endonuclease, Laglidadg/HNH.
Gasi_Mp25 Homing endonucleases LAGLI DADG Homing endonucleases. Group I mobile intron endonuclease Intron-encoded DNA endonuclease aI3-like domain 1/2 No Pfam
Laglidadg endonuclease
Zinc-binding Laglidadg Laglidadg Cluster 2938885
Cluster Name: Homing endonuclease, Laglidadg/HNH

Table 4: List of different bioinformatics tools used to annotate domains of HPs in Ganoderma.

Function prediction

Functional prediction for the hypothetical proteins of Ganoderma retrieved from UniProt, was carried out with the help of various freely available databases such Pfam, SUPERFAMILY [34], CATH, PANTHER [35], SYSTERS [36], SVMProt [37], CDART [38], SMART [39] and ProtoNet [40]. Presently, BLASTp was used for searching SYSTERS database and the output was obtained in the form of clusters of functionally related proteins. The cluster sequences with an E-value of 0.005 were classified as HP (Table 5). SUPERFAMILY database provided structural and functional annotation based on hidden Markov models with the evolutionary relationship. CATH provided the hierarchical classification of protein domains including class, architecture, topology, homologous superfamily. Hierarchical classification of the protein sequence into superfamily and subfamily clusters was achieved by SYSTERS (SYSTEmatic Re-Searching) with the backing of BLASTp. SVMProt predicted the family of functional protein based on SVM-based classification [37] determining the function of proteins based on the primary structure and also classified the distantly related proteins and homologous proteins of different function [39]. CDART searches the NCBI Entrez Protein Database based on domain architecture, which, in turn, uses the evolutionary distances by direct sequence similarity. Furthermore, CDART found similarities between protein across significant evolutionary distances by utilizing sensitive protein domain profiles rather than using the direct sequence similarity, in addition providing the information about the domain [38]. SMART (Simple Modular Architecture Research Tool), based on profile hidden Markov models determined domains in protein sequences. Data obtained from SMART was used in creating the Conserved Domain Database collection which was also made available as part of the Interpro database. PANTHER is a (Protein Analysis through Evolutionary Relationships) comprehensively organized database that analyzed protein families and subfamilies. PANTHER 7.0 covers the whole genome of 48 organisms and gives the phylogenetics relationship of genes and families. Depending on Hidden Markov Models (HMM), various hits with an E-value less than 1e-3 acutely defined the functionality of HPs.

Protein name E-value SYSTERS Organism Gene Ontology STRING
Galu_Mp10 0.002 121514 Pleurotus ostreatus (HPs) Extra chromosomal DNA (plasmid) No
    Schizophyllum commune (Orf214) 
0.024 73717 Amsacta moorei entomopoxvirus (AMV015)
Galu_Mp16 0.85 128944 Bacteroides fragilis (Integrase) DNA recombination No
Bacteroides thetaiotaomicron (Putative Integrase)
Bacteroides uniformis (Integrase IntN1)
Bacteroides uniformis (IntN1)
Galu_Mp17 0.056 115279 Fusobacterium nucleatum subsp. nucleatum (ABC transporter ATP-binding protein) Thyroid No
Nostoc sp. PCC 7120 (Hypothetical protein Alr4911)
Galu_Mp19 3.00E-11 138807 Podospora anserina endonuclease activity No
Galu_Mp20 4.3 146675 Drosophila melanogaster defense/immunity protein activity COX1-i1 protein
Galu_Mp21 0.033  86244 1.Borrelia burgdorferi (Hypothetical protein BB0399) NoATP binding No
   
   
    Clostridium acetobutylicum (Glycosyltransferase domain containing protein)
0.36 106851   nucleotide binding
0.47 101650   transferase activity
Galu_Mp22 3.00E-05 85579 Spizellomyces punctatus (Orf361) DNA-directed RNA polymerase activity No
    Plasmodium falciparum (RpoD protein) 
0.23 106486  
     
Gasi_Mp30 2.00E-11 139982 Agrocybe aegerita Cellular component Mitochondria Yes
  COX2: cytochrome c oxidase subunit 2
Hypocrea jecorina
Gasi_Mp34 (Ganoderma sinense) 7.00E-10 114742 Podospora anserina intron homing No
 
Trimorphomyces papilionaceus DNA binding
  endonuclease activity
Gasi_Mp42(Ganoderma sinense) 2.00E-20 139982 Agrocybe aegerita DNA metabolic proces atp9 GIY endonuclease (276 aa)
    (Gibberellazeae)
Hypocrea jecorina Catalytic activity
  Excinuclease ABC, C subunit, N-terminal
  GIY-YIG endonuclease  
Gasi_Mp09 (Ganoderma sinense) 2.00E-20 139982 ·Agrocybe aegerita DNA metabolic proces atp9 GIY endonuclease (276 aa)
    (Gibberellazeae)
·Hypocrea jecorina Catalytic activity
   Excinuclease ABC, C subunit, N-terminal
  GIY-YIG endonuclease  
Gasi_Mp23 (Ganoderma sinense) 6.00E-05 139982 Agrocybe aegerita Cellular metabolic process CYT1:
    C terminal fragment of CaP19.3527, likely cytochrome C1 (288 aa
Podospora anserina  
Hypocrea jecorina Catalytic activity
Gasi_Mp11(Ganodermasinense 6.00E-18 102515 Allomyces macrogynus Catalytic activity COX3: Cytochrome c oxidase subunit 3 (EC 1.9.3.1) (Cytochrome c oxidase polypeptide III); Subunits I, (269 aa)
Gasi_Mp33 2.00E-25 103988 Monosiga brevicollis DNA metabolic process SPMIT.02:
DNA binding Uncharacterized cox1 intron-1 45.6 kDa protein (384 aa)
Gasi_Mp32 1.00E-14 138811 Arabidopsis thaliana DNA metabolic process SPMIT.06: Uncharacterized cox1 intron-2 37.2 kDa protein (323 aa)
 RNA-directed DNA polymerase (reverse transcriptase)
Gasi_Mp31 4.00E-37 138807 Emericella nidulans DNA metabolic process eugene3.75340001
    hypothetical protein (277 aa)
Podospora anserina  DNA binding (Populustrichocarpa)
Gasi_Mp40 1.00E-51  138808 Allomyces macrogynus DNA metabolic process cox3-i3: COX3-i1 (185 aa)
Neurospora crassa DNA binding
Gasi_Mp24 2.00E-13 139169 ·Saccharomyces cerevisiae Cell part bI2
Cytochrome b mRNA maturase bI2
 Catalytic activity   
Gasi_Mp41 4.00E-17 139170 Chlamydomonas humicola Cell part AI5_ALPHA
Intron-encoded DNA endonuclease aI5 alpha precursor (DNA endonuclease I-SceIV) 
Catalytic activity  
Gasi_Mp06 1.00E-19 101961 Rhizophydium sp. 136  Cellular metabolic process BI2Cytochrome b mRNA maturase bI2
Catalytic activity
Gasi_Mp05 2.00E-27 139538 Podospora anserina  DNA metabolic process nad3
  Laglidadg endonuclease (423 aa)
DNA binding  
Gasi_Mp27 3.00E-11 102515 Allomyces macrogynus Cell part AI5_ALPHA
Catalytic activity Intron-encoded DNA endonuclease aI5 alpha precursor (DNA endonuclease I-SceIV) 
Gasi_Mp44 2.00E-20 102515 Allomyces macrogynus Cell part AI5_ALPHA
 Catalytic activity Intron-encoded DNA endonuclease aI5 alpha precursor (DNA endonuclease I-SceIV) 
Gasi_Mp38 2.00E-78 138828 Schizosaccharomyces japonicus DNA metabolic process SPMIT.03Uncharacterized cox1 intron-2 37.2 kDa protein (323 aa)
(Schizosaccharomycespombe)
DNA binding  
Gasi_Mp36 3.00E-61 138807 Podospora anserina DNA metabolic process gw1.8046.2.1
Hypothetical protein (229 aa)
DNA binding  (Populustrichocarpa)
Gasi_Mp39 2.00E-06 138828 Schizosaccharomyces japonicus DNA metabolic process nd1-i2 
ND1-i2 protein (722 aa)
DNA binding  (Yarrowialipolytica)
Gasi_Mp26 1.00E-62 138820 Emericella nidulans Cellular metabolic process BI3 :Cytochrome b mRNA maturase bI3
Catalytic activity
Gasi_Mp37 1.00E-12 138819 Podospora anserina DNA metabolic process AI4Intron-encoded DNA endonuclease aI4 precursor (DNA endonuclease I-SceII)
 
DNA binding 
Gasi_Mp22 3.00E-16 139380 Chlamydomonas frankii DNA metabolic process nad4L :Laglidadg endonuclease (Gibberellazeae)
 
DNA binding 
Gasi_Mp10 4.00E-22 139538 Podospora anserina DNA metabolic process nad3: Laglidadg endonuclease (Gibberellazeae)
 
DNA binding 
Gasi_Mp35 2.00E-45 138807 Podospora anserina DNA metabolic process cox1-i2: COX1-i2 protein
  (Yarrowialipolytica)
DNA binding   
Gasi_Mp25 3.00E-19 104255 Rhizophydium sp. 136  DNA metabolic process bI3 : Cytochrome b mRNA maturase bI3(Debaryomyceshansenii)
 
DNA binding 

Table 5: Functional prediction of HPs in SYSTER, InterPro, Gene Ontology, Superfamily and STRING.

Associative network among HPs in Ganoderma

Proteins interact with a network which modulates the functionality of the related proteins. Associative networking among proteins is still to be explored completely. In addition, there is a necessity for experimental performance owing to the crucial role of interactive partners that participate in multimeric complex depicting its relation between protein and function [41]. Physical, functional, experimental, and coexpression are major forces in STRING database depicting interaction. Besides, it also furnishes information about the metabolic or epigenetic associative network related to these proteins. In present work, STRING 9.1 was used for the prediction of the associative network of HPs in Ganoderma, based on parameters comprising neighborhood, co-occurrence, co-expression, experiments, databases, and homology of predicted partners of HPs (Table 5).

Results and Discussion

Availability of Ganoderma commercial products has authenticated its medicinal value. Ganoderma protein sequence, retrieved from NCBI and UniProt, were annotated function to delve deeper in understanding the proteomics. In the present study, 33 HPs of genus Ganoderma were investigated and characterized by bioinformatics tools like BLAST, HHpred, Pfam, PANTHER, CATH, CDART, and SVMProt. Tools like InterProScan BLOCK, MOTIF were employed for discovering functional motifs in the HPs (Tables 3 and 6).

S.No. Proteinname Species UNIPROT ID Protein Function
1 Galu_Mp10 Ganoderma lucidum S4UWG8 Piscicolin 126 immunity like protein
2 Galu_Mp16 Ganoderma lucidum S4UVR8 Integrase family
3 Galu_Mp17 Ganoderma lucidum S4UWD1 Prefoldin
4 Galu_Mp19 Ganoderma lucidum S4UU33 DNA endonuclease
5 Galu_Mp20 Ganoderma lucidum S4UUK5 NADH dehydrogenase
6 Galu_Mp21 Ganoderma lucidum S4UWH0 DNA-directed RNA polymerase
7 Galu_Mp22 Ganoderma lucidum S4UVS2 DNA-directed RNA polymerase
8 Gasi_Mp30 Ganoderma sinense V5KV85 GIY-YIG endonuclease
9 Gasi_Mp34 Ganoderma sinense V5KVR6 GIY-YIG endonuclease
10 Gasi_Mp42 Ganoderma sinense V5KVQ5 GIY-YIG endonuclease
11 Gasi_Mp09 Ganoderma sinense V5KVM5 GIY-YIG endonuclease
12 Gasi_Mp23 Ganoderma sinense V5KWR6 GIY-YIG endonuclease
13 Gasi_Mp21 Ganoderma sinense V5KVM0 Laglidadg homing endonuclease
14 Gasi_Mp11 Ganoderma sinense V5KVL3 Laglidadg homing endonuclease
15 Gasi_Mp33 Ganoderma sinense V5KWS5 Laglidadg homing endonuclease
16 Gasi_Mp32 Ganoderma sinense V5KVP5 Reverse transcriptase
17 Gasi_Mp31 Ganoderma sinense V5KVN0 Laglidadg homing endonuclease
18 Gasi_Mp40 Ganoderma sinense V5KV87 Laglidadg homing endonuclease
19 Gasi_Mp24 Ganoderma sinense V5KVQ8 Laglidadg homing endonuclease
20 Gasi_Mp41 Ganoderma sinense V5KVN9 Laglidadg homing endonuclease
21 Gasi_Mp06 Ganoderma sinense V5KVP6 Laglidadg homing endonuclease
222 Gasi_Mp05 Ganoderma sinense V5KWQ3 Laglidadg homing endonuclease
23 Gasi_Mp27 Ganoderma sinense V5KVP1 Laglidadg homing endonuclease
24 Gasi_Mp44 Ganoderma sinense V5KVS4 Laglidadg homing endonuclease
25 Gasi_Mp38 Ganoderma sinense V5KWS9 Laglidadg homing endonuclease
26 Gasi_Mp36 Ganoderma sinense V5KVN5 Laglidadg homing endonuclease
27 Gasi_Mp39 Ganoderma sinense V5KVS0 Laglidadg homing endonuclease
28 Gasi_Mp26 Ganoderma sinense V5KVM6 Laglidadg homing endonuclease
29  Gasi_Mp37 Ganoderma sinense V5KVQ1 Laglidadg homing endonuclease
30 Gasi_Mp22 Ganoderma sinense V5KVN8 Laglidadg homing endonuclease
31 Gasi_Mp10 Ganoderma sinense V5KV81 Laglidadg homing endonuclease
32 Gasi_Mp35 Ganoderma sinense V5KV86 Laglidadg homing endonuclease
33 Gasi_Mp25 Ganoderma sinense V5KV84 Laglidadg homing endonuclease

Table 6: List of HPs with gene and UNIPROT ID, SVM Prot with annotated function of HPs in Ganoderma lucidum.

The physicochemical properties of all 33 hypothetical proteins were determined by the ExPASy-ProtParam software (Table 2).

MOTIF predicted nature of Galu_Mp10 containing HupE/ UreJ motif, which was found to have similarity with piscicolin 126 immunity proteins as disclosed by HHpred software. TMHMM and HMMTOP give clue about four transmembrane proteins in the sequences which may contributes towards adaptability of Ganoderma in different environmental conditions. On the other hand, Gene Ontology provided information about the extrachromosomal plasmid, thus, verifying its relevancy as an immunological function [5]. In addition, BLOCK databases based on multiple alignments of conserved regions found Galu_Mp10 to be a fungal pheromone, STE3 G-PCR, which aids in detection and verification of protein sequence homology.

Another stable HP was predicted to be Galu_Mp16, on the basis of similarity and homology of sequence by BLASTp and HHpred tool. Subsequent analysis revealed the motif to be similar to Bunyavirus glycoprotein G2 (309.371) as deduced by Pfam-based on annotations and multiple sequence alignments in MOTIF. SYSTERS database, based on hierarchical partitioning, found Galu_Mp16 to be similar in sequence to integrase by in InterPro (IPR), whereas, Gene Ontology indicated it to possess DNA recombination function. STRING database found integrase to be present in the superfamily of Galu_Mp16.

Galu_Mp17 was suggested by HHpred to possess the prefoldin function evidenced throughout the literature for its fundamental role in protein folding. These proteins exhibit protein-folding activity synergistically combining with other protein. SYSTERS conferred Galu_Mp 17 to be similar to SMC proteins (Structural Maintenance Chromosome) which have ATPase family with a role in DNA recombination and repair. BLOCK further conferred it to possess thyroid hormone-inducible hepatic Spot 1 having the ability to regulate lipogenesis especially synthesis of fatty acids [8,9]. Motif by PROSITE which is a database of protein domains, families, and functional sites predicted hypothetical protein to play a role as tubulin-beta mRNA autoregulation signal, helper component proteinase, Mer2 whereas Pfam database based on multiple sequence alignments predicted it as MetRS-N binding domain.

BLASTp and HHpred predicted Galu_Mp19 to be homing endonuclease LAGLIDADG [42]. ProDom database consisting of an automatic compilation of homologous domains found Galu_Mp19 to be endonuclease LAGLIDADG, which was also confirmed by BLOCK. InterProScan identifies it as LAGLIDADG based on a prediction of protein families, domains and functional sites of the Galu_Mp19. SUPERFAMILY verified its evolutionary domain as an endonuclease later verified by CATH database. Motif annotated the protein to be LAGLIDADG by Pfam. SVMProt predicted the functional family based on a primary structure having the tendency for the classification of homologous proteins and distantly related proteins of different function and predicted it as glycosyltransferase. CDART and ProtoNet 6.1 predict and confirmed homing endonuclease LAGLIDADG-like domain. Gene Ontology predicted Galu_Mp19 as homing endonuclease LAGLIDADG, which is a rare cutting enzyme encoded by introns and inteins. Homing endonuclease is highly invasive elements which promote recombination by breakage of the double strand and facilitates repair system [42]. In spite of all, STRING does not exhibit any interaction with other.

Galu_Mp20 is another hypothetical protein portraying NADPH dehydrogenase function, a key enzyme for oxidative phosphorylation in the mitochondria, a crucial role in triggering apoptosis in addition to correlating mitochondrial activities and programmed cell death [10]. BLASTp found the hypothetical protein Galu_Mp20 to possess NADH dehydrogenase activity by sequence similarity, while the conserved sequence in BLOCK related this protein with flagellar basal body-associated protein FliL. SUPERFAMILY predicted it to possess Hect, E3 ligase catalytic domain whereas InterPro (IPR) prediction of protein families, domains, and functional sites revealed it to be lipoprotein, type 6Ferritin/ribonucleotide reductase-like. Gene Ontology recognized Galu_Mp20 to exhibit defense/immunity protein activity

Similarly, the hypothetical protein Galu_Mp21 was observed to possess DNA-dependent RNA polymerases like activity as indicated by BLASTp and HHpred bioinformatics tool. ProDom found DNA-directed RNA polymerase whereas conserved sequences in BLOCK database revealed it to have a frizzled protein signature, further confirmed by InterProScan and SUPERFAMILY prediction tools. Motif by Pfam revealed the N-terminal of the DNA-directed RNA polymerase, whereas SVMProt indicates the primary structure to be consisting of transferring phosphorus-containing groups. CDART and ProtoNet 6.1 determine the similarities across significant evolutionary distances of proteins in Galu_Mp21 to have an activity like DNA-dependent RNA polymerase. Gene Ontology denoted it to have ATP binding, nucleotide binding, and transferase activity, while STRING revealed an absence of its protein interaction with others. Importantly, RNA polymerase controls the gene transcription and improves the efficiency in adaptability in different sort of stress. From these results, it can be surmised that these genes can modulate certain enzyme and their expression, thus, assisting in acclimatization and adaption of the fungus to a particular environment. It also plays a role in telomerase and RNA silencing which may be a target for designing various therapeutics agents.

Gasi_Mp30 through BLASTp was found to have similarity with GIY-YIG endonuclease, whereas HHpred homology detected it to be like Uvr abc system protein C. ProDom analyzed it as metal-binding iron heme domain of the oxidase transmembrane contained on the mitochondrion membrane, whereas BLOCK indicated it to have likeness to N-terminal of the c subunit of excinuclease ABC by the conserved sequence analysis. InterProScan, SUPERFAMILY predicted the protein families, domains and functional sites as GIY-YIG endonuclease. CATH and PANTHER analyzed sequences to have a domain similar to Uvr ABC system protein C while SVMProt deciphered it as zinc-binding protein family. CDART and SMART predicted the domain, further confirming the HP to be similar to GIY_YIG, whereas ProtoNet 6.1 showed its similarity with nuclease-associated modular DNA-binding 1. Gene Ontology used the term in cellular component associated with mitochondria whereas STRING database showed interaction with COX2 cytochrome c oxidase subunit 2.

Gasi_Mp34 protein was also seen to have similarity, as detected by BLASTp with, GIY-YIG endonuclease whereas HHpred by homology detected it be similar to Uvr abc system protein C. ProDom analyzed it as endonuclease intron-encoded hydrolase. In this case also BLOCK detected it to have a likeness to N-terminal of the c subunit of excinuclease ABC by the conserved sequence analysis. InterProScan, SUPERFAMILY predicted protein families, domains and functional sites as GIY-YIG endonuclease. GIY-YIG motif was confirmed by Motif whereas SVMProt deciphered Gasi_Mp34 protein as all DNA-binding. SMART predicted the domain and upheld the conclusion of it being GIY_YIG to be true. ProtoNet 6.1 hierarchically classified it as an intron-encoded nuclease. Gasi_Mp34 was observed to have no positive interaction with others proteins.

Hypothetical protein Gasi_Mp23 was seen to have similarity with GIY Cytb i2 grp ID protein as projected by BLASTp and further confirmed it as GIY-YIG endonuclease by HHpred. ProDom analyzed it as mitochondrial endonuclease GIY-YIG, whereas BLOCK indicated Intron-encoded nuclease two domains. InterProScan and SUPERFAMILY predicted it as GIY-Journal YIG endonuclease. Motif predicted NUMOD1 domain and SMART found it to possess introns encoded nuclease motif while ProtoNet 6.1 has Cytochrome b/b6, N-terminal. Gene ontology found it to have catalytic activity whereas STRING defined and characterized the CYT1: C-terminal fragment of CaP19.3527 indicating it to be likely cytochrome C1.

Other hypothetical proteins, Gasi_Mp09, and Gasi_Mp42, through different parameters (Table 2) were annotated function as GIY-YIG endonuclease. It is interesting to note that the GIY-YIG nuclease domain superfamily has been implicated in the cellular process including DNA repair and recombination, transfer of mobile genetic elements, and restriction of incoming foreign DNA. It serves as a scaffold for metal ion required for catalysis of phosphodiester bond cleavage. Moreover, the nucleases of the GIY-YIG have been seen to be involved in many cellular processes including DNA repair and recombination, transfer of mobile genetic elements, and restriction of incoming foreign DNA. The GIY-YIG domain also forms a compact structural domain serving as a scaffold for the coordination of a divalent metal ion required for catalysis of the phosphodiester bond cleavage.

Another hypothetical protein, Gasi_Mp32, was predicted as reverse transcriptase enzyme, a key enzyme in antiviral drugs and insulin production. Initially, BLASTp predicted it as RNA-directed DNA polymerase, and HHpred detected to be telomerase reverse transcriptase. ProDom database consisting of an automatic compilation of homologous domains found RNA-directed transcriptase which was further confirmed by BLOCK as telomerase. InterProScan identified it as reverse transcriptase based on a prediction of protein families, domains and functional sites of the Gasi_Mp32. SUPERFAMILY verified its evolutionary domain as reverse transcriptase which was also verified by PANTHER database. Motif annotated the protein as reverse transcriptase by Pfam. SVMProt predicted its functional family based on the primary structure and having the capability for the classification of distantly related proteins and homologous proteins of different function and predicts it as zinc binding. CDART, SMART and ProtoNet 6.1 predicted and confirmed this hypothetical protein to be reverse transcriptase.

Homing endonuclease

In addition to the above discussed hypothetical proteins, others including Gasi_Mp21, Gasi_Mp11, Gasi_Mp33, Gasi_Mp31, Gasi_Mp40, Gasi_Mp24, Gasi_Mp41, Gasi_Mp06, Gasi_Mp05, Gasi_Mp27, Gasi_Mp44, Gasi_Mp25, Gasi_Mp35, Gasi_Mp10, Gasi_Mp22, Gasi_Mp37, Gasi_Mp39, Gasi_Mp26, Gasi_Mp36 and Gasi_Mp38 were analyzed with respect to their molecular weight, theoretical pI, aliphatic index and hydropathcity (Table 2). The information thus obtained taken together indicated them to be behaving as LAGLIDADG endonuclease. BLASTp, HHpred, ProDom, BLOCK and various bioinformatics tools with high confidence predicted its function as LAGLIDADG. Homing endonucleases, encoded by open reading frame in self-splicing introns and having an independently folded domain of self-splicing introns known as inteins facilitates self-propagation [43]. It endorses the homing of their respective genetic elements into allelic intronless and inteinless sites and thus playing a vital role in recombination. LAGLIDADG motif plays a crucial role in protein folding, dimerization or interdomain packing and catalysis [42]. Homing endonuclease plays a pivotal role in genome analysis, gene manipulation, cloning, recombination events, double-stranded repair, and transposition as rare cutting endonucleases to uphold chromosomal integrity and viability. LAGLIDADG plays a crucial role in protein folding, dimerization or interdomain packing and catalysis. Importantly, endonuclease plays a pivotal role in DNA repair, and little deviation in normal functioning may lead to the genesis of anomalies.

(Table 6) List of HPs with gene and UNIPROT ID, SVM Prot with annotated function of HPs in Ganoderma lucidum

Conclusion

Prediction and annotation of function to hypothetical proteins forms an indispensable part of bioinformatics and proteomics. Annotating functions to the uncharacterized proteins can assist in fathoming the various mechanisms fundamental in the adaptation of this fungus to a various stress condition. Literature has time and again declared the importance of protein which is unexplored, thus, motivating us to annotate functions to the 33 hypothetical proteins, being carried out for the first time. This endeavor reached it fruition by the assistance of various in silico tools, helping in understanding various decisive parameters along with the characteristics that shape the protein function. Some determining features of the proteins are also discussed. Among the 33 hypothetical proteins sequenced and characterized, all were predicted precisely with high confidence, except Galu_Mp10 and Galu_Mp16, which was mainly due to the absence of insufficient data. The information, thus obtained, reinstates the versatility and multifaceted nature of the fungal proteins. Delving deeper in knowing the revealed functions may help in understanding the regulation of various signaling pathways modulating cell cycle, providing a more lucid view for medical interventions. Lastly, but not all the least, the exploring the functional nature of these proteins may provide a platform for charting out effective therapies and designing drugs. In-depth predictions and functional annotation of the HPs, if and when, carried out, would assist in understanding the nuances of proteomics better.

Acknowledgement

Authors thanks Central University of Punjab, Bathinda for providing the necessary facilities to carry out the present work.

References

open access journals, open access scientific research publisher, open access publisher
Select your language of interest to view the total content in your interested language

Viewing options

Flyer image

Share This Article