Identification of possible properties of an unknown protein from Arabidopsis using bioinformatics methods

Document Type : Research Paper

Authors

Biotechnology Department, Agriculture Faculty, Azarbaijan Shahid Madani University, Km 35 Tabriz-Azarshahr Road, Tabriz, Iran.

Abstract

Today, the genome sequence of most organisms has been identified, and this information is useful in understanding the function and characteristics of organisms. In the meantime, there is unprocessed information that can be used to study unknown proteins and genes with the advancement of technology and the use of bioinformatics tools. In this research, the sequence of a gene with unknown function from Arabidopsis thaliana with accession number of X91953.1 in NCBI database was used to investigate and study its structure and possible function. This gene is related to chromosome number one in Arabidopsis thaliana and with 676 base pairs, it produces a protein with 150 amino acids and a molecular weight of approximately 15 kD. By using bioinformatics servers, the characteristics of both gene and protein sequences were investigated and it was found that it has 18 types of regulatory motifs, the functions of some of which are known, which can be related to the response to light and the activity of Cis elements for expression in the meristem. The analyzes showed that this protein has 38 motifs, three of them are conserved with high frequency. This protein has a signal peptide at its Nt and is leaked into the extracellular space. Therefore, its presence in the intercellular space is more likely than the nucleus and intracellular organelles. There is also a regulation site of a microRNA on its transcript and this microRNA is active in response to salinity and also in the embryo. This unknown protein has about 90% homology with another protein in Arabidopsis with accession number of UPF0540 (At1g62000), which can be used for further studies to identify the role of the desired protein. This protein is expressed in 10 different tissues, mainly in embryo and seed endosperm. Based on all the analyzes carried out, two functions of seed coat differentiation and the biosynthesis of secreted substances due to light can be predicted for this protein. In the continuation of this work, laboratory methods are recommended for testing the functions attributed to this gene.

Keywords

Main Subjects


Allahi, S., Sohani, M. M., & Hasani Kumleh, H. (2017). In silico identification of the PLD gene family and analysis of their expression pattern in response to salt stress in Medicago truncatula. Genetic Engineering and Biosafety Journal, 6(1), 143-156. Azizi-Dargahlou, S., & Fazeli-Nasab, B. (2022). Identification of the Properties and Function of the Unknown Protein with Accession Number AT2G15110. 1 on the TAIR Website. Gene, Cell and Tissue, e122297. doi: 10.5812/gct-122297. Baum, D. (2008). Reading a phylogenetic tree: the meaning of monophyletic groups. Nature Education, 1(1), 190. Bienert, S., Waterhouse, A., Beer, T. A. P. d., Tauriello, G., Studer, G., Bordoli, L., & Schwede, T. (2017). The SWISS-MODEL Repository-new features and functionality. Nucleic Acids Res, 45, 313-319. Bollag, D. M., Rozycki, M. D., & Edelstein, S. J. (1996). Protein Methods, 2nd Edition (2nd Edition ed.): Published by Wiley Publishers. Borges, F., Pereira, P. A., Slotkin, R. K., Martienssen, R. A., & Becker, J. D. (2011). MicroRNA activity in the Arabidopsis male germline. Journal of experimental botany, 62(5), 1611-1620. Carthew, R. W., & Sontheimer, E. J. (2009). Origins and Mechanisms of miRNAs and siRNAs. Cell, 136(4), 642-655. doi:10.1016/j.cell.2009.01.035 Clare, A., Karwath, A., Ougham, H., & King, R. D. (2006). Functional bioinformatics for Arabidopsis thaliana. Bioinformatics, 22(9), 1130-1136. Edwards, Y. J., & Cottage, A. (2003). Bioinformatics methods to predict protein structure and function. Molecular biotechnology, 23(2), 139-166. Edwards, Y. J. K., & Cottage, A. (2003). Bioinformatics Methods to Predict Protein Structure and Function. Molecular Biotechnology, 23, 139-166. Emanuelsson, O., Brunak, S., Von Heijne, G., & Nielsen, H. (2007). Locating proteins in the cell using TargetP, SignalP and related tools. Nature Protocols, 2(4), 953-971. Finn, R. D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R. Y., Eddy, S. R., ... Mistry, J. (2013). Pfam: the protein families database. Nucleic acids research, 42(D1), D222-D230. Galperin, M. Y., & Koonin, E. V. (2004). ‘Conserved hypothetical’proteins: prioritization of targets for experimental study. Nucleic acids research, 32(18), 5452-5463. Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M. R., Appel, R. D., & Bairoch, A. (2005). Protein Identification and Analysis Tools on the ExPASy Server: The Proteomics Protocols Handbook. Guo, X., Ohler, U., & Yildirim, F. (2021). How to find genomic regions relevant for gene regulation. Medizinische Genetik, 33(2), 157-165. Hajibarat, Z., Saidi, A., & Hajibarat, Z. (2018). Bioinformatics analysis of MADS-box in Brachypodium distachyon. Crop Biotechnology, 8(23), 1-15. Hawkins, T. (2007). function prediction of uncharacterized proteins. Journal of Bioinformatics and Computational Biology, 5, 1-30. Hu, B., Jin, J., Guo, A.-Y., Zhang, H., Luo, J., & Gao, G. (2015). GSDS 2.0: an upgraded gene feature visualization server. Bioinformatics $V 31(8), 1296-1297. Kuroda, K., Kato, M., Mima, J., Ueda, M. (2006). Systems for the detection and analysis of protein-protein interactions. Appl Microbiol Biotechnol. 71(2), 127-36. doi: 10.1007/s00253-006-0395-5. Marchler-Bauer, A., Zheng, C., Chitsaz, F., Derbyshire, M. K., Geer, L. Y., Geer, R. C., ... Bryant, S. H. (2012). CDD: conserved domains and protein three-dimensional structure. Nucleic acids research, 41(D1), D348-D352. doi:10.1093/nar/gks1243. Mergner, J., Frejno, M., List, M., Papacek, M., Chen, X., Chaudhary, A., ... Messerer, M. (2020). Mass-spectrometry-based draft of the Arabidopsis proteome. Nature, 579(7799), 409-414. Pino, L., Lin, A., Bittremieux, W. (2019). 2018 YPIC Challenge: A Case Study in Characterizing an Unknown Protein Sample. Journal of Proteome Research. 18(11), 3936-3943. doi: 10.1021/acs.jproteome.9b00384. Pignatta, D., Erdmann, R. M., Scheer, E., Picard, C. L., Bell, G. W., & Gehring, M. (2014). Natural epigenetic polymorphisms lead to intraspecific variation in Arabidopsis gene imprinting. Elife, 3, e03198. Shumilin, I. A., Cymborowski, M., Chertihin, O., Jha, K. N., Herr, J. C., Lesley, S. A., ... Minor, W. (2012). Identification of unknown protein function using metabolite cocktail screening. Structure, 20(10),1715-1725. Sigrist, C. J., De Castro, E., Cerutti, L., Cuche, B. A., Hulo, N., Bridge, A., ... Xenarios, I. (2012). New and continuing developments at PROSITE. Nucleic acids research, 41(D1), D344-D347. Swarbreck, D., Wilks, C., Lamesch, P., Berardini, T. Z., Garcia-Hernandez, M., Foerster, H., ... Ploetz, L. (2007). The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic acids research, 36(suppl_1), D1009-D1014. Tsai, A. Y., Kunieda, T., Rogalski, J., Foster, L. J., Ellis, B. E., & Haughn, G. W. (2017). Identification and Characterization of Arabidopsis Seed Coat Mucilage Proteins. Plant Physiol, 173, 1059-1074. Verma, S. S., Sinha, R., Rahman, M., Megha, S., Deyholos, M. K., & Kav, N.N. (2014). miRNA-mediated posttranscriptional regulation of gene expression in ABR17-transgenic Arabidopsis thaliana under salt stress. Plant molecular biology reporter, 32(6), 1203-1218. Wang, Y., Zhang, H., Zhong, H., & Xue, Z. (2021). Protein domain identification methods and online resources. Computational and structural biotechnology journal, 19, 1145-1153. Xiong, J. (2006). Essential bioinformatics: Cambridge University Press. Zhang, Y., Gao, M., Singer, S., Fei, Z., Wang, H., & Wang, X. (2012). Genome-wide identification and analysis of the TIFY gene family in grape. PLoS One, 7, e44465.