With the emergence of big data in the post-genomic age, an enormous amount of data had been generated, which requires efficient computational methods for rapid and effective identification of biological features contained in sequences. Even more so in the study of proteomics, because of the structure of protein exhibits more complexity when compared to nucleotide, due to the possible 20 amino acid peptides to the 4 nucleic bases. Therefore, the complexity and information content are expanded exponentially when polypeptides are formed. For example, an amino acid composition (AAC) of a given sequence is consist of only 50 peptides will give rise to a total of 2050 possible sequence-order combinations, which approximate to be 2050 = 1050log20 > 1.1258×1065. For such an astronomical number, it is impracticable to construct a reasonable benchmark dataset that will statistically contain all possible sequence-order information. Also, protein sequence vary widely in length, which poses additional difficulty for incorporating the sequence-order information consistently in both dataset construction and algorithm formulation. When dealing with extremely large dimensions can potentially cause over-fitting, restrict by computation handicap, and increase information redundancy, which results in bad prediction accuracy. To solve this problem, we present a convenient approaches based on the idea of pseudo reduced amino acid composition (PseRAAC), and provide a flexible and user-friendly web server for pseudo K-tuple reduced amino acids composition (PseKRAAC) (, where users can easily generate many different modes of PseKRAAC tailored to their needs by selecting various reduced amino acids alphabets and other characteristic parameters.    Cite and Contact :
Zuo YC, Li Y, Chen YL, Li GP, Yan ZH, Yang L. PseKRAAC: A flexible web server for generating pseudo K-tuple reduced amino acids composition. ,Bioinformatics. 2017, 33(1):122-124.
   Email :