FEBS Letters
Volume 580, Issue 3 , Pages 723-730, 6 February 2006

Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L.

Edited by Takashi Gojobori

  • Guillermín Agüero-Chapin

      Affiliations

    • CBQ & CAP, Central University of ‘Las Villas’ 54830, Cuba
  • ,
  • Humberto González-Díaz

      Affiliations

    • CBQ & CAP, Central University of ‘Las Villas’ 54830, Cuba
    • Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela 15782, Spain
    • Corresponding Author InformationCorresponding author.
  • ,
  • Reinaldo Molina

      Affiliations

    • CBQ & CAP, Central University of ‘Las Villas’ 54830, Cuba
    • Universität Rostock, FB Chemie, Albert-Einstein-Str. 3a, D 18059 Rostock, Germany
  • ,
  • Javier Varona-Santos

      Affiliations

    • Biomedicine Unit, FES Iztacala, UNAM, Los Barrios Avenue Num1, Los Reyes Iztacala, Tlalnepantla, DF 54090, Mexico
  • ,
  • Eugenio Uriarte

      Affiliations

    • Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela 15782, Spain
  • ,
  • Yenny González-Díaz

      Affiliations

    • CBQ & CAP, Central University of ‘Las Villas’ 54830, Cuba
    • Provincial Center for Medical Genetics, Las Tunas, 77400, Cuba, and National Center for Human Genetics, ICBP “Victoria de Girón”, La Habana 11600, Cuba

Received 28 September 2005; received in revised form 19 December 2005; accepted 21 December 2005. published online 05 January 2006.

Abstract 

The development of 2D graph-theoretic representations for DNA sequences was very important for qualitative and quantitative comparison of sequences. Calculation of numeric features for these representations is useful for DNA–QSAR studies. Most of all graph-theoretic representations identify each one of the four bases with a unitary walk in one axe direction in the 2D space. In the case of proteins, twenty amino acids instead of four bases have to be considered. This fact has limited the introduction of useful 2D Cartesian representations and the corresponding sequences descriptors to encode protein sequence information. In this study, we overcome this problem grouping amino acids into four groups: acid, basic, polar and non-polar amino acids. The identification of each group with one of the four axis directions determines a novel 2D representation and numeric descriptors for proteins sequences. Afterwards, a Markov model has been used to calculate new numeric descriptors of the protein sequence. These descriptors are called herein the sequence 2D coupling numbers (ζk). In this work, we calculated the ζk values for 108 sequences of different polygalacturonases (PGs) and for 100 sequences of other proteins. A Linear Discriminant Analysis model derived here (PG=5.36·ζ13.98·ζ342.21) successfully discriminates between PGs and other proteins. The model correctly classified 100% of a subset of 81 PGs and 75 non-PG proteins sequences used to train the model. The model also correctly classified 51 out of 52 (98.07%) of proteins sequences used as external validation series. The uses of different group of amino acids and/or axes orientation give different results, so it is suggested to be explored for other databases. Finally, to illustrates the use of the model we report the isolation and prediction of the PG action for a novel sequence (AY908988) isolated by our group from Psidium guajava L. This prediction coincides very well with sequence alignment results found by the BLAST methodology. These findings illustrate the possibilities of the sequence descriptors derived for this novel 2D sequence representation in proteins sequence QSAR studies.

Keywords: Protein sequence, Polygalactouronases, Markov model, Quantitative-structure–activity-relationship, Sequence maps

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S0014-5793(05)01559-0

doi:10.1016/j.febslet.2005.12.072

FEBS Letters
Volume 580, Issue 3 , Pages 723-730, 6 February 2006