Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides

This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hu...

Full description

Bibliographic Details
Main Authors: Camacho, Francy Liliana, Torres-Sáez, Rodrigo, Ramos-Pollán, Raúl
Format: Online
Language:eng
Published: Universidad Pedagógica y Tecnológica de Colombia 2016
Subjects:
Online Access:https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834
_version_ 1801706077447782400
author Camacho, Francy Liliana
Torres-Sáez, Rodrigo
Ramos-Pollán, Raúl
author_facet Camacho, Francy Liliana
Torres-Sáez, Rodrigo
Ramos-Pollán, Raúl
author_sort Camacho, Francy Liliana
collection OJS
description This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hundred instances). Like in other fields with similar problems, this results in large variability in the performance of predictive models, hindering any attempt to transfer them to lab practice. Rather than targeting good peak performance obtained from very particular experimental setups, as reported in related literature, we focused on characterizing the behavior of the machine learning methods, as a preliminary step to obtain reproducible results across experimental setups, and, ultimately, good performance. We propose a methodology that integrates feature learning (autoencoders) and selection methods (genetic algorithms) thorough the exhaustive use of performance metrics (permutation tests and bootstrapping), which provide stronger statistical evidence to support investment decisions with the lab resources at hand. We show evidence for the usefulness of 1) the extensive use of computational resources, and 2) adopting a wider range of metrics than those reported in the literature to assess method performance. This approach allowed us to guide our quest for finding suitable machine learning methods, and to obtain results comparable to those in the literature with strong statistical stability.
format Online
id oai:oai.revistas.uptc.edu.co:article-5834
institution Revista Facultad de Ingeniería
language eng
publishDate 2016
publisher Universidad Pedagógica y Tecnológica de Colombia
record_format ojs
spelling oai:oai.revistas.uptc.edu.co:article-58342022-06-15T16:20:06Z Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides Camacho, Francy Liliana Torres-Sáez, Rodrigo Ramos-Pollán, Raúl antimicrobial peptides learning curves machine learning statistical stability support vector regression This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hundred instances). Like in other fields with similar problems, this results in large variability in the performance of predictive models, hindering any attempt to transfer them to lab practice. Rather than targeting good peak performance obtained from very particular experimental setups, as reported in related literature, we focused on characterizing the behavior of the machine learning methods, as a preliminary step to obtain reproducible results across experimental setups, and, ultimately, good performance. We propose a methodology that integrates feature learning (autoencoders) and selection methods (genetic algorithms) thorough the exhaustive use of performance metrics (permutation tests and bootstrapping), which provide stronger statistical evidence to support investment decisions with the lab resources at hand. We show evidence for the usefulness of 1) the extensive use of computational resources, and 2) adopting a wider range of metrics than those reported in the literature to assess method performance. This approach allowed us to guide our quest for finding suitable machine learning methods, and to obtain results comparable to those in the literature with strong statistical stability. Universidad Pedagógica y Tecnológica de Colombia 2016-12-31 info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion investigation application/pdf application/xml https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834 10.19053/01211129.v26.n44.2017.5834 Revista Facultad de Ingeniería; Vol. 26 No. 44 (2017); 167-180 Revista Facultad de Ingeniería; Vol. 26 Núm. 44 (2017); 167-180 2357-5328 0121-1129 eng https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834/4728 https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834/6402
spellingShingle antimicrobial peptides
learning curves
machine learning
statistical stability
support vector regression
Camacho, Francy Liliana
Torres-Sáez, Rodrigo
Ramos-Pollán, Raúl
Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title_full Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title_fullStr Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title_full_unstemmed Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title_short Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title_sort assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
topic antimicrobial peptides
learning curves
machine learning
statistical stability
support vector regression
topic_facet antimicrobial peptides
learning curves
machine learning
statistical stability
support vector regression
url https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834
work_keys_str_mv AT camachofrancyliliana assessingthebehaviorofmachinelearningmethodstopredicttheactivityofantimicrobialpeptides
AT torressaezrodrigo assessingthebehaviorofmachinelearningmethodstopredicttheactivityofantimicrobialpeptides
AT ramospollanraul assessingthebehaviorofmachinelearningmethodstopredicttheactivityofantimicrobialpeptides