Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem

Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while bei...

Full description

Bibliographic Details
Main Authors: Solano-Jiménez, Miguel Alexis, Tobar-Cifuentes, Jose Julio, Sierra-Martínez, Luz Marina, Cobos-Lozada, Carlos Alberto
Format: Online
Language:eng
spa
Published: Universidad Pedagógica y Tecnológica de Colombia 2020
Subjects:
Online Access:https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762
_version_ 1801706089986654208
author Solano-Jiménez, Miguel Alexis
Tobar-Cifuentes, Jose Julio
Sierra-Martínez, Luz Marina
Cobos-Lozada, Carlos Alberto
author_facet Solano-Jiménez, Miguel Alexis
Tobar-Cifuentes, Jose Julio
Sierra-Martínez, Luz Marina
Cobos-Lozada, Carlos Alberto
author_sort Solano-Jiménez, Miguel Alexis
collection OJS
description Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while being used in a wide variety of knowledge areas, with good results. As a result, they were deployed in this research in a POST problem to assign the best sequence of tags (roles) for the words of a sentence based on information statistics. This process was carried out in two cycles, each of them comprised four phases, allowing the adaptation to the tagging problem in metaheuristic algorithms such as Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, and a memetic algorithm based on Global-Best Harmony Search as a global optimizer, and on Hill Climbing as a local optimizer. In the consolidation of each algorithm, preliminary experiments were carried out (using cross-validation) to adjust the parameters of each algorithm and, thus, evaluate them on the datasets of the complete tagged corpus: IULA (Spanish), Brown (English) and Nasa Yuwe (Nasa). The results obtained by the proposed taggers were compared, and the Friedman and Wilcoxon statistical tests were applied, confirming that the proposed memetic, GBHS Tagger, obtained better results in precision. The proposed taggers make an important contribution to POST for traditional languages (English and Spanish), non-traditional languages (Nasa Yuwe), and their application areas.
format Online
id oai:oai.revistas.uptc.edu.co:article-11762
institution Revista Facultad de Ingeniería
language eng
spa
publishDate 2020
publisher Universidad Pedagógica y Tecnológica de Colombia
record_format ojs
spelling oai:oai.revistas.uptc.edu.co:article-117622021-07-13T02:22:58Z Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem Adaptación, comparación y mejora de algoritmos metaheurísticos al problema de etiquetado de partes del discurso Solano-Jiménez, Miguel Alexis Tobar-Cifuentes, Jose Julio Sierra-Martínez, Luz Marina Cobos-Lozada, Carlos Alberto computational intelligence computational linguistics evolutionary computing heuristic algorithms natural language processing parts of speech tagging search methods algoritmos heurísticos computación evolutiva etiquetado de partes del discurso inteligencia computacional lingüística computacional métodos de búsqueda procesamiento de lenguaje natural Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while being used in a wide variety of knowledge areas, with good results. As a result, they were deployed in this research in a POST problem to assign the best sequence of tags (roles) for the words of a sentence based on information statistics. This process was carried out in two cycles, each of them comprised four phases, allowing the adaptation to the tagging problem in metaheuristic algorithms such as Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, and a memetic algorithm based on Global-Best Harmony Search as a global optimizer, and on Hill Climbing as a local optimizer. In the consolidation of each algorithm, preliminary experiments were carried out (using cross-validation) to adjust the parameters of each algorithm and, thus, evaluate them on the datasets of the complete tagged corpus: IULA (Spanish), Brown (English) and Nasa Yuwe (Nasa). The results obtained by the proposed taggers were compared, and the Friedman and Wilcoxon statistical tests were applied, confirming that the proposed memetic, GBHS Tagger, obtained better results in precision. The proposed taggers make an important contribution to POST for traditional languages (English and Spanish), non-traditional languages (Nasa Yuwe), and their application areas. La identificación de partes del discurso (Part-of-Speech Tagging, POST) es una tarea compleja en las aplicaciones de procesamiento de lenguaje natural. Ha sido abordada desde enfoques basados en información estadística y reglas, haciendo uso de distintos métodos y, últimamente, se destacan los algoritmos metaheurísticos obteniendo buenos resultados. Por ello, se involucran en esta investigación para asignar la mejor secuencia de etiquetas (roles) para las palabras de una oración, basándose en información estadística. Este proceso se desarrolló en 2 ciclos, donde cada ciclo tuvo 4 fases para la adaptación al problema de etiquetado en los algoritmos metaheurísticos Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, y un algoritmo memético basado en Global-Best Harmony Search como optimizador global, y en Hill Climbing como optimizador local. Se realizaron experimentos preliminares (utilizando validación cruzada), para ajustar los parámetros de cada algoritmo y luego ejecutarlos sobre los datasets completos de los corpus etiquetados IULA (castellano), Brown (inglés) y Nasa Yuwe (Nasa). Los resultados obtenidos por los etiquetadores propuestos se compararon mediante las pruebas estadísticas no paramétricas de Friedman y Wilcoxon, ratificando que el memético propuesto, GBHS Tagger, obtiene mejores resultados de precisión. Los etiquetadores propuestos se convierten en un aporte muy importante para el POST, tanto para lenguas tradicionales (Inglés y Castellano), no tradicionales (Nasa Yuwe), y sus áreas de aplicación. Universidad Pedagógica y Tecnológica de Colombia 2020-09-18 info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion application/pdf application/pdf application/xml https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762 10.19053/01211129.v29.n54.2020.11762 Revista Facultad de Ingeniería; Vol. 29 No. 54 (2020): Continuos Publication; e11762 Revista Facultad de Ingeniería; Vol. 29 Núm. 54 (2020): Publicación Continua; e11762 2357-5328 0121-1129 eng spa https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762/9627 https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762/9660 https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762/10015 Copyright (c) 2020 Miguel Alexis Solano-Jiménez, Jose Julio Tobar-Cifuentes, Luz Marina Sierra-Martínez, Ph. D., Carlos Alberto Cobos-Lozada, Ph. D.
spellingShingle computational intelligence
computational linguistics
evolutionary computing
heuristic algorithms
natural language processing
parts of speech tagging
search methods
algoritmos heurísticos
computación evolutiva
etiquetado de partes del discurso
inteligencia computacional
lingüística computacional
métodos de búsqueda
procesamiento de lenguaje natural
Solano-Jiménez, Miguel Alexis
Tobar-Cifuentes, Jose Julio
Sierra-Martínez, Luz Marina
Cobos-Lozada, Carlos Alberto
Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
title Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
title_alt Adaptación, comparación y mejora de algoritmos metaheurísticos al problema de etiquetado de partes del discurso
title_full Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
title_fullStr Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
title_full_unstemmed Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
title_short Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
title_sort adaptation comparison and improvement of metaheuristic algorithms to the part of speech tagging problem
topic computational intelligence
computational linguistics
evolutionary computing
heuristic algorithms
natural language processing
parts of speech tagging
search methods
algoritmos heurísticos
computación evolutiva
etiquetado de partes del discurso
inteligencia computacional
lingüística computacional
métodos de búsqueda
procesamiento de lenguaje natural
topic_facet computational intelligence
computational linguistics
evolutionary computing
heuristic algorithms
natural language processing
parts of speech tagging
search methods
algoritmos heurísticos
computación evolutiva
etiquetado de partes del discurso
inteligencia computacional
lingüística computacional
métodos de búsqueda
procesamiento de lenguaje natural
url https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762
work_keys_str_mv AT solanojimenezmiguelalexis adaptationcomparisonandimprovementofmetaheuristicalgorithmstothepartofspeechtaggingproblem
AT tobarcifuentesjosejulio adaptationcomparisonandimprovementofmetaheuristicalgorithmstothepartofspeechtaggingproblem
AT sierramartinezluzmarina adaptationcomparisonandimprovementofmetaheuristicalgorithmstothepartofspeechtaggingproblem
AT coboslozadacarlosalberto adaptationcomparisonandimprovementofmetaheuristicalgorithmstothepartofspeechtaggingproblem
AT solanojimenezmiguelalexis adaptacioncomparacionymejoradealgoritmosmetaheuristicosalproblemadeetiquetadodepartesdeldiscurso
AT tobarcifuentesjosejulio adaptacioncomparacionymejoradealgoritmosmetaheuristicosalproblemadeetiquetadodepartesdeldiscurso
AT sierramartinezluzmarina adaptacioncomparacionymejoradealgoritmosmetaheuristicosalproblemadeetiquetadodepartesdeldiscurso
AT coboslozadacarlosalberto adaptacioncomparacionymejoradealgoritmosmetaheuristicosalproblemadeetiquetadodepartesdeldiscurso