Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem
Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while bei...
Main Authors: | , , , |
---|---|
Format: | Online |
Language: | eng spa |
Published: |
Universidad Pedagógica y Tecnológica de Colombia
2020
|
Subjects: | |
Online Access: | https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762 |
_version_ | 1801706089986654208 |
---|---|
author | Solano-Jiménez, Miguel Alexis Tobar-Cifuentes, Jose Julio Sierra-Martínez, Luz Marina Cobos-Lozada, Carlos Alberto |
author_facet | Solano-Jiménez, Miguel Alexis Tobar-Cifuentes, Jose Julio Sierra-Martínez, Luz Marina Cobos-Lozada, Carlos Alberto |
author_sort | Solano-Jiménez, Miguel Alexis |
collection | OJS |
description | Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while being used in a wide variety of knowledge areas, with good results. As a result, they were deployed in this research in a POST problem to assign the best sequence of tags (roles) for the words of a sentence based on information statistics. This process was carried out in two cycles, each of them comprised four phases, allowing the adaptation to the tagging problem in metaheuristic algorithms such as Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, and a memetic algorithm based on Global-Best Harmony Search as a global optimizer, and on Hill Climbing as a local optimizer. In the consolidation of each algorithm, preliminary experiments were carried out (using cross-validation) to adjust the parameters of each algorithm and, thus, evaluate them on the datasets of the complete tagged corpus: IULA (Spanish), Brown (English) and Nasa Yuwe (Nasa). The results obtained by the proposed taggers were compared, and the Friedman and Wilcoxon statistical tests were applied, confirming that the proposed memetic, GBHS Tagger, obtained better results in precision. The proposed taggers make an important contribution to POST for traditional languages (English and Spanish), non-traditional languages (Nasa Yuwe), and their application areas. |
format | Online |
id | oai:oai.revistas.uptc.edu.co:article-11762 |
institution | Revista Facultad de Ingeniería |
language | eng spa |
publishDate | 2020 |
publisher | Universidad Pedagógica y Tecnológica de Colombia |
record_format | ojs |
spelling | oai:oai.revistas.uptc.edu.co:article-117622021-07-13T02:22:58Z Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem Adaptación, comparación y mejora de algoritmos metaheurísticos al problema de etiquetado de partes del discurso Solano-Jiménez, Miguel Alexis Tobar-Cifuentes, Jose Julio Sierra-Martínez, Luz Marina Cobos-Lozada, Carlos Alberto computational intelligence computational linguistics evolutionary computing heuristic algorithms natural language processing parts of speech tagging search methods algoritmos heurísticos computación evolutiva etiquetado de partes del discurso inteligencia computacional lingüística computacional métodos de búsqueda procesamiento de lenguaje natural Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while being used in a wide variety of knowledge areas, with good results. As a result, they were deployed in this research in a POST problem to assign the best sequence of tags (roles) for the words of a sentence based on information statistics. This process was carried out in two cycles, each of them comprised four phases, allowing the adaptation to the tagging problem in metaheuristic algorithms such as Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, and a memetic algorithm based on Global-Best Harmony Search as a global optimizer, and on Hill Climbing as a local optimizer. In the consolidation of each algorithm, preliminary experiments were carried out (using cross-validation) to adjust the parameters of each algorithm and, thus, evaluate them on the datasets of the complete tagged corpus: IULA (Spanish), Brown (English) and Nasa Yuwe (Nasa). The results obtained by the proposed taggers were compared, and the Friedman and Wilcoxon statistical tests were applied, confirming that the proposed memetic, GBHS Tagger, obtained better results in precision. The proposed taggers make an important contribution to POST for traditional languages (English and Spanish), non-traditional languages (Nasa Yuwe), and their application areas. La identificación de partes del discurso (Part-of-Speech Tagging, POST) es una tarea compleja en las aplicaciones de procesamiento de lenguaje natural. Ha sido abordada desde enfoques basados en información estadística y reglas, haciendo uso de distintos métodos y, últimamente, se destacan los algoritmos metaheurísticos obteniendo buenos resultados. Por ello, se involucran en esta investigación para asignar la mejor secuencia de etiquetas (roles) para las palabras de una oración, basándose en información estadística. Este proceso se desarrolló en 2 ciclos, donde cada ciclo tuvo 4 fases para la adaptación al problema de etiquetado en los algoritmos metaheurísticos Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, y un algoritmo memético basado en Global-Best Harmony Search como optimizador global, y en Hill Climbing como optimizador local. Se realizaron experimentos preliminares (utilizando validación cruzada), para ajustar los parámetros de cada algoritmo y luego ejecutarlos sobre los datasets completos de los corpus etiquetados IULA (castellano), Brown (inglés) y Nasa Yuwe (Nasa). Los resultados obtenidos por los etiquetadores propuestos se compararon mediante las pruebas estadísticas no paramétricas de Friedman y Wilcoxon, ratificando que el memético propuesto, GBHS Tagger, obtiene mejores resultados de precisión. Los etiquetadores propuestos se convierten en un aporte muy importante para el POST, tanto para lenguas tradicionales (Inglés y Castellano), no tradicionales (Nasa Yuwe), y sus áreas de aplicación. Universidad Pedagógica y Tecnológica de Colombia 2020-09-18 info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion application/pdf application/pdf application/xml https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762 10.19053/01211129.v29.n54.2020.11762 Revista Facultad de Ingeniería; Vol. 29 No. 54 (2020): Continuos Publication; e11762 Revista Facultad de Ingeniería; Vol. 29 Núm. 54 (2020): Publicación Continua; e11762 2357-5328 0121-1129 eng spa https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762/9627 https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762/9660 https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762/10015 Copyright (c) 2020 Miguel Alexis Solano-Jiménez, Jose Julio Tobar-Cifuentes, Luz Marina Sierra-Martínez, Ph. D., Carlos Alberto Cobos-Lozada, Ph. D. |
spellingShingle | computational intelligence computational linguistics evolutionary computing heuristic algorithms natural language processing parts of speech tagging search methods algoritmos heurísticos computación evolutiva etiquetado de partes del discurso inteligencia computacional lingüística computacional métodos de búsqueda procesamiento de lenguaje natural Solano-Jiménez, Miguel Alexis Tobar-Cifuentes, Jose Julio Sierra-Martínez, Luz Marina Cobos-Lozada, Carlos Alberto Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem |
title | Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem |
title_alt | Adaptación, comparación y mejora de algoritmos metaheurísticos al problema de etiquetado de partes del discurso |
title_full | Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem |
title_fullStr | Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem |
title_full_unstemmed | Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem |
title_short | Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem |
title_sort | adaptation comparison and improvement of metaheuristic algorithms to the part of speech tagging problem |
topic | computational intelligence computational linguistics evolutionary computing heuristic algorithms natural language processing parts of speech tagging search methods algoritmos heurísticos computación evolutiva etiquetado de partes del discurso inteligencia computacional lingüística computacional métodos de búsqueda procesamiento de lenguaje natural |
topic_facet | computational intelligence computational linguistics evolutionary computing heuristic algorithms natural language processing parts of speech tagging search methods algoritmos heurísticos computación evolutiva etiquetado de partes del discurso inteligencia computacional lingüística computacional métodos de búsqueda procesamiento de lenguaje natural |
url | https://revistas.uptc.edu.co/index.php/ingenieria/article/view/11762 |
work_keys_str_mv | AT solanojimenezmiguelalexis adaptationcomparisonandimprovementofmetaheuristicalgorithmstothepartofspeechtaggingproblem AT tobarcifuentesjosejulio adaptationcomparisonandimprovementofmetaheuristicalgorithmstothepartofspeechtaggingproblem AT sierramartinezluzmarina adaptationcomparisonandimprovementofmetaheuristicalgorithmstothepartofspeechtaggingproblem AT coboslozadacarlosalberto adaptationcomparisonandimprovementofmetaheuristicalgorithmstothepartofspeechtaggingproblem AT solanojimenezmiguelalexis adaptacioncomparacionymejoradealgoritmosmetaheuristicosalproblemadeetiquetadodepartesdeldiscurso AT tobarcifuentesjosejulio adaptacioncomparacionymejoradealgoritmosmetaheuristicosalproblemadeetiquetadodepartesdeldiscurso AT sierramartinezluzmarina adaptacioncomparacionymejoradealgoritmosmetaheuristicosalproblemadeetiquetadodepartesdeldiscurso AT coboslozadacarlosalberto adaptacioncomparacionymejoradealgoritmosmetaheuristicosalproblemadeetiquetadodepartesdeldiscurso |