Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries
This article proposes the architecture for a system that uses previously learned weights to sort query results from unstructured data bases when building specialized dictionaries. A common resource in the construction of dictionaries, unstructured data bases have been especially useful in providing...
Main Author: | |
---|---|
Format: | Online |
Language: | eng |
Published: |
Universidad Pedagógica y Tecnológica de Colombia
2014
|
Subjects: | |
Online Access: | https://revistas.uptc.edu.co/index.php/ingenieria/article/view/3161 |
_version_ | 1801706069194440704 |
---|---|
author | Rico-Sulayes, Antonio |
author_facet | Rico-Sulayes, Antonio |
author_sort | Rico-Sulayes, Antonio |
collection | OJS |
description | This article proposes the architecture for a system that uses previously learned weights to sort query results from unstructured data bases when building specialized dictionaries. A common resource in the construction of dictionaries, unstructured data bases have been especially useful in providing information about lexical items frequencies and examples in use. However, when building specialized dictionaries, whose selection of lexical items does not rely on frequency, the use of these data bases gets restricted to a simple provider of examples. Even in this task, the information unstructured data bases provide may not be very useful when looking for specialized uses of lexical items with various meanings and very long lists of results. In the face of this problem, long lists of hits can be rescored based on a supervised learning model that relies on previously helpful results. The allocation of a vast set of high quality training data for this rescoring system is reported here. Finally, the architecture of sucha system, an unprecedented tool in specialized lexicography, is proposed. |
format | Online |
id | oai:oai.revistas.uptc.edu.co:article-3161 |
institution | Revista Facultad de Ingeniería |
language | eng |
publishDate | 2014 |
publisher | Universidad Pedagógica y Tecnológica de Colombia |
record_format | ojs |
spelling | oai:oai.revistas.uptc.edu.co:article-31612018-11-21T00:52:28Z Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries Hacia un sistema de ponderación supervisado de bases de datos no estructuradas utilizadas en la construcción de diccionarios especializados Rico-Sulayes, Antonio unstructured data bases supervised rescoring specialized lexicography dictionary making bases de datos no estructuradas listas de hipótesis supervisadas lexicografía especializada construcción de diccionarios This article proposes the architecture for a system that uses previously learned weights to sort query results from unstructured data bases when building specialized dictionaries. A common resource in the construction of dictionaries, unstructured data bases have been especially useful in providing information about lexical items frequencies and examples in use. However, when building specialized dictionaries, whose selection of lexical items does not rely on frequency, the use of these data bases gets restricted to a simple provider of examples. Even in this task, the information unstructured data bases provide may not be very useful when looking for specialized uses of lexical items with various meanings and very long lists of results. In the face of this problem, long lists of hits can be rescored based on a supervised learning model that relies on previously helpful results. The allocation of a vast set of high quality training data for this rescoring system is reported here. Finally, the architecture of sucha system, an unprecedented tool in specialized lexicography, is proposed. El artículo propone la arquitectura de un sistema que usa valores previamente aprendidos para reordenar resultados de búsquedas en bases de datos no estructuradas al construir diccionarios especializados. Un recurso común en la construcción de diccionarios, las bases de datos no estructuradas han sido útiles ya que proveen información sobre unidades léxicas, tal como la frecuencia o ejemplos de uso de las mismas. Sin embargo, en la construcción de diccionarios especializados, cuya selección de elementos léxicos no depende de la frecuencia, el uso de estas bases de datos queda restringido a la simple ejemplificación. Incluso en esta tarea, la información de las bases de datos no estructuradas puede no ser muy útil si se buscan unidades léxicas con un uso especializado pero con varios otros significados que producen largas listas de resultados. Ante este problema, estas listas pueden ser ponderadas usando un modelo de aprendizaje automático supervisado que se apoye de los resultados previamente útiles. La recolección de un vasto conjunto de datos de alta calidad para este sistema de ponderación es reportada aquí. Finalmente, se propone la arquitectura de tal sistema, el cual representa una herramienta sin precedentes en la lexicografía especializada. Universidad Pedagógica y Tecnológica de Colombia 2014-12-28 info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion investigation investigación application/pdf text/html https://revistas.uptc.edu.co/index.php/ingenieria/article/view/3161 10.19053/01211129.3161 Revista Facultad de Ingeniería; Vol. 24 No. 38 (2015); 97-106 Revista Facultad de Ingeniería; Vol. 24 Núm. 38 (2015); 97-106 2357-5328 0121-1129 eng https://revistas.uptc.edu.co/index.php/ingenieria/article/view/3161/2853 https://revistas.uptc.edu.co/index.php/ingenieria/article/view/3161/4348 |
spellingShingle | unstructured data bases supervised rescoring specialized lexicography dictionary making bases de datos no estructuradas listas de hipótesis supervisadas lexicografía especializada construcción de diccionarios Rico-Sulayes, Antonio Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries |
title | Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries |
title_alt | Hacia un sistema de ponderación supervisado de bases de datos no estructuradas utilizadas en la construcción de diccionarios especializados |
title_full | Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries |
title_fullStr | Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries |
title_full_unstemmed | Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries |
title_short | Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries |
title_sort | towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries |
topic | unstructured data bases supervised rescoring specialized lexicography dictionary making bases de datos no estructuradas listas de hipótesis supervisadas lexicografía especializada construcción de diccionarios |
topic_facet | unstructured data bases supervised rescoring specialized lexicography dictionary making bases de datos no estructuradas listas de hipótesis supervisadas lexicografía especializada construcción de diccionarios |
url | https://revistas.uptc.edu.co/index.php/ingenieria/article/view/3161 |
work_keys_str_mv | AT ricosulayesantonio towardsasupervisedrescoringsystemforunstructureddatabasesusedtobuildspecializeddictionaries AT ricosulayesantonio haciaunsistemadeponderacionsupervisadodebasesdedatosnoestructuradasutilizadasenlaconstrucciondediccionariosespecializados |