Extraction of target structures in learners' corpora
CQL queries for the exploitation of COPLE2
DOI:
https://doi.org/10.26334/2183-9077/rapln10ano2023a2Keywords:
Learner's Corpus, target structures, PFL, L2Abstract
Foreign language (FL) or second language (L2) corpora are sets of productions by non-native speakers, learners of a given language, which contemplate the errors and well-formed structures produced. These serve different research objectives, such as studies on language acquisition (LE and L2), phenomena of linguistic interference or analysis and diagnosis of LE/L2 proficiency levels. In the context of this research, the definition of the learner's proficiency level is often relevant, and this is done, typically, through the analysis of the presence or absence of errors in the learners' productions, based on mappings of typical or expected errors and well-formed structures for a given level of proficiency. However, contrary to the learner's error – which is explicitly marked in the corpus and whose typology and methodology of analysis constitutes a subtopic of investigation on its own –, the well-formed structures, and in particular the target structures (well-formed structures expected in the learners' productions of a given level of proficiency), are not easily identifiable in the corpora. The work presented here aims to fill this gap in COPLE2 – Corpus of Portuguese Foreign/Second Language through the use of expressions in CQL – Corpus Query Language. Based on pre-identified target structures and on the information made available in COPLE2, such as morphosyntactic tagging and different levels of information and annotation (learner production, teacher correction, normalized form, lemma, etc.), we propose query expressions in CQL that easily allow any user to immediately extract examples of target structures by proficiency level. The construction of the query expressions implies the definition and testing of the best strategies for each case and requires the systematization of linguistic rules and patterns of occurrence of the phenomena in question, but also the definition of ways to circumvent the limitations inherent to the corpus annotation, on the one hand, and the query language, on the other.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Raquel Amaro, Alexandre Carreira, Alice Vieira, Cláudia Castro, Esmeralda Leong

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors retain copyright and concede to the journal the right of first publication. The articles are simultaneously licensed under the Creative Commons Attribution License, which allows sharing of the work with an acknowledgement of authorship and initial publication in this journal.
The authors have permission to make the version of the text published in RAPL available in institutional repositories or other platforms for the distribution of academic papers (e.g., ResearchGate).