Anotação de erros no corpus COPLE2

Iria del Rio; Amália Mendes

doi:10.26334/2183-9077/rapln4ano2018a42

Anotação de erros no corpus COPLE2

Autores

Iria del Rio Universidade de Lisboa, CLUL
Amália Mendes Universidade de Lisboa, CLUL

DOI:

https://doi.org/10.26334/2183-9077/rapln4ano2018a42

Palavras-chave:

corpus de aprendentes, anotação do erro, processamento de língua natural, aquisição de L2

Resumo

We present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the tokenlevel error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning.

Downloads

Não há dados estatísticos.

Downloads

Publicado

2018-10-15

Como Citar

del Rio, I., & Mendes, A. (2018). Anotação de erros no corpus COPLE2. Revista Da Associação Portuguesa De Linguística, (4), 225–239. https://doi.org/10.26334/2183-9077/rapln4ano2018a42

Descarregar Citação

Edição

N.º 4 (2018): Revista da Associação Portuguesa de Linguística

Secção

Comunicações

Licença

Este trabalho encontra-se publicado com a Licença Internacional Creative Commons Atribuição-NãoComercial-CompartilhaIgual 4.0.

Os autores mantêm os direitos autorais e concedem à revista o direito de primeira publicação. Os artigos estão simultaneamente licenciados sob a Creative Commons Attribution License que permite a partilha do trabalho com reconhecimento da sua autoria e da publicação inicial nesta revista.

Os autores têm autorização para disponibilizar a versão do texto publicada na RAPL em repositórios institucionais ou outras plataformas de distribuição de trabalhos académicos (p.ex. ResearchGate).

Anotação de erros no corpus COPLE2

Autores

DOI:

Palavras-chave:

Resumo

Downloads

Downloads

Publicado

Como Citar

Edição

Secção

Licença

BlocoAPL

RAPL

Idioma

Palavras-chave

Informações

Indexação