Taxonomy and labeling of errors in the use of Spanish as a foreign language

In recent decades, a growing interest in learning Spanish as a foreign language has grown. The impact of socioeconomic and sociocultural events on Spanish society (internationalization of the Spanish economy, opening of Cervantes Institute, Erasmus students, linguistic tourism, and an increased interest in Spanish culture) has led to new lines of research derived from the teaching of Spanish as foreign language. The article we present tries to serve as a guide for teachers to be able to diagnose, classify and solve the characteristic problems of learners of Spanish. The phonetic, morphological, lexical, grammatical and pragmatic peculiarities of their mother tongue interfere and hinder the process of learning Spanish. The research is oriented in its different stages to collect, analyze and provide results from written composi-tions in students’ blogs. Specifically, the research aims at the design of a methodology based on the principles of corpus linguistics in its different stages, from the corpus compilation, its labeling (identification and classification of mistakes) and note taking until the extraction of the results. The main result is the creation of a taxonomy and its labeling subdivided into four major subsections: lexical, grammatical, discursive and graphical errors. Abstract


Introduction
We start our research from the applied linguistics discipline, in its autonomy and its structure directly related to the teaching of sociology, discourse analysis and psychology among other disciplines. Applied linguistics mediates between the theoretical and the practical field, it is interdisciplinary and guides in the resolution of problems posed by the use of language.
Studies on the teaching of Spanish as a foreign language are different. From a theoretical point of view, the contributions of the psycholinguistics, sociolinguistics and ethnolinguistics in the strategies of learning of the Spanish language as well as diverse theories and definitions on the acquisition of a second language stand out, attending to the diverse communicative factors for a favorable linguistic exchange.
The contributions of psycholinguistics and sociolinguistics (ethnography of communication, sociology of language, variationism and languages in contact) in applied linguistics will introduce the contrastive analysis and analysis of errors and interlanguage.
Within contrastive analysis, we must emphasize the interference that relates the effects that differ or resemble between the structure of the mother tongue and the target language which occur in the learning of L2. So they are based on a scientific description of the language as it is learned and then compared with a parallel description of the learner's native language. Regarding error analysis, the objective is to study the mistakes from the development of the concept of communicative competence, according to the gradation of communication obstruction and the efficacy that produces in the listener. As for the interlanguage, it presents the direct relationship between L1 and the learning of L2.

Justification and background
In recent decades there has been a growing interest in tackling the problem of second language use. The way of analyzing errors has gradually evolved over time. In the last decades the error was considered as something negative. Later, these ideas, with the rise of the communicative approach, gave way to an assessment of it, considering the error as an indispensable step in the learning process.
In his behaviorist theory, Skinner considers that the formation of linguistic habits based on similarity, not analysis, leads to language learning (Chomsky, 1959). The error was to be avoided, so that it would not be noticed in the student's mind. By rejecting "incorrect" productions, errors are gradually extinguished from the student's repertoire. Therefore, learning the mother tongue takes place thanks to the successive reinforcement of "correct" emissions. The learning of a second language, from a behavioral perspective, will imply the formation of a new repertoire of linguistic habits through the mechanisms of repetition and reinforcement.
The model of teaching that had been implemented was the audio-lingual method, which gave value to the understanding of the spoken language and the oral production of it.
In reaction to this approach, the mentalist current emerges, based on the ideas of Chomsky (1957). It postulated that the structure of mind determined the language and that all human languages had certain structures in common.
Later, the cognitive theories emerged from the reflections of Piaget (1953), whose fundamental objective was to clarify the role that language played in cognitive development. His views on schemes and the notions of accommodation and assimilation have great application in the teaching of second languages. The error is given some importance and it is considered necessary to design activities in which the student makes mistakes and then is able to reflect and correct them.
The theories of acquisition of the mother tongue and the methodological conception of the teaching of second languages are closely related. For decades, attempts have been made to explain the psycholinguistic processes that occur in a student in his or her transition to competition in a second language. Contrastive analysis emerges then. The works of Fries (1945) and Robert  set the beginning of this model, whose main postulate was to compare the two linguistic systems involved, with the aim of predicting which were, on the one hand, structures that presented difficulties and, therefore, could be considered as potential errors, and, on the other hand, those that, since they resemble the mother tongue, should not present any type of difficulty.
We assume that the student facing the foreign language finds that some aspects of the new language are very easy, while others offer great difficulty. Those traits that resemble those of his own language will be easy for him and, on the contrary, those that are different will be difficult (Side, 1957: 2-3).
In this model it was assumed that the learning of a second language was automated and that all errors could be predicted and explained from the interference of the mother tongue. The errors are conceived as something negative and experts argue that errors can be predicted by the systematic comparison of two languages. Contrastive analysis was insufficient to explain students' errors, making it necessary to develop a more efficient model in describing and explaining the complexity of the second language acquisition process. It was shown that not all mistakes were explainable because of a negative interference of L1.
Within the framework of various controversies, error analysis appeared in the sixties, a new model of research inspired by generative linguistics that accentuates the creative aspect of language, raising the role of error. It proposes to analyze and explain the errors of the students in order to discover their causes and identify the psycholinguistic processes that show universal strategies of learning. The error stops being reproachable to become necessary from its conception as an indicator of the learning process. The analysis developed in this model, as its name implies, focused exclusively on the erroneous productions of the students. The change of perspective against the error led to a rethinking of the type of analysis that was carried out and it was Corder who talked about the need to analyze not only the wrong productions, but also the right ones. Corder (1991: 75-76) decriminalizes the error and again elevates its status, since it considers that it is a faithful indicator of the learning process.
Making mistakes is an unavoidable and even necessary part of the learning process. The "correction" of errors is precisely what gives us the kind of negative evidence necessary for the discovery of the correct rule or concept. Consequently, a better description of idiosyncratic sentences directly contributes to an explanation of what students know and do not know at any given moment in their learning, and ultimately they should train the teacher not only to provide them with the information that their hypothesis is wrong, but also the correct type of information or data so that the student will form a more appropriate concept of a rule of the target language.
The interlanguage analysis became a method of linguistic research from 1972, when Selinker, drawing on the contributions of previous studies, designed a model based on his theory in which the student of a second language builds his own linguistic system with elements of L1 and L2, but with particular features. So the student creates a non-real intermediate language in his mind, which is the combination of his mother tongue and the language he intends to learn and, based on that system, operates when he intends to express himself in L2.
The errors are now the sign that show how the student tries to create a language, becoming indicators of the stage of the construction process in which the learner is. The model seeks to describe the construction process by analyzing all the structures of the student's performance, both the wrong and the right ones. Interlanguage studies have shown that those who learn a second language put into practice a series of strategies, variables from one individual to another, which allow them to integrate new information into their schemes, while testing their hypotheses. Among the used strategies we find: simplification, hypergeneralization, fossilization and transference.
Studies in the framework of error analysis have allowed us to know better the difficulties of language acquisition and the persistence of certain errors of Spanish in speakers of different mother tongues.

Methodology
The observation of the mistakes that foreign students make in their learning of Spanish as a foreign language led us to establish the following methodological decisions that served to develop this work.
The overall objective was to create a corpus of blogs of learners of Spanish as a foreign language in order to collect a representative number of texts which showed the errors of students learning Spanish as L2.
• An analysis of the corpus will provide evidence of errors and their repetition to design a detailed taxonomy.
• The results of the corpus constitute a good database for research and, therefore, for the categorization of errors.
• The content is a key part of the aims pursued in our research.
In our research the content deals with: • Full written texts (journal and critical comment).
• Texts collected from blogs made in the classroom.
• The activity was part of the development of the lessons. Students used to write in their blogs and each of the entries was corrected and commented to each student individually.
The corpus on which the research is based is formed by a total of 766 entries of university students of different levels.
Students had to make a blog during the whole school semester. They were asked to write four days a week. Two days they wrote a personal diary and the other two days they had to produce a critical comment of the blog of the subject http://spanishupv.blogspot.com.es.
With all this information we can hold that our corpus responds to the following criteria: • Corpus of learners of Spanish as a foreign language.
• Written corpus (compilation of blog entries).
• Monolingual corpus. All collected texts are in Spanish.
• Synchronous corpus. The texts are compiled as the semester progresses.
• Encoded corpus. An error labeling has been designed.

Analysis
We have done an error analysis based on the grammatical category adding discursive and graphic errors. It is a type of taxonomy that clearly shows the linguistic competence of the student.
This classification is the result of the classifications made by Sonsoles Fernández (1997) andIsabel Santos Gargallo (1993) in order to collect all the errors and create a broad classification with its corresponding labeling.
What follows is the design of the errors classification with their corresponding labels. It is an extensive sorting since it is a rigorous classification. Its purpose is to make possible to pick up all the mistakes made by learners of Spanish as a foreign language.  In this section, when we refer to gender and number, we refer to both as lexical traits inherent to the noun, not to problems of paradigm or concordance construction. Error produced by the generalization of the most frequent paradigm in Spanish "o" for masculine, "a" for feminine.

Nombres femeninos acabados en -a. Female nouns finished with -a
Error that contradicts the general paradigm of genre in Spanish.

Nombres acabados en -e o consonante. Nouns ending with -e or consonant
The error lies in the fact that the student attributes female gender to nouns ended -e and masculine gender to consonant ended ones.

Nombres masculinos acabados en -o. Male nouns ending with -o
Error that contradicts the general paradigm of the genre in Spanish.

Nombres femeninos acabados en -o. Female names ending with -o
Error produced by the generalization of the most frequent paradigm in Spanish "o" for masculine, "a" for feminine.

Reconocimiento del número. Number recognition
The errors related to the number show ignorance of the singular or plural value of some words or grammatical behavior in terms of numbers.

Palabras normalizadas en singular. Standardized words in singular
Failure to assign the plural to nouns that function as individual in Spanish.

Problemas en la percepción de nombres contables y no contables. Problems in the perception of count and non-count nouns
Error caused by giving incorrect value to count or non-count nouns. Error caused by non-pertinent use of prefixes in word creation.

Derivación etimológica no adecuada. Not suitable etymological derivation
Generalization of structures of the MT that lead to the error of the word derivation.

Asociaciones. Associations
Assimilation of the paradigms of the target language.

Uso de un significante español próximo. Use of a nearby Spanish signifier
Errors whose origin refers to the learning of L2 itself. Apparently the words are similar in form but their meanings are different.

Formaciones no atestiguadas en español. Unattributed formations in Spanish
They refer to deviations from the norm, to idiomatic vices and to all those forms that do not correspond to our Spanish alphabet.

ERRORES SEMÁNTICOS SEMANTIC ERRORS
These mistakes affect the meaning.

Lexemas con semas comunes pero no intercambiables en el contexto. Lexemes with common but non-interchangeable semes in the context
This is the distinction between lexemes that belong to the same semantic field but that are differentiated by some syntactic aspect or by restrictions in the use.

Neutralización de semas entre lexemas del mismo campo semántico. Neutralization of semes between lexemes of the same semantic field
In Spanish there are more than one lexeme that share semantic fields and that are compatible in some contexts and not in others. The learner can know only one lexeme and apply it to all semantic fields or ignore the contexts in which each lexeme is used.

Confusión entre lexemas que comparten contextos. Confusion between lexemes that share contexts
An error is made by not knowing the contexts in which each lexeme is used.

Confusión entre lexemas que requieren diferentes reglas sintácticas. Confusion between lexemes requiring different syntactic rules
The student ignores or neutralizes the difference between lexemes of the same semantic field that share a core of common semes. Often it is not just a failure of choice but the use of the only known lexeme is generalize.
1.2.5. "Ser" y "estar". To be This error can be explained if we start from the fact that Spanish owns two verbs for a content. The problem of using the verbs "ser" and "estar" is not only of choice between two verbs that are only one in the mother tongue but also the lack of mastery of the semantic features of the adjective attribute to indicate a circumstantial acquired quality, descriptive of the noun. This difficulty is aggravated when the adjective is polysemic and can be used with either of the two verbs or when its choice is subjective of the speaker (Fernández, 1990).

Perífrasis. Periphrasis
Error produced by the use of expressions that try to substitute an unknown or doubtful lexeme.
1.2.6.1. Traducción literal de la LM. Literal translation from the MT Error produced by contact with L1. The learner selects certain linguistic structures that can compensate its deficiencies and tries to adapt them in the learning of L2. The lexical interference will be total when an element of the L2 is replaced by an element of the L1 or L3. The lexical interference will be partial when the element follows a creative process of adaptation, substitution or reduction.

Sustitución por una explicación. Substitution for an explanation
Learners try to make themselves understood by explaining a word they do not know.
1.2.7. Cambios entre lexemas derivados de la misma raíz. Changes between lexemes derived from the same root These errors are mainly due to the transposition of verbal categories and to the modification of the affixe.

Cambios léxicos de registro. Lexical record changes
Students include some colloquialisms that are not appropriate in written language. These colloquialisms are typical of an oral record.

PARADIGMAS VERBALES. VERBAL PARADIGMS
In this section we include gender and number errors due to incorrect formation and confusion in verbal inflection.
2.1.1. Confusión entre la primera y tercera persona. Confusion between first and third person Exchange between the 1st and 3rd person singular, with minimal formal variants and possibly the most frequent.

Confusión entre conjugaciones. Confusion between conjugations
Incorrect choice of conjugation; learners tend to use the conjugation of verbs that are more familiar to them.

Confusión entre las formas irregulares en las que diptonga la vocal tónica. Confusion between the irregular forms in which the stressed vowel diphthongizes
The learner does not diphthongize the stressed vowel by ignorance of the irregular verbal forms. So they do not know that the vowel "e" diphthongizes in "ie", the vowel "o" diphthongizes in "ue", the vowel "i" diphthongizes in "ie" and the vowel "u" diphthongizes in "ue".

Confusión en el cambio vocálico de la raíz. Confusion in the root vowel change
Error produced by the ignorance of the rule of vowel change of some irregular verbs. In the simple past tense, the verbs belonging to the third conjugation and containing the vowel "e", change to "i" and those containing the vowel "u" change to "o" in the third person singular and plural.

CONCORDANCIAS. AGREEMENTS
By agreement we mean "way of internal relationship between elements of the sentence, which consists of gender equality and number (...) between the noun, adjective, article and pronoun. And in the equality of number and person between the verb and its subject" (Lázaro Carreter, 1971: 105). The following subsections focus on the nominal categories, on the agreement between verb and subject and verbal agreement.

En género. Gender
Errors of agreement between adjectives, determinants and pronouns and the noun to which they refer according to gender.

Preferencia por el masculino. Preference for the masculine
The student tends to use the "unmarked form", that is, the masculine gender against the feminine.

Discordancia entre elementos alejados. Disagreement between remote elements
The error is produced by the distance of the adjective or determinant in relation to the noun that determines the brand of gender.

En número. Number
Agreement errors in number between the noun and its adjacent, or between the pronoun and its referent. Loss in the control of the opposition presence/absence of the article, that is, in the inadequate omission and unnecessary use.

Adición. Addition
Error caused by the unnecessary presence of the article.
2.3.1.1. Uso innecesario del artículo para sustantivos no actualizados. Unnecessary use of article for non-updated nouns The unnecessary presence of these forms is originated by accompanying names not updated or determined in the context.

2.
3.1.2. Uso innecesario del artículo en construcciones introducidas por la preposición "de". Unnecessary use of the article in constructions introduced by the preposition "de" Error generated by the multifunctionality of the complements introduced by the preposition "de".

Omisión del artículo en sustantivos determinados. Omission of the article in certain nouns
Alteration of the norm that requests the presence of the article for the 'determined' noun, because it has already been presented in the context, because it is determined in the same sentence with a complement or because the context clearly delimits it.

Omisión en nombres propios. Omission in proper names
In relation to proper names, the errors we find are originated by generalization of the most common standard, since they are names that have been fixed in Spanish with the corresponding article.
2.3.3. Elección entre las formas determinada/indeterminada. Choice between determinate/indeterminate forms Alteration of the rule that requests the presence of the article for the 'determined' noun, because it has already been presented in the context, because it is determined in the same sentence with a complement or because the context clearly delimits it. In contrast, the unnecessary presence of these forms is originated by not updated accompanying names or determined in the context.

Frases hechas y unidades léxicas complejas. Idioms and complex lexical units
The omission or unnecessary use of the article is found in idioms, in which a cross occurs with another close expression or is due to an interference of the MT.

DEMOSTRATIVOS. DEMONSTRATIVES
2.4.1. Uso de "este" por "aquel/ese" en la deixis temporal. Use of "este" for "aquel/ ese" in the temporary deixis The main reason for the error is the temporary deixis. In Spanish, "aquel/ese" is used in reference to the past and "este" for the present.

Uso anafórico dentro del discurso. Anaphoric use within discourse
The anaphoric use within the speech undergoes modifications regarding what is correct in Spanish. The use of the negative indefinite is erroneous if it appears in the plural when it can only be in singular, the indeterminate article appears in its place, either in plural or singular, and also if it appears in its place another undefinited non negative or omitted. The clitic pronoun, both in terms of direct object and indirect object, is omitted because the student thinks that it is redundant and falls into error, thus neglecting the presence of the double object. However, the omission takes place both in constructions where there is a double object and in which it does not exist.
2.7.5. Uso innecesario del pronombre átono. Unnecessary use of the unstressed pronoun It is a process of hypercorrection. This happens when the student is aware that the omission of the pronoun is a mistake.
2.7.6. Otros problemas relacionados con las formas átonas. Other problems related to unstressed forms Confusion between forms of direct object and indirect object. Errors leading to errors of: leísmo, loísmo, laísmo and repetition of the noun phrase. ONOMÁZEIN 56 (June 2022): 37 -67 Mónica Belda Torrijos Taxonomy and labeling of errors in the use of Spanish as a foreign language 58 2.7.7.1. "Se" lexicalizado o modificador léxico. Lexicalized "se" or lexical modifier Incorrect use of verbs in which the presence of the pronoun is essential since it has been lexicalized and verbs in which its presence or absence causes a change of meaning.
2.7.7.3. Omisión del "se" intransitivador. Omission of intransitive "se" The presence or absence of the pronoun modifies the transitivity or intransitivity of the verb in the sentence.

Intensificador subjetivo. Subjective intensifier
The presence of the pronoun works by extolling the object of the action or by intensifying the quality or magnitude of the same action. We find, therefore, an unnecessary use of the pronoun.

Intransitivador. Intransitivator
Errors that refer to verbs in which the presence or absence of the pronoun decide the transitive or intransitive condition of the sentence. The improper use of the past produces a change of perspective throughout the narrative. On the other hand, the omission of connectors does not allow to observe the correlation of the actions and the passage from one stage of the discourse to another, and with this, the conditions that lead to the use of the past imperfect or preterite perfect. Erroneous choice or omission of the subjunctive mode in temporary subordinate clauses introduced by "cuando" when speaking of the future and in sentences introduced by "antes de", "antes de que".

Oraciones finales. Final clauses
Erroneous choice or omission of the subjunctive in subordinate clauses expressing purpose when the subject of the main and subordinate verbs of the sentence is different.

Oraciones concesivas. Concessive sentences
Erroneous choice or omission of the subjunctive mode in concessive sentences. The student does not know that the use of the subjunctive depends on the insecurity in the concession or the potential difficulty that this implies.

Oraciones consecutivas. Consecutive sentences
Wrong choice or omission of the subjunctive mode in consecutive sentences whose main sentence is negative.

ERRORES DISCURSIVOS. DISCURSIVE ERRORS
3.1. COHERENCIA GLOBAL. GLOBAL COHERENCE 3.1.1. Relación tópico -comentario. Subject-comment relationship It refers to the deviation of the subject or introduction of different topics. Information is dispersed in the development of the subject, as well as unnecessary repetitions, incongruences and incomprehensible elements appear.

Estructuración. Structuring
The structure and organization of the text, and the order and coherence of the text are analyzed in this section.

Anáfora y deixis. Anaphora and deixis
Use of personal pronouns.

Deícticos. Deictic
Wrong use of the form of demonstratives, adverbs and some lexemes.

Anafóricos. Anaphorical
Error with the relationship between the personal pronoun, the demonstrative, the possessive or the indefinite and its antecedent.

Repeticiones. Repetitions
Learners repeat a phrase instead of using a pronoun or other substitute.

Índices temporales y espaciales. Temporal and spatial indices
This mistake is produced by the omission of the temporal and spatial references of the narration.

Enlaces conjuntivos. Connective links
Link selection or omission error.