Building and Using Your Own Corpus and Concordance

"The use of corpora and concordance is now an area of considerable interest"

(Coniam, 1997: 199)

Building Your Own Corpus

There are two types of corpus:-

1) Corpus of specific genre of text; e.g. academic article, business letters and newspaper features articles.
Building specific corpus need just need to find the file and download it. Requirement size for each corpus depends on the number of examples of each corpus. Finding certain article can be easy with certain web site for examples; or
2) General corpuses which includes text from wide variety of different genres.
Building a general corpus need to see the world; argest corpus, the Internet. To use the Internet as a corpus, need to use the search engine with wide coverage which search within pages as well as meta-tags is needed.

Make your Own Concordance

After decided to use the Internet as a corpus, next is use it to make a concordance. Concordance has two different meaning; both have its own application in language learning.
To produce word-count concordance from a corpus, need to use concording program that is helpful to the user.

Using Your Corpus and Concordance

A word-count concordance can be use with specific corpus even though most concording programmed is limit to certain concordance in the corpus. The main pedagogical use of word-count concordance can be use in course and materials preparations. It indicate which word need to be taught in specific corpus and help in finding representative texts to be use as teaching tools. Using examples to show concordance in language learning and asked the student to induce the rules of each concordance. It’s encouraging the students to realize the benefit of inducing their own rules in identifying language data. Moreover, while using teacher-chosen example-of-use in concordance as teaching technique, it represent the way the teacher to encourage the student learning the point of learning. However, most of language data that student learned has not been vetted by teachers and it make them learn what is valuable learning also learn many types of learning examples in unambiguous illustrations of a language point. An investigation by the writers shows a valuable use of concordances build by the student themselves without teachers and concordance help, base only on programmed that they familiar with and create their own concordance in language learning. Students' use of self-selected concordance in self-correction write in one possible application besides teacher conduct self-awareness by questioning about word and asked the student to build their own concordance and notice their own ability in concordance from the data they learn. Corpora and concordance has their own use in language learning from standard use do teacher-created concordances in the classroom through awareness-raising questions while the student do self-correction in learning language itself. As a conclusion, using concordances and corpora help the students and teachers to learn about language leaning itself.

'The boss was the same old boss'

boss 2
the 2
old 1
same 1
was 1

To produce a word-count concordance from a corpus, a concording programmed is helpful, although some other programs can also create concordance. 2.06 PM 28 Mac 2009

Using Concordance Programs in the Modern Foreign Languages Classroom

A "concordance", according to the Collins Cobuild English Dictionary, is:

“An alphabetical list of the words in a book or a set of books which also says where each word can be found and often how it is used.”

Using concordance programs in the modern foreign languages classroom by Graham Davies by doing word-count using concordance in creating glossaries and dictionaries and an extremely useful item for teachers in language learning.

Concordance means a list of words taken from a piece of authentic language displayed in the center of the page and shown with parts of the contexts in which they occur.


Concordance 1 on the word "sin":

1. Thus from my lips, by yours, my


is purged.

2. Then have my lips the


that they have took.



from thy lips? O trespass sweetly urged!

4. Give me my



Text used as basis for the concordance, with the keyword in bold:

Ay, pilgrim, lips that they must use in prayer.
O, then, dear saint, let lips do what hands do;
They pray, grant thou, lest faith turn to despair.
Saints do not move, though grant for prayers’ sake.
Then move not, while my prayer’s effect I take.
Thus from my lips, by yours, my sin is purged.
Then have my lips the sin that they have took.
from thy lips? O trespass sweetly urged!
Give me my sin again.

A computer-generated concordance

Now look at that same concordance, displayed with fuller context (here between 75 and 80 characters each side, including blank spaces):

1. Move not, while my prayer’s effect I take. Thus from my lips, by yours, my sin
JULIET Then has my lips the sin that they have took. ROMEO
is purged.

2. Thus from my lips, by yours, my sin is purged. JULIET Then has my lips the sinROMEO Sin from thy lips? O trespass sweetly urged! that they have took.

3. Is purged. JULIET Then have my lips the sin that they have took. ROMEO Sin from thy lips? O trespass sweetly urged! Give me my sin again

4. they have took. ROMEO Sin from thy lips? O trespass sweetly urged! Give me my sin again.

The KWIC and the fuller context display are both useful, depending on what you want to do with the material.

So there you have the basic ingredients for any concordance: a text base and a procedure. But whereas the procedure was manual and it gave us an extremely limited concordance (the concordance had only four citations), the meanings of the word "sin" that appear in it are rooted in the poetic world of Romeo and Juliet. Below, in contrast, is a concordance on the same keyword, based this time on a 25-citation sample created by a concordance, using contemporary including British and American books, ephemera, newspapers, magazines, radio transcripts and transcriptions of ordinary conversations.

Concordance 2 on the word "sin":

List of uses of concordance for language teachers

· The teacher can use a concordance to find examples of authentic usage to demonstrate features of vocabulary, typical collocations, a point of grammar or even the structure of a text

· The teacher can generate exercises based on examples drawn from a variety of corpora, for example gap-filling exercises and tests.

· Students can work out rules of grammar or usage and lexical features for themselves by searching for key words in context. Depending on their level, they can be invited to question some of the rules, based on their observation of patterns in authentic language.

· Students can be more active in their vocabulary learning: depending on their level, they can be invited to discover new meanings, to observe habitual collocations, to relate words to syntax, or to be critical of dictionary entries.

· Students can be invited to reflect on language use in general, based on their own explorations of a corpus of data, thus turning themselves into budding researchers.

Concordance software and corpora

Concordances for Windows

Concordance by R.J.C. Watt of Dundee University makes both a full concordance and a KWIC-concordance (by Watt called “Fast Concordance”). The “Fast Concordance” is really fast. The “Full Concordance” is, of course, a bit slower, and making a full concordance of a very large corpus will require a lot of computer power and patience. But a full concordance of Sir Walter Scott’s Ivanhoe (about 200,000 words) took about 5 minutes on a Pentium 166MHz machine with 64MB of RAM.


How big a corpus one needs also depends on what it is to be used for? Basically the corpus must be so big that there are enough occurrences of the language elements we want to study. For comparison: Cobuild uses a corpus of about 200 million words of written and spoken UK, US and NZ English in dictionary compilation. Birmingham University’s The Bank of English corpus comprises about 500 million words, and is well suited for linguistic research. Letting our students loose on such vast masses of text is, in most cases, likely to create more confusion that clarity. Less will often do. But, of course, if confronted with a really ardent advocate of misguided ideas of what is correct usage and what is not, a failure to find examples of the misguided expressions in a corpus of 400-500 million words just might make an impression on him/her. Chris Tribble argues that a specialist micro corpus of about 25,000-30,000 words will be quite adequate for most educational purposes. On the other hand, see Tribble and Jones (1997:11): “We tend to think that a word like crime is a common word but it actually occurs only about 20 times in every one million words of a 'balanced" collection of texts such as the Longman-Lancaster corpus”. Later we’ll show examples of what can be done with a corpus of about 50,000 German words.

Preparing for working with concordances for teachers and students

Need to prepare yourself and created discussion topic to discuss in the class between the students and the teacher to create interaction between them. Preparing the text for concordance also needed besides prepared a learning task and discussion topic. The discussion topic needs to include a critical mass of idea, control of contextual information, scrupulous of the original materials, deciding on the degree of editorial control needed. Preparing the student to face learning of concordances is important in term of the obvious thing that you forgot to mention, independence from authority, discussion topic and the hard work of learning from a raw data, dynamics and pacing of group-work at the computer and helping students to move on: transferable skills. 2.07 PM 28 Mac 2009


