Tài liệu Luận văn Thiết kế và đánh giá một bài kiểm tra tiếng Anh chuyên ngành cho sinh viên xây dựng dân dụng tại trường đại học dân lập Hải Phòng: Vietnam national university, hanoi
College of foreign languages
--------²-------
Designing & evaluating an English reading
test for the non-majors of Civil Engineering
at Haiphong private university
Thiết kế và đánh giá một bài kiểm tra tiếng anh chuyên ngành
cho sinh viên xây dựng dân dụng tại
trường đại học dân lập hải phòng
M.A. minor thesis
Field: methodology
Code: 50702
Course: k11
By : Nguyen Thi Phuong Thu
Supervisor : Tran Hoai Phuong, MEd.
Hanoi - August 2005
Acknowledgements
During the process of further studying and conducting this research I was really honored to receive guidance, assistance, and encouragement from various lecturers as well as supervisors among whom I would like to acknowledge my sincere thanks to the leaders of the College of Foreign Languages who have given me permission and created favorable conditions for study and research.
I would also like to thank my supervisor, Mrs.Tran Hoai Phuong, Med, who really sympathized with me and a...
62 trang |
Chia sẻ: hunglv | Lượt xem: 1545 | Lượt tải: 0
Bạn đang xem trước 20 trang mẫu tài liệu Luận văn Thiết kế và đánh giá một bài kiểm tra tiếng Anh chuyên ngành cho sinh viên xây dựng dân dụng tại trường đại học dân lập Hải Phòng, để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên
Vietnam national university, hanoi
College of foreign languages
--------²-------
Designing & evaluating an English reading
test for the non-majors of Civil Engineering
at Haiphong private university
Thiết kế và đánh giá một bài kiểm tra tiếng anh chuyên ngành
cho sinh viên xây dựng dân dụng tại
trường đại học dân lập hải phòng
M.A. minor thesis
Field: methodology
Code: 50702
Course: k11
By : Nguyen Thi Phuong Thu
Supervisor : Tran Hoai Phuong, MEd.
Hanoi - August 2005
Acknowledgements
During the process of further studying and conducting this research I was really honored to receive guidance, assistance, and encouragement from various lecturers as well as supervisors among whom I would like to acknowledge my sincere thanks to the leaders of the College of Foreign Languages who have given me permission and created favorable conditions for study and research.
I would also like to thank my supervisor, Mrs.Tran Hoai Phuong, Med, who really sympathized with me and also gave me great help as well as invaluable guidance and encouragement from the very start to the end of my research.
It is also my pleasure to give my special thanks to the students of classes XD 501, XD 502 and XD 503 at Hai Phong Private University who enthusiastically took part in doing the test and helped me collect the results of the test.
I also benefited greatly from talks and discussions with my colleagues so let me thank all of them for what they have directly or indirectly contributed.
And finally I really want to thank my beloved husband who always gives great support to my further study.
Nguyen Thi Phuong Thu
List of abbreviations
1. HPU Haiphong Public University
2. CE Civil Engineering
3. CEE Civil Engineering English
4. ESP English for Specific Purposes
5. MCQ Multiple Choice Question
6. T True
7. F False
8. M Mean
9. S Sum of
10. N The number of the scores
11. x The raw score
12. f The frequency with which a score occurs
13. H The highest value
14. L The lowest value
15. SD Standard Deviation
16. FV Item difficulty
17. R The number of the correct answers
18. ve very easy
19. e easy
20. d difficult
21. vd very difficult
22. D Iitem discrimination
23. CU The number of the correct asnwers of the upper half
24. CL The number of the correct asnwers of the lower half
25. gd good discrimination
26. md bad discrimination
27. bi bad item
28. p Spearman rho correlation coefficient
29. SU Score on the upper half
30. SL Score on the lower halfTable of contents
Acknowledgement
List of abbreviations
Part I: Introduction
1.Rationale
2.Aims of the study
3.Scope of the study
4.Methods of the study
5.Design of the study
Part II: Development
Chapter one: Literature review
1.1.Language testing
1.2.Communicative language tests
1.3.Testing reading skills
1.3.1.Multiple choice questions
1.3.2.Short answer questions
1.3.3.Cloze
1.3.4.Selective deletion gap filling
1.3.5.C tests
1.3.6.Coloze elide
1.3.7.Information transfer
1.3.8.Jumbled sentences
1.3.9.Matching
1.3.10.Jumbled paragraphs
1.4.Major characteristics of a good test
1.41.Reliability
1.4.2.Validity
1.4.2.1.Content validity
1.4.2.2.Face validity
1.4.2.3.Criterion-related validity
1.4.2.4.Construct validity
1.4.3.Practicality
1.4.4.Discrimination
1.5.Achievement tests
1.5.1.Class progress test
1.5.2.Final achievement test
Summary
Chapter two: Methodology
2.1.A quantitative study
2.2.The selection of participants
2.3.The materials
2.4.Methods of data collection and data analysis
2.5.Limitations of the research
Summary
Chapter three: Discussion
3.1-The content area of the test
3.2-The relative weights of the different parts of the test
3.3-Constructing the test
3.4-Administering the test
3.5-Marking the test
3.6-Test scores interpreting and evaluation
3.6.1.The frequency distribution
3.6.2.The central tendency
3.6.2.1.The mode
3.6.2.2.The median
3.6.2.3.The mean
3.6.3.The dispersion
3.6.3.1.The low-high
3.6.3.2.The range
3.6.3.3.The standard deviation
3.7-Test item analysis and evaluation
3.7.1.Item difficulty
3.7.2.Item discrimination
3.8.Estimating reliability
Summary
Part III: Conclusion and recommendations
References
Appendices
Part I: Introduction
1.Rationale
Testing is a matter of concern to all teachers - whether we are in the classroom or engaged in syllabus/ materials, administration or research. We know quite well that good tests can improve our teaching and stimulate student learning. Although we may not want to become a measurement expert we may have to periodically evaluate student performances and prepare reports on student progress.
Haiphong Private University (HPU) is a university in which there are a number of classes of Civil Engineering (CE) for students of Construction Department. Generally speaking, non-majors, especially the students of this department, lack background knowledge of English. The non-majors of CE have chances to learn General English (GE) during their first three terms to prepare for their 120 periods of English for Specific Purposes (ESP) in the fourth term. In fact, this type of English is quite demanding for them and many had to admit that they could not learn it well. As a result, many students failed after each final examination.
The causes for the above situation are various. It might be because some students are either too hesitant or too lazy to learn anew subject. It might also be because some students could not overcome the difficulties they usually meet during their study, for example their ESP is too new or too demanding for them, or they have to learn many periods per week to leave time for other subjects. However, the reason which is no less important and which needs taking into account is the matter of testing. In general, teachers at HPU are well-qualified and when teaching they are quite enthusiastic with good teaching methodology. However, the results of their students’ tests are not always satisfactory, the scores they gained were often lower than expected. Moreover, we teachers cannot deny the fact that sometimes the test results do not accurately reflect the testees’ language competence.
According to Brown (1994a: 373) and Hughes (1989: 1) “A great deal of language testing is of very poor quality. Too often language testing has a harmful effect on teaching and learning and too often they fail to measure accurately whatever it is they are intended to measure.”
For all the above reasons the author of this research study would like to take this opportunity to undertake the study entitled “Designing a reading test for the non-majors of Civil Engineering at Haiphong Public University” with a view to evaluating the students’ reading abilityBai doc thi chac la chi kiem tra duoc kha nang doc thoi, chu khong phai la toan bo language ability dau em.
after one term’s study last school year (2004-2005) as well as to gaining some knowledge and experience of foreign language testing for herself after completing better
the study.
2.Aims of the study
The minor thesis is aimed at designing an achievement test of ESP reading which would be conducted in a class of Civil Engineering English at HPU. The test was considered as a final examination. Then the results of the test will be analysed, evaluated, and interpreted. The test takers are non - English - majors.
The specific aims of the research are:
to assess the learners’ achievement in improving reading skill with English of Civil Engineering after 120 period reading course.
to measure their aptitude for the reading skill.
to diagnose their strength and weakness in reading the subject matter.
to find out whether or not the test satisfies the qualities of a good test. From there the test will measure the effectiveness of the teacher’s teaching. If the test is not a good one, some suggestions will be made for a better test form.
3.Scope of the study
“Not all language tests are of the same kinds. They differ with respect to how they are designed, and what they are for; in other words, in respect to test method and test purpose.” (Mc Namara, 2000: 5). For example, in terms of method, there are paper-and-pencil language tests, performance tests, ect. And in terms of purpose, there are achievement tests, proficiency test, and so on. In fact, the same form of test may be used for different purposes, although in other cases the purpose may affect the form.
Due to the limitation of time and ability, it is impossible for the author to design tests of all these types or of all the four language skills (speaking, writing, listening and reading). Therefore, this minor thesis is limited to designing and evaluating an achievement test of ESP reading for the non-majors at HPU and the reading tested was for communicative purposes.
4.Methods of the study
In this minor thesis the author designed an achievement test of reading, administered it and then evaluated it, so the method adopted is quantitative. The data will be collected through testing the students’ reading ability of Civil Engineering English.
5.Design of the study
The study is composed of three parts:
*Part I is the presentation of basic information such as the rationale, the scope of the study, the aims of the study, the methods of the study and finally the design of the study.
*Part II includes three chapters:
+ Chapter one is the literature review in which the literature that is related to language testing and major characteristics of a good reading test is presented.
+ Chapter two is concerned with research methodologies including the methods adopted in doing the research, the selection of participants, the materials, the methods of data collection and data analysis.
+ Chapter three is the discussion, which is the main part of the study. This chapter reviews how a reading test of Civil Engineering for the non-majors at HPU was designed, administered, and then evaluated.
*Part III includes the conclusion and recommendations for further research on the topic.
Following these parts are the references and appendices.
Part II : Development
Chapter one : Literature review
This chapter will provide an overview of the theoretical background of the research. It is composed of five small sections. Section 1.1 brings a significant insight into the concept of language testing. Section 1.2 is the introduction of communicative language tests. Testing reading skills will be discussed in section 1.3 which is followed by section 1.4 with the investigation into major characteristics of a good test. The final area to be mentioned is a brief review of achievement tests which is presented in section 1.5.
1.1.Language testing
An understanding of language testing is relevant both to those who are actually involved in creating language tests, and also to those who are involved in using tests or the information tests provide in practical research contexts. For this very reason, this section wishes to take a close look at what a language test is.
Most researchers agree that language tests play many important roles in life. Firstly the moment one does a test can be considered an important transitional moment in his life, for example, a pupil wishing to enter a university has to pass the entrance tests, or a job seeker has to do a certain test so that the employer will know whether he is competent, or if somebody needs to drive a motor or a car, he or she has to pass a driving test, ect. Secondly, language tests are also important to many occupations. We teachers rarely teach without testing our students’ performance in the subjects. Tests will help us to put them in right places; therefore, language tests, if used properly, can be considered a valuable teaching device for any teacher, and they will contribute positively to the development of both teachers and learners. Last but not least, any researcher who needs measurement of the language proficiency of the subjects cannot do it without using an already existing test or designing his or her own test.
As for Caroll (1968) a test in general will certainly tell something about a testee’s characteristics. Thanks to the results from his test, it is possible for a teacher to judge whether this student is good or bad at the subject tested. Caroll provides the following definition of a test: “a psychological or educational test is a procedure designed to elicit certain behavior from which one can make inferences about certain characteristics of an individual.” (Caroll, 1968: 46)
According to Hughes (1989: 9), tests can be classified as follow:
Proficiency tests
Achievement tests
Class progress tests
Final achievement tests
Diagnostic tests
Placement tests
Aptitude or Prognostic tests
Direct tests versus indirect tests-Discrete- point tests versus intergrative tests
Norm-referenced tests versus criterion-referenced tests
Objective tests versus Subjective tests
Communicative tests
Generally there are some approaches to tests, for example the essay-translation approach, the structuralist approach, the integrative approach, or the communicative approach. However, in this minor thesis, I would like to choose only the communicative approach to testing. This approach focuses on how the language is used in communication ( Em vua noi la bai nay focuses on how the language is used in communication, xong roi lai mo ngoac giai thich la “what” rather than “how” la sao?
‘meaning’ rather than ‘form’). This attempts to obtain different profiles of a learner’s performance in the language.
The development and the use of language tests involve an understandingof the nature of communicative language use and language ability, on the one hand, and of measurement theory, on the other. Each of these areas is complex in its own right.
In short, like teaching, testing is important to any teacher as well as for any student. It is difficult to deny that testing cannot be separated from teaching, testing can even be seen part of teaching. Therefore, we teachers should pay great attention to the issue of testing in our teaching.
1.2.Communicative language tests
There is one thing that is essential to the activities of designing a test and interpreting the meaning of test scores. It is the view of language and language use embodied in the test. The term ‘test construct’ refers to these aspects of knowledge or skill possessed by the candidate which are being measured. To define test construct it is important to be clear about what knowledge of language consists of and how that knowledge is used in actual performance (i.e. language use). It is also essential to understand what view the test takes of language use because if the view the test takes is different, then the test will be different. As a result, the reporting of score will be different, and the test performance will be interpreted differently. Therefore, the difference of format between tests is not just incidental; it implies a difference between views of language and language use. Accordingly, communicative language tests are different from other types of tests such as discrete point test or integrative and pragmatic tests in the following aspects:
According to Mc Namara (2000: 17) discrete point test focuses on students’ knowledge of the grammatical system, of vocabulary and aspects of pronunciation and tends to test these aspects of knowledge in isolation. With this type of test, multiple choice questions are most suitable. This discrete point tradition of testing is seen as focusing too much on knowledge of the formal linguistic system for its own sake rather than on the way the knowledge is used to achieve communication.
Aslo as for Mc Namara using integrated tests is a new orientation in which integrated knowledge of relevant systemic features of language (pronunciation, grammar, vocabulary) with an understanding of context is deployed. Yet, these tests are regarded as time consuming and difficult to score. For example for an oral interview, the test will involve comprehension of extended discourse (both spoken and written), and as a result besides the disadvantages mentioned above it also requires trained raters.
Because of those disadvantages another type of test, pragmatic test, replaced the old ones. It focuses less on knowledge of language and more on psycholinguistic processing involved in language use. With this type, a cloze test was seen the most suitable and was once believed to be easy to construct, relatively easy to score. However, it soon turned out to be measuring the same kinds of things as discrete point tests of grammar and vocabulary. It also failed to test communicative skills.
In the early 1970s thanks to Hyme’s theory of communicative competence (an understanding of language and the ability to use language in context, particularly in terms of the social demand of performance, i.e. knowing a language is more than knowing its rules of grammar) communicative language tests developed and it has the two following features:
’They are performance tests which require assessment to be carried out when the candidate is engaged in communication, either receptive or productive, or both.
They see language as a sociological phenomenon, focusing on the external, social functions of language while integrative and pragmatic tests see language as an internal phenomenon. With this test, the use of authentic texts and real world tasks may be developed.’ (Mc Namara, 2000: 16).
One of its distinguishing feature that supersedes other types of tests is that besides systemic features of language, it requires students’ careful study of the communicative roles and tasks.
All the reasons discussed above are regarded as a strong impetus that initiates this minor thesis into designing a reading test of ESP for communicative purpose, i.e. it is a communicative language test.
1.3-Testing reading skills
In a reading test, test items are often set basing on the text itself. And often within the same test more than one typed of item, maybe two, three or more types of the following items are used:
1.3.1. Multiple-choice questions (MCQs)
This is one of the most popularly used types for setting a reading comprehension test. When doing this test the candidate is required to select the answer from a number of given options, only one of which is correct. The marking is totally objective. Selecting and setting items are, however, subjective processes, and the decision about which is the correct answer is a matter of subjective judgment on the part of the item writer.
1.3.2. Short answer questions
In the test there are questions which require the candidates to write down specific answers in spaces provided on the question paper.
1.3.3. Cloze
This type is also familiar with students. In the cloze procedure, words are deleted from a text after allowing a few sentences of introduction. The deletion rate is mechanically set, usually between every fifth and eleventh word because deleting too many or too few words can cause problems with test validity. Candidates have to fill each gap by supplying the word they think has been deleted.
1.3.4. Selective deletion gap filling
It is selecting items for deletion based upon what is known about language, about difficulty in text and about the way language works in a particular text.
1.3.5. C-Tests
In C-test every second word in a text is partially deleted. In an attempt to ensure solutions, students are given the first half of the deleted word. The examinee completes the word on the test paper and an exact word scoring procedure is adopted.
1.3.6. Cloze elide
In cloze elide test, words that do not belong to the original text are inserted into a reading passage and candidates have to indicate where these insertions have been made.
1.3.7. Information transfer
This is a task where the information transmitted verbally is transferred to a non-verbal form, e.g. by labeling a diagram, completing a chart or numbering a sequence of events. This type of test is an objective method for testing the test takers’ understanding of the texts.
1.3.8.Jumbled sentences
This type of test is intended to test the student’s understanding of a sequence of stages in a process or events in a narrative. A successful student is the one who can reorder jumbled sentences or unscrambled sentences of a story correctly.
1.3.9.Matching
Like MCQ test, matching is a familiar type of testing reading comprehension. With this test, candidates are required to identify the relationships between a list of entries in one column with a list of responses in another column. Candidates may have to match word with word, sentences with sentence, picture with sentence, etc.
1.3.10.Jumbled paragraphs
Similar to tasks involving jumbled sentences, test tasks with jumbled paragraphs require students to rearrange the given paragraphs in the correct order. To do this students have to read through these paragraphs to get the main idea of the whole text. In short, for testing reading abilities different methods have been recommended and a teacher may use this one or that one depending on certain purposes. For example, to develop the communicative nature of tests the use of short answer questions, selective gap filling, C-tests, information transfer techniques or other restricted response formats are often preferred.
1.4. Major characteristics of language tests
Tests can serve pedagogical purpose, to be sure. The most important consideration in designing a language test is its usefulness. This can be defined in terms of their qualities such as reliability, validity, practicality, interactiveness, impact, or authenticity, etc. Among these the four qualities which will be discussed below are more critical for good tests.
1.4.1. Reliability
Reliability is apparently an essential quality of test values; if the Day lai la mot cau run-on nua em nay. Xem lai ca bai di nhe.
scores of a test are not relatively consistent, they fail to provide us with the information about the ability we want to measure. Reliability is considered a fundamental criterion against which any language test has to be judged.
‘Reliability is often defined as consistency of measurement’ (Bachman & Palmer, 1996:19). A reliable test score will be consistent across different characteristics of the testing situation. Thus, reliability can be considered to be a function of the consistency of scores from one set of test tasks to another. Or in other words, tests should not be plastic in their measurements: if a student takes a test at the beginning of the course and again at the end, any improvement in his score should be the results of differences in his skills and not inaccuracies in the test. In the same way, it is important that the student’s score should be the same (or as nearly the same as possible) whether he takes one version of the test or another and whether one person marks the test or another. Reliability also means ‘the consistency with which a test measures the same thing all the time’(Harrison, 1987). This can be presented in the figure below:
Scores on test tasks with characteristics A’
Scores on test tasks with characteristics A
Reliability
Figure 1: Reliability
There are therefore three aspects to reliability: the circumstances in which the test is taken, the way in which it is marked, and the uniformity of the assessment it makes.
According to Hughes (1989) there are two components of test reliability: the performance of candidates from occasion to occasion and the reliability of the scoring. Therefore, to make tests more reliable Hughes (1989) gives a long list of and clear instructions for what we should do:
take enough samples of behavior,
do not allow candidates too much freedom in choosing what and how to answer,
write unambiguous items,
provide clear and explicit instructions,
ensure that tests are well laid out and perfectly legible,
make sure candidates are familiar with format and testing techniques,
provide uniform and non-distracting conditions of administration,
use items that permit scoring which is as objective as possible,
make comparisons between candidates as direct as possible,
provide a detailed scoring key,
train scorers,
agree on acceptable responses and appropriate scores at outset of scoring,
identify candidates by number, not name, and
employ multiple, independent scoring. (Hughes, 1989: 36-42)
The concept of reliability is particularly important when considering language tests withinthe communicative paradigm (Porter, 1983). Davies (1965: 14) also shares the same view but he also admits that ‘reliability is the first essential for any test; but for certain kinds of language test may be very difficult to achieve.’
1.4.2. Validity
The second quality that affects test usefulness is validity. A test is said to be valid if it measures what it is intended to measure. Or in other words, the test may be valid for some purposes, but not for others. For example, if the purpose of a test is to test ability to communicate in a foreign language, then it is valid if it actually tests ability to communicate. If the test is full of questions of grammar, then the test cannot be considered valid. Moreover, if a test is to test reading ability, but it also tests writing, for example, then the test fails to have the validity for testing reading.
However, it is impossible to say whether a test is valid or not valid at all because there are degrees of test validity, i.e. this test may be more valid than that one. Therefore, Moore (1992) defined validity as “the degree to which a test measures what it is supposed to measure” . There are different types of validity such as content, face, construct, criterion-related validity, and they will be all discussed below.
1.4.2.1.Content validity
Among different types of validity, content validity is said to be the most important one, but it is also the simplest. “A test is said to have content validity if its content constitutes a representative sample of the language skills, structures, etc. with which it means to be concerned.” (Hughes, 1989: 22). In order to judge whether or not a test has content validity, we need a specification of all the related aspects that the test is meant to cover, including the skills or the structures. Such a specification should be made at a very early stage in test construction.
According to Weir (1990: 24) the more a test stimulates the dimensions of observable performance and accords with what is known about that performance, the more likely it is to have content validity and construct validity. Thus, for Kelly (1978: 8) content validity seems ‘an almost completely overlapping concept” with construct validity, and for Moller (1982: 68): ‘the distinction between construct and content validity in language testing is not always very marked, particularly for tests of general language proficiency.’ Slightly different from other researchers, Anastasi (1982: 131) defined content validity as: ‘essentially the systematic examination of the test content to determine whether it covers a representative sample of the behavior domain to be measured.’
So we could see that content validity has been defined differently, but most researchers agree that content validity is highly important for the two following reasons. First, the greater a test’s content validity is, the more likely it is to be an accurate measure of what it is supposed to measure. A test in which major areas identified in the specification are under-representedor not represented at all is unlikely to be accurate. Secondly, such a test is likely to have harmful backwash effect. Areas which are not tested are likely to become areas ignored in teaching and learning.
1.4.2.2. Face validity
A test is said to have face validity if it looks as if it measures what it is supposed to measure. Face validity is hardly a scientific concept, yet it is very important. A test which does not have face validity may not be accepted by candidates, teachers, education authorities or employers.
1.4.2.3. Criterion-related validity
There are essentially two kinds of criterion-related validity: concurrent validity and predictive validity. According to Viete (1992), concurrent validity is used to refer to the relationship between the test results and the results of another assessment (using an appropriate, reliable and validated assessment procedure) which was made at approximately the same time. And predictive validity concerns the degree to which a test can predict candidate’s future performance.
1.4.2.4 Construct validity
Like reliability, construct validity is essential to the usefulness of any language test. The term construct validity is used to refer to the extent to which we can interpret a given test score as an indicator of the ability(ies) or construct(s), we want to measure. The purpose of construct validation is to provide evidence that underlying theoretical constructs being measured are themselves valid. Typically, construct validation begins with a psychological construct that is part of a formal theory. The theory enables certain predictions about the construct variable will behave or be influenced under specified conditions. The construct is then tested under the conditions specified. If the hypothesized results occur, the hypotheses are supported and the construct is said to be valid. Often this will involve a series of tests under a variety of conditions.
Test validity is the one that is always paid the most attention to since it is an indispensable quality of all good tests. When constructing a test, the first thing to be focused on is test validity. Hughes (1989: 22) agrees that if in a test important parts are not defined or not presented, it will fail to be accurate. He notes that “the greater a test's content validity is, the more likely it is to be an accurate measure of what it is to measure.”
1.4.3. Practicality
Another quality of a good test which should not be forgotten is its practicality. Although it is different in nature from other qualities, practicality is not less important. Unlike reliability and validity, practicality does not pertain to the uses that are made of test scores, but primarily to the ways in which the test will be implemented in a given situation, and to whether the test will be developed and used at all. Practicality often affects a tester’s decisions during the development of a test, i.e., at every stage of his testing.
Practicality can be defined as ‘the relationship between the resources that will be required in the design, development, and use of the test and the resources that will be available for these activities’. (Bachman & Palmer, 1996: 35). This relationship can be represented as in the figure below:
Available resources
Practicality=
Required resources
When practicality ³ 1, the test development and use is practical
When practicality< 1, the test development and use is not practical.
In a nutshell, when designing a test the tester should always bare in mind this quality- practicality-to ensure that the test is as economical as possible, both in time (preparation, sitting and marking) and in cost (materials and hidden costs of time spent). In other words, a practical test is the one which can minimize the use of the available resources, i.e., the required resources must not be more than the available resources.
1.4.4. Discrimination
Finally, a discussion of the basic concepts behind testing would be incomplete without the treatment of the closely related idea of discrimination. According to Harrison (1994:14) discrimination is ‘the extent to which a test separates the students from each other.’ However, the extent of discrimination varies according to each kind of test. For instance, an achievement test should result in a wide range of scores because it is easier to make decisions about where to separate one group of students from another so that they can be awarded different grades. A diagnostic test, however, may be intended to show that nearly all students have learnt the material tested, and in this case they should all get fairly high scores.
1.5. Achievement tests
Different researchers have different points of view of an achievement test. According to Harrison (1983: 65) ‘designing and setting an achievement test is a bigger and more formal operation than the equivalent work for a diagnostic test, because the student's result is treated as a qualification which has a particular value in relation to the results of other students. An achievement test involves more detailed preparation and covers a wide range of material, of which only the sample can be assessed.’
Heaton (1988) defines achievement tests What are “them”
as the ones that are “based on what the students presumed to have learnt, not necessarily on what they have actually learnt nor on what has actually been taught.”
In Brown’s point of view “an achievement test is related directly to classroom lesson, units or even a total curriculum within a particular time frame.” (Brown, 1994: 259). In other words, an achievement test measures a student’s mastery of what should have been taught. It is thus concerned with covering a sample (or selection), which accurately represents the contents of a syllabus or a course book. Unlike progress test, achievement test should attempt to cover as much of the syllabus as possible. If we confine our test to only part of the syllabus, the contents of the test will not reflect all that the student has learnt.
Achievement test can be subdivided into class progress tests and final achievement tests.
1.5.1. The class progress test
The class progress test is often conducted during the course and is developed by the teacher himself after each chapter or each term. He constructs such type of test to judge how successful his teaching is and also to find out what his students have achieved from his teaching. The class progress test is a teaching device and can be considered a good chance for the students to prepare for the final achievement test.
1.5.2. The final achievement test
The final achievement test is more formal and intended to measure achievement on a larger scale (annual exams, entrance exams, final exams). The final achievement test is not written and administered by the teacher himself, but maybe by ministries of education, boards of examiners, or by members of teaching institutions. A final achievement test is often based on an adopted syllabus and its approach, either syllabus-content approach or syllabus-objective approach. If the test is based on the former, its contents should be based directly on a course syllabus or on the textbooks and other materials chosen. If it is based on the latter, its contents are based directly on the objectives of the course.
Summary
In this chapter I have briefly dealt with the concept of a language test, how it is defined and what is important in designing it. Moreover, I also mentioned the concept of communicative language ability in which communicative competence was also discussed. Also, in this chapter the definition of an achievement test as well as testing reading skills were presented because they play an important role in the process of doing this research.
Chapter two: Methodology
This chapter will include a brief introduction of a quantitative study, the selection of participants who took part in doing the test, and the materials from which the test items were taken. The methods of data collection and data analysis are presented afterwards. Finally come the limitations of the research.
2.1.A quantitative study
Like qualitative research, quantitative research comes in many approaches including descriptive, correlational, exploratory, quasi-experimental, and true-experimental techniques.
As a teacher of Civil Engineering English, I designed this reading test to understand better how things are really operating in my own classroom as well as to describe the performance of my learners in the reading skill. After 120 period reading course 50 How many? Be specific!
students were chosen from three different classes (XD501, XD 502, XD 503) to do a reading test in the time given (60 minutes) and then the results collected from the testing papers would Em oi, phai consistent ve thi chu. Trong research report, khi ke ve nhung thu em da lam thi tat ca deu o thi qua khu hoac tuong lai trong qua khu.
be described in different terms with the use of the descriptive statistics technique. The correlational research technique was also used to find out the reliability coefficient latter in the study.
2.2.The selection of Participants
The students at Haiphong Private University mainly come from different towns and cities in the North of Vietnam. They are generally aged between 18 and 22, or older.
At the university, they study for eight terms in four years. There students are classified into majors and non-majors of English. The latter usually have to learn a foreign language, in this case English, in only two years of their whole student lifeIn the first three terms, they study General English and in the fourth term English for Specific Purposes (ESP). After two years’ English learning, they are required to be able to read and translate their ESP at intermediate level. However, students often have varying English levels prior to the course due to the fact that at secondary school they learned different languages, including Russian, French, and Chinese. It is therefore important for teachers to apply appropriate methods in teaching them GE as well as ESP to help them become more proficient. It is also critical that teachers give them suitable tests which meet their need and Em nen de y den cau truc song song trong khi viet nhe.
the requirements of society at the same time.
2.3.The Materials
During the first three terms the CE students are required to learn all the 15 units in Elementary Headway, and the first 8 units in Pre-intermediate Headway. These three terms include 205 periods in all, 75 periods for each term. In the fourth term, they study 120 periods of ESP usig a the 15-unit textbook on English for Civil Engineering.
2.4.Methods of data collection and data analysis
To collect data for the research, a 34-item test of Civil Engineering English reading was delivered to 50 students of the Construction Department. These non-majors did the test within the time frame given (60 minutes). Then the test papers were collected, and then were marked, analysed, and interpreted. Doing these things did point out how many students did the test well, how many performed badly, the most frequent scores the testees got, how these scores ranged, how many scores deviated from the mean, etc.
2.5.Limitations of the research
Like in any other studies, some limitations cannot be avoided in this one. Firstly, because of the limitation of time as well as of ability, the author could design only one reading test to be conducted on 50 students, which might not be a large number. Yet, it is hoped that the results could be reliable and valid enough for the researcher to make inferences and come to certain conclusions. Secondly, instead of designing different types of test, the author was able to make solely one type, that is an achievement test to measure the progress her students had made in terms of reading skills after undertaking the course of English for Civil Engineering in their last term in 2004-2005 school year. From the results, the author could also measure the effectiveness of her teaching.
Summary
This chapter gives a brief account on a quantitative study, in which the author used the descriptive statistics and correlational technique to analyse the data. Following the methods, the selection of participants and materials has also been dealt with. A quick introduction of the data collection and data analysis methods was also presented and finally came the limitations of the research.
Chapter three: Discussion
This chapter is the discussions firstly about the content area of the test, how the test was divided, how to construct and mark the test. Afterwards, the whole test results and each test item would be analysed and then interpreted. Finally, the author will evaluate the test based on the four criteria of a good test as mentioned in the previous chapter.
3.1. The content area of the test
The following topic checklist of the course book will help to point out the content area of the reading test. Help seize co nghia la gi va giup ai?
The Topic checklist of the course book
Topic
Material
Number of unit/ page
Architectural composition
Skeleton construction
Concrete, reinforced
concrete, prestressed
concrete
Ultimate carrying capacity and factor of safety
Pre-cast products
Breakwaters
Conveying, placing,
compacting, and curing
Concrete and strength test
Asphalt concrete
Materials and properties
Structure
Location
Actions and sequences
Arch and arch beam bridges
Shear forces and bending moments in beams
Matrix methods in the calculation of structure
The hinge
English of Civil
Engineering
Unit 1-p.1
Unit 2-p.6
Unit 3-p.10
Unit 4-p.14
Unit 5-p.20
Unit 6-p.24
Unit7-p.1 (Book 2)
Unit 8-p.8
Unit 9-p.12
Unit 10-p.20
Unit 11-p.26
Unit 12-p.29
Unit 13-p.32
Unit 14-p.35
Unit 15-p.35
Unit 16-p.42
Unit 17-p.46
3.2.The relative weights of the different parts of the test
The test is composed of 5 parts, and the weighting of each part is illustrated in the following table:
Test of reading
Part
Input
Response/ Item type
Scores
Weighting
1
Factual text,
approx.120 words
5 comprehension
questions
10
20%
2
5 word columns
Matching to make 5 sentences
10
20%
3
10 jumbled
sentences
Rearranging
10
20%
4
10 statements
True / False
10
20%
5
Factual text, approx.
120 words with 5
blanks
Blank filling
10
20%
3.3.Constructing the test
To construct the reading test used in this research, the author went through the following procedures:
Statement of the problem
There was a need for this achievement test to be administered at the end of the course of training in the reading of Civil Engineering English (the students are graduates). The test was intended to find out what progress was being made after 120 period study and also what were the greatest difficulties in learning that the students still had at the end of the course. Thanks to that future courses may give more attention in these areas. Backwash is considered important; the test should encourage the practice of the reading skills that the students need in their university study. The time allowed was one hour.
Specifications
CONTENT
Types of text: The academic texts were from the course book entitled ‘English for Civil Engineering’. One sample text is provided in Appendix 1.
Addressees: Non-native speaker university students at HPU, or more specifically non-majors of CE at HPU.
Topics: The topics were suitable for the candidates and the type of test, and the subject area were neutral.
Operation: The test has 5 tasks and the candidates had to scan to locate specific information, to match words/ phrases to make correct statements, to arrange words/ phrases to make complete sentences, to decide whether the given statements are true or false, and finally to fill blanks with the given words.
FORMAT AND TIMING
Scanning: 1 passage with about 120 words in length.
5 short answer items, the items in the order in which relevant information appears in the texts. Responses were controlled.
Time: 10 minutes.
Detailed reading
-5 columns of words. Responses were controlled.
Time: 10 minutes.
-10 jumbled sentences to be rearranged. Responses were controlled.
Time: 20 minutes.
-10 statements to be marked T or F. Responses were controlled.
Time: 10 minutes.
-1 passage with about 120 words in length.
5 gaps to be filled. Responses were expected.
Time: 10 minutes.
CRITICAL LEVELS OF PERFORMANCE
All test items were written such that any student completing the course successfully would be able to respond correctly to all of them. Allowing for ‘performance errors’ on the part of candidates, a critical level of 80 percent was set. The students reaching this level would be the ones succeeding in terms of the course’s objectives.
SCORING PROCEDURES
There was a detailed key and the scoring was completely objective.
SAMPLING
The texts were chosen from a variety of topics in the course book. Draft items were written before the test was officially used.
ITEM WRITING AND MODERATION
All the items in the test were based on a consideration of what a competent non-major would be able to obtain from the texts. Considerable time was set side for moderation and rewriting of items.
KEY
There was a detailed key for the test results. The key is provided in Appendix 2
After having followed the above procedures, the test was designed as follow:
Haiphong public university Achievement test
Testee's full name:..................................................... Skill: Reading
Mark:
Level: Intermediate
Time allowed: 60 minutes
Question 1: Read the following passage then answer the questions given below
Conveying devices may be wheelbarrow, bottom dump bucket, dump truck. If necessary concrete may be pumped through hoses and steel pipelines. The mode of transport depends on the quality of concrete to be placed, the equipment available and other factors. The method employed must prevent the separation of the materials, called segregation, and insure that concrete of good quality is deposited in the form.
The forms are made of timber or metal of a size and shape suitable for the finished work. They must be of sufficient strength and rigidity to support the wet material and allow it to be properly compacted. They are so constructed that they may be easily removed when the concrete has hardened. The interior of the forms must be oiled or soaped to prevent the concrete from adhering to the forms.
1-What are the conveying devices mentioned in the passage?
...................................................................................................................................
2-What does the mode of transport depend on?
...................................................................................................................................
3-What are the forms made of?
...................................................................................................................................
4-Why must the forms be of sufficient strength and rigidity?
...................................................................................................................................
5-What must we do with the interior of the forms before placing?
...................................................................................................................................
Question 2: Use the words/ phrases given below to make sentences describing the properties of materials.
Steel
Stone
Glass wool
Brick
has the property of
high tensile strength
good sound isolation
good thermal isolation
high compressive strength
This means
it can resist high compressive forces
it can resist high tensile forces
it does not transmit heat easily
it does not transmit sound easily
6-...............................................................................................................................
...............................................................................................................................
7-...............................................................................................................................
...............................................................................................................................
8-...............................................................................................................................
...............................................................................................................................
9-...............................................................................................................................
...............................................................................................................................
Question 3: Arrange the following words/ phrases to make complete sentences
10-fire/ weather/ has/ high/ concrete/ resistance/ and.
...............................................................................................................................
11-lost cost/ durable/ is/ and/ concrete/ pre-cast/ at.
...............................................................................................................................
12-made/ different/ materials/ from/ concrete/ is.
...............................................................................................................................
13-solid/ reinforcement/ is/ widely/ in/ spaced/ slabs.
...............................................................................................................................
14-aggregate/ 20mm/ 40mm/ coarse/ in/ ranges/ to/ size/ from.
...............................................................................................................................
15-during/ segregate/ conveying/ may/ concrete.
...............................................................................................................................
16-be/ spread/ shall/ mixture/ by/ asphalt/ paver.
...............................................................................................................................
17-elastic/ clay/ rubber/ is/ plastic/ but/ is.
...............................................................................................................................
18-a/ done/ is/ concrete/ mixer/ mixing/ in.
...............................................................................................................................
19-vibrators/ driven/ be/ by/ electricity/ air/ or/ compressed/ may.
...............................................................................................................................
Question 4: Use your knowledge of the subject to decide whether the following statements are true or false. (Write T or F)
20-Glass wool is a heavy material.
21-Rubber cannot be stretched or compressed.
22-Concrete is a light material so it is easy to lift.
23-We can burn wood because it is combustible.
24-Mild steel can resist corrosion.
25-Rubber is plastic while clay is elastic.
26-Because copper is a good conductor of heat so heat can be easily transferred through it.
27-We can easily scratch glass because it is soft.
28-Concrete cannot be bunt because it is non-combustible.
29-Stainless steel is corrosion-resistant.
Question 5: Fill each blank below with ONE of the given words.
multi-story minimum timber
architecture maximum possible
impossible low steel
The modern skeleton structure is the result of rational use of steel and concrete in building. Among its characteristic features are the reduction of all load-carrying members to (30).................sizes and clear division between structural and non-structural elements. The skeleton is composed of rigidly connected beams and columns. It is a particular suitable form for (31) ................. buildings. The great strength of modern building materials makes it (32) ................ to build higher and higher, to meet today’s ever increasing demands. The pattern of our large cities is being determined by skeleton structures of steel and concrete just as decisively as the pattern of medieval cities was determined by the (33) .................frame. Widespread use has made the modern skeleton structure a central theme of contemporary (34) .................
3.4.Administering the test
In order to accomplish the two purposes of test administration for this reading test-collecting feedback to assess usefulness of the reading course and making inferences about test takers’ language ability-it is necessary to have some control over the procedures for administering it. These involve guiding the test takers through the following process of taking the test:
Preparing the testing environment
The first step in the test administration was preparing the testing environment to be consistent with the specifications in the test blueprint. This involved arranging the place of testing (rooms C101 and C102), the materials (50 test papers) and equipment (fans, tables and desks for the students, chairs for the examiners, lights), personnel (2 examiners), time of testing (60 minutes), and physical conditions under which the test is administered. The weather at that time was quite good for the students to do the test.
Communicating the instructions
‘The second step in administering the test was to give the instructions in such a way that they would be understood by all the test takers. When administering the test it is essential that the test takers receive the full benefit of the instructions’ (Bachman and Palmer, 1981: 233). This included the obvious steps of providing suitable conditions (time, lighting, lack of distraction) for reading written instructions with the help of the two examiners.
Maintaining a supportive environment.
The next step is maintaining a supportive testing environment throughout the test. This includes avoiding distractions due to temperature, noise excessive movement, etc.
Collecting the tests.
The final step in the test administration was collecting the tests.The testing papers were collected by the examiners after the allowed time in each testing room was over. When they were being collected, the test takers left at their own peace.
3.5. Marking the test
The testing papers were marked according to the band scores on the 0 – 10 scale as officially approved by the HPU board of examiners after they were collected.
3.6. Test scores interpreting and evaluation
The results of language tests are most often reported as numbers of scores, and it is these scores, ultimately, that test users will make use of. The test scores of the 50 student participants were interpreted and analysed. This very analysis will simply provide a summary of how the students did the test, and to check on the test’s reliability and to have some idea of how dependable the test scores were. The following steps will provide the reader with an outline of how such analysis can be conducted.
3.6.1.The frequency distribution:
Frequency distribution Em nen format lai bang di, vi bang thua nhieu cho trong qua, va cung hoi kho hieu. Minh khong hieu truc nao dung de ta cai gi, va em list duoc diem cua bao nhieu em.
isa record of testees’ scores ranging from the lowest to the highest marks in a test. Raw marks are marks awarded by counting the number of correct answers on a test. The frequency distribution of the reading test that the author conducted is presented by the diagram below:
(It is essential to remember that the total score of the test is 50, however, after the marking the total score each student got was divided by 5 to suit the 0-10 scale previously approved by the board of examiners)
The diagram above can be seen as self-explanatory: the vertical dimension indicates the number of candidates scoring within a particular range of scores; the horizontal dimension shows what these ranges are.
When looking at the diagram, it is clearly seen that the students got different marks ranging from 1 to 9, i.e . the lowest score was 1 and the highest was 9. The charts also tells that the set of scores was distributed quite unevenly, for example no student got marks 1.5, 2.5, 8.5; the score that most of the students got was 5.5. It also points out clearly the outcome (the students who got marks 5, 5.5, 6, 6.5, 7, 7.5, 8, and 9 would pass, and those getting marks under 5 would fail,) of the test.
3.6.2.The central tendency
A convenient way of summarizing data is to find single statistic, called the CENTRAL TENDENCY, which represents an entire set of numbers. Central tendency can be defined as ‘the propensity of a set of numbers to cluster around a particular value’ (Brown and Rodgers, 2002: 128). Three statistics are often used to find central tendency:, the mode, the median, and the mean.
3.6.2.1.-The mode
The MODE is the value in a set of numbers that occurred most frequently. In a way, the mode is the simplest of the three central tendency statistics discussed here because it requires no computation. In this case the mode is 5.5 because it is the most frequent value.
3.6.2.2.The median
The MEDIAN is the point in the distribution below which 50% of the values lie and above which 50% lie. To find the median for this case, first place the values in order from low to high. Then, examine the value above and below which 50% marks lie. Here the median is 5.
3.6.2.3.The mean
The most widely used measure of central tendency is the MEAN, which is more commonly called the AVERAGE. The mean is the sum of all the values in a distribution divided by the total number of values (50).
The formula for the mean is:where: M= mean
ồ= sum of (or add up)
N= the number of the scores
x = the raw score
f = the frequency with which a score occurs
Using the formula above we have:
table
x f xf
1 1 = 1
1.5 0 = 0
2 2 = 4
2.5 0 = 0
3 1 = 3
3.5 2 = 7
4 4 = 16
4.5 2 = 9
5 7 = 35
5.5 12 = 66
6 4 = 24
6.5 1 = 6.5
7 6 = 42
7.5 4 = 30
8 3 = 24
8.5 0 = 0
9 1 = 9
ồxf = 267.5
From the above analysis we have the mean ằ 5.5 and the median = 5.As a result there’s a quite fairly correspondence between the mean and the median. When comparing to the results the students got last terms it is possibly accepted because when studying General English the score they got after their exams were a little higher (the median and the mean generally ranged from 6 to 7). It is because of some reasons, firstly they had longer time to get in touch with the General English (at least 225 periods). Secondly this English was not so hard. Therefore with the mean of 5.5 and the median of 5, the test results are quite satisfactory.
3.6.3.The dispersion
Knowing about the central tendency of a set of numbers is a highly helpful way of characterizing the most typical behavior in a group. It doesn’t, however, tell us anything about the way the numbers spread out around that central or typical behavior. To know such a thing we need to find out the dispersion, which can be defined as ‘the degree to which the individual numbers vary away from the central tendency’ (Brown and Rodgers, 2002: 130). There are three primary ways of examining dispersion: the low-high, the range, and the standard deviation.
3.6.3.1.The low-high
The LOW-HIGH involves finding the lowest value and the highest value in a set of numbers. When looking at the marks the testees got and by putting the numbers in order from high to low, we can see immediately that the lowest value was 1 and the highest value was 9. Thus, the low-high is 1-9.
3.6.3.2.The range
The RANGE is the difference between the highest and the lowest scores, i.e. it is the highest value minus the lowest. The formula for the range of the reading test results is written as follow:
Range = H-L where: H= highest value đ the range of the test results = 9-1= 8
L= lowest value
The test with a big range proves that there was a wide range of abilities among the testees. 3.6.3.3.The standard deviation (SD)
The best overall indicator of dispersion of the reading test is the STANDARD DEVIATION. It is the degree to which the group of scores deviate from the mean. Brown (1988: 69) defined it as ‘a sort of average of the differences of all scores from the mean’. The standard deviation is ‘a sort of average’ because you are averaging some values by adding them up and dividing by the number of values, just as you did in calculating the mean. So the equation for the standard deviation starts with adding the squared difference between the value and the mean (5.5) up and dividing the number of the test takers (50):
where: SD: standard deviation
X: values
M: the mean of the values
N: the number of the values
Values Mean Difference Squared difference (D2)
1 - 5.5 = -4.5 20.25
1.5 - 5.5 = -4 16
2 - 5.5 = -3.5 12.25
2.5 - 5.5 = -3 9
3 - 5.5 = -2.5 6.25
3.5 - 5.5 = -2 4
4 - 5.5 = -1.5 2.25
4.5 - 5.5 = -1 1
5 - 5.5 = -0.5 0.25
5.5 - 5.5 = 0 0
6 - 5.5 = 0.5 0.25
6.5 - 5.5 = 1 1
7 - 5.5 = 1.5 2.25
7.5 - 5.5 = 2 4
8 - 5.5 = 2.5 6.25
8.5 - 5.5 = 3 9
9 - 5.5 = 3.5 12.25
ồD 2 = 106.25
As seen above the Standard Deviation is the squares root of the variance. The standard deviation (SD) is a very powerful measure of ‘dispersion’. In this case we have a large standard deviation (1.46) therefore it shows us the following:
-the score distribution of the test was wide.
-the test has spread the students out.
-there was a wide range of ability among the testees.
3.7.Test item analysis and interpretation
The results obtained from the test can be used to provide valuable information concerning: + the performance of the students as a group,
+ the performance of individual student,
+ the performance of each of the items comprising the testđ the difficulty level and the level of discrimination.
Therefore all the 34 items of the reading test were analysed in terms of item difficulty and item discrimination as follow:
3.7.1.The item difficulty
The Item difficulty (the index difficulty or facility value=FV) of an item shows how easy or difficult the particular item proved in the test.’ (Heaton, 1988: 175)
The formula of item difficulty (FV) is:
where: R: the number of correct answers
N: the number of the testees
i.e. Level of difficulty=proportion of students getting it right= the average score on this item.
*Note: the FV value does not tell us who got it right. It tells us nothing about discrimination.
The scales for item difficulty are:
ve (very easy) with FV=0.81á1 (i.e. 81 to 100% students got it right)
e (easy) with FV= 0.61á0.8 (i.e.61 to 80% students got it right)
ok with FV=0.41á0.6 (i.e.41 to 60% students got it right)
d (difficult) with FV=0.21á0.4 (i.e. 21 to 40% students got it right)
vd (very difficult) with FV=0á0.2 (i.e. 0 to 20% students got it right)
The calculation for the item difficulty is presented in the table below:
Item difficulty
Items
Conclusions
R
FV
ve
e
ok
d
vd
1
42
0.82
ệ
2
32
0.64
ệ
3
44
0.88
ệ
4
34
0.68
ệ
5
32
0.64
ệ
6
45
0.90
ệ
7
25
0.50
ệ
8
41
0.82
ệ
9
22
0.44
ệ
10
33
0.66
ệ
11
23
0.46
ệ
12
32
0.64
ệ
13
20
0.40
ệ
14
32
0.64
ệ
15
49
0.98
ệ
16
31
0.62
ệ
17
30
0.60
ệ
18
24
0.48
ệ
19
15
0.30
ệ
20
41
0.82
ệ
21
16
0.32
ệ
22
46
0.92
ệ
23
32
0.64
ệ
24
20
0.40
ệ
25
24
0.48
ệ
26
16
0.32
ệ
27
31
0.62
ệ
28
8
0.16
ệ
29
19
0.38
ệ
30
7
0.14
ệ
31
7
0.14
ệ
32
6
0.12
ệ
33
12
0.24
ệ
34
7
0.14
ệ
From the results Doi het thi sang thi qua khu khi report diem cua hoc sinh
in the table above it is clearly seen that items 1, 3, 6, 8, 15, 20, 22 were fairly easy since they had the index of difficulty of more than 0.8 or 80%. In these cases, at least 81% of the students taking the test answered correctly. Items 2, 4, 5, 7, 10, 12, 14, 16, 17, 23, 27 could be seen as easy since their index of difficulty ranged from 0.61 to 0.8. With the FV ranging from 0.41 to 0.6 items 9, 11, 18, 25 were all right for the students. A few items were difficult (including items 13, 19, 21, 24, 26, 28, 33 with FV ranging from 0.21to 0.4) and very difficult (items 28, 30, 31, 32, 34 with FV ranging from 0 to 0.2).
3.7.2. The item discrimination
Item discrimination (D) indicates the extent to which the item discriminates between the testees, separating the more able testees from the less able.
The formula of item discrimination is:
where CU: the number of the correct answers of the upper half
CL: the number of the correct answers of the lower half
The scales for item discrimination are:
gd (good discrimination) with D=0.6 á1
md (medium discrimination) with D=0.3á0.59
bd (bad discrimination) with D=0á0.29
bi (bad item) with D <0
The conclusions about the item discrimination of the test are shown in the table below:
Item discrimination
Items
Conclusions
CU
CL
D
gd
md
bd
bi
1
15
27
-0.24
ệ
2
16
16
0
ệ
3
34
10
0.48
ệ
4
19
15
0.08
ệ
5
17
15
0.04
ệ
6
29
16
0.26
ệ
7
23
19
0.08
ệ
8
19
22
-0.06
ệ
9
14
8
0.12
ệ
10
17
16
0.02
ệ
11
13
10
0.06
ệ
12
24
8
0.32
ệ
13
11
9
0.04
ệ
14
20
12
0.16
ệ
15
25
14
0.22
ệ
16
10
21
-0.22
ệ
17
22
8
0.28
ệ
18
15
9
0.12
ệ
19
8
7
0.02
ệ
20
34
7
0.54
ệ
21
11
5
0.12
ệ
22
21
15
0.12
ệ
23
21
11
0.20
ệ
24
13
7
0.12
ệ
25
17
7
0.20
ệ
26
13
3
0.20
ệ
27
19
12
0.14
ệ
28
3
5
-0.04
ệ
29
15
4
0.22
ệ
30
6
1
0.10
ệ
31
4
3
0.02
ệ
32
5
1
0.08
ệ
33
8
4
0.08
ệ
34
5
2
0.06
ệ
D can range from +1 to –1. D of +1 shows perfect correlation with the testees’ results on the whole test and D of –1 discriminates in entirely wrong way.
High discrimination index shows that the item at the right level of difficulty and discriminates well. Low discrimination index shows that the items discriminate poorly because it is too difficult for everyone, both ‘good’ and ‘bad’. Therefore the results from the table above show that most items had low level of discrimination, except items 3, 12, 20 having the medium level of discrimination and items 1, 8, 16, 28 which are considered the bad ones since they had too low index of discrimination, which means that they failed to tell which students were good or bad.
3.8. Estimating reliability
There are a number of ways to estimate a test’s reliability. I shall only present here only the split half method. It is first to divide the test very carefully into two equivalent halves. Then for each student, two scores are calculated: the score on the upper part, and the score on the lower part. The more similar are the two sets of scores, the more reliable the test is, i.e. the ideal reliability coefficient is 1. A test with a reliability coefficient of 1 is the one which would give precisely the same results for a particular set of candidates regardless of when it is administered. A test which has the reliability coefficient of 0 would give sets of results quite unconnected with each other, and the test would fail to be a reliable one.
To find out the reliability coefficient of the test used in this study, the author would like to begin with the Spearman coefficient because it is conceptually the easiest to understand. Spearman coefficient is often simply called SPEARMAN RHO, or symbolized by the Greek letter p. The equation for Spearman rho is as follow:
where: p = Spearman rho correlation coefficient
D = the difference between the ranks
N= the number of cases
The scales for reliability are:
0.8đ1.0 strong correlation (i.e. high reliability)
0.6đ0.8 medium correlation (i.e. medium reliability)
0.4đ0.6 weak correlation (i.e. low reliability)
0.2đ0.4 very weak correlation (i.e. very low reliability)
To find out p we have the following table, in which:
S: Student
SU: Score on the upper half
SL: Score on the lower half
D: The difference between SU and SL
D2 : Squared difference
S
SU
SL
D
D
1
4
1
3
9
2
6
4
2
4
3
7
3.5
3.5
12.25
4
10
5
5
25
5
9.5
8
1.5
2.25
6
10
7.5
2.5
6.25
7
10
9.5
0.5
0.25
8
12
7.5
4.5
20.25
9
8.5
12
-3.5
12.25
10
10.5
10
0.5
0.25
11
15
8
7
49
12
14.5
8
6.5
42.25
13
16
10
6
36
14
16
9
7
49
15
8.5
7
1.5
2.25
16
12.5
12
0.5
0.25
17
16.5
8
8.5
72.25
18
15.5
9
6.5
42.25
19
13.5
11
2.5
6.25
20
17
10
7
49
21
14
11
3
9
22
13
12
1
1
23
16
11
5
25
24
12
15
-3
9
25
16
11
5
25
S
SU
SL
D
D2
26
15
13
2
4
27
10
18
-8
64
28
14
14
0
0
29
20
8
12
144
30
15
13
2
4
31
13
15
-2
4
32
15
15
0
0
33
20
10
10
100
34
17
13
4
16
35
22
8
4
16
36
24
8
6
36
37
14
18
-4
16
38
15
17
-2
4
39
12
20
-8
64
40
26
8
16
324
41
20
14
6
36
42
21
14
7
49
43
21
17
4
16
44
28
18
2
4
45
17
11
6
36
46
18
10
8
64
47
28
12
16
256
48
25
15
10
100
49
27
13
14
196
50
28
16
12
144
ồ D2 = 2206.5
(The total score of the test is 50. It is not essential for each student’s score to be divided by 5)
The strength of the relationship between the two sets of scores of the reading test is given by the correlation coefficient. When calculated as directed in the box above, this turns out to be 0.9. This coefficient relates to the two half test. But the full tests is of course twice as long as either half, and we know that the longer the test is, the greater the reliability will be. So the full reading test should be more reliable than the coefficient of 0.9 indicated. By means of a formula, it is possible to estimate the reliability of the whole test as follow:
Reliability of whole test = 2 x coefficient for split half
1 + coefficient for split half
Using the formula, we obtain a figure of 0.95, which indicates that the test was highly reliable. In conclusion, to judge whether the above reading test was a good one or not, the author will come back to the four criteria: reliability, validity, discrimination and practicality.
Once again, reliability and validity are critical for any test and are referred to as essential measurement qualities. There is a relationship between reliability and validity. On the one hand, a test may be reliable without being valid. On the other hand, if the test is not reliable, it cannot be valid at all. The above calculation of reliability coefficient showed that the test was reliable, therefore it is now possible to say that the test was also valid.
The next quality concerned is discrimination. It is the capacity to discriminate the different students to reflect the difference in the performance of the individual in a group. The test items were in a wide difficult scale, ranging from ‘very easy item’ to ‘very difficult item’, i.e. the test was neither too easy nor too difficult, so the test was the one which could realize its purpose of discrimination between candidates.
Also to prepare for the administration of the test, practicality was always considered during the development of the test. All the things to be prepared were all available, for example: sitting, marking and the test did not cost much and did not take much time.
From all the above discussion it is possible to say that this is a fairly good test though it is far from being a perfect one. For designing a better test in the future, some suggestions will be presented later at the end of the research.
Summary
In this chapter a reading test for the non-majors of CE at HPU was designed after the relative weights what is this?
of the different parts of the test were clearly pointed out. Afterwards, the author also presented the information regarding the administration and marking of the test. And then the test results were interpreted in terms of dispersion, frequency distribution and central tendency. The reliability coefficient was also established so that it was easier to regard the test as a reliable one. Finally each item of the test was analysed in terms of level of difficulty and level of discrimination so that it is easier to tell whether each item was very easy, easy, average, difficult, or very difficult for the testees and whether the item discriminated students well or not.
Part III Conclusion and recommendations
This research is aimed at designing and evaluating Muc tieu cua nghien cuu nay khong phai la de design ma la de evaluate mot bai kiem tra cho hoc sinh. Em nen xem xet lai muc tieu cua minh de viet va dat ten lai cho toan bo de tai, neu khong nguoi doc se expect the wrong thing.
an appropriate English reading test for the non-majors of Civil Engineering at Haiphong Public University. It is composed of three parts: part I, part II, and part III.
Perhaps in part III chapter three is the most important one since in this chapter the information regarding the construction of a reading test was constructed. Moreover, the author also gave an account of all the necessary steps in the administration and marking of the test. After collecting the test results, the author interpreted and evaluated them in terms of frequency distribution, central tendency and dispersion. More carefully, she also analysed and interpreted each test item in terms of level of difficulty and discriminating power which were presented in the two separating tables. This was useful to decide whether the test was suitable for the students or not. Additionally, the reliability coefficient was found out to make sure that this reading test did really satisfy all the four criteria: reliability, validity, discrimination, and practicality, though it might not be a perfect test just yet. When the test is considered appropriate, it can be used especially at Haiphong Public University where the subject matter of English for Civil Engineering is still regarded as a new one and the existing tests for it are not many.
Like any other research, this one cannot avoid some limitations. Firstly, the test was limited to testing learners’ reading ability. Secondly the test items mostly did not show good discrimination, so it maybe not a perfect one. However, the research was done with my great effort and in much time, therefore it is truly worthwhile for me. It is apparent that besides many other components such as teaching materials and learning activities, tests are part of educational programs and they always serve pedagogical purposes, tests give chance for any teacher to look back at his teaching, and tests can promote student learning, ect. Testing is important, to be sure, therefore when designing a test we must pay attention to the test usefulness by considering the important qualities such as reliability, validity, practicality, discrimination, and so on with respect to specific tests, and not solely in terms of abstract theories and statistical formulae. Moreover, we must consider these qualities from the very beginning of the test planning and development process.
For a better test form I have some suggestions as follow:
Firstly, though the test can be maintained or changed to be shorter or longer depending on the time allowed, Some test items should be changed, for instance the numbers of very easy and very difficult items like items 1, 3, 6, 8, 15, 22, 28, 30, 31, 32, 34 can be reduced to be smaller to leave space for the other item types.
Secondly, this achievement test was based on the syllabus-content approach since the subject matter was rather new and difficult for the students. The test was designed basing on what the students have already learnt in the course book. If the test had been syllabus-objective based, the students would have coped with problems in testing such as they were tested what they have not learnt and not prepared. However, it will be more favorable if a final achievement test is based on syllabus-objective approach. This is because it will provide accurate information about individual and group achievement, and it is likely to promote a more beneficial backwash effect on teaching. This can be explained by the two following reasons: firstly, at least the tester must be clear of the course objective in constructing the test and enables it to follow students’ achievement over those objectives; secondly, this can help to work against the poor teaching practice that syllabus-content tests fail to do.
Hopefully this research will be useful for those who are interested in designing their own tests, especially the reading ones.
References
Alderson, J. C., Clapham, C. and Wall, D. (1995). Language Test Construction and Evaluation. Cambridge: Cambridge University Press.
Bachman, L.F (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press.
Bachman, L.F., and Palmer, A.S., (1981). Language Testing in Practice: Designing and Developing Useful Language Test. Oxford University Press.
Brown, J.D., and Rodger, T.S., (2002). Doing Second Language Research. Oxford University Press.
Canale, M. and Swain, M. (1980). Approaches to Communicative Competence. Singapore: Seameo Regional Language Center. Occasional Papers, 14, April.
Davies, A. (1990). Principles of Language Testing. Oxford, UK; Cambridge, Mass, USA: Blackwell Publishers.
Harrision, A. (1987, 1991). A Language Testing Handbook. Macmillan Publishers.
Heaton, J.B., (1990). Classroom Testing. Longman Group UK Limited.
Heaton, J.B., (1988). Writing English Language Tests. Longman Group UK Limited.
Henning, G. (1987). A Guide to Language Testing. Cambridge: Newbury House Publishers.
Hughes, A. (1989). Testing for Language Teachers. Cambridge: Cambridge University Press.
Littlewood, W. T. (1981). Communicative Language Teaching. Cambridge University Press.
McNamara, T.F. (2000). Language Testing. Oxford University Press.
Nunan, D. (1991). Language Teaching Methodology. UK: Prentice-Hall International.
Richards, J.C. and Rodgers, T. S. (1986). Approaches and Methods in Language Teaching. Cambridge University Press.
Viete, R. (1992). Running the Gauntlet: English Language Testing and Support for NESB Applicants to Post-primary Teacher Training Course in Victoria. Melbourne: Monash University: Unpublished M. Ed. thesis.
Weir, C.J. (1990). Communicative Language Testing. Prentice Hall International (UK) Ltd.
Weir, C.J. (1993). Understanding and Developing Language Tests. New York: Prentice Hall International.
Appendix 1
A sample text
Unit 10: Materials and properties
I-Look at these picture and translate into Vietnamese
1. A man can easily lift a large roll of glass wool but not a concrete beam.
Glass wool is light but concrete is heavy.
2. A man can bend a rubber tile but not a concrete tile.
Rubber is flexible but concrete is rigid.
3. Wood can burn but concrete cannot burn.
Wood is combustible but concrete is non-combustible.
4. Water vapor can pass through stone but not through bitumen.
Stone is permeable but bitumen is impermeable.
5. You can see through glass but not through wood.
Glass is transparent but wood is opaque.
6. Stainless steel can resist corrosion but mild steel cannot.
Stainless steel is corrosion-resistant but mild steel is not corrosion-resistant.
7. Heat can be easily transferred through copper but not through wood.
Copper is a good conductor of heat but wood is a poor conductor of heat.
8. Rubber can be stretched or compressed and will then return to its original shape but clay cannot.
Rubber is elastic but clay is plastic.
9. Bitumen can be dented or scratched easily but glass cannot.
Bitumen is soft but glass is hard.
II-Look at these diagrams. Match the letters A-H in the diagrams with the sentences below:
A B C
D E F
G H
Now complete these sentences with properties:
a-The polythene membrane can prevent moisture from rising into the concrete floor. This means that polythene is ................
b-The T-shape aluminium section can resist chemical action. This means that aluminium is ................
c-The stone block cannot be lifted without using a crane. This means that stone is ................
d-The corrugated iron roof cannot prevent the sun from heating up the house. This means that iron is ................
e-Glass wool can help to keep a house warm in winter and cool in summer. This means that glass wool is ................
f-The ceramic tiles on the floor cannot be scratched easily by people walking on them. This means that ceramic tiles are ................
g-Asbestos sheeting can be used to fireproof doors. In other words asbestos is ................
h-Black cloth blinds can be used to keep the light out of a room. This means that cloth is ................
III-Look at the picture below and answer the questions
a b c
d
e f
a-Why is glass used for window panes?
Because glass is ............................................................................................
b-Why is glass wool used to keep the heat in hot-water tanks?
Because glass wool has the property of ..........................................................
c-Why is some steel covered with a thin layer of zinc?
Because zinc is ..............................................................................................
d-Why are some fire doors covered with asbestos sheets?
Because asbestos is ........................................................................................
e-Why are some metal sheets formed into a corrugated shape?
Because the corrugated shape makes the sheet..............................................
f-Why is concrete used for the columns of a building structure?
Because ..........................................................................................................
Reading
1.Look at these diagrams and read the passage
Building materials are used in two basic ways. In the first way they are used to support the loads on the building and in the second way they are used to divide the space in a building. Building components are made from building materials and the form of a component is related to the way in which it is used. We can see how this works by considering three different types of construction:
In one kind of construction, blocks of materials such as brick, stone, or concrete are put together to form solid walls. These materials are heavy, however, they can support the structural loads because they have the property of high compressive strength. Walls made up of blocks both support the building and divide the space in the building.
In another types of construction, sheet materials are used to from walls which act as both space-dividers and structural support. Timber, concrete and some plastics can be made into large rigid sheets and fixed together to form a building. These buildings are lighter and faster to construct than buildings made up of blocks.
Rod materials, on the one hand, can be used for structural support but not for dividing spaces. Timber, steel, and concrete can be formed into rods and used as columns. Rod materials with high tensile and compressive strength can be fixed together to form frame structures. These spaces between the rods can be filled with light sheet materials which act as dividers but do not support structural loads.
2.Now say which paragraph discusses:
a-Planar construction
b-Frame construction
c-Mass construction
3.Complete this table by putting ticks (ệ) in the boxes to show the functions of the components:
Function of components
Form of material
Structural support only
Space dividing only
Both structural support and space dividing
Blocks
Sheets
Rods
4.Now say whether these statements are true or false. Correct the false statements.
a-Rod materials can be used for both dividing space and support the building.
b-Concrete can be used as a block material, a sheet material and a rod material.
c-Steel is used for frame construction because it has high tensile strength and low compressive strength.
d-The sheet materials, which act as space dividers in a frame construction building, can be very light because they do not support structural loads.
e-Mass construction buildings are light whereas planar construction buildings are heavy.
Appendix 2
Detailed key for the test results
Question 1 (10 marks, 2 marks for each correct answer)
wheelbarrow, bottom dump bucket, dump truck, hoses and steel pipelines.
It depends on the quality of concrete to be placed, the equipment available and other factors.
They are made of timber or metal.
To support the wet material and allow it to be properly compacted.
To prevent the concrete from adhering to the forms.
Question 2 (10 marks, 2.5 marks for each correct answer)
Steel has the property of high tensile strength. This means it can resist high tensile forces.
Stone has the property of good sound isolation. This means it does not transmit sound easily.
Glass wool the property of good thermal isolation. This means it does not transmit heat easily.
Brick the property of high compressive strength. This means it can resist high compressive forces.
Question 3 (10 marks, 1 mark for each correct answer)
Concrete has high fire and weather resistance.
Precast concrete is durable and at low cost.
Concrete is made from different materials.
In solid slabs reinforcement is widely spaced.
Coarse aggregate ranges in size from 20mm to 40 mm.
Concrete may segregate during conveying.
Asphalt mixture shall be spread by paver.
Rubber is elastic but clay is plastic.
Mixing concrete is done in a mixer.
Vibrators can be driven by electricity or compressed air.
Questions 4 (10marks, 1 mark for each correct answer)
F 25. F
F 26. T
F 27. F
T 28. T
F 29. T
Question 5 (10 marks, 2 marks for each correct answer)
30. minimum
31. multi-story
32. possible
33. timber
34. architecture
Các file đính kèm theo tài liệu này:
- Vietnam national university.doc