Tài liệu Đề tài Đánh giá độ tin cậy của bài thi trắc nghiệm thứ nhất trên máy tính cuối kỳ 4 dành cho sinh viên năm thứ hai không chuyên ngành tiếng anh trường đại học kinh doanh và công nghệ Hà Nội: VIETNAM NATIONAL UNIVERSITY, HANOI
COLLEGE OF FOREIGN LANGUAGES
DEPARTMENT OF POSTGRADUATE STUDIES
NGUYEN THI VIET HA
A STUDY ON THE RELIABILITY OF THE FINAL ACHIEVEMENT COMPUTER-BASED MCQS TEST 1 FOR THE 4TH SEMESTER NON - ENGLISH MAJORS AT HANOI UNIVERSITY OF BUSINESS AND TECHNOLOGY
(đánh giá độ tin cậy của bài thi trắc nghiệm THứ NHấT TRÊN MáY TíNH cuối kỳ 4 dành cho sinh viên năm thứ hai không chuyên ngành tiếng anh trờng đại học kinh doanh và công nghệ hà nội)
Minor Programme Thesis
Field: Methodology
Code: 601410
HANOI, 2008
VIETNAM NATIONAL UNIVERSITY, HANOI
COLLEGE OF FOREIGN LANGUAGES
DEPARTMENT OF POSTGRADUATE STUDIES
NGUYễN THị VIệT Hà
A STUDY ON THE RELIABILITY OF THE FINAL ACHIEVEMENT COMPUTER-BASED MCQS TEST 1 FOR THE 4TH SEMESTER NON - ENGLISH MAJORS AT HANOI UNIVERSITY OF BUSINESS AND TECHNOLOGY
(đánh giá độ tin cậy của bài thi trắc nghiệm THứ NHấT TrÊN MáY TíNH cuối kỳ 4 dành cho sinh viên năm thứ hai không chuyên ngành tiếng anh trờng đại học ki...
73 trang |
Chia sẻ: hunglv | Lượt xem: 1949 | Lượt tải: 1
Bạn đang xem trước 20 trang mẫu tài liệu Đề tài Đánh giá độ tin cậy của bài thi trắc nghiệm thứ nhất trên máy tính cuối kỳ 4 dành cho sinh viên năm thứ hai không chuyên ngành tiếng anh trường đại học kinh doanh và công nghệ Hà Nội, để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên
VIETNAM NATIONAL UNIVERSITY, HANOI
COLLEGE OF FOREIGN LANGUAGES
DEPARTMENT OF POSTGRADUATE STUDIES
NGUYEN THI VIET HA
A STUDY ON THE RELIABILITY OF THE FINAL ACHIEVEMENT COMPUTER-BASED MCQS TEST 1 FOR THE 4TH SEMESTER NON - ENGLISH MAJORS AT HANOI UNIVERSITY OF BUSINESS AND TECHNOLOGY
(đánh giá độ tin cậy của bài thi trắc nghiệm THứ NHấT TRÊN MáY TíNH cuối kỳ 4 dành cho sinh viên năm thứ hai không chuyên ngành tiếng anh trờng đại học kinh doanh và công nghệ hà nội)
Minor Programme Thesis
Field: Methodology
Code: 601410
HANOI, 2008
VIETNAM NATIONAL UNIVERSITY, HANOI
COLLEGE OF FOREIGN LANGUAGES
DEPARTMENT OF POSTGRADUATE STUDIES
NGUYễN THị VIệT Hà
A STUDY ON THE RELIABILITY OF THE FINAL ACHIEVEMENT COMPUTER-BASED MCQS TEST 1 FOR THE 4TH SEMESTER NON - ENGLISH MAJORS AT HANOI UNIVERSITY OF BUSINESS AND TECHNOLOGY
(đánh giá độ tin cậy của bài thi trắc nghiệm THứ NHấT TrÊN MáY TíNH cuối kỳ 4 dành cho sinh viên năm thứ hai không chuyên ngành tiếng anh trờng đại học kinh doanh và công nghệ hà nội)
Minor Programme Thesis
Field: Methodology
Code: 601410
Supervisor: Nguyễn Thu Hiền. M.A
HANOI, 2008
VIETNAM NATIONAL UNIVERSITY, HANOI
COLLEGE OF FOREIGN LANGUAGES
DEPARTMENT OF POSTGRADUATE STUDIES
CANDIDATE’S STATEMENT
I hereby state that I: Nguyen Thi Viet Ha, Class 14A, being a candidate for the degree of Master of Arts (TEFL) accept the requirements of the College relating to the retention and use of Master of Arts Thesis deposited in the library.
In terms of these conditions, I agree that the origin of my thesis deposited in the library should be accessible for the purposes of study and research, in accordance with the normal conditions established by the librarian for the care, loan or reproduction of the thesis.
Signature
Date
ACKNOWLEDGMENTS
In the completion of this thesis, I have received a great deal of backup. Of primary importance has been the role of my supervisor, Ms. Nguyen Thu Hien, M.A, Teacher of Department of English and American Languages & Cultures, College of Foreign Language, Vietnam National University, Hanoi. I am deeply grateful to her for her precious guidance, enthusiastic encouragement and invaluable critical feedback. Without her dedicated support and correction, this thesis could not have been completed.
I am deeply indebted to my dear teacher, Mr. Vu Van Phuc, M.A, Head of Testing Center, College of Foreign Languages, VNU, who provided me with a lot of useful suggestion and assistance towards my study.
I would also like to express my sincere thanks to all teachers and colleagues in English Department, HUBT, for their help in conducting the survey, sharing opinions and making suggestions to the study. Especially, my thanks go to Ms. Le Thi Kieu Oanh, Assistant of English Department, HUBT for her willingness to offer test score data.
I wish to show my special thanks to the students of K11 at Hanoi University of Business and Technology who have actively participated in the survey..
Finally, it is my great pleasure to acknowledge my gratitude to beloved members of my family, especially my husband who constantly encouraged and helped me with my thesis.
ABSTRACT
The main aim of this minor thesis is to evaluate the reliability of the final Achievement Computer-based MCQs Test 1 for the 4th semester non-English majors at Hanoi University of Business and Technology.
In order to achieve this aim, a combination of both qualitative and quantitative research methods were adopted. The findings indicate that there is a certain degree of unreliability in the final achievement computer-based MCQs test1 and there are two main factors that cause the unreliability including test item quality and test- takers’ performance.
Having carefully considered a thorough analysis of the collected data, the author made some suggestions in order to improve the quality of the final achievement test and the MCQs test 1 for the non-majors of English in the 4th semester in Hanoi University of Business and Technology. Firstly, the test objectives, sections and skill weight should be adjusted to be more compatible with the course objectives and the syllabus. Secondly, a testing committee should be set up for the construction and development of a multi choice item bank including test items which are of good p-value and discrimination value.
LIST OF ABBRIVIATIONS
1. CBT: Computer-based testing
2. HUBT: Hanoi University of Business and Technology
3. MC: Multi choice
4. MCQs: Multi choice questions
5. ML Pre- : Market Leader Pre-intermediate
6. KD: Kuder- Richardson
7. SD: Standard deviation
LIST OF TABLES AND CHARTS
1. Table 1 Types of tests
2. Table 2 Scoring format for each semester
3. Table 3 The syllabus for 4th semester (for non –English majors)
4. Table 4 Time allocation for language skills and sections
5. Table 5 Specification grid for the final computer-based MCQs test 1
6. Table 6 Main points in the grammar section
7. Table 7 Main points in the vocabulary section
8. Table 8 Topics in reading section
9. Table 9 Items in the functional language sections
10. Table 10: Test reliability coefficient
10. Table 11: p-value of items in 4 sections
11. Table 12: Discrimination value of items in 4 sections
12. Table 13: Number of test items with acceptable p-value and discrimination value in 4 sections
13. Table 14: Suggested scoring format
14. Table 15: Proposed test specifications
12. Chart 1 Students’ response on test content
13. Chart 2 Students’ response on item discrimination value
14. Chart 3 Students’ response on time length
15. Chart 4 Students’ response arbitrariness
16. Chart 5 Students’ response on relation between test score and their achievement
TABLE OF CONTENT
CANDIDATE’S STATEMENT
i
ACKNOWLEDGEMENT
ii
ABSTRACT
iii
LIST OF ABBREVIATION
iv
LIST OF TABLES AND CHARTS
v
TABLE OF CONTENT
vi
Chapter 1: INTRODUCTION
1
1.1. Rationale for the study
1
1.2. Aims and research questions
2
1.3. Theoretical and practical significance of the study
2
1.4. Scope of the study
2
1.5. Method of the study
2
1.6. Organization of the paper
3
Chapter 2: LITERATURE REVIEW
4
2.1. Language testing
4
2.1.1. What is a language test?
4
2.1.2. The purposes of language tests
4
2.1.3. Types of language tests
5
2.1.4. Criteria of a good language test
5
2.2. Achievement test
6
2.2.1. Definition
6
2.2.2. Types of achievement test
6
2.2.3. Considerations in final achievement test construction
7
2.3. MCQs test
7
2.3.1. Definition
7
2.3.2. Benefits of MCQs test
8
2.3.3. Limitations of MCQs test
10
2.3.4. Principles on designing a good MCQs test
11
2.4. Reliability of a test
11
2.4.1. Definition
11
2.4.2. Methods for test reliability estimate
12
2.4.3. Measures to improve test reliability
15
2.5. Summary
15
Chapter 3: The Context of the Study
16
3.1. The current English learning, teaching and testing situation at HUBT
16
3.2. The course objectives, syllabus and materials used for the second non-majors of English in Semester 4.
17
3.2.1. The course objectives
17
3.2.2. Business English syllabus
17
3.2.3. The course book
19
3.2.4. Specification grid for the final achievement Computer-based MCQs test in Semester 4.
19
Chapter 4: Methodology
21
4.1. Participants
21
4.2. Data collection instruments
21
4.3. Data collection procedure
21
4.4. Data analysis procedure
22
Chapter 5: RESULTS AND DISCUSSIONS
23
5.1. The compatibility of the objectives, content and skill weight format of the final achievement computer-based MCQ test 1 for 4th semester with the course objectives and the syllabus
23
5.1.1 The test objectives and the course objectives
23
5.1.2. The test item content in four sections and the syllabus content
24
5.1.3. The skill weight format in the test and the syllabus
26
5.2. The reliability of the final achievement test
27
5.2.1. Reliability coefficient
27
5.2.2. Item difficulty and discrimination value
27
5.3. The attitude of students towards the MCQs test 1
29
5.4. Pedagogical implications and suggestions on improvements of the existing final achievement computer-based MCQs test 1 for the non-English majors at HUBT.
34
5.5. Summary
38
Chapter 6: CONCLUSION
39
6.1. Summary of the findings
39
6.2. Limitations of the study
40
6.3. Suggestions for further study
40
REFERENCES
41
APPENDICES
I
APPENDIX 1
Grammar, Reading, Vocabulary and Functional language check list
II
APPENDIX 2
Survey questionnaire (for students at HUBT)
IV
APPENDIX 3
Students’ test scores
VII
APPENDIX 4
Item analysis of the final achievement computer-based MCQs test 1- 150 items, 349 examinees
XII
APPENDIX 5
Item indices of the final achievement computer-based MCQs test 1
XVII
Chapter 1: Introduction
Rationale of the study
Testing plays a very important role in teaching and learning process. Testing is one form of measurement which is used to point out strengths and weaknesses in the learned abilities of the students. Through testing, especially tests scores we may discover the performance of given students and of teachers. As far as students are concerned, test scores reveal what they have achieved after a learning period. As for teachers, test scores indicate what they have taught to their students. Based on test results, we may make improvement in teaching, learning and testing for better instructional effectiveness.
Another reason for the selection of testing a matter of study lies in the fact that the current language testing at Hanoi University of Business and Technology (HUBT) has been under a lot of controversy among students and teachers. Testing is mainly carried out in the form of two objective tests on computers (named test 1 and test 2) which are administered at the end of each semester. The scores that a student gets on these tests are the main indicators of his or her performance during the whole semester. There are different comments on the results of these tests, especially the test 1 for the second-year non-English majors. Some subject teachers claim that these tests do not truly reflect the students’ language competence. Others say that these tests are appropriate to what students have learnt in class and compatible with the course objectives and therefore reliable. Also, among the students, do opposite ideas exist. Many think that these tests are more difficult than what they have learnt and studied for the exam, others say that these test items are easy and relevant to what they have been taught. Therefore finding out whether the tests are closely related with what the students have been learnt and what the teachers have taught, also, whether these tests are of reliability is indispensable.
For the two reasons mentioned above, the author would like to undertake this study entitled “A study on the reliability of the final achievement Computer-based MCQs Test 1 for the 4th semester non-English majors at Hanoi University of Business and Technology” with the intention to examine rumors about this test. In addition, the author hopes that the study results help to raise awareness among teachers as well as those who are interested in this field. At the same time, study results, in some extent, can be applied to improve the current testing situation in HUBT.
Aims and research questions
The main aim of the study is to investigate the reliability of the existing final achievement MCQs test 1 (4th semester) for non-English majors at HUBT through analyzing the test objectives, test content and test skill weight format, students’ scores, test items, perception and comments from students on the test and then to make suggestions towards the test’s improvement.
To achieve this aim, the following research questions are set for exploration:
Are the objectives, content and skill weight format of the final achievement computer-based MCQs test 1 compatible with the course objectives, the syllabus content and skill weight format ?
To what extend is the test 1 reliable?
What is the student’s attitude towards the final achievement Computer-based MCQs test 1?
Scope of the study
The existing final achievement Computer-based MCQs test 1 in the 4th semester for the second-year non-English majors at HUBT
Theoretical and practical significance of the study
Theoretically, the study proves that testing is crucial in order to measure and evaluate the quality of learning and teaching. Also, test reliability is one of the most important criteria for the evaluation of a test.
Practically, the study presents how reliable the final achievement MCQs test 1 administered at HUBT is and how to improve its quality.
Method of the study :
Both qualitative and quantitative methods are used.
Regarding literature review on language testing, course objectives, syllabus, the objectives, content and format of the achievement test 1 for 4th term, results of the questionnaires for students, qualitative method is applied.
With reference to test scores and test items analysis, quantitative method is used.
1.6. Organization of the paper
The study is composed of 6 chapters.
Chapter 1- Introduction briefly states the rationale, aims and research questions, scope of the study, theoretical and practical significance of the study, method of the study and organization of the paper.
Chapter 2- Literature review discusses relevant theories of language testing, final achievement test, Computer-based MCQ tests and test reliability.
Chapter 3- The context of the study deals with English learning, teaching and testing situation at HUBT, course book, syllabus and check list for the test.
Chapter 4- Methodology presents participants, data collection instruments, data collection and data analysis procedure.
Chapter 5– Results and Discussions presents and discusses the results of the study. Suggestions for the improvement of the achievement test 1 are also proposed in this chapter.
Chapter 6- Conclusion summarizes the findings, mentions the limitations and provides suggestions for further study.
Chapter 2: Literature review
2.1. Language testing
2.1.1. What is a language test?
There are a wide variety of definitions of a language test which have one point of similarity. That is to say, a language test is considered as a device for measuring individuals’ language ability.
According to Henning (1987, p.1), “Testing, including all form of language test, is one form of measurement”. In his opinion, tests such as listening or reading comprehension are delivered in order to find out the extent to what the abilities of these skills are present in the learners. Similarly, Bachman (1990, p.20) stated: “A test is a measurement instrument designed to elicit a specific sample of an individual’s behavior”. He also considered obtaining the elicited sample of behavior as the distinction of a test from other types of measurement.
Brown H.D (1995, p.384) presented the notion in a simpler way: “A test, in plain words, is a method of measuring a person’s ability or knowledge in a given domain”. He explained that a test first and foremost is a method which includes items and techniques requiring the performance of testees. Via this performance, a person’s ability or language competence is measured.
These viewpoints show that a language test is an effective tool of measuring and assessing students’ language knowledge and skills and providing precious information for better future teaching and learning.
2.1.2. The purposes of language tests
Language tests regarding their purposes are perceived from different perspectives by different scholars. Typically, Henton (1990) mentioned 7 points which can be represented as follows:
Finding out about progress
Encouraging students
Finding out about learning difficulties
Finding out about achievement
Placing students
Selecting student
Finding out about proficiency
In general, a language test is used to evaluate both teaches and students’ performance, to make judgment and adjustment to teaching materials and methods, and to strengthen students’ motivation for their further study.
2.1.3. Types of language tests
Language tests can be classified into different types according to their purposes. Henton (1990), Brown (1995), Harrison (1983) and Hughes (1989) pointed out that language tests include four main types: proficiency tests, diagnostic tests, placement tests and achievement tests with characteristics illustrated in the following table:
Type of test
Characteristics
Proficiency test
Measure people’s abilities in a language regardless of any training they may have had in that language
Diagnostic test
Check students’ progress for their strengths and weaknesses and what further teaching is necessary
Achievement test
Assess what students have learnt as known syllabus
Placement test
Classify students into groups at different level at the beginning of a course
Table 1: Types of tests
Another researcher, Henning (1987) divided tests into objective and subjective ones on the basic of the manner in which they are scored. Subjective tests obtain scoring by opinionated-judgment on the part of the scorer while objective tests are scored by comparing examinee responses with an established set of acceptable responses or scoring key.
2.1.4. Criteria of a good language test
Just like any measuring device, a language test presents potential error measurement. For the purpose of investigating and evaluating and “testing” a test, researchers such as Brown (1995), Henning (1987), Bachman (1990) and Harrison (1983) identified criteria to determine if a test is good or not. A good language test must feature four most important qualities: reliability, validity, practicality and discrimination.
The reliability of a test is its consistency (Brown, 1995; Harrison, 1983). A test is reliable only when it yields the same results whether it is administrated under any circumstances or scored by any markers. The validity of a test refers to “the degree to which the test actually measures what it is intended to measure” (Brown, 1995, p.387). A test is considered to be valid if it possesses content validity, face validity and construct validity. The practicality of a test is administrative. A test is practical when it is time and money- saving. Also, it is easy to administer, mark and interpret. The discrimination of a test is the extent to which a test separates the students from each other (Harrison, 1983). In other words, it is the capacity of the test to discriminate among different students and to reflect individuals’ performance of the same group.
2.2. Achievement test
2.2.1. Definition
Achievement tests are of extensive use at different levels of education due to their distinguished characteristics. Researchers define the notion of achievement tests in various ways.
Henning (1987, p.6) held that:
Achievement tests are used to measure the extent of learning in a prescribed content domain, often in accordance with explicitly stated objectives of a learning program. .
From this definition, it followed that an achievement test was a measurement tool designed to examine language competence of learners over a period of instruction learning and to evaluate instruction program. In the same token, Hughes (1989) put that achievement tests were intended to assess how successful individual students, groups of students or the courses themselves have been in achieving objectives. Achievement tests play an important role in the education programs, especially in evaluating students’ acquired language knowledge and skills during a given course.
2.2.2. Types of achievement test
Achievement tests can be subdivided into the final achievement and progress achievement according to the time of administration and the desired objectives (Henton, 1990).
Final achievement tests are usually given at the end of the school year or at the end of the course to measure how far students have achieved the teaching goals. The contents of these tests must be related to the teaching content and objectives concerned.
Progress achievement tests are usually administrated during the course to measure the progress that students are making. The results from the test enables teachers to identify the weaknesses of the learners, diagnose the areas not properly obtained by students during the course in order to have remedial action.
Henton (1990) also stated the two types of test differ in the sense that final achievement tests are designed to cover a longer period of learning and it should attempt to cover as much of syllabus as possible.
2.2.3. Considerations in final achievement test construction
On the basis of its characteristics, Heaton (1990) put that covering much the contents of a syllabus or a course book is a requirement for designing a final achievement test. Testers should avoid basing the test on their own teaching rather than on the syllabus or course book in order to establish and maintain a certain standard. In addition, Mc Namara (2000) stated that test writers should draw out a test specification before writing a test. Test specification is resulted from the process of designing test content and test method. Test specification has to include information on the length, the structure of each part of the test, the type of materials with which the candidates will have to engage, the source of materials, the extent to which authentic materials may be altered, the response format and how responses are scored. They are usually written before the tests and then the test is written on the basis of the specifications. After the test is written, the specification should be consulted again to see whether the test matches the objectives set in the specifications.
2.3. MCQs test
2.3.1. Definition
Multi-choice questions tests (MCQs tests) are objective tests which require no particular knowledge or training in the examined content area on the part of the scorer (Henning, 1990). They are different from subjective tests in terms of scoring methods. That means no matter which examiners mark the test, a testee will get the same score on the test (Heaton, 1988).
MCQs tests use multi-choice questions which is also called multi-choice items as a testing technique. An MC item is a test item where the test taker is required to choose the only correct answer from a number of given options (McNamara, 2000; Weir, 1990). .
In the view of Heaton (1988), MC items take many forms but their basic structure includes two parts. The initial part is known as stem. The primary purpose of the stem is to present the problem clearly and concisely. The stem needs to provide the testees a very general idea of the problem and the answer required. The stem may be in the form of an incomplete statement, a complete statement or a question. The other part is the choices from which the students select their answers and is referred as options/ responses or alternatives. In an MC item there may be three, four or five options of which one is the correct options or key while the others are distractors of which the task is to distract the majority of poor students from the correct option. The optimum number of options in most public test for each multi choice item is five. And it is desirable to use four options for grammar items and five for vocabulary and reading.
2.3.2. Benefits of MCQs test
MC items are undoubtedly one of the most widely used types of items in objective test (Heaton, 1988). The popularity of this testing technique results from its efficiency. Researchers such as Weir (1990), Heaton (1988) and Hughes (1989) pointed out a number of benefits which are presented as detailed below.
Firstly, the scoring of MCQs test is perfectly reliable, rapid and economical. There is only one correct in the format of an MC item so that the scorers’ interference into the test is minimized. The scorers are not permitted to impose their personal expertise, experience, attitudes and judgment when giving marks to testees’ responses. The testees, thus, always get a consistent result whoever the scorers are and whenever their tests are given marks. In addition, MCQs tests can be marked mechanically with minimal human intervention. As a result, the marking is not only reliable and simple but also more rapid and often more cost effective than other forms of written test (Weir, 1990).
Secondly, an MCQs test can cover a much wider sample of knowledge than a subjective test. When taking an MCQs test, a candidate has only to make a mark on the paper and therefore it is possible for testers to add more items in a given period of time (Hughes, 1988). With a large number of items in the test, the coverage of knowledge in MC items is so broad and is very useful for identifying students’ strengths and weaknesses and distinguishing their ability.
Thirdly, MCQs tests increase test reliability. According to Heaton (1988) and Weir (1990), it will not be difficult to obtain reliability for MCQs tests because of perfectly objective scoring. Besides, due to the fact that the testees do not have to deploy the skill of writing as in open-ended one and MC items have clear and unequivocal format, the extent to which measurement errors exert on the trait being assessed is narrowed.
Another benefit is that MC items can be trialed beforehand fairly easily. From these trials, the difficulty level of each item and that of the test as a whole are usually possible to be estimated in advance (Weir, 1990). The results from item difficulty estimate make a great contribution to the success of designing a more appropriate test to candidates’ level of language.
In addition, Heaton (1988, p.27) claimed that “multi choice items can provide a useful means of teaching and testing in various learning situation (particularly at the lower levels) provided that it is always recognized such items test knowledge of grammar, vocabulary, etc. rather than the ability to use language”. MC items can be very useful in measuring students’ ability to recognize correct grammatical forms, etc and therefore can help both teacher and students to identify areas of difficulty.
As far as computer-based MCQs tests are concerned, according to McNamara (2000) many important national and international language tests, including TOEFL, are moving to computer-based testing (CBT) since there have been rapid developments in computer technology. The main feature of CBT is that stimulus texts and prompts are presented not in examination booklets but on the screen, with candidates being required to key in their responses. The advent of CBT has not necessarily involved any change in the test content but often simply represents a change in test method. McNamara (2000) noted that the proponents of computer-based testing can point to a number of advantages. First, just as paper-done MCQs tests, scoring of fixed response items can be done automatically and the candidate can be given a score immediately. Second, the computer can deliver tests that are tailored to the particular abilities of the candidate. This type of test, as also called computer-adaptive test, can provide far more information about the testees’ ability.
2.3.3. Limitations of MCQs tests:
Despite the fact that MCQs tests bring lots of benefits, especially, to test administrators, there are several problems associated with the use of MC items. These problems were identified by a number of researchers such as Weir (1990), Hughes (1989), Heaton (1988), McCOUBRIE. P (2004) and McNamara (2000).
First of all, Hughes (1989) criticized that MCQ technique tests only recognition knowledge. To do a given task, a testee just needs to look at the stem and four or five options and then picks out the key. His or her performance is not much more than the recognition of the right form of language. It shows no evidence that this person can produce the language. Obviously, this type of test presents a lack between at least some candidates’ productive and receptive skill and therefore the performance on an MCQs test may give an inaccurate picture of these candidates’ ability (Hughes, 1989). Heaton (1988) also pointed out that an MC item does not lend itself to the testing language as communication and the process involved in the actual selection of one out of four or five options does not bear much relation to the language used in most real life situation. Normally, in everyday situation we are required to produce and receive language while MC items are merely aimed to test receptive skills.
Another problem arises when using MCQs tests is that “the multi choice item is one of the most difficult and time consuming types of items to construct” (Heaton, 1988, p.27). In order to write a good item, test designers have to strictly follow certain principles. For example, they have to write many more items than they actually need for a test. After that they have to pre-test and analyze students’ performance on the item evaluate items and recognize the usable ones or even to rewrite the items for a satisfactory final version. These procedures take a lot of test constructors’ time and need far more careful preparation than subjective tests.
Furthermore, objective tests of MCQs type encourage guessing (Weir, 1990; Heaton, 1988; Hughes, 1989). Hughes estimated the chance of guessing the correct answer in a three option multi choice item is roughly 33%; in four or five option item it is 25% or 20% respectively. The format of MC items makes it possible for testees to complete some items without reference to the texts they are set on. As a result, the score gained in MCQs maybe suspect and the score range may become narrow.
Some other limitations in the use of MC items involve backwash and cheating. Backwash may be harmful because MQ items require students to memorize as many structures and forms as possible and do not stimulate them to produce language. Thus practicing MC items is not a good way to improve learners’ command of language. Cheating may be facilitated as MC items make students easy to communicate with each other and exchange selected response nonverbally.
Referring to computer-based tests, according to McNamara (2000), this type of test requires the prior creation of item bank which have been thoroughly trialed. The preparation for a standardized item bank to estimate difficulty for candidates at given levels of ability as precisely as possible is not at ease. In addition, delivering CBT raises the question of validity and reliability. For example, different levels of familiarity with computers or of reading texts on computer screens will affect students’ performance. These differences might lead to difficult conclusion about a candidate’s ability.
2.3.4. Principles to construct MC items
In order to construct a good MC item, there are a large number of principles which can be summarized as follows (Heaton, 1988):
Each MC item should have only one answer
Only one feature at a time should be tested
Each option should be grammatical correct when placed in the stem, except for the case of specific grammar test items.
All multi-choice items should be at a level appropriate to the proficiency level of the testees.
Multi choice items should be as brief and as clear as possible
Multi choice items should be arranged in rough order of increasing difficulty and there should be one or two simple items to “lead in” the testees.
2.4. Reliability of a test
2.4.1. Definition
In research, the term reliability means ‘repeatability’ or ‘consistency’. A test is considered reliable if it would give us the same result over and over again assuming that what we are measuring isn't changing. Lynch (2003, p.83) stated that reliability refers to “the consistency of our measurement”. In the same vein, Harrison (1983) explained that to be reliable, tests should not be elastic in their measurement. Whatever the version of the test a testee take, whatever the occasion the test is administrated, and whatever raters who score the test, it still yields the same results.
2.4.2. Methods of test reliability estimate
Reliability may be estimated through a variety of methods which is presented below:
* Test-retest method is a classic way to calculate the reliability coefficient of a test. The test is given to a group of students and then given again to these students immediately afterward (the interval between two test administration is no more than two weeks). The test is assumed to be perfectly reliable if the students get the same score on the first and the second administration (Alderson, J.S. et al., 1995)
* Parallel-form methods involve correlating the scores from two or more similar (parallel) tests which are administrated to the same sample of persons. A formula for this method may be expressed as follows:
Rtt = rA,B (Henning, 1987)
Rtt: the reliability coefficient
rA,B: the correlation of form A with form B of the test when administered to the same people at the same time.
* Inter-rater method is applied when scores on the test are independent estimates by two or more raters. It involves the correlation of the ratings of one rater with those of another. The following formula is used in calculating reliability:
nrA,B
Rtt = (Henning, 1987)
1 + (n-1)r A,B
Rtt: inter-rater reliability
n: the number of rater who combines estimates from the final mark for the examiner
rA,B: the correlation between the raters, or the average correlation among all rater if there are more than two
* Internal consistency method judges the reliability of the test by estimating how consistent test-takers’ performances on different parts of the tests with each other (Bachman, 1990). The following are internal consistency measures that can be used:
Split-half reliability involves dividing a test into two, and correlating these two halves. The more strongly the two halves correlate, the higher the reliability will be. This method uses the following formula:
2rA,B
Rtt = (Henning, 1987)
1 + r A,B
Rtt: Reliability estimated by the split half method
rA.B: The correlation of the score from one half of the test with those from the other half
Kuder-Richardson Formula 20 (KD20) is based on item level data and is used when the tester has the results for each test item. The KD-20 is as follows:
n st2 - ∑si2
Rtt = ( ) (Henning, 1987)
(n-1) st2
Rtt: The KR 20 reliability estimate
n: The number of items in the test
st2: The variance of test scores
∑si2 : The sum of the variances of all items (or ∑pq)
Kuder- Richardson Formula 21 (KD-21) is based on total test scores and assumes that all items of an equal level of difficulty. The KD-21 is as follows:
n x – x 2/n
Rtt = ( 1 - ) (Henning, 1987)
( n-1) st2
Rtt : The KR 20 reliability estimate
n: The number of items in the test
x: The mean of scores on the test
st2: The variances of test scores
Alderson, J.S. et al (1995) stated that for the internal consistency reliability, the perfect reliability index is +1.0. In the same view, Hughes (1989, p.31-32) noted that “ the ideal reliability coefficient is 1- a test with a reliability coefficient of 1 is one which would give precisely the same results for a particular sets candidates regardless of when it happened to be administrated”. Reliability coefficient for a good vocabulary, structure and reading test is usually in the 0.90 to 0.99 range, for an auditory comprehension test is more often in the 0.80 to 0.89 range and for an oral production test it may be in the 0.70 to 0.79 range while an MCQs test typically has the reliability coefficient of more than 0.80 (Hughes, 1989).
Among the above ways of estimating reliability, test-retest and parallel methods require at least two test administrations while the inter-rater and internal consistency methods need only a single administration. For the reason of convenience and satisfaction, KD20 and KD 21 are often chosen more than the others and are considered the two most common formulae (Alderson. J.S. et. al., 1995).
Concerning MCQs tests, besides estimating test reliability coefficient, item analysis including item difficulty and item discrimination provides more concise insight into the test reliability (Henning, 1997).
The formula for calculating item difficulty is:
∑Cr
p = (Henning, 1987)
N
p: proportion correct
∑Cr : the sum of correct responses
N: the number of students
Henning (1987) pointed out that p value for each item should be between 0.33 and 0.67 and thus the level of difficulty of the item is acceptable. If p value is below 0.33, the item is considered as too difficult. If it is above 0.67, the item is too easy.
The formula for computation of item discrimination is:
Hc
D = (Henning, 1987)
Hc + Lc
D: discriminability
Hc: the number of correct response in the high group
Lc: the number of correct response in the low group
The optimal size of each group is 28% of the total sample. For very large samples of examinees, the number of examinees in the high and low groups are reduced to 20% for computational convenience. The acceptable discrimination value by sample separation method is >= 0.67 (Henning, 1987)
2.4.3. Measures to improve test reliability
Reliability may be improved by eliminating its sources of error. Hughes (1989) makes a list of recommendation to improve test reliability as follows:
Take enough sample of behavior
Do not allow candidates too much freedom
Write unambiguous items
Provide clear and explicit instructions
Ensure that the test are well laid out and perfectly legible
Candidate should be familiar with format and testing techniques
Provide uniform and non-distracting conditions of administration
Furthermore, item difficulty and item discriminability show that the reliability of an MCQs test. is low or high (Henning, 1987). Therefore the most straight forward ways to improve test reliability is to design MCQs items with good level of difficulty and discrimination value.
2.5. Summary
This chapter presents the theoretical framework for the study. In Section 2.1, the notion of a language test as a measuring device of people’s ability is reviewed. Additionally, the purposes of language testing, types of language tests and criteria of a good test are also discussed. Section 2.2 classifies achievement tests into two types and mentions consideration in designing final achievement tests. The definition, benefits and limitations of MCQs tests and principles of this type of test construction are dealt with in section 2.3. The final Section - 2.4 is concerned with test reliability, methods for estimating test reliability, and ways to make language tests more reliable.
Chapter 3: The Context of the Study
3.1. The current English learning, teaching and testing situation at HUBT
There are over 1500 second-year non-majors of English at HUBT. English is their required subject for foreign language. Their levels of proficiency vary because of their different backgrounds, knowledge of language, exposure to English, characteristics, learning attitudes, motivations and so on. These students have to cover a comparatively large amount of knowledge of English as English hold the highest credits among all subjects. In the English Department, HUBT there are totally 62 teachers who work with the non-English majors enthusiastically to help them with the foreign language. They are all dedicated and qualified with an average of five years’ teaching experience.
With the aim to equip students with business English and communication skills necessary for their future career, learning and teaching activity for the second-year non-English majors mainly focus on developing speaking and listening skills. However, testing process is quite complicated and can be described as follows.
In semester 4 the students have to experience daily assessment and go through four tests all together. Daily assessment includes checking vocabulary, speaking skill, and doing tasks in the course book and practice files. The four tests comprise of two paper tests and two computer-based MCQs tests. These tests are designed by teachers of English Department, HUBT and . The paper tests, given in the middle of the term (week 9) and at the end of the term (week 17) focus on listening, writing, grammar and vocabulary. The computer-based MCQs tests are administered on computers in the week 19. Each test lasts 2 hours and includes 150 multi choice items emphasizing on vocabulary, grammar, reading and functional language. The construction of the first test (hereafter achievement test 1) is based on the three units of the course book (Unit 7, 8, 9) that the students have already learnt. The second one (achievement test 2) is designed on the basis of the last three units of the course book (Unit 10, 11, 12). Items of MCQs tests are selected by one person in charged of teaching English in the 3rd and 4th semester for the second year students.
The Computer-based MCQs test administered in HUBT is similar to a paper-done one. The main different is that the test is delivered on computers and students simply click mouse for their chosen response among A, B, C, D. This kind of test is different from computer adaptive tests which are tailored to the particular abilities of the candidate. In other words, a Computer-based MCQs test at HUBT is in fact an MCQs test delivered on computers.
The following chart illustrates testing guideline for semester 4:
Semester 4 (12 credits)
The first score (6 credits)
The second score (6 credits )
Daily assessment 1
20%
Daily assessment 2
20%
Paper test 1
10%
Paper test 2
10%
Computer-based MCQs test 1
70%
Computer-based MCQs test 2
70%
Table 2: Scoring format for each semester
This study is only focused on the final achievement computer-based MCQs 1.
3.2. The course objectives, syllabus and materials used for the second non-majors of English in Semester 4.
3.2.1. The course objectives
The training objectives in the 4th semester are to help students to:
- Further develop speaking and listening skill in business contexts
- Further develop skill of reading business texts
- Consolidate basic grammar
- Broaden business vocabulary
- Further practice pronunciation
-Write business letters and memorandums
3.2.2. Business English syllabus
The syllabus is described in the following table:
Week
Time
(minutes)
Unit
Content
Page
Further work- page
1
220
7
Starting up- Vocabulary
C.B. 62-63
P. F. 28-29
220
7
Listening- Reading
C.B. 64-65
2
220
7
Language review- Skills
C.B. 66-67
P.F. 29-30/
220
7
Case study
C.B. 68-69
P.F. 30-31
3
220
7
Text bank- Talk business
T.B. 126-127/ P.F- 66-67
Grammar review correction/
220
8
Starting up- Vocabulary
C.B- 70-71
P.F- 32
4
220
8
Listening- Reading
C.B- 72-73
220
8
Language review- Skills
C.B. 73-75
P.F- 33
5
220
8
Case study
C.B- 76-77
P.F. 34-35
220
8
Text bank- talk business
T.B. 128-129/ P.F. 68-69
Grammar review correction
6
220
9
Starting up- Listening
C.B. 78-79
220
9
Vocabulary- Reading
C.B. 80-81
P.F- 36
7
220
9
Language review- skills
C.B- 82-83
220
9
Case study
C.B. 84-85
P.F- 38-39
8
220
9
Text bank- Talk business
T.B- 130-131/ P.F- 70-71
Grammar Review correction
220
C
Revision
C.B- 86-89
9
220
Written test
Note: C.B: Course book; P.F: Practice file; T.B: Teacher’s book
Table 3: The syllabus for 4th semester (for non –English majors)
Time allocation for language skills and sections is illustrated as follows:
Skills
Class numbers ( period )
Percentage (%)
Listening
16.5
22%
Speaking
19.5.
26% (13% for practicing functional language)
Reading
13.5
18%
Writing
6
8%
Grammar
10.5
14%
Vocabulary
9
12%
Table 4: Time allocation for language skills and sections
3.2.3. Course book
The course book in semester 4 for the second year students at HUBT is Market Leader Pre-intermediate which was written by Davis Cotton, David Falvey and Simon Kent and published in 2002 by Longman. These books mainly focus on three skills: speaking, listening and reading. It does not put a great emphasis on grammar. The book is divided into 12 units and closely interrelated but each with a slightly different emphasis. The pattern including starting up-Vocabulary-Listening-Reading-Language review-Skills-Case study is the same for all units. In the fourth semester, students study the last six units of this book (Unit 7-Unit 12)
The course book check lists necessary for examining the task and content in the course book used for construction of the achievement computer-based MCQs test 1 is in Appendix 1.
3.2.4. Specification grid and scoring scale for the final achievement Computer-based MCQs test 1 in Semester 4.
In order to evaluate students’ achievement, the following grid is used to design achievement test 1
Part
Main skill
Input
Item type
Number of mark
Skill weighting
1
Vocabulary
Incomplete sentences, approx. 18 words
50 x; 4 multiple choice
4
33%
2
Grammar
Incomplete sentences, approx. 18 words
50 x; 4 multiple choice
3
33%
3
Reading
Narrative or factual test, approx. 60 words
30 x; 4 multiple choice
1.67
20%
4
Functional language
Short sentences, approx. 16 words
20 x; 4 multiple choice
1.33
14%
Table 5: Specification grid for the final computer-based MCQs test 1
The scoring scale for the test is designed by the teachers in HUBT and includes two levels as follows:
Pass: For students who can get 50% of the whole test
Fail: For students who get below 50% of the whole test.
Chapter 4: Methodology
4.1. Participants
The first subjects who participated in this study include 349 second year students from 14 classes. Their test scores were collected for the purpose of analyzing and computing the internal consistency reliability, item difficulty, and item discriminability.
The second subjects who took part in answering a questionnaire include 236 second year non-English majors. Their responses to 14 questions were analyzed in order to investigate the students’ attitude towards the final achievement MCQs test 1.
4.2. Data collection instruments
The following instruments were adopted to obtain information for the study:
- Kuder-Richardson Formula 20 for internal consistency reliability estimate
- Item difficulty and item discrimination formulae mentioned in section 2.4.2.
- A questionnaire survey for students (see Appendix 2)
The questionnaires were designed on the basis of Henning’s list of threats to reliability of a test (1987). The objective is to find out students’ attitude towards the reliability of the current achievement MCQs test 1 in the 4th term . The questionnaires included 14 items and were in Vietnamese to make sure the informants understood the questions appropriately (see Appendix 2). These items focus on the characteristics of the test, test administration and test-takers.
4.3. Data collection procedure
The data about test objectives and the course objectives were elicited through English Department Bulletin, HUBT enacted in 2003. The data about the syllabus content were collected through the syllabus for the second year students. The data about the test content and test format were obtained through a copy of the official current test from English Department.
The data about the students’ test scores and items responses were obtained from a file containing both the students’ score and responses on the test provided by Informatics Department, HUBT.
The data about the results of questionnaire were collected from 236 second year students who were randomly selected one week after they have finished the final achievement test 1.
4.4. Data analysis procedure
First, the comparison between the test objectives and the course objectives, the test content and the syllabus content, and skill weight in the test format and the syllabus was made in order to determine if they are compatible with each other.
Second, reliability coefficient, item difficulty and item discrimination indices of the MCQs test 1 were analyzed in order to determine the extent to which the final achievement test 1 is reliable
Finally, analysis of students’ responses on the questionnaire was made in order to find out students’ attitude towards the MCQs test given to them.
Chapter 5: Results and Discussions
5.1. The compatibility of the objectives, content and skill weight format of the final achievement computer-based MCQ test 1 for 4th semester with the course objectives and the syllabus
5.1.1 The test objectives and the course objectives
As mentioned in section 3.2.1, the course is mainly targeted to further develop students’ essential business communication skill of speaking such as making presentations, taking parts in meetings, negotiating, telephoning and using English in social situation. Through a lot of interesting discussion activities, students will build up their confidence in using English and improve their fluency. The course is also aimed at developing students’ listening skill such as listening for information and note-taking. In addition, it provides students with important new words and phrases and increases their business vocabulary. Students’ skill of reading will be also built up through authentic articles on a variety of topic on business. The course also helps students to revise and consolidate basic grammar, to improve their pronunciation and to perform some writing tasks on business letter and memorandum.
The MCQs test 1 is designed to check what students have learnt about vocabulary, grammar, reading topics and functional language in Unit 7,8,9 of Market Leader Pre-. It is also constructed to assess students’ achievement at the end of the course, especially to evaluate students’ results after completing these 3 units. Particularly, vocabulary and grammar section making up of 100 items are aimed at examining the amount of vocabulary and grammar that students have been instructed. Reading section of 30 items is to measure students’ reading skill on business topics such as marketing, planning and managing. Functional language sections of 20 items is to measure students’ ability of communicating in daily business situations.
Obviously, the objectives of the course and of the MCQs test 1 are partially compatible with each other. That is to say, the course provides students with knowledge about vocabulary, grammar and functional language and develop students’ reading skills and the MCQs test 1 is designed to measure students’ ability of these knowledge and skills. However, the difference is that the course objectives are targeted to develop both receptive and productive skills for students whereas the test merely focuses on students’ receptive skill of reading and examines students’ ability of knowledge recognition rather than language production.
5.1.2. The test item content in four sections and the syllabus content
* Grammar section
The grammar items in the test are shown clearly and specifically in the table below.
Grammar items
Numbers of tested items
Percentage of tested items
1
Future expressions
15
30
2
Prepositions
10
20
3
Wh-question forms
8
16
4
Verb tense and verb form
8
16
5
Reported speech
5
10
6
Connectors
2
4
7
Adjective comparatives
2
4
Table 6: Main points in the grammar section
Compared to the grammar checklist (see Appendix 1), it can be seen that test items in this section generally cover grammar items in the course book such as question forms, future time expression and reported speech. However the total proportion of these items only makes up of 56%, a little higher than the total percentage of items which are not targeted at in grammar part of the syllabus such as prepositions, connectors, comparatives, verb tense and verb form..
* Vocabulary section
The table below shows the allocation of test items according to the topics of vocabulary included in the textbook.
Vocabulary
Numbers of tested items
Percentage of tested items
1
Noun- noun collocation (Marketing terms)
15
30
2
Verb- noun collocation (ways to plan )
13
26
3
Verb- preposition collocation (ways to manage)
12
24
4
Other economic terms definitions
3
6
5
Verbs showing trends
3
6
6
Multi-word verbs
3
6
7
Adjective related to profits
1
2
Table7: Main points in the vocabulary section
In comparison with vocabulary checklist (see Appendix 1), it can be recognized that test items in vocabulary section of the test are of 80% the same as vocabulary items in the course book. That is to say, the test items stick to what students have learnt such as noun-noun collocation relating to marketing terms, verb-noun collocation relating to ways to plan and verb-preposition collocation relating to ways to manage. Nevertheless, there are also items such as verbs showing trends, multi-word verbs and adjective related to profits which do no include in vocabulary part of the syllabus but in reading articles in Unit 7, 8,9.
* Reading comprehension section
In this section, there are 30 extracts of which main topics are shown as follows:
Extract Topic
Numbers of tested items
Percentage of tested items
1
Coaching new employees
6
20
2
Company profile
5
16.7
3
Company song
4
13.3
4
Town planning
4
13.3
5
Time managing
3
10
6
Planning for tourism
3
10
7
The role of Public Relation department
3
10
8
The role of Marketing department
2
6.7
Table 8: Topics in reading section
By comparing the reading section with the reading checklist (see Appendix 1), it can be observed that the topics in the MCQs test 1 such as managing, marketing and planning are highly relevant to the ones that the students have already learnt.
* Functional language section
This section includes 20 items of business situations. The function of language in these situations is presented in the following table:
Item
Numbers of tested items
Percentage of tested items
1
Clarifying
5
25
2
Making suggestions
4
20
3
Checking information
3
15
4
Asking for opinions
3
15
5
Finishing conversation
2
10
6
Giving opinion
2
10
7
Saying goodbye
1
5
Table 9: Items in the functional language sections
To bring Table 9 into comparison with functional language checklist (see Appendix 1), it can be obviously realized that all test items broadly cover what the students have already been taught in business situations (for example, telephoning, meeting and socializing & entertaining). However, there is a lack of language items of interruption and making excuses although they are focal points in the syllabus.
To sum up, with regard to the content, items in four sections of the MCQs test 1 is generally to large extent relevant to the course book.
5.1.3. The skill weight format in the test and the syllabus
According to skill weight format in the syllabus illustrated in Table 4- section 3.2.2, among four parts including reading, vocabulary, grammar and functional language, reading has the highest proportion of skill weight (18%) and ranks number 1. Grammar ranks number 2 with the skill weight percentage of 14. Functional language ranks number 3 with the rate of 13% and vocabulary is at the bottom with the proportion of 12%.
However, in the test specification grid, skill weighting for four sections is not in the same rank as in the syllabus. Vocabulary and grammar section, with the number of 50 tests items for each hold the same rank – number 1 whereas the rank of reading (30 test items) and functional language (20 test items) is number 3 and 4 respectively. Thus it can be seen that in the MCQs test 1, the rank of reading section is changed from number 1 to number 3 and vocabulary section changed from rank 4 to rank 1.
From the detailed findings presented above, we can realize that the MCQs test 1 objectives are partially compatible with the course objectives. Also, the skill weight format of the MCQs test 1 is partially similar to the skill weight format in the syllabus. Only the content of the MCQs test 1 nearly reflects all the course book content. It thus might be concluded that the MCQs test 1 is to a certain degree related to the teaching content and objectives.
5.2. The reliability of the final achievement test
5.2.1. Reliability coefficient
The results we get from test scores are demonstrated as follows:
Mean
6.59
The variance of the test score
2.12
Standard deviation (s.d)
1.46
The sum of the variance of all items (∑pq)
33
Reliability coefficient
-14.6
Table 10: Test reliability coefficient
As stated in chapter 2, the typical reliability coefficient for MCQs tests is >= 0.8 and the closer it gets to 1.0, the better it is. However, the reliability coefficient of the MCQs test 1 here is too low in comparison with the desirable one.
5.2.2. Item difficulty and discrimination value
Difficulty and discriminability value for each of the 150 tested items were illustrated in Appendix 5.
* Item difficulty value
Among 150 items, there are 54 items of which p value is bigger than 0.67, making up of 36% of the total test items while there are no items with p value smaller than 0.33 (see Appendix 5). That means 64% of test items have acceptable difficulty level, 36% of test items are too easy and 0% of test items is too difficult.
In addition, the MCQs test 1 merely obtained the average p value of 0.55 (see Appendix 3)
The following table illustrates p-value for items in 4 sections of the MCQs test 1:
Section
Number of items with acceptable p-value
Number of items without acceptable p-value
Vocabulary
41
9
Grammar
25
25
Reading
29
1
Functional language
1
19
Table 11: p-value of items in 4 sections
Table 10 shows that half of test items in grammar section and especially 95% of test items in functional language section are too easy. It appears that the MCQs test 1 includes too many too easy items, especially in functional language section. Besides, this test as a whole does not have a range of items with a desirable average p-value of 0.55. Accordingly, items with undesirable difficulty index in this test might reduce the test reliability.
* Item discrimination value
Among 150 items there are 76 items of which discrimination values are acceptable (>=0.67). The others are non-discriminating (see Appendix 5).
The following table demonstrates discrimination value for items in 4 sections of the MCQs test 1:
Section
Number of items with acceptable discrimination value
Number of items without acceptable discrimination value
Vocabulary
21
29
Grammar
29
21
Reading
25
5
Functional Language
1
19
Table 12: Discrimination value of items in 4 sections
Table 11 proves that 95% of items in functional language section are mostly non-discriminating. Roughly half of items in vocabulary and grammar section are also not of good discrimination value. Only items in reading sections can discriminate students well. Thus it can be inferred that item discriminability of MCQs 1 is not as good as expected.
* The number of items with both acceptable p-value and discrimination value is 68, making up of 45.3% of the whole test (see Appendix 5). That might be understood that only 45.3% of test items have good quality.
The number of test items with acceptable p-value and discrimination value in 4 sections of the MCQs test 1can be shown as follows:
Section
No of items with acceptable p-value and discrimination value
Vocabulary
27
Grammar
15
Reading
25
Functional Language
1
Table 13: Number of test items with acceptable p-value
and discrimination value in 4 sections
From this table we can see that items in reading section have the best quality as they satisfy the requirement for p-value and discrimination value. Then come the items in vocabulary section. The items in grammar, and especially in functional language section are undesirable since they are too easy and non-discriminating.
In brief, the findings show that the MCQs test 1 to large extent lacks reliability for two reasons. First, the reliability coefficient of this test is too far from a desirable reliability coefficient of an MCQs test. Second, more than half of test items (54.7%) do not have good p-value and discrimination value.
5.3. The attitude of students towards the MCQs test 1
The survey questionnaires were delivered to 236 second year non-English majors, but only 218 papers were collected. The following are the results:
In order to find out students’ perception about the test content, the author asked students whether the content in 4 sections of the test was relevant to what they had learned. The result is shown in the chart below:
Chart 1: Students’ response on test content
Among four sections, functional language was perceived as the most relevant with the total proportion of 65%. Reading section was claimed to be the least relevant (31% only). Vocabulary was said to be a little more relevant to the syllabus than grammar (59% compared with 52%).
Giving opinions on the test length, three fourths of the students (75%) found that the total number of 150 multi choice items was reasonable for them. 25% of them thought that it was too many.
Answering the question whether the test as a whole had power to discriminate among students in the ability of interest, approximately 36 % of students determined that test items actually discriminate the student level of English. The rest of 64% claimed the level of discrimination was not remarkable. The result can be seen clearly in the following pie chart:
Discrimination value
36%
64%
high
low
Chart 2: Students’ response on item discrimination value
In the fourth question, students were asked if they had enough time to fulfill the tasks given in the achievement test1. The following chart illustrates the result:
Time length
84%
9%
7%
enough
not enough
too much
Chart 3: Students’ response on time length
By observing the result in Chart 3, we realize that roughly 84% of students answered time management was not a problem for them. 7% of responses showed that time allowance was too much while 9% said that they needed more time to finish the tasks.
Regarding the clarify of the test instruction, 90 % of student stated that the instruction was clear. Only 10% of them perceived it was quite unclear.
When being asked about the influence of test supervision on the test result, 98% of students commented that test supervisors were strict. Only 2% of them acknowledged that they were under no very strict supervision.
Students were also asked whether testing room affected their performance. 40 % of them claimed that the testing room did have impact on their test performance. 60% stated they were not affected.
Responding to the question whether they experienced computer breakdown when doing the test and whether their test results were affected, a third of informants stated that they did and had to do the test again. 77% of them found it have a very negative influence on their test performance. The rest of 23% saw no impact.
When being asked if they suffered from physical and emotional pressure when performing tasks, 45% of students admitted they did while 55% of them did not.
With reference to test-taking behavior, 56% the informants responded that they did select the answers arbitrarily whereas 44% did not. The result was illustrated in the chart below. :
Response Arbitrariness
56%
44%
Yes
No
Chart 4: Students’ response arbitrariness
Answering the question about prior exposure to the test format and content, 97 % of students realized that they were familiar with this type of test. And only 3% were not. This can be explained that they were the second year students and have done a number of tests.
Concerning students’ computer skill, 61% of students claimed that they were good at using computer to do the test. 38% thought their skill was normal. Only 1% stated that they were not good at it.
When asked whether any difference between doing the test with hard copy and soft copy exists, amazingly 50 % of the participants found it different and 50% did not although they were the second year student and experienced four times doing MCQs English test on computers.
In the last question, students were asked whether the test scores reflected their actual achievement during the 4th semester. The result was presented in the following pie chart:
test score and students' achievement
66%
34%
exactly
not exactly
Chart 5: Students’ response on relation between test score and their achievement
As it can be seen from Chart 5, 66% of students acknowledged that the test score actually reflected their achievement while 34% of them did not get the score as expected.
From these results we can realize some points as follows:
- Factors which do not affect students’ scores include students’ computer skill, students’ familiarity with the test format and content, test supervision, clarity of test instruction, and time allowance.
- Factors affecting students’ test performance involve test characteristics, testee characteristics and test administration characteristic. Test characteristics include a large number of test items, low content relevance to the course book and low discrimination power. Testee characteristics consist of response arbitrariness, suffering from pressure and bad ability of reading texts on the screen. Test administration characteristic involves computer breakdown. Clearly when performing tasks, students were heavily influenced by both objective and subjective factors and therefore the results they got did not reflect their true ability as 34% of them claimed.
In short, the test scores do not seem reliable from students’ perspectives. That is because students’ performances on the test were affected by a number of both objective and subjective factors.
All of the findings to three research questions mentioned above lead to a conclusion that the MCQs test 1 does not yeild a reliable result. The unreliability of the test resulted from the performance of both test-takers and test-designers. As for test designers, they made the test of low quality. The allocation of items with difficulty among four sections was not reasonable. The items were also not really discriminating. As for test-takers, they did not perform the tasks well. Notably, according to the findings obtained from the comparison and analysis of test item content, there is high relevance between the test and the course book, especially in reading section. However, the findings from the questionnaire survey for students show that the test content is not actually relevant to what students have been taught, especially reading part. It is likely that the flunctuation from students when doing the test such as pressure, difficulty in reading texts on computers and response arbitrariness made them believe that the content of the test was generally 50% relevant to what they have learnt and their test scores does not reflect their true ability.
Regarding to all aspects, the MCQs test 1 has one good point. That is, it is valid in terms of content. Nevertheless, this point is not enough to conclude that it is a good test as it lacks reliability.
5.4. Pedagogical implications and suggestions on improvements of the existing final achievement computer-based MCQs test 1 for the non-English majors at HUBT.
In this section, some suggestions for test-designers are offered to improve the quality of the final achievement MCQs test 1.
A good achievement test must be valid and reliable. In order to make a more valid achievement test, test designers should stick to the course objectives of developing speaking and listening skill when designing achievement tests. According to Table 4-section 3.2.2. illustrating time allocation and skill weighing, speaking and listening skill are the main focuses of the course book Market Leader Pre-intermediate, these skills therefore should be tested with relevant skill weight proportion. Furthermore, functional language section in the MCQs test should be removed as it is far from the real – life situations. In fact, appropriate responses to various stimuli in everyday situations should be produced rather than chosen from these limited responses. Instead, functional language should be included in speaking tests. The scoring format for semester 4 should be as follows:
Semester 4 (12 credits)
The first score (6 credits)
The second score (6 credits )
Oral test
25%
Oral test
25%
Paper test 1
- listening
- writing
35%
25%
10%
Paper test 1
- listening
- writing
35%
25%
10%
Computer-based MCQs test 1
- reading
- vocabulary
- grammar
45%
20%
10%
15%
Computer-based MCQs test 1
- reading
- vocabulary
- grammar
45%
20%
10%
15%
Table 14: Suggested scoring format
It is expected that this suggested scoring format should ensure the principle of “test what is taught”.
In order to improve the test reliability, it is necessary to establish a testing committee of three to five people who will be responsible for test construction, administration and development instead of only one as it is at present. The testing committee should be made up of members with good knowledge, skills and experience of making MCQs tests. They are recommended to pay attention to the following three issues.
First, the testing committee members on one hand should design MCQs tests themselves and on the other hand require teachers to make their own tests. Teachers should be provided with test design and test development techniques involving vocabulary, grammar and reading by the testing members so that they can construct tests of good quality. This can be done through regularly-held workshops. The main reason for this is no one can understand the students’ strengths and weaknesses better than these teachers. Therefore the tests made by them can be sure to be reliable and practical with the students. Both committee members and teachers need to clarify students’ levels of language in order to maximize the test efficiency. This information would be helpful for them to avoid designing items with undesirable difficulty and discriminality value. In addition, the content of the test should relate to and familiar to what the students are taught and learnt during the course as much as possible. The test should also be systematically built up on the ground of a carefully constructed test specification.
Second, these test items should be carefully taken into consideration regarding the relevance to the course book content and then only acceptable tests item should be selected and piloted to students. The trial can be done at classrooms with strict supervision and it is preferable to enable students to do the test on computers in order to help them to get familiar with reading soft copy texts and to reduce their pressure.
Third, the results obtained from the trials should be carefully analyzed and discussed in terms of test difficulty, test discrimination, instructions, time allowance, distractors in order to decide which items are good enough and which items need adjusting to put into an item bank. The item bank therefore can guarantee the variety of test choices, test quality and test confidentiality.
Last but not least, the item bank needs to be updated, supplemented and adapted, especially after the achievement tests are given to students in each semester, with items of good quality for the consolidation and development of a standardised one.
* A proposed Specification Grid for the final achievement computer-based test 1 for the 4th semester non –English majors in HUBT.
Based on the findings of the study and the course objectives, a proposed test specification of the current 4th term English achievement MCQs test 1 is worked out as follows so that more accurate measures of students’ language competence can be achieved.
The objectives of the final achievement objective test 1 for the 4th term non – English include:
Checking what the students have learnt about vocabulary, grammar and reading and to what degree the objectives of the course have been achieved in the set timeframe.
Assessing students’ achievement at the end of the course, especially to evaluate students’ results after learning three units of Market Leader pre-Intermediate book.
Giving students’ feedback. The test results will be useful for students to see what they have achieved in their learning process.
Identifying room for improvement for both teaching content and teaching methodology. That is, teachers will refer to their students’ scores/ errors to adapt their teaching methods, the syllabus content and materials so as to make them more appropriate to their students’ needs and abilities.
The following is the grid of this tests’ specification.
Achievement test: Paper specifications grids
Time allowance: 150 minutes
Level: Pre – intermediate for non- English majors (Hanoi University of Business and Technology)
Test of Reading, Grammar and Vocabulary
Section
Main skill focus
Input
Response/ item type
Number of marks
Skill weighting
1. Reading
Reading for gist/ specific information including topics closely related to marketing, planning and managing
Narrative or factual text. Approx. 60-80 words each
X 41, 4 multi choice option
4.1
41%
2. Grammar
Recognizing grammar items involving wh-questions, future expression and reported speech
Narrative or factual text, approx.15-20 words each
X 32, 4 multi choice option
3.2
32%
3.Vocabulary
Recognizing vocabulary items including noun-noun , verb- noun and verb-preposition collocation
Narrative or factual text, approx. 15- 25 words each
X 27, 4 multi choice option
2.7
27%
Table 15: Proposed test specifications
5. 5. Summary
In this chapter, results and conclusions about three research questions of the study are drawn out and discussed. The findings show that the final achievement computer-based MCQs test 1 for the second year non-English majors at HUBT is to a certain extent not reliable. Thus some suggestions to make the test more reliable and high quality are given to test designers.
Chapter 6: Conclusion
6.1. Summary of the findings
Test reliability is undeniably an important criterion to define the quality of a test. The investigation and evaluation of the reliability of the final achievement computer-based MCQs test 1 are therefore useful to the judgment of the quality of teaching, learning and testing process at HUBT. Through data collected from students’ test score and item responses, the author find out the answers to three research questions about the compatibility of the test objective, test content and test format to the course objective and the syllabus content, the extent to which the test scores are reliable and the students’ attitude towards the test and then to come to a final conclusion about the reliability of the test.
The findings of indicate that the MCQs test 1 is not a good test as it first of all lacks compatibility between the test objective and the course objectives. The skill weight format of the test and of the syllabus are also incompatible. Four sections of the MCQs test 1 cover language items in the course book but the coverage relevance is still problematic. In addition, the MCQs test 1 fails to meet one of the most important criteria- reliability . The unreliability exists due to some problems. First, test items are of low quality as a result of low item difficulty and item discrimination value. Item analysis and students’ perception of the test discrimination indicate that the test does not have good discrimination value. Students’ perception and reliability coefficient of the MCQs test 1 both also show that the test score that students gets are unreliable. Second, several characteristics involving test items, testees and test administration such as a large number of test items, low content relevance to the course book, response arbitrariness, pressure and ability of reading text on the screen and computer breakdown as perceived by students reduce the reliability of the test scores. On the basis of these results, the author provides some suggestions towards the improvement of the test quality. The reliability of the final achievement MCQs test 1 for second year non-English majors may be increased if it is constructed more relevant to the course objectives and syllabus and if test items are designed and withdrawn from an item bank of good p-value and discrimination value by an efficient testing committee.
The author hopes that the study will give a detailed view of the Computer-based MCQs tests administrated at HUBT and the suggestions towards the test improvement will come into reality in order to properly assess students’ actual language ability during the process of learning Market Leader Pre-intermediate.
6.2. Limitations of the study
The study on the reliability of the final achievement Computer-based MCQs test does contain some unavoidable limitations. Firstly, this thesis investigated only a minor aspect among a lot of facets of test reliability due to the limit of time and the scope of a minor MA thesis. That is internal consistency reliability. Secondly, the test item analysis does not include distractor tally which can bring a much deeper view on the test due to the fact that access to these data was impossible. Finally, the author only developed a set of questionnaire to evaluate the test reliability from students’ perspective. If the attitude and perception of the teachers on the test had been studied, the results would have been more comprehensive.
6.3. Suggestions for further study
Considering the important of testing and the existence–to-a-certain-degree of the unreliability of the computer-based MCQs test, further research is needed to study its effects on language learning and assessing and identify coping strategies to help students promote their learning of four English skills while MCQs is still employed as a very useful testing technique.
References
1. Alderson, J.C., Clapham, C. and D. Wall. (1995). Language Test Construction and Evaluation. Cambridge University Press.
2. Bachman, L. F (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press
3. Bachman, L.F; Palmer, A.S (1996). Language testing in practice: designing and developing useful language tests. Oxford: Oxford University Press
4. Brown, H. D. (1995). Teaching by principles. An Interactive Approach to language Pedagogy. London: Longman
5. Cotton, D. , David , F. and K. Simon . (2002). Market Leader- Pre-intermediate. Longman.
6. Harrison, A. (1983a). A Language Testing Handbook. London: McMillan Press
7. Henton, J.B. (1988). Writing English Language Test. Longman Group U. K.
8. Henton, J.B. (1990). Classroom testing. New York: Longman
9. Henning, G. (1987). A guide to Language Testing: Development, Evaluation, Research. Cambridge: Newbury House Publishers
10. Hien, T.T. (2005). The pros and cons of the multiple-choice testing technique with reference to methodological innovation as perceived by secondary English language teachers and students. Unpublished M.A Thesis, VNU.
11. Hughes, A. (1989). Testing for language teachers. Cambridge: Cambridge University Press.
12. Kunnan, A.J; Milanovic, M. (2000). Fairness and validation in language assessment : selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida . Cambridge : Cambridge University Press,
13. Lynch, B.K (2003). Language assessment and programme evaluation. Edinburgh: Edinburgh University Press.
14. McCOUBRIE, P. (2004). Improving the fairness of Multi choice questions : a literature review. Medical Teacher, Vol 26, No. 8, 2004, pp709-712.
15. Mc Namara, T. (2000). Language Testing. Oxford: Oxford University Press
16. Milanovic, M. (1999). Issues in computer-adaptive testing of reading proficiency. Cambridge : Cambridge University Press
17. Milanovic, M; Saville, N. (1996). Performance testing, cognition and assessment : selected papers from the 15th language testing research colloquium (LTRC), Cambridge and Arnhem. Cambridge : Cambridge University Press
18. Spolsky, B. (1995). Measured words. Oxford: Oxford University Press
19. Trang, H.V. (2005). Evaluating the reliability of the achievement writing test for the first-year students in the English Department, College of Foreign language, Vietnam National University, Hanoi and some suggestions for changes. Unpublished M.A Thesis, VNU5. 6. 20, 20. Weir, C. J. (1990). Communicative Language testing. Prentice hall International (UK) Ltd.
21. Weir, C.J. (2005). Language testing and validation: an evidence- based approach. Basingstoke : Palgrave Macmillan.
APPENDICES
APPENDIX 1
Grammar, Reading, Vocabulary and Functional language check list
Part
Unit
Items
Task
Page no
Grammar
7
Questions
Correcting/ Making questions/ Completing and Matching questions
P66; 29-30
8
Talking about future plan
Matching/ completing sentences/Making sentences/
P74; 32-33
9
Reported speech
Completing sentences/ Transforming direct into indirect speech/ Building sentences
P82; 37
Reading
7
Selling dreams – Beyond advertising- Fun marketing
Answering question/ Ordering headings/ Matching/ True-False/ Classifying
P65; 126-127
8
Planning for tourism – Time management- Town planning
Matching/ Answering questions/ Numbering summary/ Completing sentence/ Answering Qs
P72-73; 128-129
9
Managing across cultures- The company song- Coaching new employees
Answering the questions/ Matching / True-False/ Choosing correct alternatives
P81; 130-131
Vocabulary
7
Word partnerships
Filling missing vowels/ Matching/ Doing puzzle/ Completing a text
P. 63; 28,29
8
Ways to plan
Matching/ Combining word/ Completing a text
P71; 32
9
Verbs and preposition
Matching/ Completing table/ Completing sentence/ Making sentence/ Correcting
P30; 36
Functional language
7
Telephoning: Exchanging information (checking information, asking for information, finishing a conversation)
Listen and tick, answer questions, complete chart/ Role play
P 67
8
Meeting: Interrupting and clarifying
Listen and order, identify language function/ Role play
P 75
9
Socializing and entertaining (making excuses, asking and giving opinion, saying goodbye, thanking hosts)
Answer questions/ Listen and answer Qs, complete extract, order/ Role play
P 83
APPENDIX 2
Câu hỏi điều tra
Các em sinh viên K11 thân mến, Khoa Tiếng Anh mong nhận đợc ý kiến của các em về bài thi trắc nghiệm cuối kỳ 4 (A6). Các câu trả lời thận trọng, chính xác và đầy đủ của các em cho những câu hỏi dới đây sẽ giúp ích rất nhiều cho việc nâng cao chất lợng bài thi cho sinh viên năm thứ hai . Xin chân thành cảm ơn sự cộng tác của các em!
Các em hãy đánh dấu vào những câu trả lời mà các em lựa chọn, và đa thêm ý kiến nếu cần thiết.
1. Các em hãy nhận xét nội dung của bài thi trắc nghiệm trên máy. Nội dung của bài thi có phù hợp với các kiến thức mà các em đợc học trên lớp không?
* Phần từ vựng
a. Phù hợp b. Không phù hợp
…………………………………………………………………………………………
* Phần ngữ pháp
a. Phù hợp b. Không phù hợp
…………………………………………………………………………………………
* Phần đọc hiểu
a. Phù hợp b. Không phù hợp
…………………………………………………………………………………………
* Phần tình huống
a. Phù hợp b. Không phù hợp
…………………………………………………………………………………………
2. Các em có nhận xét gì về số lợng 150 câu hỏi trong bài thi trắc nghiệm cuối kỳ?
a. Quá nhiều b.Vừaphải …………………………………………………………………………………………
3. Các em có nhận xét gì về mức độ phân loại trình độ học sinh của bài thi trắc nghiệm?
a. Cao b. Thấp
…………………………………………………………………………………………
4. Các em hãy nhận xét về thời gian làm bài thi trắc nghiệm trên máy. Thời gian làm bài:
a. Thừa b. Đủ c. Thiếu
………………………………………………………………………………………
Các em có nhận xét gì về các chỉ dẫn dẫn làm bài trong bài thi trắc nghiệm trên máy ?
a. Rõ ràng b. Không rõ ràng
…………………………………………………………………………………………
6. Các em đánh giá thái độ của cán bộ coi thi nh thế nào?
a. Nghiêm túc b. Thiếu nghiêm túc
…………………………………………………………………………………………
7. Theo các em phòng thi có ảnh hởng đến kết quả làm bài của các em không ?
a. Có ảnh hởng b. Không ảnh hởng
…………………………………………………………………………………………
8. Trong quá trình làm bài máy tính của em có bị trục trặc không? Nếu có và các em phải làm lại từ đầu thì việc này có gây ảnh hởng tiêu cực đến kết quả bài thi của các em?
a. Có ảnh hởng b. Không ảnh hởng
…………………………………………………………………………………………
9. Các em có chịu áp lực về tâm sinh lý khi các em làm bài không?
a. Có b. Không
………………………………………………………………………………………
10. Trong khi làm bài thi các em có thờng tuỳ tiện chọn đáp án không?
a. Có b. Không
…………………………………………………………………………………………
11. Các em nhận xét gì về kiểu bài thi trắc nghiệm trên máy ?
a. Quen thuộc b. Không quen thuộc
…………………………………………………………………………………………
12. Các em nhận thấy kỹ năng sử dụng máy tính khi làm bài thi trắc nghiệm của mình nh thế nào?
a. Tốt b. Không tốt
…………………………………………………………………………………………
13. Các em nhận thấy việc đọc và làm bài thi trắc nghiệm trên máy so với việc đọc và làm bài thi trắc nghiệm trên giấy cho kết quả nh thế nào?
a. Khác nhau b. Không khác nhau
…………………………………………………………………………………………
Theo các em điểm bài thi trắc nghiệm phản ánh sự tiến bộ trong quá trình học tập của các em ở trên lớp nh thế nào ?
a. Chính xác b. Không chính xác
……………………………………………………………………………………….…
APPENDIX 3
STUDENTS’ TEST SCORES
STUDENTS' TEST SCORES
EXAMINEE
x
x
(x-x)2
EXAMINEE
x
x
(x-x)2
Score
Mean
Score
Mean
1
Lê Minh Đức
6.13
6.59
0.21
50
Chu Thị Phơng
7.6
6.59
1.01
2
Tạ Tuấn Anh
5.73
6.59
0.75
51
Đào Duy Phong
5.33
6.59
1.6
3
Vũ Trần Chính
7.4
6.59
0.65
52
Ngô Văn Quân
5
6.59
2.54
4
Ngô TháI Dũng
7.07
6.59
0.23
53
Ngô Thị Thìn
5.67
6.59
0.85
5
Nguyễn Thị Hồng Hạnh
5.07
6.59
2.32
54
Trần Thị Thảo
5.4
6.59
1.42
6
Lơng Hồng Hạnh
4.27
6.59
5.4
55
Lại Văn Thờng
6.67
6.59
0.01
7
Phạm Văn Kỳ
4.93
6.59
2.77
56
Nông Phơng Thuỳ
6.87
6.59
0.08
8
Đỗ Ngọc Luyện
6.27
6.59
0.1
57
Phạm Ngọc Tú
6.47
6.59
0.02
9
Đặng Xuân Nam
6.47
6.59
0.02
58
Nguyễn Thị Trang
8.73
6.59
4.57
10
Hoàng Quốc Thái
6.2
6.59
0.15
59
Nguyễn Thị Thu Trang
4.13
6.59
6.07
11
Trần Trung Thành
5.67
6.59
0.85
60
Lê Thị Yến
6.07
6.59
0.27
12
Phan Chiễn Thắng
6.07
6.59
0.27
61
Phạm Thị Diệp
6.47
6.59
0.02
13
Lê Bá Thực
5.67
6.59
0.85
62
Tạ Thị Doan
6.33
6.59
0.07
14
Nguyễn Quốc Toàn
6.13
6.59
0.21
63
Đặng Văn Dũng
5.47
6.59
1.26
15
Ngô Mạnh Tuấn
5.27
6.59
1.75
64
Lê Thuỳ Dung
6.47
6.59
0.02
16
Nguyễn Anh Tuấn
6.13
6.59
0.21
65
Phạm Thị Duyên
7.2
6.59
0.37
17
Bùi Trí Tuệ
5.13
6.59
2.14
66
Đoàn Ngọc Hải
5.33
6.59
1.6
18
Nguyễn Kim Anh
6.67
6.59
0.01
67
Nguyễn Thi Hạnh
7.33
6.59
0.54
19
Bùi Tuấn Anh
5.93
6.59
0.44
68
Nguyễn Văn Hiếu
3.4
6.59
10.2
20
Mai Trung Hiếu
5.47
6.59
1.26
69
Đinh Văn Hoàng
7.53
6.59
0.88
21
Nguyễn Xuân Linh
6.13
6.59
0.21
70
Nguyễn Thị Kim Liên
8.87
6.59
5.18
22
Đặng Ngọc Long
7.6
6.59
1.01
71
Thân Thị Ngọc Mai
5.87
6.59
0.52
23
Đỗ Tiến Mạnh
6.13
6.59
0.21
72
Trần Hoài Nam
4.93
6.59
2.77
24
Hoàng Quốc Minh
7.67
6.59
1.16
73
Bùi Thị Ngát
6.87
6.59
0.08
25
Phạm Văn Phúc
6.53
6.59
0
74
Nguyễn Quỳnh Nga
7.6
6.59
1.01
26
Viên Lê Quang
8.4
6.59
3.26
75
Bùi Thuý Nga
7.33
6.59
0.54
27
Vũ Văn Thái
6.47
6.59
0.02
76
Đỗ Thị Bích Ngọc
6.07
6.59
0.27
28
Trần Văn Thuận
5.8
6.59
0.63
77
Nguyễn Jen Ny
6.33
6.59
0.07
29
Lê Mạnh Tú
7.33
6.59
0.54
78
Nguyễn Xuân Quỳnh
6.07
6.59
0.27
30
Đồng Sỹ Toản
5.87
6.59
0.52
79
Vũ Thị Tâm
6.87
6.59
0.08
31
Phạm Văn Tuấn
6.13
6.59
0.21
80
Trần Thị Thơng
8
6.59
1.98
32
Nguyễn Thị Tú Uyên
6.53
6.59
0
81
Nguyễn Kim Thu
5
6.59
2.54
33
Ngô Bá Văn
5.47
6.59
1.26
82
Lê Minh Thuỷ
5.73
6.59
0.75
34
Bùi Hải Vinh
4.8
6.59
3.22
83
Nguyễn Phơng Thuý
7
6.59
0.17
35
Nguyễn Trọng Vinh
8
6.59
1.98
84
Nguyễn Thị Tiệp
7.93
6.59
1.79
36
Nguyễn Văn Việt
5.87
6.59
0.52
85
Nguyễn Thị Thu Trang
8.13
6.59
2.36
37
Đặng Thị Quỳnh Anh
7.27
6.59
0.46
86
Đậu Thị Huyền Trang
4.27
6.59
5.4
38
Lê Quang Bình
4.47
6.59
4.51
87
Phan Thị Quỳnh Trang
6.67
6.59
0.01
39
Chu Văn Chuyển
3.93
6.59
7.09
88
Nguyễn Thị Hải Yến
6.87
6.59
0.08
40
Nguyễn Ngọc Diệp
7.53
6.59
0.88
89
Nguyễn Thị Đam
7.47
6.59
0.77
41
Đinh Thu Hằng
6
6.59
0.35
90
Đào Quỳnh Anh
8.67
6.59
4.31
42
Nguyễn Ngân Hà
6.73
6.59
0.02
91
Nguyễn Thị Dung
5.67
6.59
0.85
43
Cao Văn Hải
6.13
6.59
0.21
92
Hoàng Thị Thu Giang
7.6
6.59
1.01
44
Doãn Thị Hạnh
5.4
6.59
1.42
93
Bùi Trờng Giang
6.33
6.59
0.07
45
Bùi Thị Hiền
5.47
6.59
1.26
94
Phan Ngọc Hơng
8
6.59
1.98
46
Phan Thị Mỹ Hơng
8.53
6.59
3.75
95
Bùi Thị HảI Hà
5.93
6.59
0.44
47
Đỗ Thị Linh
6.07
6.59
0.27
96
Tăng Thị Kim Hạnh
5.53
6.59
1.13
48
Vũ Thị Mai
6.6
6.59
0
97
Trần Thị Hoài
7.4
6.59
0.65
49
Lý Thị Phơng Ngân
5.8
6.59
0.63
98
Nguyễn Phơng Hoa
6.87
6.59
0.08
STUDENTS' TEST SCORES
EXAMINEE
x
x
(x-x)2
EXAMINEE
x
x
(x-x)2
Score
Mean
Score
Mean
99
Nguyễn Thị Thanh Huyền
7.07
6.59
0.23
147
Từ Thị Hà Vân
6.87
6.59
0.08
100
Lu Thuỳ Linh
6.67
6.59
0.01
148
Dơng Thị HảI Yến
7.27
6.59
0.46
101
Phạm Thanh Long
6.53
6.59
0
149
Bùi Minh Đức
5.33
6.59
1.6
102
Nguyễn Thị Quỳnh Mai
9.33
6.59
7.49
150
Nguyễn Quỳnh Anh
7.53
6.59
0.88
103
Nguyễn Thị Nga
6.93
6.59
0.11
151
Lê Thị Quỳnh Anh
8.27
6.59
2.81
104
Vũ Thị Ngọc
6.67
6.59
0.01
152
Trơng Thuỳ Chi
7.6
6.59
1.01
105
Nguyễn Thảo Nguyên
6.93
6.59
0.11
153
Hà Kim Dung
8.27
6.59
2.81
106
Nguyễn Thị Minh Nguyệt
8.73
6.59
4.57
154
Phí Thị Hằng
6.33
6.59
0.07
107
Thái Ngọc Nhung
6.07
6.59
0.27
155
Đào Minh Hà
8.47
6.59
3.52
108
Nguyễn Thị Lan Phơng
6.67
6.59
0.01
156
Phạm thị Thu Hà
8.07
6.59
2.18
109
Nguyễn Hồ Quanhg
6.67
6.59
0.01
157
Lê Thị Hồng
5.2
6.59
1.94
110
Đỗ Thị Nh Quỳnh
6.33
6.59
0.07
158
Vũ Thị Tuyết Lan
5.27
6.59
1.75
111
Nguyễn Thị Thảo
7.4
6.59
0.65
159
Nguyễn Thị Liên
7.53
6.59
0.88
112
Phạm Thị Phơng Thuỳ
7.07
6.59
0.23
160
Nguyễn Công Nhớ
2.93
6.59
13.4
113
Lê Thị Thu Thuỷ
7.93
6.59
1.79
161
Bùi Thị Phơng
7.27
6.59
0.46
114
Lê Thị Thuỷ
8.4
6.59
3.26
162
Bùi Mai Phơng
8.53
6.59
3.75
115
Phạm Thị Ngọc Trang
7.67
6.59
1.16
163
Nguyễn Thanh Phợng
7.6
6.59
1.01
116
Nguyễn Thị Thu Trang
6.13
6.59
0.21
164
Bùi Thị Thu Quỳnh
7.87
6.59
1.63
117
Lê Quỳnh Trang
8.67
6.59
4.31
165
Lu Thị Trang Thảo
0.13
6.59
41.8
118
Ngô Quốc Tuân
5.73
6.59
0.75
166
Nguyễn Thị Thắm
7.33
6.59
0.54
119
Vũ Thị Ngọc Anh
8.33
6.59
3.02
167
Trần Thị Thoa
8.2
6.59
2.58
120
Nguyễn Thành Công
7.07
6.59
0.23
168
Hoàng Thị Thu Thuỷ
7.33
6.59
0.54
121
Bùi Khắc Cờng
2.47
6.59
17
169
Nguyễn Thị Thu Trang
3.93
6.59
7.09
122
Phạm Văn Dũng
3.6
6.59
8.96
170
Hoàng Thị Huyền Trang
7.4
6.59
0.65
123
Đoàn Thị Kim Dung
6.53
6.59
0
171
Đặng Thị Huyền Trang
8.33
6.59
3.02
124
Nguyễn T Duy
6.13
6.59
0.21
172
Dơng Huyền Trang
5.47
6.59
1.26
125
Hà Lan Hơng
5.27
6.59
1.75
173
Phạm Thuý Vân
7.13
6.59
0.29
126
Nguyễn Thị Thu Hiền
6.53
6.59
0
174
Vũ Thanh Xuân
9
6.59
5.79
127
Hoàng Thi Hiền
5.47
6.59
1.26
175
Tạ Quốc Đạt
7
6.59
0.17
128
Đặng Thanh Huyền
5.4
6.59
1.42
176
Vũ Trọng Đam
7.13
6.59
0.29
129
Nguyễn Ngọc Linh
6.87
6.59
0.08
177
Trần Vũ Độ
8.07
6.59
2.18
130
Vũ Kiều Loan
7.73
6.59
1.29
178
Lã Mạnh Cờng
6.53
6.59
0
131
Phạm Thị Quỳnh Mai
8.67
6.59
4.31
179
Trịnh Văn Cờng
5.67
6.59
0.85
132
Vũ Thị Minh
6.4
6.59
0.04
180
Lại Văn Dũng
4.53
6.59
4.26
133
Nguyễn Thị Ngân
7.33
6.59
0.54
181
Trần Mỹ Hằng
5.73
6.59
0.75
134
Đỗ Thị Ngân
6.47
6.59
0.02
182
Nguyễn Anh Hào
8.07
6.59
2.18
135
Nguyễn Hoàng Nga
5.4
6.59
1.42
183
Mai Thanh Hải
6.93
6.59
0.11
136
Nguyễn Hồng Nhung
7.27
6.59
0.46
184
Trần Diệu Hồng
6.67
6.59
0.01
137
Phạm Thị Hồng Nhung
4.53
6.59
4.26
185
Trần Thị Bích Hậu
6.4
6.59
0.04
138
Nguyễn Trần Phơng
7.47
6.59
0.77
186
Vũ Thị Hoàng Lan
4.33
6.59
5.12
139
Bùi Mạnh Quân
7.47
6.59
0.77
187
Đặng Vũ Lập
6
6.59
0.35
140
Mạc Thị Ngọc Quỳnh
8.67
6.59
4.31
188
Trần Thành Long
4.07
6.59
6.37
141
Nguyễn Duy Sơn
4.07
6.59
6.37
189
Lu Thị Kiều Oanh
5.67
6.59
0.85
142
Nguyễn Quang Thiện
6.07
6.59
0.27
190
Nguyễn Văn Quảng
5.6
6.59
0.99
143
Phạm Thị Kim Thu
6.73
6.59
0.02
191
Nguyễn Nguyệt Quỳnh
7.6
6.59
1.01
144
Đào Thi Thuý
7.27
6.59
0.46
192
Bùi Việt Thái
6.47
6.59
0.02
145
Nguyễn Thị Thu Trang
4.47
6.59
4.51
193
Hoàng Phơng Thảo
7.13
6.59
0.29
146
TrơngThị Hải Vân
5.6
6.59
0.99
194
Đỗ Duy Thắng
6.67
6.59
0.01
STUDENTS' TEST SCORES
EXAMINEE
x
x
(x-x)2
EXAMINEE
x
x
(x-x)2
Score
Mean
Score
Mean
195
Ngô Văn Tháng
5.87
6.59
0.52
244
Cao Thu Nga
8.87
6.59
5.18
196
Lê Quang Thọ
7.2
6.59
0.37
245
Ngô Hằng Nga
6.4
6.59
0.04
197
Nguyễn Xuân Toàn
6.67
6.59
0.01
246
Nguyễn Thị Quỳnh
7.13
6.59
0.29
198
Cao Anh Trung
6.33
6.59
0.07
247
Nguyễn Thị Minh Tâm
7.8
6.59
1.46
199
Đỗ Ngọc Tuyền
7.07
6.59
0.23
248
Hoàng Thị Thảo
5.73
6.59
0.75
200
Đinh Đức Anh
8.67
6.59
4.31
249
Đỗ Đức Thiện
8.53
6.59
3.75
201
Trần Tuấn Anh
8
6.59
1.98
250
Đoàn Thu Thuỷ
5.2
6.59
1.94
202
Phạm Ngọc Báu
6.93
6.59
0.11
251
Hoàng Thanh Tùng
6.67
6.59
0.01
203
Vũ Thị Dung
7.8
6.59
1.46
252
Lê Quang Đạt
7.53
6.59
0.88
204
Trần Văn Hà
5.27
6.59
1.75
253
Nguyễn Duy Điền
6.4
6.59
0.04
205
Đào Thiị Thu Hà
6.73
6.59
0.02
254
Nguyễn Văn Chính
9.07
6.59
6.13
206
Phạm Trung Hiễu
7.8
6.59
1.46
255
Dơng Thu Hơng
8.93
6.59
5.46
207
Nguyễn Quang Huy
7.93
6.59
1.79
256
Vũ Thị Thuý Hà
7.87
6.59
1.63
208
Vũ Văn Huyên
7.27
6.59
0.46
257
Đỗ Thu Hà
7.93
6.59
1.79
209
Vũ Đức Lợng
6.93
6.59
0.11
258
Vũ thi Thu Hà
7.47
6.59
0.77
210
Nguyễn Văn Liễu
7.73
6.59
1.29
259
Phạm Thị Thu Hoài
7.2
6.59
0.37
211
Đinh Tiến Lực
4.8
6.59
3.22
260
Vũ Thị Hoạt
8.4
6.59
3.26
212
Nguyễn Đức Minh
6.47
6.59
0.02
261
Nguyễn Thị My Huyền
6.13
6.59
0.21
213
Tống Quang Nam
3.2
6.59
11.5
262
Nguyễn Minh Huyền
8.27
6.59
2.81
214
Nguyễn Bích Ngọc
6.4
6.59
0.04
263
Trần Mỹ Linh
6.8
6.59
0.04
215
Bùi Minh Ngọc
8.73
6.59
4.57
264
Bùi Huy Long
6.73
6.59
0.02
216
Nguyễn Thanh Tâm
7.33
6.59
0.54
265
Hoàng Long
8.67
6.59
4.31
217
Trần Mạnh Thắng
8.33
6.59
3.02
266
Ngô Ngọc Mai
8.67
6.59
4.31
218
Nguyễn Thị Thanh Thuỷ
8.2
6.59
2.58
267
Nguyễn Bích Phợng
7.53
6.59
0.88
219
Nguyễn Thị Thu Thuý
6.87
6.59
0.08
268
Nguyễn Hoàng Sơn
3.4
6.59
10.2
220
Phạm Văn D Tùng
6.2
6.59
0.15
269
Nguyễn Hoàng Sơn
5.33
6.59
1.6
221
Phạm Thị Thu Trang
6.33
6.59
0.07
270
Nguyễn Thị Tám
3
6.59
12.9
222
Trần Thị Thu Trang
6.33
6.59
0.07
271
Nguyễn Thế Tình
2.53
6.59
16.5
223
Đỗ Quốc Trinh
6.4
6.59
0.04
272
Nguyễn Mạnh Tởng
3.73
6.59
8.2
224
Trần Nguyệt ánh
8
6.59
1.98
273
Vuơng Thị Thu Trang
5.07
6.59
2.32
225
Nguyễn Hồng Ân
7.33
6.59
0.54
274
Bùi Quang Trung
4.8
6.59
3.22
226
Nguyễn Phơng Anh
8.53
6.59
3.75
275
Đỗ Đức Việt
4.67
6.59
3.7
227
Chu Việt Cờng
6.53
6.59
0
276
Trần Thu Anh
3.73
6.59
8.2
228
Nguyễn Thị Minh Châu
7.13
6.59
0.29
277
Hoàng Thọ Công
5.27
6.59
1.75
229
Nguyễn Phơng Dung
8.13
6.59
2.36
278
Cao Đức Cờng
3.53
6.59
9.38
230
Nguyễn Thuỳ Dung
6.8
6.59
0.04
279
Nguyễn Mạnh Cờng
4.27
6.59
5.4
231
Nguyễn Thị Thuý Hằng
6.87
6.59
0.08
280
Lơng Minh Châu
6.73
6.59
0.02
232
Nguyễn Thu Hơng
7.67
6.59
1.16
281
Vũ Tiến Dũng
2.87
6.59
13.9
233
Nguyễn Thị Hồng Hà
7
6.59
0.17
282
Phạm Ngọc Duy
4.33
6.59
5.12
234
Lê Thị Ngọc Hà
7.93
6.59
1.79
283
Vũ Thị Hơng Giang
6.6
6.59
0
235
Nguyễn Diệu Hà
7.2
6.59
0.37
284
Vũ Trờng Giang
6.13
6.59
0.21
236
Nguyễn Thanh Hải
9.4
6.59
7.88
285
Nguyễn Thị Hơng
8.73
6.59
4.57
237
Trần Lê Huy
3.07
6.59
12.4
286
Hoàng Thị Mai Hơng
2.67
6.59
15.4
238
Nguyễn Diệu Linh
8.8
6.59
4.87
287
Đinh Thị Thu Hà
8.6
6.59
4.03
239
Lê Thị Thanh Loan
8.07
6.59
2.18
288
Trịnh Thu Hồng
7.8
6.59
1.46
240
Nguyễn Hải Long
4.87
6.59
2.97
289
Đỗ Mạnh Hùng
7.4
6.59
0.65
241
Nguyễn Thị Hà Ly
6.73
6.59
0.02
290
Bùi Sĩ Hiếu
7.47
6.59
0.77
242
Nguyễn Thế Mẫn
4.2
6.59
5.73
291
Bùi Huy Hoàng
4.53
6.59
4.26
243
Phạm thị Thuý Ngà
7.13
6.59
0.29
292
Lê Thị Hoa
4.8
6.59
3.22
STUDENTS' TEST SCORES
EXAMINEE
x
x
(x-x)2
EXAMINEE
x
x
(x-x)2
Score
Mean
Score
Mean
293
Triệu Khánh Hoà
8.87
6.59
5.18
342
Đỗ Huơng Thuỷ
8.13
6.59
2.36
294
Vũ Đình Khoa
5.87
6.59
0.52
343
Nguyễn Anh Tú
4.8
6.59
3.22
295
Phạm Chí Lợng
7
6.59
0.17
344
Ngô Văn Toàn
9.27
6.59
7.16
296
Nguyễn Thị Thuỳ Linh
6
6.59
0.35
345
Trần Huơng Trang
7.2
6.59
0.37
297
Trịnh Thị Minh Loan
6.47
6.59
0.02
346
Phó Đức Trung
4.4
6.59
4.81
298
Vũ Đức Long
8.07
6.59
2.18
347
Phan Thị Vân
8.2
6.59
2.58
299
Nguyễn Tuyết Mai
8.27
6.59
2.81
348
Phạm Thanh Vân
6.93
6.59
0.11
300
Hoàng Thị Kim Oanh
8.13
6.59
2.36
349
Hoàng Việt
9.47
6.59
8.27
301
Nguyễn Phơng Thảo
7.13
6.59
0.29
302
Giang Thị Thảo
8.53
6.59
3.75
∑x
2301
303
Nguyễn Văn Thao
6.33
6.59
0.07
∑(x-x)2
740
304
Đinh Phơng Ngọc Anh
3.47
6.59
9.76
Mean
6.59
305
Nguyễn thị Vân Anh
6.67
6.59
0.01
306
Phạm Văn Bình
4.4
6.59
4.81
307
Trần Thị thuỳ Dơng
5.8
6.59
0.63
308
Lại Việt Dũng
7.67
6.59
1.16
309
Phùng Thị Ngọc Dung
7.4
6.59
0.65
310
Nguyễn Hơng Giang
6.93
6.59
0.11
311
Ngô Đức Hải
7.87
6.59
1.63
312
Trần Huy Hng
6.67
6.59
0.01
313
Khuất Thị Thu Hoài
6.07
6.59
0.27
314
Nguyễn Hoài Linh
6.47
6.59
0.02
315
Lê Thị Diệu Linh
8.2
6.59
2.58
316
Phạm Thị Loan
5.13
6.59
2.14
317
Trần Thị Ngọc Mai
6.47
6.59
0.02
318
Nguyễn Thành Nh
7.53
6.59
0.88
319
Nguyễn Thị Thảo
6.67
6.59
0.01
320
Lê Minh Thanh
5.6
6.59
0.99
321
Nguyễn Thu Trang
7.27
6.59
0.46
322
Nguyễn Viết Tuấn
8.53
6.59
3.75
323
Vũ Viết Tuấn
4.2
6.59
5.73
324
Nguyễn Anh Văn
8.87
6.59
5.18
325
Nguyễn Trần Việt
7.73
6.59
1.29
326
Ngô Thị Vân Anh
7.6
6.59
1.01
327
Phùng Xuân Chinh
6.33
6.59
0.07
328
Nguyễn Thế Dũng
6.6
6.59
0
329
Nguyễn Tuấn Dũng
3.13
6.59
12
330
Nhâm Thị Giang
8.27
6.59
2.81
331
Trần Văn Hải
7.13
6.59
0.29
332
Nguyễn Thị Hiên
7
6.59
0.17
333
Ngô Duy Huynh
7
6.59
0.17
334
Nguyễn Thanh Huyền
8.07
6.59
2.18
335
Nguyễn Thanh Huyền
6.8
6.59
0.04
336
Nguyễn Ngọc Linh
7.6
6.59
1.01
337
Bùi Ngọc Long
5.47
6.59
1.26
338
Cao Thị Nhi
8
6.59
1.98
339
Tô Minh Pha
9.13
6.59
6.43
340
Nguyễn Đức Phong
6.67
6.59
0.01
341
Trần văn Sang
5.2
6.59
1.94
APPENDIX 4
ITEM ANALYSIS OF THE FINAL ACHIEVEMENT COMPUTER-BASED MCQS
Các file đính kèm theo tài liệu này:
- Luan van Nguyen Thi Viet Ha K14A.doc