Purpose the Project The scoring of constructed-response

Purpose of the Research and Its Relevance to
Paragon Tests

 

The
Canadian Academic English Language (CAEL) Assessment, administered by Paragon
Testing Enterprises, is a standardized test designed to determine the level of
English language proficiency of non-native English-speaking students (NNESSs),
who would like to study in an English-speaking college or university, or to
apply for membership in various professional associations (Paragon, 2017). The
CAEL Assessment tests NNESSs’ listening, reading, and writing skills on topics
selected from first-year university courses in the arts, sociology,
anthropology, business, engineering, sports, law, and medicine. Possible CAEL
topics include criminal behaviour, global warming, urban development, cultural
diversity, weather systems, team management, competition, and organizational
behaviour. CAEL is an integrated-skills academic English test and the reading,
listening, and writing components are on the same topic. Further, most testing
items are subjective questions, e.g., constructed-response questions for
reading and listening and essay question for writing (Paragon, 2017). CAEL is
alternative to the American TOEFL test and the British IELTS Academic Test, but
currently fairly new and not as popular as TOEFL and IELTS in terms of the
number of NNESSs taking the test and English-speaking institutions accepting it
as NNESSs’ English proficiency requirement for studying at these institutions.
Therefore, it is important to examine CAEL’s scoring reliability of the
constructed-response and essay questions and predict its popularity in the near
future.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

 

Specifically,
using generalizability (G-) theory
and geographic information system (GIS)
as methodological frameworks as well as the assessment data previously
collected by Paragon and to be collected from Paragon, the purpose of this
proposed one-year study is to examine the scoring variability and reliability
of the constructed-response and essay questions in the reading, listening, and
writing components of the CAEL Assessment and predict CAEL’s future popularity,
respectively.

 

 

Rationale and Significance of the Project

 

The
scoring of constructed-response and essay questions has long been considered a
problematic area for educational assessment professionals (Bacha, 2001;
Barkaoui, 2011; Çetin, 2011; Huang, 2011, 2012; Popham, 2003; Whipple, 2016).
Due to NNESSs’ different linguistic and cultural backgrounds, the scoring of
their English writing then becomes even more problematic (Elorbany & Huang,
2012; Huang, 2008, 2012; Huang & Foote, 2010; Sakyi, 2000). On the one
hand, many factors affect NNESSs’ writing, including their English proficiency,
mother tongue, home culture, and style of written communication (Hinkel, 2003;
Huang, 2012; Yang, 2001). In assessing their English writing, the raters may
differentially consider these factors. On the other hand, empirical studies
have found differences in rater behavior for NNESSs’ English writing assessment
(Bachman, 2000; Huang, 2008, 2012). A number of studies indicate that many
factors affect the assessment of NNESSs’ writing, for example, scoring method,
examinee first language, and rater background, previous experience, and amount
of prior training (Badjadi, 2013; Barkaoui, 2011; Elorbany & Huang, 2012;
Huang & Han, 2013; Weigle, Boldt, & Valsecchi, 2003). All these lead to
questions about the accuracy, construct validity, and ultimately the fairness
of assessing the written work produced by the NNESSs.

 

 

 

1

 

Fairness
is a priority in the field of educational assessment. For any given large-scale
test, the evaluation of fairness is suggested as a standard procedure by
American Educational Research Association (AERA), American Psychological
Association (APA), and National Council on Measurement in Education (NCME)
(1999). Test fairness has also become an important political and legal issue
(Garcia & Pearson, 1994). According to standards and guidelines for fair
test practices (AERA, APA, NCME, 1999; Joint Advisory Committee, 1993),
educational organizations, institutions, and individual professionals should
make assessments as fair as possible for test takers of different races,
gender, and ethnic backgrounds (AERA, APA, & NCME, 1999; Joint Advisory
Committee, 1993).

 

In the
past two decades, fairness, reliability, validity issues in NNESSs’ writing
assessments have been of growing interest because of a significant growth in
the number of NNESSs being educated in English-speaking colleges and
universities (Kunnan, 2000; Huang, 2011, 2012). Since CAEL is fairly new and
serves as an alternative to TOEFL and IELTS Academic Test; further, most CAEL
testing items are subjective questions, it is extremely important to examine
the scoring variability and reliability of its subjective items.

 

In
addition, the prediction of CAEL’s future popularity has important educational
and marketing implications. Different from TOEFL and IELTS, CAEL assesses
NNESSs’ authentic academic English skills and should be used by most
English-speaking colleges and universities for making admission decisions for
NNESSs. To collect the empirical evidence to make it clear to the public that
CAEL is a high-quality assessment and superior to TOEFL and IELTS is the first
step; the next is to make it more and more accessible and acceptable in the
global market. Therefore, this proposed study becomes important and significant
for both Paragon and its consumers including NNESSs and English-speaking
colleges and universities worldwide.

 

 

Research Questions

 

The
following seven research questions will guide the study: a) What are the
sources of score variation contributing relatively more to the variability of
scores assigned by raters across reading and listening constructed-response
questions? b) What is the reliability of the scores assigned by raters across
reading and listening constructed-response questions? c) What is the impact of the
NNESS’ first language (i.e., Asian languages, European languages, etc.),
scoring method (i.e., holistic versus analytic), and rater characteristic
(i.e., gender, and writing assessment experience) on the assessment variability
of the essay question in the writing component? d) What is the impact of the
NNESS’ first language, scoring method, and rater characteristic on the
assessment reliability of the essay question in the writing component? e) What
is the country distribution of NNESSs taking the CAEL Assessment and colleges
and universities accepting CAEL in the past three years (2015, 2016, and 2017)?
f) What are the changes of the number of NNESSs taking the CAEL Assessment and
colleges and universities accepting CAEL in the past three years? And g) what
will be the predicted number of NNESSs taking the CAEL Assessment and colleges
and universities accepting CAEL in the future five to ten years?

 

 

 

 

 

 

 

 

 

2

Methodology

 

In order to answer the first four research questions, G-theory
(Cronbach, Gleser, Nanda, & Rajaratnam, 1972) will be used as a methodology
for data analysis. G-theory, which is increasingly being used by assessment
professionals in a variety of assessment contexts, especially those involving
performance assessments (Linn & Burton, 1994; Shavelson, Baxter, & Gao,
1993), is a more powerful approach than classical test theory (CTT) for the
detection of rater variation. In CTT, an examinee’s observed score consists of
a “true score” and an “error,” and the ratio of true score variance to observed
score variance is defined as reliability (Crocker & Algina, 1986). However,
the reliability estimates based on CTT accounts for only a single source of
variance within a given analysis (Brown, 1991). Generalizability theory extends
the framework of CTT in order to take into account the multiple sources of
variability that can have an effect on test scores. It is also a statistical
method that can identify the sources of variance and error and estimate the
impact of these variance components on scoring accuracy, allowing the
investigator to consider numerous applications of an instrument (Shavelson
& Webb, 1991). Therefore, it provides a comprehensive conceptual framework
and methodology for analyzing more than one measurement facet (factor)
simultaneously in investigations of assessment error and score dependability
(Brennan, 2001; Gao, & Brennan, 2001).

 

In order to answer the last three research questions, GIS (Tomlinson,
1968) will be used as a methodology for data analysis. GIS was an appropriate
methodological consideration because these three research questions focus on
the country distribution, the changes of the number of, and the predicted
number of NNESSs taking the CAEL Assessment and colleges and universities
accepting CAEL across multiple years. GIS is a mapping system for use in
planning. One of the first uses of the term GIS was by Tomlinson (1968). It
integrates hardware, software, and data for capturing, managing, analyzing, and
displaying all forms of geographically referenced information (Tomlinson, 1968).
Researchers have used GIS methods for various study topics such as planning
(Berry, Higgs, Fry, & Langford, 2011; Sanchez, 1999), environmental
applications (Walker, Riegl, & Dodge, 2008), and geological applications
(Lee, 2005).

 

Data for this proposed study are either existing or to be collected by
the researcher(s) with the assistance of Paragon’s data department. Specifically,
to answer the first four research questions, the following G-theory analyses
will be performed: a) a (p : l) x r/r’ (person within first
language-by-rater/rating) G-study and multiple p x r/r’ (person-by-rater/rating) G-studies for the reading and
listening constructed-response questions will be performed for different first
languages; b) multiple p x r/r’
decision (D-) studies for the reading and listening constructed-response
questions will then be performed for different first languages; c) a (p : l) x m x (r : g) (person
within first language-by-scoring method-by-rater within gender) G-study and
multiple p x m x r (person-by-scoring method-by-rater) G-studies for the essay
question will be performed for
different first languages and different genders, respectively; d) a (p : l) x m x (r : e) (person
within first language-by-scoring method-by-rater within assessment experience)
G-study and multiple p x m x r
(person-by-scoring method-by-rater) G-studies for the essay question will be
performed for different first languages and different assessment experiences,
respectively; e) multiple p x m x r
D-studies for the essay question will be performed for different first
languages, different genders, and different assessment experiences,
respectively.

 

In order
to answer the last three research questions, the following GIS analyses will be
performed: a) Applying GIS to visually depict data in the past three years
(2015, 2016 and 2017); specifically, several maps will be created to visualize
the country distribution of NNESSs taking the CAEL Assessment and colleges and
universities accepting it; b) A GIS-based analysis and

 

 

3

 

mapping will be conducted to
visualize the changes of the number of NNESSs taking the CAEL Assessment and
colleges and universities accepting it in the past three years; c) To predict
the number of NNESSs taking the CAEL Assessment and colleges and universities
accepting it in the future five to ten years, a GIS-based prediction model,
Testing Expanding Model (TEM), will be constructed and applied by using
multiple regression analysis. Additional analysis will include policy
implication for creating strategies to increase the numbers of countries
accepting the CAEL Assessment; d) Furthermore, a Web-based interactive GIS map
application will be developed to visualize the historical data and the
prediction for future popularity of the CAEL Assessment across the world.

 

The
computer program GENOVA (Crick & Brennan, 1983) will be employed for
G-study analyses. GENOVA is a computer program used to estimate the variance
components, as well as the interaction effects and standard errors, in a G-theory balanced design. Further,
Microsoft Excel and ArcMap will be used for the GIS data analysis. Excel is a
powerful spreadsheet that is easy to use and allows the user to store,
manipulate, analyze, and visualize data. ArcMap is the main component of Esri’s
ArcGIS suite of geospatial processing programs, and is used primarily to view,
edit, create, and analyze geospatial data. ArcMap allows the user to explore
data within a data set, symbolize features accordingly, and create maps.

 

Go Top
x

Hi!
I'm Eleanor!

Would you like to get a custom essay? How about receiving a customized one?

Check it out