Use & Interpretation of SFQ data

The use of student ratings for summative evaluation of teaching is a common practice in local and overseas universities (Kwan, 1999). The situation at HKUST is no exception. Academic departments and the Language Center regularly use SFQ results to help them evaluate the teaching performance of their faculty and instructors.

A paper entitled "Guidelines for the use of the Summative Student Evaluations of Teaching" was passed by CTLQ outlining 5 principles on use of student feedback for teaching evaluation at HKUST. Below are the 5 principles quoted from the said document.

  1. Student feedback and other forms of feedback

    Students have a unique role in the evaluation of teaching performance, only students can report on the effectiveness of their own teachers in enabling them to achieve the required learning outcomes of the course. Evidence from students must always be as a core feature of any evaluation of teaching.

    However, there are key aspects of teaching performance that cannot be evaluated only by students, in particular: the quality of course content and the relation of course content to program requirements; the academic standard of the material and related assessment; contributions to the collective effort to improve program quality; and professional activities related to teaching and learning.

    Student evaluation of teaching should always be complemented by other sources of evidence of teaching performance.

    Back to top

  2. Questionnaire-based scores in evaluation of performance

    Well designed and well implemented questionnaires grounded in research about the characteristics of good teaching have been shown to provide reliable and valid evidence about the effectiveness of teaching from the perspective of students.

    However, research has also shown that student evaluation through questionnaires is liable to bias and to manipulation by teachers gearing their teaching to gaining high scores on evaluations.

    Evidence from student evaluation of teaching can be used with confidence to identify very good teaching and poor teaching performance from the perspective of students. Fine judgments based on small variances in scores cannot be made with confidence.

    Back to top

  3. Limits of quantitative feedback

    Student evaluation questionnaires provide a simple, comprehensive form of feedback on teaching, allowing for comparisons across teachers. But students' experience of learning is individual, and the range of approaches to teaching is wide. Simple instruments based on standardized, quantifiable criteria cannot be effective in all circumstances.

    Student evaluation of teaching through questionnaires should be complemented by other sources of feedback from students that take into account the range of teaching and learning environments and that provide qualitative feedback with richness and depth.

    Back to top

  4. Summative and formative feedback

    End-of session evaluations reported after grades are assigned are the accepted model for "summative" evaluation by students. However, this model does not allow for mid-session formative feedback.

    Student evaluation of teaching through end-of-session questionnaires should be complemented by mid-session formative feedback on teaching performance.

    Back to top

  5. Evaluation of courses

    Students' evaluation of teaching performance and students' response to their experience of the course or their program cannot be easily disentangled. Where a course or program requires review or decisions are to be made to adjust the design of a course or program, other feedback tools are called for.

    Student feedback to improve courses and programs should be undertaken through a process that is fit-for purpose.

    Back to top

In addition to the above principles, some suggestions on the presentation and interpretation of SFQ results are outlined below. These suggestions are based on findings from research studies on use of student ratings (Abrami, 2001; Franklin, 2001; Neumann, 2000).

  1. Spread of scores within a section

    Besides paying attention the value of the mean score for each question in the SFQ report, attention should also be paid to the standard deviation (SD) and the distribution of students' responses. The higher the SD for a particular question, the more diverse are students' views about that question. Hence, the mean would not represent the view of the average student in that section class. According to Theal and Franklin ((Theall & Franklin, 1991), an SD of 1.2 is high for a 5-point scale1. This can also be easily checked against the distribution chart of responses for that question, which is included in the SFQ instructor report. Such disparate views about the course may be related to the diversity in students' background.

    Back to top

  2. Accuracy in SFQ results

    The mean scores that appear in the SFQ reports are not error free. This is true for all kinds of educational and psychological measurement and SFQ is no exception. There is always a margin of error in them. Hence, a small difference (say, less than 5 out of 100) between scores in SFQ survey is often not significant and probably a result of random error created in the measurement process. It is possible to estimate the error if the standard deviation of the score distribution and the number of students evaluating a section are known.

    Back to top

  3. Combining SFQ results from multiple sections

    SFQ results for a single section can sometimes be influenced by factors beyond the control of the instructors. For a more accurate assessment of an instructor's teaching performance, it is suggested that the average of the SFQ scores from multiple sections/semester taught by the same instructor should be used instead. Research studies (Gillmore, Kane, & Naccarato, 1978; Smith, 1979) and statistical analysis (see appendix for details) conducted at CELT lend support to this practice. This is especially important if the number of students providing feedback for the section is small, say 10 or less.

    Back to top

  4. Comparison of SFQ scores

    Decades of research in student ratings repeatedly shows that student ratings are affected by (i) discipline; (ii) class size; and (iii) level of courses (Neumann, 2000). Hence, comparing ratings from vastly different disciplines can be problematic. Research also showed that the relationship between class size and ratings is not linear (Lewis, 1991; Theall & Franklin, 1991). Sections with enrolment between 35 - 100 would on average receive lower ratings than others. As for course level, PG courses are generally rated higher than UG courses. Hence if comparison is to be made between faculty and instructors' performance, then the above three aspects of the courses should be taken into account. CELT has prepared an "SFQ University Summary Report - Breakdown by Level, Department and Class size" , which should provide a more meaningful basis for comparison of SFQ scores.

    Back to top

  5. Trends and clusters

    It is sometimes insightful to compare the student ratings of the same course taught in different years. Sometimes by examining all questions with higher (or lower) ratings, a pattern can be found which can provide insight into the strengths and weaknesses of one's teaching.

    Back to top

  6. Interpreting students' comments

    Interpretation of student comments - In average situation (i.e. not excellent or poor teaching) students who are either very positive or very negative about the course are more likely to answer the questions, hence their views should not be taken as representative of the class, nor should they be ignored. Comments that are back up by examples or contain details about the relevant learning experience would be more useful.

    Back to top

  7. Response rate

    Response rate is also a factor to be considered in interpreting SFQ results. A higher response rate is needed for smaller classes for the results to be considered reliable. At HKUST, for sections with enrolment less than 10, 100% response rate is required. For sections with enrolment above 100, 30% response rate is already acceptable.

    Back to top


1This would be equivalent to an SD of 24 for SFQ.