Scoring system

VU medisch Centrum


EMGO institute for health and care research

Scoring the methodological quality of a study on measurement properties

The COSMIN checklist was developed to rate the methodological quality of a study on one of more measurement properties. The COSMIN checklist is increasingly used in systematic reviews of measurement properties. It is highly desirable in systematic reviews to obtain an overall methodological quality score for each study on a given measurement property.  However, in the Delphi study in which the COSMIN checklist was developed, it was not discussed how an overall methodological quality score per measurement property (per box) can be obtained.

Recently, such a scoring system has been developed. This scoring system can be used in systematic reviews of measurement properties to obtain an overall quality rating per COSMIN box. It was decided to change the dichotomous response options (yes, no) of the COSMIN items into four response options (excellent, good, fair, poor) in order to increase the discriminative ability of the items. Four response options for each item of the COSMIN checklist were defined, representing excellent, good, fair, and poor methodological quality. Subsequently a methodological quality score per box is obtained by taking the lowest rating of any item in a box (‘worst score counts’).

This rating system was developed based on discussions in the Clinimetrics working group of the EMGO Institute for Health and Care Research as well as on the application of this rating system to rate the quality of all studies on measurement properties described in 46 articles on neck disability questionnaires.

COSMIN scoring system
We argued that meeting all COSMIN standards represents the ideal situation. Therefore, a study on a measurement property is rated as having ‘excellent’ quality if all relevant COSMIN items are scored adequate. In general, a study is rated as having ‘good’ quality if some things are not reported, but one can assume that these issues are adequate (e.g. if it can be assumed that patients were not changed in a test-retest study). A study is rated as having ‘fair’ quality if the value of the measurement property might have been underestimated (e.g. due to unstable patients or a long time interval in a test-retest design) or estimated in a moderate sample size or when there were other minor flaws in the design or statistical analyses. A study is rated as ‘poor’ if the results are not to be trusted because of major flaws in the design or statistical analyses (e.g. small sample size or inappropriate statistical methods). Specific criteria for ‘excellent’, ‘good’, ‘fair’, and ‘poor’ quality for each study on a given measurement property have been described. 

If you are performing a systematic review of measurement properties, we recommend using this scoring system. 

The Interpretability box and the Generalizability box are mainly used as data extraction forms. We recommend to use the Interpretability box to extract all information on the interpretability issues described in this box (e.g. norm scores, floor-ceiling effects, minimal important change) of the instruments under study from the included articles. Similar, we recommend to use the Generalizability box to extract data on the characteristics of the study population and sampling procedure. Therefore no scoring system was developed for these boxes.

Download here the COSMIN checklist with 4-point scoring system.

If you use the scoring system, please refer to our article on the development of the scoring system, which is published in Quality of Life Research. Click here to download the article.