FAQ

VU medisch Centrum

E&B

EMGO institute for health and care research

QUESTIONS

 

FAQ 1: Can the COSMIN checklist also be used for other instruments that are not patient-reported outcomes?

 

FAQ 2: What does the abbreviation  7* #items and ≥ 100 mean?
 

FAQ 3: Which boxes should be completed when other terminology is used in the article than in the COSMIN checklist?

 

FAQ 4: When should box J (interpretability) be completed?

 

FAQ 5: When should box D (content validity) be completed?

 

FAQ 6: Which information should be provided in an article as evidence for content validity (box D)?

 

FAQ 7: How should the Generalizability box be used?

 

FAQ 8: How should the IRT box be used?

 

FAQ 9: Was does the percentage missing items refer to?

 

FAQ 10: What does 'Were at least two measurements available'mean?
 
ANSWERS

 

Can the COSMIN checklist also be used for other instruments that are not patient-reported outcomes?

 

Answer: Yes, the COSMIN checklist have been used in several systematic reviews of instruments such as performance-based tests.

Examples:

 

Dobson F, Hinman RS, Hall M, Terwee CB, Roos EM, Bennell KL. Measurement properties of performance-based measures to assess physical function in hip and knee osteoarthritis: a systematic review. Osteoarthritis Cartilage 2012;20:1548-62.

 

Bartels B, de Groot JF, Terwee CB. The six minute walk test in chronic pediatric conditions: a systematic review of measurement properties. Physical Therapy 2013;93:529-41.

 

What does the abbreviation 7* #items and ≥ 100 mean in the COSMIN scoring system?

Answer: this means that the number of patients you need for a factor analysis should be at least 7 times the number of items and at least 100 to receive an excellent rating. For example, if you want to perform a factor analysis on 30 items, you need 7*30=210 patiens for an excellent rating and 5*30=150 patients for a good rating. If you want to perform a factor analysis on 10 items, you need 7*10=70 patiens, but since this is less than 100, we recommend 100 patients to get more reliable estimates. Thus if a study includes less patients than 5 times the number of items and less than 100 in total (e.g. 80 patients in a factor analysis on 25 items), the quality of the study is considered to be poor.

 

Which boxes should be completed when other terminology is used in the article than in the COSMIN checklist?

 

Answer: this depend on the purpose of using the COSMIN checklist:

 

Using the COSMIN checklist in a systematic review of measurement properties

In a systematic review the aim is to evaluate all evidence regarding the measurement properties of the instruments of interest. We recommend to use the COSMIN terminology and definitions as a taxonomy for the review. This means that as a reviewer you should decide which measurement properties are assessed, regardless the terminology used by the authors of the included studies.

 

Example: Vogel et al.1 assessed ‘sensitivity of change’ of a number of voice assessment test. They defined sensitivity as “the extent to which changes in performance measures reflect true change in CNS function”. According to the COSMIN taxonomy, this study should be regarded as a responsiveness study. We recommend completing box I (responsiveness).

 

Example: Gummesson et al.2 examined the ‘longitudinal construct validity’ of the quick-DASH. The authors defined longitudinal construct validity as “the measure’s ability to detect a true change in health status and its precision in detecting changes of different magnitudes (also referred to as responsiveness or sensitivity to change)”. According to the COSMIN taxonomy, this study should be regarded as a responsiveness study. We recommend completing box I (responsiveness).

 

Example: Dawson et al.3 assessed ‘reproducibility’ (which they also called test-related reliability) of a ‘shoulder instability questionnaire’ by administering two questionnaires within 24 hours. Correlations of the total scores were measured by Pearson correlation coefficients. The authors stated that “the data were also examined by the coefficient of reliability according to the method described by Bland and Altman”. In the results section they stated “The coefficient of reliability was calculated as 5.7 using the Bland and Altman method and 95% of score differences fell between 0 ± 5.7”. According to the COSMIN taxonomy, both reliability and measurement error were assessed in this study. We recommend completing box B (reliability) and boxing C (measurement error).

 

Hiller et al.4 assessed ‘criterion validity’ of a questionnaire (BBUSQ-22) assessing bowel and urinary tract symptoms in women. The authors stated that “there is an absence of consensually accepted standards or criteria against which new measures may be validated. In such circumstances, validation research becomes a study of the correlations among measures, with clinical test results and other questionnaires serving as validation criteria”. The BBUSQ-22 was compared with other disease-specific questionnaires and with a general health status questionnaire (SF-36 and EuroQol). According to the COSMIN taxonomy, this analysis would be considered an assessment of construct validity, not criterion validity, as the COSMIN panel agreed that no gold standard exists for PRO instruments. We recommend to complete box F (construct validity, hypotheses testing) because this analysis contributes to information on the construct validity of the BBUSQ-22.

 

Using the COSMIN checklist to assess the quality of a manuscript, submitted for publication

As a reviewer or editor, you may want to use the COSMIN checklist to evaluate the quality of the submitted manuscript. In this case it may be useful to use the terminology of the authors to decide which box to complete. For example, if the study of Hiller et al. described above was submitted as a manuscript, this study would be considerd of low quality as a study of criterion validity because of the lack of an adequate gold standard. As a reviewer or editor you may decide recommending the authors to consider their study as a study on construct validity. If the authors would formulate specific hypotheses in the revision of their manuscript about the expected correlations of the BBUSQ-22 with the SF-36 and EuroQol, the study could change from a low quality study on criterion validity into a high quality study on construct validity.

 

 

 

When should box J (interpretability) be completed?

 

Answer: this depend on the purpose of using the COSMIN checklist:

 

Using the COSMIN checklist in a systematic review of measurement properties

In a systematic review the aim is to evaluate all evidence regarding the interpretability of the instruments of interest. We recommend to complete items 4-8 for all included studies. You may consider using these 5 items in a data extraction form, to extract data on the distribution of scores in the study population and in relevant subgroups, and data on floor-and ceiling effects and MIC from all included studies. This gives you an overview of all available information on the interpretability of scores of the instruments of interest.

 

Using the COSMIN checklist to assess the quality of a manuscript, submitted for publication

As a reviewer or editor, it may be useful to check for each study on measurement properties whether the auhors have provided the information mentioned in items 4-8 of box J, i.e. data on the distribution of scores in the study population and in relevant subgroups, data on floor-and ceiling effects, and MIC if it was possible to determine the MIC based on the data available. Often authors do have information available on distribution of scores and floor and ceiling effects but this information is not always published. By recommending authors to provide this information in their (revised) manuscript, reviewers and editors can help improving the intrepretability of measurement instruments.

 

If the authors state that they studied interpretability of an instrument, reviewers or editors can use box J to evaluate the quality of the interpretability study, by checking whether the authors have considered all relevant aspects of interpretability.

 

Example: Terwee et al.5 studied the interpretability of the Graves’ Opthalmopathy Quality Of Life questionnaire. The title of this study was “Interpretation and validity of changes in scores on the Graves’ ophthalmopathy quality of life questionnaire (GO-QOL) after different treatments”. In the introduction the authors stated that “To make the GO-QOL a useful tool for clinical investigators, guidelines are needed for the interpretation of (changes in) scores on the questionnaire”. In this case, we recommend completing box J.

 

 

When should box D (content validity) be completed?

 

Answer: this depend on the purpose of using the COSMIN checklist:

 

Using the COSMIN checklist in a systematic review of measurement properties

In a systematic review the aim is to evaluate all evidence regarding the content validity of the instruments of interest. This information may come from different studies. Many studies evaluate only one or two aspects of content validity. We therefore recommend to complete box D for all included studies. This gives you an overview of all available information on the content validity of the instruments of interest.

 

Example: Chan Ci En et al.6 examined the association between scores on the Neck Pain and Disability Scale (NPAD) and data from interviews where subjects were allowed to spontaneously identify problems associated with their neck pain disorder. Of the 10 most common problems identified in the interviews, 7 are included in the NPAD. These results provide evidence on whether all items are relevant for the study population (item 2) and whether all items together comprehensively reflect the construct to be measured (item 4). Wheeler et al.7 Examined face validity of the NPAD by using a comparison with pain-free volunteers a group of patients with neck pain. As expected, the group with neck pain demonstrated higher scores on the NPAD than pain-free controls. These results provide evidence on whether all items are relevant for the purpose of the measurement instrument (discriminating between patients with and without neck pain)(item3). To evaluate the content validity of the NPAD, the evidence from both studies should be taken into account.

 

COSMIN is currently developing new boxes to evaluate face and content validity of measurement instruments, including also criteria for what constitutes good face and content validity.

 

Using the COSMIN checklist to assess the quality of a manuscript, submitted for publication

As a reviewer or editor, you may want to use the COSMIN checklist to evaluate the quality of the submitted manuscript. In this case, we recommend to use box D only when the aim of a study was to evaluate content validity of an instrument. Box D can be used to check whether all relevant aspects of content validity have been assessed in the study. If not, authors may be given the opportunity to provide the missing evidence in the revised version of their manuscript. If authors have studied only one or two aspects of content validity, this should be considered a serious limitation of the content validity study. Moreover, if only part of the aspects of content validity have been studied, authors should not conclude that the instrument has good content validity.

 

 

 

Which information should be provided in an article as evidence for content validity (box D)?

 

Answer: evidence that all items refer to relevant aspects of the construct to be measured (item 1) can be obtained by asking experts on the construct and members from the target population. Evidence that all items are relevant for the study population can be obtained by asking members from the target population. Evidence that all items together comprehensively reflect the construct to be measured can for example be obtained by comparing the instrument with open questions.

 

Example: Albers et al. studied the content validity of the Patient Dignity Inventory (PDI). The questionnaire was send to a random sample of members of the Right to Die-NL and the Dutch Patient Association. All items of the PDI were thought to influence the sense of dignity during the last phase of life by both people who have one or more advance directives of the Right to die-NL and people who have a ‘wish to live statement’. These results could be regarded as evidence that all items refer to relevant aspects of the construct to be measured and that all items are relevant for the study population (items 1 and 2). They also compared the results from the PDI with open-ended questions asking how participants define dignity and what issues they think will influence their sense of dignity during the last phase of life. The results showed that almost all issues described in the open-end responses were represented in the PDI items but that that content validity could be improved by including items on communication and issues relating to care. This provides evidence that all items together do not completely reflect the construct to be measured comprehensively (item 4).

 

It is more difficult to think of how evidence can be obtained that all items are relevant for the purpose of the measurement instrument (i.e. discrimination, evaluation) (item 3). For an instrument to be used in an evaluative application this means that all items should be relevant for measuring change. If items cannot change (e.g. if questions are asked about aspects of a disease or characteristics of a person that cannot change) these items are not relevant for the purpose of measurement. This limits the content validity. If evidence is provided that item scores change in patients who change on the construct of interest (based on an external criterion) this is considered as evidence for responsiveness, but it may also be an indication that items are relevant for measuring change (provided that the items are measuring the construct of interest).

 

 

How should the Generalizability box be used?

 

Answer: this depend on the purpose of using the COSMIN checklist:

 

Using the COSMIN checklist in a systematic review of measurement properties
For systematic reviews we recommend to use items 1-6 in a data extraction form, to extract information about the characteristics of the study sample in which the measurement properties were assesed. See for example the review of Schellingerhout et al.8

 

Using the COSMIN checklist to assess the quality of a manuscript, submitted for publication

The Generalizability box can be used to examine to which extent it is clear to which patient population the results of a study on a measurement property can be generalized. The box provides a measure of the ‘amount of generalizability’ of the results of a study. 

Example: In a hypothetical article internal consistency was evaluated in a group of 100 patients participating in a clinical trial, while reliability was assessed in a group of 50 other patients, who were newly referred to an outpatient clinic. The characteristics of these two patient groups were different. Consequently, the results of the internal consistency analysis are generalisable to another patient population than the results of the reliability analysis. Therefore, the Generalisability box should be completed twice, for the internal consistency assessment and for the reliability assessment separately.

 

Example: In another article internal consistency was evaluated in a group of 200 patients. This sample was adequately described. In the same study reliability was assessed in a subsample of 70 patients, selected from the group of 200 patients. It was not described how these 70 patients were selected. Now it is less clear to which population the results of the reliability analysis are generalisable than to which patient population the results of the internal consistency analysis are generalisable. Also for this reason the Generalisability box should be completed twice, for the internal consistency assessment and for the reliability assessment separately.

 

How should the IRT box be used?

 

Answer: For studies that use IRT, the IRT box should be used in combination with the relevant COSMIN box. For example, in a study on internal consistency using Rasch analysis, you should use box A and the IRT box. For the final quality rating, you then take the lowest rating of any items in one of these boxes. So actually, you combine the two boxes.

 

 

Was does the percentage missing items refer to?

Answer: in each COSMIN box a question is asked whether the percentage of missing items is described. A high number of missing items can introduce bias in the results of the study if the missings were not random. Missing items could refer to the average number of missing items per instrument or the percentage of missing responses per item. We suggest to score excellent if for each item the percentage of missing answers is described or if at least the item(s) with the highest percentage of missing answers is described (e.g. "13% of the patients did not answer the question on sexual functioning"). It is important to know how many persons were excluded from the analyses because of missing items. Also, a high numer of missing responses for an item may indicate a lack of content validity.

 

A second item asks if it was adequately described how missing items were handled. It is important that this information is known because it may have a large influence on the scores on the instrument.

 

 

What does 'Were at least two measurements available'mean?

Answer: this question is included in box B and box C. It means that the instrument should have been completed twice (test-retest reliability) or the patient should be rated at least twice (by the same or by different raters). For example, a patient completed a questionnaire twice, or a doctor measured the blood pressure of a patient twice, or two raters scored the same x-ray of a patient.

 

 

References

 

1 Vogel AP, Fletcher J, Snyder PJ, et al. Reliability, Stability, and Sensitivity to Change and Impairment in Acoustic Measures of Timing and Frequency. Journal of Voice 2009; in press.

 

2 Gummeson C, Ward MM, Atroshi I. The shortened disabilities of the arm, shoulder and hand questionnaire (QuickDASH): validity and reliability based on responses within the full-length DASH. BMC Musculoskeletal Disorders 2006, 7:44

 

3 Dawson J, Fitzpatrick R, Carr A. The assessment of shoulder instability. The development and validation of a questionnaire. J Bone Joint Surg [Br] 1999;81-B:420-6.

 

4 Hiller L, Bradshaw HD, Radley SC, Radley S. Criterion validity of the BBUSQ-22: a questionnaire assessing bowel and urinary tract symptoms in women. Int Urogynecol J 2007;18:1133–7.

 

5 Terwee CB, Dekker FW, Mourits MPh, Gerding MN, Baldeschi L, Kalmann R, Prummel MF, Wiersinga WM. Interpretation and validity of changes in scores on the Graves’ ophthalmopathy quality of life questionnaire (GO-QOL) after different treatments. Clin Endocrinol 2001;54:391-398.

 

6 Chan Ci En M, Clair DA, Edmondston SJ. Validity of the Neck Disability Index and Neck Pain and Disability Scale for measuring disability associated with chronic, non-traumatic neck pain. Man Ther 2009;14(4):433-8.

 

7 Wheeler AH, Goolkasian P, Baird AC, Darden BV, 2nd. Development of the Neck Pain and Disability Scale. Item analysis, face, and criterion-related validity. Spine 1999;24(13):1290-4. 

8 Schellingerhout JM, Verhagen AP, Heymans MW, Koes BW, de Vet HCW, Terwee CB. Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review. Accepted Qual Life Res June 2011