![]() |
|
|
|
Evaluation & research: What's the difference?People sometimes use the terms evaluation and research synonymously, but this is a mistake. Evaluation differs greatly from research. The National Research Council (http://www.nas.edu/nrc/) has defined six key characteristics for scientifically-based research:
These characteristics can help clarify the differences between evaluation and research. With respect to characteristic number 1, the types of questions asked, both research and evaluation should be designed to address “significant” questions and involve the collection of observable data. But evaluation questions are much more closely linked to specific decisions that have a more localized, less “generalizable” scope. For example, an evaluation may be focused on decisions about the types of search delimiters that a particular audience would find most useful in a specific digital library whereas a research study may be focused on addressing issues related to the effects that digital libraries have on the scholarship within a field such as geology. Interestingly, although evaluation has a long history of being focused on decision-making, the shift to “evidenced-based decision making,” long established in health and medical fields, is a relatively recent development in the context of educational research (Shavelson & Towne, 2002). With respect to characteristic number 2, while there are evaluation models that emphasize theory, most evaluation plans do not have as strong a foundation in theory or research literature as research designs are expected to have. An evaluation focused on decisions about what types of metadata are most useful for K-12 teachers seeking educational resources might not be as informed by the theoretical underpinnings of metadata structures as would a research study focused on the mental models of metadata standards constructed by reference librarians. That said, there may well be great benefit to be gained by including relevant theoretical perspectives in evaluations (Chen, 1990). With respect to characteristic number 3, both evaluation and research are concerned with the reliability and validity of the tools and instruments used in data collection. One difference is that within evaluation circles, there is more acceptance of the use of measures that may not be completely validated, whereas in research there is a much greater expectation that the reliability and validity of instruments be rigorously established before being used in a study. For example, an evaluation of community reactions to a new user interface for a digital library may utilize an original survey instrument that has not been completely validated, whereas a research study focused on the effects of digital libraries on plagiarism among undergraduate students would almost certainly require the use of a validated measure of academic honesty. With respect to characteristic number 4, there are major differences between evaluation and research. Traditional quantitative educational research is generally designed on the basis of experimental or quasi-experimental designs intended to rule out or limit the plausibility of alternative explanations for results. Evaluations, on the other hand, are often designed to examine rival explanations from a variety of perspectives, and to provide decision makers with alternatives from which to choose. In that sense, many evaluations have more in common with qualitative research designs, and indeed, evaluations and qualitative studies may often look very similar in design, implementation, and even reporting. The capacity to rule out counter explanations of observed evidence has long raised many contentious issues within the educational research community (Lagemann, 2000), but evaluations can escape these issues by presenting the evidence for alternative explanations and allowing decision-makers to decide for themselves. We think it is better to think of evaluation as a process more akin to the judicial process. In a legal case, evidence is presented for the guilt or innocence of someone and a jury or judge decides, whereas, in the scientific process, findings are judged to be more or less warranted on the basis of peer review and replication. Of course, evaluations may also utilize quasi-experimental designs. For example, it might be feasible to roll out two different types of digital library search engines to randomly selected populations, and to collect data such as the number of return visits from the same domain names to judge user preferences. Although such an experimental approach might be feasible in an evaluation, we would not recommend it be used exclusively because of the difficulty of interpreting the results. Instead, we would advocate for a mixed methods approach that would include the collection of qualitative information to reveal why any preference patterns that emerged existed. With respect to characteristic number 5, replication is much more common, and indeed, expected in research than in evaluation. In evaluation, the emphasis is on providing quality information to inform decisions in a timely manner. If evaluators have done their job well, the information they have provided has helped decision makers make better decisions, and hence future evaluations will be focused on different decisions. Suppose the National Science Foundation must decide funding priorities for collection development for the National Science Digital Library. If an evaluation has been done to collect usage data in K-12, undergraduate, scientific, and other communities, informed decisions can then be made. A subsequent evaluation might focus on the types of resources most highly valued in the communities targeted for increased funding. Finally, with respect to characteristic number 6, peer and public reviews are among the primary foundations of scientific research. In fact, pseudoscience would proliferate without high standards for reviews. By contrast, evaluations are rarely shared beyond the “stakeholders” (decision makers and other interested parties) who are part of any particular evaluation. The results of evaluations are sometimes presented at conferences, and a few even get published in journals, but most evaluation reports have very limited circulation. Frankly, we believe that the state-of-the-art of evaluations would be improved if there was more public sharing, and indeed there is a formal review process, called meta-evaluation (Cook & Gruder, 1978), that is essentially the evaluation of evaluations. As you get involved in evaluating digital libraries, you are strongly encouraged to share your methods and findings at conferences or through peer-reviewed publications. For example, Wildemuth, Marchionini, Yang, Geisler, Wilkens, Hughes, and Gruss (2003) from the University of North Carolina at Chapel Hill presented the results of their evaluation of video surrogates at the 2003 Joint Conference on Digital Libraries held in Houston, Texas. Mead and Gay (1995) published a paper about using concept mapping in digital library evaluations in the ACM SIGOIS Bulletin. For anyone in academe, evaluations of digital libraries can be a form of the “scholarship of teaching” (Shulman, 2000), i.e., systematic inquiry into the effects of various forms of instruction (e.g., web-based instruction) or instructional support (e.g., digital libraries).
|