Baseless Measurements: No Baseline Study? No Problem

Introduction

Impact evaluations are powerful tools for developing a knowledge base on a programme about what works, what does not, and the reasons why. The case for incorporating more ‘rigorous impact evaluations’ into development programming and public policy processes have been eloquently documented elsewhere and supported by many development programmes, non-governmental organisations, and government agencies. However, despite considerable attention and progress made towards strengthening evaluation theory and practice over the past several years, many practical, real-world evaluation challenges remain pervasive.

Three categories of challenges are most prevalent with policy and programme evaluations: design, data collection, and data analysis. This essay discusses two of the most common design challenges: failure to establish a set of baseline data and a lack of a control group. The purpose of this essay is to synthesise existing knowledge about evaluation design challenges in a real-world scenario where critical data are most commonly not available or are of low quality. This essay incorporates practical experience conducting evaluations in Nigeria. It outlines strategies and approaches utilised to navigate the critical challenges resulting from data gaps during evaluations.

The rest of the document provides answers to the challenge. The essay presents a brief description of the problem. It offers specific propositions for reconstructing baseline data and control groups when the project or programme did not establish those requirements at the start of the project. The essay discusses the rationale and implementation strategies for the proposed solutions.

Context of the Problem

What is “Impact”? While there is consensus on the importance of impact evaluation within the development community, there has been a widespread debate amongst evaluation practitioners and in academic circles on the definition of impact evaluation. Many programme-implementing agencies assume that impact can be measured by merely comparing baseline measures with post-project measures for the target population. Typically, there is no evaluation of the outcomes against a comparison group. There is an erroneous assumption that the observed changes result from the project intervention.

This essay adopts the definitions proposed by the International Initiative for Impact Evaluation. Evaluation is the “analysis that measures the net change in outcomes for a particular group of people that can be attributed to a specific programme using the best methodology available, feasible and appropriate to the evaluation question that is being investigated and to the specific context.” In this context, “impact” is referred to as “net project impact” and means the “total observed change” minus “change attributed to other factors not related to the project”.

Causal attribution is necessary to ascertain the effectiveness of the interventions. Yet, establishing causation is one of the most challenging issues in evaluation. Whether or not the outcome of the intervention was due to the treatment or to other factors has profound implications when deciding whether a project should be repeated elsewhere or implemented on a larger scale. An understanding of the true magnitude and direction of project impact has advantages, including savings, more effective allocation of resources and avoidance of investments in projects that will not produce the expected benefits.

Programme evaluators face real challenges in designing rigorous impact evaluations for development projects and programmes. In their book, RealWorld Evaluation: Working Under Budget, Time, Data, and Political Constraints, Bamberger et al. (2012) present two sets of real-world evaluation scenarios. In the first scenario, the evaluation faces budget, time and data constraints. Similarly, the programme design and delivery mechanisms limit the range of available evaluation designs. For example, it may not be possible to include a control group or to conduct a comprehensive baseline study on the project population, or there may be limits on the number of interviews conducted. Under the second scenario, the project may be nearing completion or has ended before the evaluation is commissioned. No baseline data is collected on a comparison group and often not on the project group either. Secondary data is lacking or of low quality. There are time pressures to complete the report. There are budget constraints. There are political constraints on the evaluation methodology and pressures to ensure “objective but positive findings”.

Under the above scenarios, the evaluator has to determine if it is possible to conduct a quality impact evaluation given the constraints. The evaluator has to decide how to select the most robust design within the above limitations. There are specific strategies evaluators can use to address the above challenges while at the same time aiming to achieve maximum possible evaluation rigour within the constraints of a given context. There are specific approaches for reconstructing baseline data and comparison groups in ways that do not involve random assignment through a quasi-experimental design approach.

Reconstructing Baseline Data Gaps and Control Groups

Proposed Solution

Obtaining a credible estimate of the counterfactual is one of the significant challenges with evaluating any intervention or programme. What change would have occurred in the relevant condition of the target population if there had been no intervention? Without a credible answer to this question, it is not possible to determine whether the intervention actually influenced the outcomes for the programme beneficiaries or it is merely associated with successes (or failures) that would have occurred regardless.

However, RCTs are not always practical or possible in many real-world evaluations. Therefore, one of the alternative ways to define the counterfactual is a ‘quasi-experimental design with pre-test and post-test non-equivalent control group design’ with the statistical matching of the two groups. This design can help the evaluator answer the impact question: How do we know if the observed changes in the project participants are due to the implementation of the project or other unrelated factors? In this design, participants are either self-selected or selected by the project implementing agency. Statistical techniques (such as propensity score matching), drawing on high-quality secondary data can be used to match the two groups on several relevant variables.

Rationale for the Solution

Is it possible to assess the impact without randomisation? The ‘matched-comparison group design’ allows the evaluator to make causal claims about the impact of aspects of an intervention without having to assign participants as in RCTs randomly. Where projects and programmes fail to collect all of the necessary baseline data and control group, a matched-comparison group design could be considered a “rigorous design” that allows evaluators to estimate the size of the impact of a programme, policy, or intervention. It is also one decisive way to get a sense of the variation in participation and thus an understanding of the counterfactual.

Implementation of the solutions – The ‘How’

Is it possible to reconstruct the baseline data and control group? It may be possible to generate baseline conditions where the programme failed to collect baseline data at the start of the intervention. Several qualitative techniques are available for reconstructing the baseline conditions that existed at the time the project began. These techniques include identification and effective use of documentary (secondary) data sources, analysis of administrative records, interviews with key informants, use of participatory group interview techniques (such as Participatory Rural Appraisal) to recreate historical data and timelines, and the use of recall (see Bamberger, Rugh and Mabry 2006).

While these approaches can yield valuable information, evaluators must exercise caution in how they apply them. For instance, some of the issues in baseline reconstruction are variations in reliability of recall, memory distortion, secondary data not easy to use, secondary data incomplete or unreliable, and critical informants may distort the past.

How can control groups be recreated? Impact evaluation requires the collection of data from both those affected by the intervention (the treatment group) and a similar untreated group (the comparison group). As noted above, reconstructing a matched-control group provides the variation in outcomes for those benefitting from the policy or programme and the group that is similar in all respects with the treatment group except that they didn’t receive the treatment (the “comparison/control group”) and thus a sense of the counterfactual.

There are a few ways to reconstruct comparison groups. These include Randomisation, Judgmental matching of individuals/ households/ communities, Pipeline (when there is phased introduction of project services, beneficiaries entering in later phases can be used as “pipeline” comparison groups), Internal controls when different subjects receive different combinations and levels of services.

Challenges involved in implementing the solutions

Reconstructing baseline and comparison groups are not without their challenges. Sample selection bias arises when participation in the programme by individuals is related to unmeasured characteristics that are themselves related to the programme outcome under study. Project areas are often selected purposively and are challenging to match. Differences between project and comparison groups are difficult to assess whether the results were due to project interventions or to these initial differences. In many instances, the programme may lack adequate data to select comparison groups. Contamination is a problem where the movement of people into and out of control areas. For instance, a regionally targeted employment or palliative programme might induce migration from a neighbouring control region hence compromising the integrity of controls. Statistical methods cannot fully adjust for initial differences between the groups (unobservables). These challenges are resolved through appropriate statistical methods, including quasi-experimental controls. When successfully applied, can deal with most of the problems mentioned above.

How to work around the solution(s)

Propensity score matching (PSM) is a statistical method that can be used to strengthen the construction of comparison groups design. In PSM, an individual is not matched on every single observable characteristic but their propensity score; that is, the likelihood that the individual will participate in the intervention (predicted likelihood of participation) given their visual characteristics. PSM matches treatment individuals/ households with similar comparison individuals/ households, and subsequently calculates the average difference in the indicators of interest.

Characteristics of project and comparison groups can be compared using rapid assessment methods. Some of these methods include observation, key informants, focus groups, secondary data, aerial photos and GIS data. Combination of the strengths of both quantitative and qualitative approaches (enhanced by triangulation) can strengthen the validity and reliability of evaluation findings.

Concluding Thoughts

This essay has briefly examined two distinct, but interrelated challenges evaluators face in the real world while evaluating the effectiveness of programme and policy interventions: lack of baseline data and control group. Strategies to address the data constraints were proposed. The paper also highlighted additional challenges that exist when reconstructing baseline and comparison groups and how to work around the solutions.

In conclusion, despite the challenges of designing a rigorous impact evaluation as discussed in this paper, evaluation consultants in a developing country context like Nigeria must always be prepared, first, to be invited as evaluators at a late stage in the life of project cycles. Development consultants must be ready to work under data constraints, including having limited access to comparative baseline data and without a feasible comparison group. Evaluators must be equipped to operate under budget and time restrictions without compromising the methodological rigour. Finally, consultants must be prepared to reconcile different evaluation paradigms and information needs of various stakeholders.

References

[1] See for instance Michael Bamberger and Howard White (2007). Using Strong Evaluation Designs in Developing Countries: Experience and Challenges. Journal of Multidisciplinary Evaluation, Volume 4, Number 8

[2] International Initiative for Impact Evaluation (3IE). (2011). 3IE Impact Evaluation Practice A Guide for Grantees. http://www.3ieimpact.org/strategy/pdfs/3ie{1c02100822988c48c7b0a484ab61ac3d7f398d67c2f66594d88b2db33072d9d9}20impact{1c02100822988c48c7b0a484ab61ac3d7f398d67c2f66594d88b2db33072d9d9}20evaluation{1c02100822988c48c7b0a484ab61ac3d7f398d67c2f66594d88b2db33072d9d9}20practice.pdf

[3] See for instance, Howard White, Shagun Sabarwal and Thomas de Hoop (2014), Randomized Controlled Trials (RCTs), UNICEF. Retrieved from: http://devinfolive.info/impact_evaluation/img/downloads/Randomized_Controlled_Trials_ENG.pdf