Initiatives Framework for Analysis
- About the Framework
- 1: Understand the Initiative Context and Details
- 2: Review Overall Persistence Lift
- 3: Validate Matching Details
- 4: Review Data Details
- 5: Export Raw Data & Further Validate the Results
- 6: Review Impact by Student Group
- 7: Review Impact by Term
- 8: Discuss results with key stakeholders
- 9: Identify Opportunities for Related Impact Analyses
About the Framework
This analysis framework helps you explore and interpret your results in Initiative Analysis. It also provides recommendations for additional analysis to more deeply understand how your initiatives affect student success.
These are your keys to success:
- Understand the context for the initiatives
- Ask questions
- Examine in detail both the persistence impact results and the key metrics
Why analyze? Your careful analysis will reveal which initiatives have the greatest impact on persistence for different types of students. Determining this lets you cost-effectively optimize your resources to increase student success by connecting students to programs that are now known to have the best chance of increasing their likelihood to persist.
The framework consists of nine steps detailed here:
- Understand context and details
- Review overall persistence lift
- Validate matching details
- Review data details
- Export raw data and further validate results
- Review impact by Student Group
- Review impact by Term
- Explore results with stakeholders
- Identify opportunity for related impact analysis
Once you have created a research question and designed your initiative, prepared the data for upload into Initiative Analysis, validated the data, and run them through Initiative Analysis, you will then be able to review each impact analysis in detail using the framework described below.
1: Understand the Initiative Context and Details
When evaluating the results of an impact analysis, be sure to review the initiative design, and ask key questions that will provide you with the important context needed to interpret the findings. You can read the descriptions on the initiative detail page or meet with the author of the initiative for additional context.
Questions to Explore:
- What is the research question that the analysis is answering? What is the intervention that is being tested?
- Was the initiative expected to impact persistence? If not, was persistence a secondary or tertiary goal?
- Were the eligibility and participation requirements clearly defined?
- Is this a program-level initiative? (i.e., a program offered to students throughout the term)?
- Was participation or treatment relatively consistent throughout the term?
- What is the treatment and how is it different from business-as-usual practice?
As you review the initiative details, consider any potential confounding factors based on the initiative analysis design that could affect the interpretation of impact results.
Potential confounding factors that may have an effect on the results include:
- Other initiatives for the same student group during the same term
- Changes in initiatives for the student group at the same time as this initiative
- Programmatic changes over time, such as change in program leadership, structure, or if participation becomes mandatory or not
- The participant group and the comparison group data were from different terms
2: Review Overall Persistence Lift
Next, look at the overall persistence results and do a “gut check.”
Questions to Explore:
- Are the results intuitive?
- Do the results seem too good to be true?
If the results seem unintuitive or too good to be true, then explore the following:
- Was the definition of initiative participation an actual treatment, i.e. an action administered to a group of students with the intent of increasing their likelihood to persist?
- What was the sample size? If the sample size is lower than 1,000:
- Check the definition of the eligible comparison group and the participant group. Is the eligible comparison group inherently different from the participant group in any significant ways that could make it difficult to find matches of similar students? (e.g., one group was Pell eligible and one was not)
- Check the Data Details to see if your eligible comparison group is smaller than the participant population. Is this expected? Why or why not?
- Does the lift seem unintuitive? For example, is the persistence lift “too good to be true”? Or, on the other hand, is it unexpectedly negative? (Hint: Is there more than a +/- 10% lift?) If so:
- How was participation or treatment defined? Was initiative participation an actual treatment?
- What are potential confounding factors that may be affecting the result?
- Do the p-value and confidence interval provide any additional clues?
- What is the p-value?
- Is your p-value less than 0.05?
- If yes, this is good! It is likely the initiative you are measuring caused the lift in persistence. This is more likely to occur when the matched participant group is greater than 1,000 and the match rate is high.
- If not, this means the analysis produced results that are not statistically significant.
- Was the sample size large enough?
- How does the lift in persistence compare to the estimated minimum detectable effect size? (Hint: Check the Data Details page for the estimated minimum detectable effect size. This is the same estimate given during data validation when first submitting the data for analysis.)
- Was the initiative expected to specifically impact persistence?
- What are potential confounding factors affecting the result?
- Is your p-value less than 0.05?
- What is the confidence interval? If the confidence interval is wider than the lift, then could your sample size be too small?
If Results Are Not Statistically Significant
There might be times where Initiative Analysis results come back as not statistically significant. This could be a good indication that the sample size N was not large enough to confidently attribute the outcomes to the initiative itself. Check the number of analyzed participants and the minimum detectable effect size to confirm if the N was too small. If so, think about ways that you could get to a larger N in future initiative design implementations, or by asking a different research question with different eligibility and participation criteria that could give you a larger sample size to work with.
Another common cause of results being not statistically significant is that the initiative being analyzed was not purposefully designed to impact persistence rates. In this case, it may not be surprising that a statistically significant persistence impact was not observed.
P-Value and 95% Confidence Interval
Initiative Analysis displays p-values and 95% confidence intervals to help users interpret results and make decisions based on the analysis. The p-value in Initiative Analysis represents the statistical significance of the average lift in persistence measured across multiple bootstrap samples, whereas the 95% confidence interval represents the plausible range of the true effect based on the average results from those bootstrap samples.
Make sure to use all data provided by Initiative Analysis, including the Matching Details page and Export Raw Data details to further evaluate the results.
3: Validate Matching Details
In addition to checking the N, p-value, and confidence intervals, you can also verify results by checking the Matching Details tab of completed initiatives.
Under Matching Details, you will see the percentage of the participant IDs you submitted that were matched via prediction-based propensity score matching (PPSM). We typically recommend at least a 70% match rate to consider an analysis valid and less likely to be biased.
If your overall match rate is less than 70%, check the Data Details to determine if there are at least enough eligible comparison students to match with participants.
- Hint: If 1,000 participants were submitted but only 100 eligible comparison students were submitted, then the highest number of matches that could be identified by PPSM would be 100, which means that at least 90% of the participants would be left out of the analysis, which would result in a 10% match rate at best.
- Recommendation: If the match rate is low, consider including data from additional historic terms (up to 4 years) to increase the sample size for the analysis.
After reviewing the Overall Match rate, examine the two metrics used in PPSM: Persistence Prediction and Propensity Score. If Initiative Analysis’s PPSM did a good job of identifying comparable students, the red and blue lines will overlap in the “After Matching” chart. The “After Matching” rate, or Similarity, for both charts should be 85% or greater.
This confirms that students used for analysis were similar in both their likelihood to succeed or persist and their likelihood to participate in the initiative. Below are examples of distributions that you should expect to see on each chart.
Persistence Prediction Matching Details | Propensity Score Matching Details |
Likelihood to Persist
- “Before Matching” shows some selection bias.
Students in the participant group (represented by the blue line) tend to be more likely to persist. - “After Matching” distributions are similar for both the matched participants and the matched comparison students, indicating the selection bias was removed.
Likelihood to Participate
- “Before Matching” shows a higher number of students in the comparison group who are not likely to participate in tutoring.
- “After Matching” distributions controls for this selection bias as shown by the overlapping lines.
Troubleshooting: Low Match Rate
Low match rates could indicate biased or inaccurate impact analysis results. If the Before Matching distributions of persistence or propensity scores are widely different (see example below), the participant and comparison groups already had vastly different likelihoods of persisting or receiving the intervention prior to the initiative taking place.
If so, consider:
- Were the eligibility and participation criteria for this initiative appropriately defined?
- Could there be significant differences in the data between the two groups?
For example, could GPA or financial aid data be different enough between the groups that it could cause a large difference in persistence or propensity scores?
Check the Data Details to determine if enough eligible comparison students were submitted for analysis.
For example, if 1,000 participants were submitted but only 100 eligible comparison students were submitted, then the most number of matches that could be identified by PPSM would be 100, which means that at least 90% of the participants would be left out of the analysis, which could skew results.
4: Review Data Details
When validating and interpreting results, check whether eligible comparison students and participants were selected from different terms, otherwise known as “pre-post analysis.” This will only happen if the user who submitted the initiative selected to include additional eligible comparison students from other terms during the data validation portion of the Add Initiative process.
If eligible comparison students could have been matched from other terms, you must be mindful of potential confounding factors from comparing students across different time periods. For example, in the image above, the participant students were selected from Autumn 2017 but the eligible comparison students were selected from different terms: Summer 2016, Autumn 2016, and Autumn 2017.
Check that the eligible students from all time periods used in the analysis had similar persistence rates and feature availability/distributions to confirm no additional confounding factors will be introduced in pre-post analysis.
Questions to Explore:
- Are the terms used in pre-post matching valid for analysis?
- Recommendation: If pre-post matching is necessary and viable, students from similar seasonal historic terms should be used to minimize confounding factors (students from the same seasonal terms typically have more similarities).
- Example: Match students from Fall 2017 to Fall 2016 terms.
- Are the students across different terms indeed comparable (no significant differences between student populations, institutional environment, or initiatives in place) amongst the terms used for comparison?
- Example: If you are examining an initiative for your First Time in College (FTIC) population, did the entrance requirements for FTIC students among the terms you are examining change significantly? Are the GPAs of incoming current students much higher, on average, than historical FTIC students?
- Are the availability and representation of student data features consistent throughout the full time period used for comparison?
- What are other potential confounding factors affecting the program, terms, or student list?
- Is there a large enough ratio of eligible comparison students to participants to ensure a higher likelihood of a high match rate?
5: Export Raw Data & Further Validate the Results
Verify that the comparison group calibration (predicted vs. actual persistence rates) makes sense.
- On the Initiative Details page, select Export Raw Data.
- Check Calibration: Is the comparison group’s predicted persistence rate similar to the comparison group’s actual persistence rate?
To determine the calibration, calculate the difference between the Comparison Group Outcome (Predicted) values and the Comparison Group Outcome (Actual) values (columns G & I below).
Important: These numbers should be similar. Ensure that the calibration error is < 3% for N > 500. See next.
Troubleshooting: Questionable Calibration
If there is a significant difference between the comparison group’s predicted and actual persistence rates (i.e., the calibration errors are above 3%), verify the following:
- The number of analyzed participants is > 1000
- The initiative data file submitted to Initiative Analysis specifies the correct eligibility criteria.
- The intervention being measured is not a necessary criteria for success/persistence. For example, if students didn’t “participate” in the initiative (e.g. didn’t see an advisor), then they couldn’t register for next term, so all eligible comparison students are expected to have much lower persistence rates.
- The intervention is programmatic, meaning that the intervention occurred throughout the term. Initiative Analysis uses data early in the term, at census, for matching. If the intervention began late in term, persistence predictions could have changed significantly from what they were at census.
- Pre-post matching was not used. Pre-post matching is where eligible comparison students were not from the same term as the participants. This scenario would only occur if you selected this option during the initiative submission and data validation process.
In general, be cautious of any results that have a low N, low match rate, or large calibration errors.
6: Review Impact by Student Group
After completing the data checks described above, review the impact by student group. These results provide:
- insight into what types of students benefit most from the initiative,
- whether the target student group was impacted by the initiative, and
- where there are potential opportunities for further program optimization.
Questions to Explore:
Use the Impact by Student Group view in your instance to review the following questions:
- To what extent was this initiative aimed to impact persistence?
- Who were the intended student group(s) for this initiative? Were the groups with the highest lift among those intended for this initiative?
- Which student groups had the greatest positive change?
- What were the results for the student groups with the lowest (Bottom Quartile) and highest persistence predictions (Top Quartile)?
- How could the institution encourage more students likely to benefit from the initiative to participate in it? What opportunities are there to engage more students at the college or university who could benefit from this intervention?
- How could the institution help intended students who did not benefit from the initiative?
- What changes could be made to the initiative for that particular student group?
- What other support strategies might you consider?
7: Review Impact by Term
Scroll down to Impact by Term to review the term-based results to identify trends over time.
Questions to Explore:
- Is the impact on persistence consistent over terms?
- What notable differences are there between the same term over multiple years or between different terms?
- What could cause these differences?
8: Discuss results with key stakeholders
Once you’ve confirmed the impact results and identified some key findings, it’s important to share and discuss those learnings with other stakeholders and departments to ensure accurate representation of results and to encourage action and decisions to be made using this quantitative analysis.
Best practice: Use Questions to Explore in step 6 and 7 for questions to review with key stakeholders.
9: Identify Opportunities for Related Impact Analyses
Finally, engage in discussions about how these initiatives could be improved to collect better data for measurement or to expand their reach to students who benefit the most.
Questions to Explore
- What other research questions could be measured in Initiative Analysis about this program? This is the dosage effect: what is the impact on attending more than one time per term?
- “Does attending tutoring at least 3 times have a greater impact than just once?”
- “Does attending math tutoring AND supplemental instruction have a greater impact than just going to supplemental instruction?”
- What are different ways to define participation?
- Did the expected target population participate in this initiative?
- Was increased persistence the primary, secondary, or tertiary outcome objective for the initiative?
- Were there other outcome objectives?
- What other measures of success are there for this initiative?
- What other data needs to be collected?
- How can the results help conduct focused outreach to the student groups benefiting most from this program, or use the results to adjust the program itself to potentially have greater impact?