In addition to rooting out unintended consequences and gathering user feedback directly in this step, as you roll out your AI coach you may also wish to return to steps 2 and 3 to gauge whether you are achieving the goals you had set out to attain and mitigating the potential risks and harms you originally envisioned. As noted earlier, this process is meant to be cyclical rather than linear, allowing you to not only plan but refine and improve your AI coach over time.
Substeps
- Develop tests and monitoring protocols for prevalent tasks to detect unexpected errors, risks, or outcomes.
- Collect direct feedback from users.
a) Develop tests and monitoring protocols for prevalent tasks to detect unexpected errors, risks, or outcomes.
Even tasks that AI routinely performs well can occasionally produce unexpected errors or outcomes. Regular monitoring helps catch these rare but potentially significant issues to ensure consistent quality and safety in AI coaching. Ensure there is continuous monitoring of the AI coach's performance, with the option for human intervention when needed, and educate users on when it's appropriate to seek human help.
Examples:
- Track the frequency of outdated information being provided.
- Flag instances where groups of students (such as international students, student parents, students in vocational versus traditional programs, or different genders) are incorrectly advised.
- Analyze cases where students report the given advice as confusing or unhelpful.
- Monitor how often the AI coach escalates sensitive questions to human coaches.
- Set up alerts for any patterns of inaccuracies or inconsistencies in the AI coach's responses.
- Review and audit interactions involving high-risk topics, such as mental health or academic integrity.
- Establish thresholds for acceptable error rates and set protocols for intervention if those thresholds are exceeded.
- Ensure that monitoring data is regularly reviewed by human oversight teams to identify trends and areas needing improvement.
Questions to Discuss:
- What prompts or flags do you want to watch for as you evaluate the usage of our AI coach on a regular (daily, weekly, monthly) basis?
- What might you do with that information? Change the AI coach or change your expectations?
b) Collect direct feedback from users.
User feedback provides invaluable insights into the real-world performance and impact of the AI coaching system. It helps identify areas for improvement, uncover unexpected issues, and ensure the system is meeting users’ needs.
Examples:
- Rating system for individual AI coach responses.
- Post-session surveys on overall experience.
- Option to report concerning or unhelpful AI coach behavior.
- Periodic in-depth user interviews or focus groups.
- Open-ended comment boxes for specific suggestions.
- Feedback prompts after key milestones (e.g., course registration, exam periods).
- Regular check-ins with student representatives or advisory groups.
Questions to Discuss:
- How can you make it easy for users to provide your team with feedback on their experience? Within/after every interaction or periodically?
- What other mechanisms do you have in place to gather feedback from users or user representatives on a deeper level, such as through surveys or focus groups?