Rhetica

Safety

Safety and moderation

Rhetica simulates challenging dialogue, so moderation has to keep the training useful without letting it drift into harmful coaching.

Moderation

How the app stays in bounds

The app can simulate rhetoric for recognition practice. It should not help users carry out harm.

  • User input and model output are screened before they appear in the app.
  • Requests involving harassment, coercion, impersonation, or exploitation should be blocked or redirected.
  • High-stakes guidance stays out of scope even when it is framed as a debate prompt.

Escalation

What to do when a session goes wrong

Users need a clear stop path, and deployments need to know who handles incidents next.

  • Users should be able to pause or leave sparring at any point.
  • High-risk content should be logged for restricted safety review.
  • School deployments should define incident owners and response times before launch.