GuardRail: Harmful Response Prevention (Beta)

GuardRail: Harmful Response Prevention (Beta)

List view
Quick Start
User Guide
Policies & GuardRails
Witness Anywhere: Remote Device Security
Witness Attack
Administrator Guide
 

Harmful Response Prevention GuardRail (Beta)

Harmful Response Prevention is WitnessAI’s response analysis and control Guardrail. The purpose of this Guardrail is to analyze the responses back from AI models to user prompts, and then detect, and optionally prevent these responses from being sent to the user. These harmful responses are evaluated in three broad categories; harm to self, harm to others, and illegal activity. When this Guardrail detects these activities, it provides the option to Allow, Warn, or Block the response from the AI model with a customizable message.

Use Cases

Harm to self or others

Block or warn Users about dangerous AI responses, such as the possibility that AI Model responses may contain unqualified or inaccurate medical or financial information.

Illegal activity

Block or warn Users about dangerous AI responses, particularly common activities that may expose individuals or the Organization to legal risks.
 

Using Harmful Response Prevention Step-by-Step

This section covers details on the Harmful Response Prevention GuardRail, and how to add it to a Policy. Refer to the Policy Creation page for how to create Policies.

Add a Harmful Response Prevention GuardRail

notion image
After creating a Policy
  1. Click the Guard Rails tab in the policy editor.
  1. Select the Harmful Response Prevention GuardRail from the available options.
  1. Click the slide button to enable the GuardRail.
  1. Click the drop-down for the ‘Action’ field and choose Allow, Warn, or Block.
  1. Optionally click the drop-down for the ‘Message’ field and choose the message to show the user when a Harmful Response is detected. A custom message can be added if desired.
notion image
  1. Save the configuration.