GuardRail: Harmful Response Prevention (Beta)

Harmful Response Prevention is WitnessAI’s response analysis and control Guardrail. The purpose of this Guardrail is to analyze the responses back from AI models to user prompts, and then detect, and optionally prevent these responses from being sent to the user. These harmful responses are evaluated in three broad categories; harm to self, harm to others, and illegal activity. When this Guardrail detects these activities, it provides the option to Allow, Warn, or Block the response from the AI model with a customizable message.

Use Cases

Harm to self or others

Block or warn Users about dangerous AI responses, such as the possibility that AI Model responses may contain unqualified or inaccurate medical or financial information.

Illegal activity

Block or warn Users about dangerous AI responses, particularly common activities that may expose individuals or the Organization to legal risks.

Using Harmful Response Prevention Step-by-Step

This section covers details on the Harmful Response Prevention GuardRail, and how to add it to a Policy. Refer to the Policy Creation page for how to create Policies.

Add a Harmful Response Prevention GuardRail

After creating a Policy

Click the Guard Rails tab in the policy editor.

Select the Harmful Response Prevention GuardRail from the available options.

Click the slide button to enable the GuardRail.

Click the drop-down for the ‘Action’ field and choose Allow, Warn, or Block.

Optionally click the drop-down for the ‘Message’ field and choose the message to show the user when a Harmful Response is detected. A custom message can be added if desired.

Save the configuration.

Harmful Response Prevention GuardRail (Beta) Use Cases Harm to self or others Illegal activity Using Harmful Response Prevention Step-by-Step Add a Harmful Response Prevention GuardRail