DefinitionSecurityAI attacks

What is model poisoning?

Model poisoning is an attack where adversaries manipulate an AI model's training data or fine-tuning process to make it produce incorrect, biased, or malicious outputs. It compromises the model's integrity at a fundamental level.

Attack Vectors

How model poisoning works

Every feature designed to help your team work smarter with AI.

01

Training data manipulation

Attackers inject malicious or biased examples into training datasets that subtly alter the model's behavior.

02

Fine-tuning attacks

Compromise the fine-tuning process to introduce backdoors or biases that activate under specific conditions.

03

Backdoor triggers

Plant hidden triggers that cause the model to produce specific malicious outputs when certain inputs are provided.

04

Output validation

Monitor and validate AI outputs to detect anomalies that may indicate a poisoned model.

05

Supply chain security

Verify the integrity of models, training data, and fine-tuning pipelines to prevent tampering.

06

Anomaly detection

Track output patterns over time to identify sudden changes that may indicate model compromise.

Benefits

How to protect against model poisoning

Use models from trusted providers with transparent training practices
Validate AI outputs before using them in critical decisions or workflows
Monitor output patterns for anomalies that may indicate model compromise
Implement prompt governance to control what data enters AI systems
Use DLP scanning to prevent sensitive data from entering potentially compromised models
Stay informed about security advisories from AI model providers

FAQ

Frequently asked questions

Can model poisoning affect ChatGPT or Claude?

Major providers invest heavily in training data security, but no system is immune. The practical risk for most organizations is lower with major providers than with open-source or custom-trained models.

How does TeamPrompt help with model poisoning risks?

TeamPrompt adds a security layer between your team and AI models through DLP scanning and prompt governance. While it cannot detect a poisoned model, it protects the data you send to models and helps monitor usage patterns.

What is the difference between model poisoning and prompt injection?

Model poisoning attacks the model itself during training. Prompt injection attacks the model at inference time through crafted inputs. Both manipulate AI behavior, but at different levels.

How it works

Three steps from install to full AI security coverage.

1

Install

Add the browser extension to Chrome, Edge, or Firefox — or use the built-in AI chat. No proxy or VPN needed.

2

Configure

Enable the compliance packs for your industry, set DLP rules, and add your team's prompts to the shared library.

3

Protected

Every AI interaction is scanned in real time. Sensitive data is blocked before it leaves the browser. Your team has a full audit trail.

Ready to secure your team's AI usage?

Drop your email and we'll get you set up with TeamPrompt.

Free for up to 3 members. No credit card required.