GPTBot is OpenAI's web crawler — the bot that reads public pages to improve OpenAI's models and to power ChatGPT's browsing and citations. The decision to let it in or keep it out is small in effort and large in consequence: it's the difference between being cited in AI answers or being invisible to them.
What GPTBot does
GPTBot fetches publicly available web pages, respecting robots.txt. OpenAI uses what it collects to help train and improve models and, increasingly, to ground ChatGPT's answers with citations to live sources. It identifies itself with the user agent GPTBot and publishes its IP ranges so you can verify it.
The AI crawlers you should know
| Crawler | Who | Purpose |
|---|---|---|
GPTBot | OpenAI | Training + ChatGPT browsing/citations |
OAI-SearchBot | OpenAI | ChatGPT search results |
ClaudeBot | Anthropic | Training + Claude citations |
PerplexityBot | Perplexity | Answer engine indexing |
Google-Extended | Gemini / AI training (separate from Search) | |
CCBot | Common Crawl | Open dataset many models train on |
Allow or block? The real trade-off
This is a strategy decision, not a default:
- Allow if you want AI visibility. Blocking GPTBot removes your site from ChatGPT's citations entirely — the opposite of Answer Engine Optimization. For most businesses that rely on being found, allowing is the move.
- Block if you're protecting proprietary, paywalled, or licensed content you don't want used for training or answers.
Note that Google-Extended is separate from Googlebot: blocking it keeps you out of Gemini/AI training without affecting your normal Google Search ranking.
How to allow or block GPTBot
It's all robots.txt. To block GPTBot from the whole site:
# robots.txt
User-agent: GPTBot
Disallow: /To allow it (the default if no rule exists) but block one section:
User-agent: GPTBot
Allow: /
Disallow: /members/To allow AI crawlers broadly while blocking one, repeat the block per user agent:
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: CCBot
Disallow: /Verify your AI crawler access
The most common mistake is accidentally blocking AI crawlers you meant to allow — a stray Disallow, a CMS default, or a security plugin. Since this silently removes you from AI answers, it's worth checking.
Nurbak scans your live site for AI-crawler access (GPTBot, ClaudeBot, PerplexityBot, Google-Extended), plus the structure and llms.txt signals that decide whether AI can actually cite you. Check it with the free AI Visibility Checker.

