Stop AI crawlers from scraping your content for training โ normal Google search stays intact
# AI crawler blocking โ generated by theutilhub.com # Normal search engines (Googlebot, Bingbot) are NOT affected. User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: CCBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Bytespider Disallow: / User-agent: meta-externalagent Disallow: /
| Bot | Operator | Purpose |
|---|---|---|
| GPTBot | OpenAI | Scrapes data to train ChatGPT models |
| ChatGPT-User | OpenAI | Fetches pages during ChatGPT live browsing |
| ClaudeBot | Anthropic | Scrapes data to train Claude models |
| Google-Extended | Google AI (Gemini) training โ separate from normal Search | |
| CCBot | Common Crawl | Open crawl dataset used by many AI models |
| PerplexityBot | Perplexity | Perplexity AI search indexing and answers |
| Bytespider | ByteDance | ByteDance (TikTok) AI training data scraping |
| meta-externalagent | Meta | Meta AI (Llama) training data scraping |
์ด ๋๊ตฌ๋ฅผ ์น๊ตฌ์๊ฒ ๊ณต์ ํ๊ธฐ
Related Tools
The Block AI Crawlers Code Generator creates standards-based code to stop major AI crawlers โ OpenAI (GPTBot), Anthropic (ClaudeBot), Google (Google-Extended), Meta, Perplexity, and more โ from scraping your website content as training data. Choose which bots to block, the scope (entire site, specific folders, or images only), and the output type (robots.txt, HTML meta tag, server header, or llms.txt), and ready-to-paste code is generated instantly. Crucially, this blocking does NOT affect indexing by normal search engines like Google or Bing โ Googlebot and Google-Extended are separate bots, so you keep your search visibility while selectively blocking only AI training. All code generation happens 100% in your browser; your paths and settings are never sent to a server. Includes a database of 18+ AI bots with per-bot purpose descriptions and application guides.