Custom robot’s header tags (often sent as HTTP headers or meta tags) allow you to control how search engines and other robots interact with your web pages. Here’s a comprehensive guide to implementing them:
Understanding Robots Header Tags:
Robots header tags provide instructions to web crawlers about:
- Whether a page should be indexed
- Which links should be followed
- How the page should be cached
- Whether to show a snippet in search results
Implementation Methods:
1.Via HTTP Headers (Recommended for Technical Users)
This method sends instructions before the HTML content:
Apache (via .htaccess):
<FilesMatch “\.(php|html)$”>
Header set X-Robots-Tag “noindex, nofollow”
</FilesMatch>
Nginx:
location ~* \.(php|html)$ {
add_header X-Robots-Tag “noindex, nofollow”;
}
PHP:
header(‘X-Robots-Tag: noindex, nofollow’, true);
2.Via HTML Meta Tags (Easier for Most Websites)
Add this within your <head> section:
<meta name=”robots” content=”noindex, nofollow”>
Common Directives and Combinations:
Directive | Effect |
index | Allow this page to be indexed (default) |
noindex | Prevent this page from being indexed |
follow | Allow crawlers to follow links on this page (default) |
nofollow | Prevent crawlers from following links on this page |
none | Equivalent to “noindex, nofollow” |
noarchive | Prevent caching of this page |
nosnippet | Prevent display of snippets in search results |
max-snippet:[n] | Limit snippet length to n characters |
max-image-preview:[size] | Control image preview size (none, standard, large) |
max-video-preview:[n] | Limit video preview length in seconds |
notranslate | Prevent search engines from offering translation of this page |
noimageindex | Prevent images on this page from being indexed |
unavailable_after:[date] | Stop indexing after specified date/time (RFC 850 format) |
Advanced Implementation Techniques:
1.Conditional Robots Tags
PHP Example:
<?php
if ($user_logged_in) {
header(‘X-Robots-Tag: noindex’);
}
?>
2.Page-Specific Rules:
WordPress Example (functions.php):
add_action(‘wp_head’, ‘custom_robots_tags’);
function custom_robots_tags() {
if (is_page(‘private-page’)) {
echo ‘<meta name=”robots” content=”noindex”>’;
}
}
3.Dynamic Content Handling:
For JavaScript-rendered pages:
// After page load
document.querySelector(‘head’).insertAdjacentHTML(‘beforeend’,
‘<meta name=”robots” content=”noindex”>’);
Testing Your Implementation:
1.Google Search Console: Use the URL Inspection tool
2.Curl Command:
curl -I https://yourwebsite.com/page
- (Look for X-Robots-Tag in headers)
- Browser Developer Tools: Check Network > Headers tab
Best Practices
- Consistency: Ensure HTTP headers and meta tags don’t conflict
- Specificity: Use granular directives when possible (e.g., noimageindex instead of blanket noindex)
- Documentation: Maintain a robots.txt file that complements your header tags
- Caching: Consider how cached versions might affect your directives
- Monitoring: Regularly check search console for indexing issues
Common Pitfalls:
- Conflicting directives between headers and meta tags
- Over-aggressive noindex rules blocking important pages
- Forgetting to remove noindex tags after development
- Incorrect date formats for unavailable_after
- Not testing directives thoroughly before deployment
