The TXT robot is a text file owned by a website/blog, which functions to instruct search engine bots on how to interact with the website or blog. Many webmasters usually use TXT robot files to instruct bots about which pages, directories, and URLs to crawl and which parts should not be crawled. It can block all search engine bots that come to a site. To make it easier to understand, please see the illustration image below.
This Robot TXT file can be likened to a homeowner, and a search engine bot is like a guest. So, a homeowner has the right to instruct all guests who come about which room/room they can enter and which room/room they are not allowed to enter. So, in essence, this robot.txt is a .txt file created by webmasters in a programming language that search engine bots understand about which page sections, directories, and URLs can be crawled and indexed on search engines.
Table of Contents
Robot.txt Command list
The following is a basic programming language that is often contained in the robot.txt file.
- User-agent: *: This command means the command code that applies to all bots, be it google bot, google mobile bot, google image bot, Bing bot, and so on, to submit to commands loaded in a robot.txt file
- User-agent: Googlebot-mobile: is an order intended for Googlebot-mobile only.
- “Disallow:”: this command serves to explain which parts are not allowed by the bot to crawl.
- “Allow: /”: this command functions allow bots to crawl all web pages except those that are listed in the disallow command.
If you are still confused, I will complete the following tutorials to make them easier to understand
Tutorial 1: How to free all search engine bots to crawl all web contents indefinitely
User-agent: *
Disallow:
Tutorial 2: How to Block all search engine bots from crawling all web content
User-agent: *
Disallow: /
Tutorial 3: How to block all bots into several directories
User-agent: *
Disallow: / cgi-bin /
Disallow: / tmp /
Disallow: / wp-admin /
Tutorial 4: How to Block only one type of bot. For example, we only want to block the Yandex bot.
User-agent: YandexBot
Disallow: /
How to set Robot.txt to make it more SEO Friendly
By default, the robot.txt setting on blogger and WordPress will allow all search engine robots to crawl as many as all the pages, directories, and files on a website. You need to know, the more freedom we give search engine robots to crawl a website, the worse its impact will be on the SERP of search results. This is because not all pages on a website/blog can be categorized as high-quality pages in the eyes of search engines, and ultimately the more low-quality pages are indexed, the worse the quality of a website or blog is in the eyes of search engines. Therefore, the robot.txt setting is one of the things that need to be done in optimizing Onpage SEO. To make it easier to understand, I divided this tutorial into two parts: the robot.txt setting in wordpress self-hosting and blogger.
How to Setting Robot.txt on WordPress Selfhosting
The first thing you have to do is login cpanel -> file manager -> public _HTML -> find the robots.txt file -> right click edit (utf8). If you don’t find the robots.txt file, please create a new file in public_html and name the file robots.txt
After entering the robots.txt file, please enter the script below
sitemap: http://www.dadsensearticle.com/sitemap.xml
User-agent: *
# disallow all files in these directories
Disallow: / wp-admin /
Disallow: / wp-includes /
Disallow: / cgi-bin /
Disallow: / wp-content /
Disallow: / archives /
Disallow: / *? *
Disallow: *? replytocom
Disallow: / author
Disallow: / comments / feed /
Disallow: * / trackback /
Disallow: / wp- *
Disallow: / *? *
User-agent: Mediapartners-Google *
Allow: /
User-agent: Googlebot-Image
Allow: / wp-content / uploads /
User-agent: Adsbot-Google
Allow: /
User-agent: Googlebot-Mobile
Allow: /
Don’t forget to replace sitemap: www.adsensearticle.com with your domain.
How to Setting Robot.txt for blogger
Please login to your blogger account, select settings -> search preferences -> activate a special robot.txt
User-agent: Mediapartners-Google
Disallow:
User-agent: Googlebot
Disallow: / search
Disallow: /? M = 1
Disallow: /? M = 0
Disallow: / *? M = 1
Disallow: / *? M = 0
User- agent: *
Disallow: / search
Sitemap: http://adsensearticle.com/feeds/posts/default?orderby=UPDATED
Don’t forget to replace the sitemap with their respective Blogspot domains.
You need to know, there are lots of robot.txt recommendations for bloggers on the internet, but I prefer the robot.txt setting in the script above because, in my personal opinion, the above settings can prevent duplicate content.
Also Read: How to Set Up the All In One SEO Pack Plugin