Page 1 of 1

Keeping Spiders & Robots from Accessing Site

Posted: Sat Dec 27, 2014 11:03 pm
by paradiselost
Google and other spiders try to load pages. While they are unsuccessful it loads the server unnecessarily.

Since the server doesn't have a robots.txt nor an Apache/Linux htaccess file this is a recurring problem. Perhaps a meta tag NOINDEX NOFOLLOW would help.

Any ideas?

John

Re: Keeping Spiders & Robots from Accessing Site

Posted: Sun Dec 28, 2014 2:45 pm
by agw
I'll add an option to serve a robots.txt for those people who have put their server onto the Internet and don't want to be spidered.

In the meantime you can use the custom content plugin to add the robots meta tag to the head of every page - create an HTML file with this in it:

Code: Select all

<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
and then tell the custom content plugin to inject it at the start of the head for every page.

If you want a robots.txt then you can also use the custom content plugin to serve one - create a folder somewhere on your drive, put the robots.txt into that folder and then set that folder as the site root folder. You don't need to reproduce the entire site in that folder, you can just have a robots.txt in there and nothing else.

Re: Keeping Spiders & Robots from Accessing Site

Posted: Mon Dec 29, 2014 8:18 am
by paradiselost
If you want a robots.txt then you can also use the custom content plugin to serve one - create a folder somewhere on your drive, put the robots.txt into that folder and then set that folder as the site root folder. You don't need to reproduce the entire site in that folder, you can just have a robots.txt in there and nothing else.
Let me see if I understand this correctly. I create a robots.txt file and put it in a folder. The site root is now virtualradar. Please walk me through a spider finding it when the address is xxxx.com/virtualradar/ with a GET command like Get http://xxxx.com/foldername/robots.txt

John

Re: Keeping Spiders & Robots from Accessing Site

Posted: Mon Dec 29, 2014 9:57 am
by paradiselost
I created a robots.txt with notepad and put it in the virtualradar folder. From the internet using the GET command http://philippineradars.com/virtualradar/robots.txt it is visible. That along with the meta tags in the headers of the desktop and mobile should do the trick.

John
http://philippineradars.com/virtualradar/