Keeping Spiders & Robots from Accessing Site

Want to post something that doesn't quite fit into the other forums? This is the place for that.
Post Reply
paradiselost
Posts: 103
Joined: Sun Apr 06, 2014 10:22 am
Location: Philippines

Keeping Spiders & Robots from Accessing Site

Post by paradiselost » Sat Dec 27, 2014 11:03 pm

Google and other spiders try to load pages. While they are unsuccessful it loads the server unnecessarily.

Since the server doesn't have a robots.txt nor an Apache/Linux htaccess file this is a recurring problem. Perhaps a meta tag NOINDEX NOFOLLOW would help.

Any ideas?

John
Working Example of Version 2 Beta Virtual Radar Server http://dgteflyovers.ddns.net/virtualradar/
VRS 2 Help Files http://dgteflyovers.ddns.net:8080

agw
Posts: 2249
Joined: Fri Feb 17, 2012 3:20 am

Re: Keeping Spiders & Robots from Accessing Site

Post by agw » Sun Dec 28, 2014 2:45 pm

I'll add an option to serve a robots.txt for those people who have put their server onto the Internet and don't want to be spidered.

In the meantime you can use the custom content plugin to add the robots meta tag to the head of every page - create an HTML file with this in it:

Code: Select all

<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
and then tell the custom content plugin to inject it at the start of the head for every page.

If you want a robots.txt then you can also use the custom content plugin to serve one - create a folder somewhere on your drive, put the robots.txt into that folder and then set that folder as the site root folder. You don't need to reproduce the entire site in that folder, you can just have a robots.txt in there and nothing else.

paradiselost
Posts: 103
Joined: Sun Apr 06, 2014 10:22 am
Location: Philippines

Re: Keeping Spiders & Robots from Accessing Site

Post by paradiselost » Mon Dec 29, 2014 8:18 am

If you want a robots.txt then you can also use the custom content plugin to serve one - create a folder somewhere on your drive, put the robots.txt into that folder and then set that folder as the site root folder. You don't need to reproduce the entire site in that folder, you can just have a robots.txt in there and nothing else.
Let me see if I understand this correctly. I create a robots.txt file and put it in a folder. The site root is now virtualradar. Please walk me through a spider finding it when the address is xxxx.com/virtualradar/ with a GET command like Get http://xxxx.com/foldername/robots.txt

John
Working Example of Version 2 Beta Virtual Radar Server http://dgteflyovers.ddns.net/virtualradar/
VRS 2 Help Files http://dgteflyovers.ddns.net:8080

paradiselost
Posts: 103
Joined: Sun Apr 06, 2014 10:22 am
Location: Philippines

Re: Keeping Spiders & Robots from Accessing Site

Post by paradiselost » Mon Dec 29, 2014 9:57 am

I created a robots.txt with notepad and put it in the virtualradar folder. From the internet using the GET command http://philippineradars.com/virtualradar/robots.txt it is visible. That along with the meta tags in the headers of the desktop and mobile should do the trick.

John
http://philippineradars.com/virtualradar/

Post Reply