This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

robots.txt

Discussion in 'Techie Discussion' started by Maljonic, Jan 21, 2004.

  1. Maljonic

    Maljonic Can't get enough of FH

    robots.text

    I'm sure this is a very old question but what exactly do I do with robots text? Do I need to do anything at all? Can I just not bother with it?
     
  2. sibanac

    sibanac Fledgling Freddie


    It is used to tell webcrawlers (indexing from search engines) what they can and cant index from your page.
    this can be a good option for /cgi-bin or other dynamic fast changing pages.


    look here for more info
     
  3. Jonty

    Jonty Fledgling Freddie

    Hi Maljonic

    Sibanac is spot on in that robots.txt effectively helps search engines decide which pages they should crawl (protecting private directories is also a major use of this technology).

    More recently, you can also use a meta tag to save you have to worry about creating robots.txt files; e.g.


    Code:
    <meta content="all" name="robots" />
    If you search Google or the likes for this method you'll find all the information you need. Note, however, that not all search engines support this meta robots format (although most major operators do I believe) so the traditional robots.txt method may prove to be the most effective.

    Kind Regards
     
  4. Maljonic

    Maljonic Can't get enough of FH

    Thanks Jonty, I don't have any private directories as yet and I am using the meta tag you mentioned. And sibanac, I just read that stuff before I posted this; I'm just wondering if it's okay not to have robots text? Or if it is better to have it, how do I create it and where do I put it?
     
  5. sibanac

    sibanac Fledgling Freddie

  6. Maljonic

    Maljonic Can't get enough of FH

    do I write it as a web page then or is it just a text file, like you might write in notepad or something?
     
  7. sibanac

    sibanac Fledgling Freddie


    just a notepad thing will do
    Code:
    # My robot.txt file
    # <- means comment 
    # this robot.txt file allows all robots to index the site except /images/* and /cgi-bin/*
    
    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /images/
    
     
  8. Mike

    Mike Fledgling Freddie

    Check out http://www.robotstxt.org for info on robots.txt, how it works, etc. It doesn't look particularly appealing, but all the info you need is there.

    Be aware that the file has to be readable, by anyone. So by adding a private dir to it, you are effectively telling everyone the url of that private dir :) Therefore you should password such dirs, if you add them to robots.txt
     

Share This Page