DISQUS

danielmiessler.com | grep understanding: The Whitehouse.gov Website’s Robots.txt File Has 1839 Lines In It

  • Some Joker · 2 years ago
    Looking at most of those entries, it looks like they're excluding pages which look to be designed for text only browsers/screen readers.. nearly every directory ends in /text

    Disallow: /asia/2005/photoessay/china/text
    Disallow: /asia/2005/photoessay/japan/text
    Disallow: /asia/2005/photoessay/korea/text
    Disallow: /asia/2005/photoessay/mongolia/text
    Disallow: /asia/2005/photoessay/mrsbush1/text
    Disallow: /asia/2005/photoessay/mrsbush2/text


    and if you browse up one directory, you get the same story with pictures..

    I'd say it looks like they are doing it to work around for a poor file structure or possibly to keep search engines from finding duplicate text (although without pictures)

    *shrugs* I'm all for pointing out when the administration does something crooked, but I can't see fault in this one.. (granted, I've only checked out 20 or so of the links.. the only one that didn't go anywhere for me was /video/text )
  • sergei · 2 years ago
    Search in Google for 'robots.txt' shows whitehouse.gov at position 5
  • ghost16825 · 2 years ago
    Ooooh, /secret/ directories. *nods head*
  • Deepak · 2 years ago
    Yup, even I noted it sometime back as an excellent sitemap. ;-)