SiteSucker

Looking for an offline copy of a website? Perhaps there is a lot to look at but not much time online? No problem. SiteSucker is a donationware application for MacOSX that archives websites. You give it a URL and it archives everything from html, and images, to movies and flash. Everything is copied so that you can browse at your leisure.
CoolOSXApps < SiteSucker

  • From the SiteSucker site: “By default, SiteSucker honors robots.txt exclusions and the Robots META tag. Therefore, any directories or pages disallowed by robot exclusions will not be downloaded by SiteSucker. This behavior, however, can be overridden with the Ignore Robot Exclusions setting under the Advanced tab in the download settings.”
    Have no doubt many lawsuits could come from use of the override feature. Even the Google Gods don’t dare to cache sites tagged “Do Not Cache.”
    Sitesucker could be considered aiding and abetting in Federal Copyright suits and I would love to be the test case.

  • From the SiteSucker site: “By default, SiteSucker honors robots.txt exclusions and the Robots META tag. Therefore, any directories or pages disallowed by robot exclusions will not be downloaded by SiteSucker. This behavior, however, can be overridden with the Ignore Robot Exclusions setting under the Advanced tab in the download settings.”
    Have no doubt many lawsuits could come from use of the override feature. Even the Google Gods don’t dare to cache sites tagged “Do Not Cache.”
    Sitesucker could be considered aiding and abetting in Federal Copyright suits and I would love to be the test case.

  • And? That’s part of robots crawling procedures, not readability. So in all technicality you can still download a site to read. That’s what it’s online for.
    That’s not aiding and abetting. And Federal Copyright lawsuits? You got to be kidding me. You’re downloading a site to read it. You can’t download sites that you aren’t allowed in (say .htaccess or other types of security measures).
    Downloading sites for offline reading has been around since the Dark Ages of the Internet.

  • darkmoon

    And? That’s part of robots crawling procedures, not readability. So in all technicality you can still download a site to read. That’s what it’s online for.
    That’s not aiding and abetting. And Federal Copyright lawsuits? You got to be kidding me. You’re downloading a site to read it. You can’t download sites that you aren’t allowed in (say .htaccess or other types of security measures).
    Downloading sites for offline reading has been around since the Dark Ages of the Internet.

  • I think you missed my point. Sitesucker could be used as a tool to commit copyright infringement and therefore could end up in the same hot water as many file sharing sites have in the past.
    I know sites are put online so they can be read but they are also pulled off line at the request of publishers so they don’t have to compete with free.
    Case in point: My newest book was online for 2 years and the only reason I was able to seal the deal was the fact that it could not be found in any caches.
    Sightsucker ignores “Do not cache” and could become a very valuable tool for scrapers and such.

  • I think you missed my point. Sitesucker could be used as a tool to commit copyright infringement and therefore could end up in the same hot water as many file sharing sites have in the past.
    I know sites are put online so they can be read but they are also pulled off line at the request of publishers so they don’t have to compete with free.
    Case in point: My newest book was online for 2 years and the only reason I was able to seal the deal was the fact that it could not be found in any caches.
    Sightsucker ignores “Do not cache” and could become a very valuable tool for scrapers and such.

  • Point is… caching and offline reading is too different things. Caching doesn’t equate to offline content.
    All offline readers can ignore robots.txt, so what you’re assuming is incorrect about the technology.

  • darkmoon

    Point is… caching and offline reading is too different things. Caching doesn’t equate to offline content.
    All offline readers can ignore robots.txt, so what you’re assuming is incorrect about the technology.