Wednesday, January 26, 2022

using wget to mirror a website

In order to download all the files on a website, first I tried WebHTTrack, but it was not limiting itself to downloading files on that domain even after going to the experts option and choosing stay on same domain. Maybe I should have tried the filters as in this forum post. Anyway, found a simpler method using wget, so just did

wget \
     --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --restrict-file-names=windows \
     --domains our.domain \
     our.domain/link/to/index.html

         

No comments:

Post a Comment