Before we get into how to download a website, here are some basics that you should know. I would broadly classify any website into 3 categories.
These are basically traditional websites where in you get the data from the database and it’s rendered in your browser. Usually such website may have a backend where an administer can add / update the content of the website using user interface. Since everything is loaded on demand page might load slow compared to static websites.
So coming back to original question, how can download a website or make an offline copy of a website?
Well, it can be easily for most of the websites. It’s straight forward for “static” website since the content does not change that often.
It can be tricky for “dynamic” and “hybrid” kinds if the content keeps changing. Your downloaded website might be outdated hence, it won’t be of any use.
There are various software’s that you get use to download a copy of a website. Since I like to use my terminal, here is a simple command to create a mirror using “wget” command.
wget –mirror –convert-links –adjust-extension –page-requisites –no-parent http://example.org
Explanation of the various flags:
– –mirror – Makes (among other things) the download recursive.
– –convert-links – convert all the links (also to stuff like CSS stylesheets) to the relative, so it will be suitable for offline viewing.
– –adjust-extension – Adds suitable extensions to filenames (HTML or CSS) depending on their content-type.
– –page-requisites – Download things like CSS stylesheets and images required to properly display the page offline.
– –no-parent – When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.
Alternatively, the command above may be shortened:
wget -mkEpnp http://example.org
Note: that the last p is part of np (–no-parent) and hence you see p twice in the flags.
Reference: https://www.guyrutenberg.com/2014/05/02/make-offline-mirror-of-a-site-using-wget/ ( I am not sure if this blog is maintained anymore)
Also published on Medium.