How do I download a website?

Before we get into how to download a website, here are some basics that you should know. I would broadly classify any website into 3 categories.

  1. Static
  2. Dynamic
  3. Hybrid

Static Website

These are simple websites created using HTML, CSS and javascript. The only way to update the website is to edit the HTML files and upload it again to your hosting.

Dynamic Website

These are basically traditional websites where in you get the data from the database and it’s rendered in your browser. Usually such website may have a backend where an administer can add / update the content of the website using user interface. Since everything is loaded on demand page might load slow compared to static websites.

Hybrid Website

This approach combines the best of static and dynamic websites. In this skeleton framework is loaded which might include some static contents and the dynamic part is loaded using AJAX request. There are lots of modern javascript frameworks that allows you to built some applications very easily.

So coming back to original question, how can download a website or make an offline copy of a website?

Well, it can be easily for most of the websites. It’s straight forward for “static” website since the content does not change that often.

It can be tricky for “dynamic” and “hybrid” kinds if the content keeps changing. Your downloaded website might be outdated hence, it won’t be of any use.

There are various software’s that you get use to download a copy of a website. Since I like to use my terminal, here is a simple command to create a mirror using “wget” command.

wget –mirror –convert-links –adjust-extension –page-requisites –no-parent http://example.org

Explanation of the various flags:

– –mirror – Makes (among other things) the download recursive.
– –convert-links – convert all the links (also to stuff like CSS stylesheets) to the relative, so it will be suitable for offline viewing.
– –adjust-extension – Adds suitable extensions to filenames (HTML or CSS) depending on their content-type.
– –page-requisites – Download things like CSS stylesheets and images required to properly display the page offline.
– –no-parent – When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.

Alternatively, the command above may be shortened:

wget -mkEpnp http://example.org
Note: that the last p is part of np (–no-parent) and hence you see p twice in the flags.

Reference: https://www.guyrutenberg.com/2014/05/02/make-offline-mirror-of-a-site-using-wget/ ( I am not sure if this blog is maintained anymore)


Also published on Medium.

Leave a Reply