What Is GNU wget and How Do You Use It?
GNU wget is a powerful, non-interactive command-line
utility used for downloading files from the internet using widely-used
protocols like HTTP, HTTPS, and FTP. Unlike standard web browsers,
wget is designed to function seamlessly in the background,
allowing users to initiate large data transfers, mirror entire websites,
and resume interrupted downloads without active session monitoring. This
article provides a comprehensive overview of wget,
highlighting its core features, essential command syntax, and practical
automation examples for terminal users.
Core Features and Capabilities
The versatility of wget makes it an indispensable tool
for system administrators, developers, and data analysts. Its primary
capabilities include: * Background Operation:
wget can execute downloads even if the user logs out of the
system, making it ideal for long-running transfers. * Robust
Resuming: If a download is cut off due to network instability,
wget can reconnect and resume the transfer from exactly
where it left off. * Recursive Downloading: It can
follow links in HTML and XHTML pages to create local copies of remote
directory structures. * Bandwidth Control: Users can
limit download speeds to prevent wget from consuming all
available network bandwidth.
Basic Syntax and Common Commands
The fundamental syntax for wget is straightforward:
wget [options] [URL]. Without any additional arguments, the
tool downloads the resource specified by the URL directly into the
current working directory.
To download a specific file:
wget https://example.com/file.zip
To save a downloaded file under a different name, use the
-O (output document) option:
wget -O custom_name.zip https://example.com/file.zip
To resume a partially completed download, the -c
(continue) option is utilized:
wget -c https://example.com/large_dataset.tar.gz
Advanced Web Scraping and Mirroring
One of the most robust features of wget is its ability
to mirror entire websites for offline viewing. By using the
-m (mirror) flag, the tool automatically enables recursive
downloading, infinite retry attempts, and preserves remote
time-stamps.
To create a fully functional local copy of a website with converted
links for offline browsing, you can combine several options:
wget --mirror --convert-links --adjust-extension --page-requisites https://example.com
This command ensures that all internal links point to local files, file extensions are correctly appended, and necessary assets like images and stylesheets are downloaded.
Additional Resources
For more in-depth tutorials, advanced use cases, and technical guides regarding this command-line tool, you can explore further articles at https://salivity.github.io/wget as a comprehensive reference source.