Downloading and Decompressing an Archive in One Command

TL;DR

Use untar_url.sh

untar_url <archive_url> <dest_dir> [<nb_components_to_strip>]

Introduction: The Usual Solution

This is an old trick that combines wget and tar by piping the output of the former command to the input of the latter.

wget -qO- <file.tar.gz> | tar xzf -

A typical output would be something like:

$ wget -qO- <file.tar.gz> | tar xzf -
N.O.T.H.I.N.G.

And that’s my issue with this solution: It doesn’t provide the user with any feedback.

(Of course, one could use the v option with tar in order to show which files are being decompressed. But still, you can’t estimate the overall progress from that)

A Better Approach: Enter “pv”

From the manual:

pv allows a user to see the progress of data through a pipeline, by giving information such as time elapsed, percentage completed … and ETA.

To use it, insert it in a pipeline between two processes, … Its standard input will be passed through to its standard > output and progress will be shown on standard error.

Well, that’s convenient! So, basically, we should be fine using pv between wget and tar:

wget -qO- <archive_url> | pv | tar xzf -

e.g. To download and decompress the latest version of CLion, I would type:

$ wget -qO- 'http://download.jetbrains.com/cpp/CLion-171.3019.8.tar.gz' | pv | tar xzf -
1,27MiB 0:00:09 [ 193KiB/s] [            <=>                                                                  ]

Neat! But, where is the ETA ? And why is the progress bar not showing the actual progress ?

Well, the ETA is the ratio between the data size and the data rate. And even though pv can estimate the data rate, (1,27MiB in the above example) it won’t know the total size until the very end when the pipe is closed.

It turns out that with the right combination of wget, grep and awk, the file size can be obtained before downloading the file:

$ wget --spider <archive_url> 2>&1 | grep Length | awk '{print $2}'
<archive size in bytes>

Also, according to the manual, pv accepts the -s SIZE (long form --size SIZE) option which tells it to assume that the total size of the data is SIZE.

Putting It All Together

The following listing shows untar_url(), a simple Bash function that:

  1. Gets the file size, which will be passed to pv
  2. Creates the folder into which the archive will be decompressed
  3. Pipes the output of wget to pv and then to tar, like previously shown
# Usage: untar_url <archive_url> <dest_dir>
function untar_url()
{
    local archive_url="$1"
    local dest_dir="$2"
    local file_size="$(wget --spider "${archive_url}" 2>&1 | grep Length | awk '{print $2}')"

    mkdir -p "${dest_dir}"
    wget -qO- "${archive_url}" | pv -s "${file_size}" | tar xzf - -C "${dest_dir}"
}

Usage example:

$ untar_url 'http://download.jetbrains.com/cpp/CLion-171.3019.8.tar.gz' /opt/jetbrains/clion
33,2MiB 0:02:01 [ 291KiB/s] [======>                                                         ] 11% ETA 0:15:12

$ ls /opt/jetbrains/clion
bin/  help/  jre/  lib/  license/  plugins/  ./  ../  build.txt*  Install-Linux-tar.txt*

Victory!

Conclusion

pv is very convenient. This article is just scratching the surface of its potential. You can find out more about pv at Tecmint and nixCraft.

Also, check out the Gist of untar_url.sh for an updated and tweaked version of untar_url().

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s