Why are tar and gzip almost always used together, and not just gzip? Is there any advantage to that method?


TAR creates a single archived file out of many files, but does not compress them.

Format Details

A tar file is the concatenation of one or more files. Each file is preceded by a 512-byte header record. The file data is written unaltered except that its length is rounded up to a multiple of 512 bytes and the extra space is zero filled. The end of an archive is marked by at least two consecutive zero-filled records.

GZIP compresses a single file into another single file, but does not create archives.

File Format

...Although its file format also allows for multiple such streams to be concatenated (zipped files are simply decompressed concatenated as if they were originally one file), gzip is normally used to compress just single files.[4] Compressed archives are typically created by assembling collections of files into a single tar archive, and then compressing that archive with gzip.

| improve this answer | |
  • 12
    There is no such thing as a "tgz" file. It is a tar.gz. The job of gzip is to zip or unzip its content (in this case, a tar archive.) Then you unarchive it with tar. It is typical Unix pipelining philosophy, and thus hardly unique. – luis.espinal Mar 2 '11 at 11:00
  • 5
    No, .tar.gz isn't unique: .tar.bz2, .cpio.gz, etc. work the same way. – user46971 Mar 2 '11 at 13:55
  • 3
    @user36310 I know what you mean in principle, but in practice the tools let you extract a single file. tar -xzvf tarball.tar.gz single/file.txt. Behind the scenes it needs to do some extra work but for appearances it extracts a single file. – Rich Homolka Mar 2 '11 at 18:44
  • 4
    Make that "a lot of extra work" if the file is at the end of a large archive. Clearly, if you need random access, zip/rar/xar/7z/lzh/arj/cab/sit etc. are superior formats. – LaC Apr 20 '11 at 11:19
  • 3
    To be precise, a .tar.* compressed archive is always “solid”, ie. consists of a single compressed stream. A .zip archive on the other hand is not solid at all, the compression algorithm is started anew for each file. It sacrifices compression efficiency to speed up random access. .7z archives can be solid, non-solid or to have solid blocks. – Daniel B May 28 '16 at 19:32

Gzip / Bzip2 are stream compressors. They compress a stream of data into something smaller. They could be used on individual files, but not on groups of files on their own.

Tar on the other hand has the ability to turn a list of files, with paths, permissions and ownership information, into a single continuous stream - and vice versa.

That's why, to archive files (and if one needs compression as well), one usually uses tar + some compression method.

| improve this answer | |

Tar is in charge of doing one and only one thing well: (un)archiving into(out of) a single archive file. Of what? Of one and only one thing: a set of files.

Gzip is in charge of doing one and only one thing well: (un)compressing. Of what? Of one thing and one thing only: a single file of any type... and that includes a file created with tar.

It goes back to the UNIX philosophy of pipelining, the underlying "pipe and filters" architecture ; the treatment of everything as a file and the sound architectural goal of "one-thing-does-one-thing-only-and-does-it-well" (which results in a very elegant and simple plug-n-play of sorts.)

In its simplicity, it is almost algebraic in nature (a hefty goal in systems design). And that is no easy feat.

In many ways (and not without its flaws), this is almost a pinnacle in composability, modularity, loose coupling and high cohesion. If you understand these four (and I mean really understand), you understand, it will be obvious why tar and gzip work like that in pairs.

| improve this answer | |
  • 1
    This UNIX philosophy is beautiful all right, but I'm observing that it falls short in, creating non-solid archives. (Extracting a single file from a 1-GB tar.gz shouldn't be a pain, and from what I've understood here, ZIP is pretty much superior to tar.gz... right?) – user541686 Mar 2 '11 at 17:06
  • @Mehrad - First, what is a non-solid archive? As attributed to Voltaire, "If you would converse with me, first you must define your terms.". Second, yes the pipe and filters architecture falls short in specific cases, just like any other architecture, regardless of its beauty. That's a given with a modicum of engineering, and it is not the argument that is being made. Third, zip is superior to gzip and tar, but that was not what you asked. You asked why zip and tar work they way they do and if there were any advantages, and you were given a technical answer. – luis.espinal Mar 2 '11 at 20:36
  • @Mehrad - also, I don't know what kind of problems do you encounter when unzipping|untar a 1gig tar.gz file. I've done that many times, up to 2GB with older installs of gzip (and up to 4gb with newer versions of gzip.) If you are doing it over the wire or on a NFS mount, then duh! You'll encounter similar performances problems as if done with plain zip. Heck, I've even untarred from a pipe to a remote process spitting gzip input into a socket. Try that with zip. For each problem, use the appropriate tool (be it tar|gz or zip.) – luis.espinal Mar 2 '11 at 20:41

First of all, TAR wasn't created to create file archives. It's Tape ARchiver. It's job is to write out or load in an archive to/from tape.

The -f option makes it use a file as "virtual tape", which can then be compressed by another program. In fact, such compression happens on real-world tapedrives as well.

Of course, the philosophy of using one program to do it well also counts in this case, but one might miss why TAR archives are structured as a stream instead of directory of contents + contents.

| improve this answer | |
  • 3
    Right... ZIP files put all the file info in a unified header, then all the file contents... that makes it impossible to append more files to a ZIP file... you have to rewrite the whole file... with TAR format, the header for each file is separate, so you can easily append more files without rewriting the whole tape – JoelFan Mar 2 '11 at 19:57

Traditionally, Unix systems used one program to perform one task per the Unix philosophy: tar was just a means to package multiple files into a single file, originally for tape backup (hence tar, tape archive). tar does not provide compression; the resulting uncompressed archive is typically compressed with some other program such as gzip, bzip2, or xz. In the old days, they'd use the compress command to do this; newer compression algorithms are much more effective than this.

The highly modularized approach dictated by the Unix philosophy means that each program can be used individually as appropriate, or combined to perform more complex tasks, including the creation of compressed archives as described here. For these sorts of tasks, it also makes it easy to swap out individual tools as needed; you'd just change the compression program to use a different compression algorithm, without having to replace the tar utility itself.

This modular approach is not without its disadvantages. As mentioned in comments to other answers, a dedicated compressed archive format like .zip is better able to handle extraction of individual files; compressed tarballs need to be decompressed almost in their entirety in order to extract files near the end of the archive, while .zip archives allow random access to their contents. (Some newer formats, such as .7z, support solid and non-solid archives, as well as solid blocks of varying size in larger archives.) The continuing use of tar in conjunction with a separate compression utility is a matter of tradition and compatibility; also, .7z and .zip do not support Unix filesystem metadata such as permissions.

| improve this answer | |

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.