今天使用tar檔案格式的優點是什麼?

今天的問答環節是由SuperUser提供的,SuperUser是Stack Exchange的一個分支,是一個由社群驅動的問答網站分組。...

今天使用tar檔案格式的優點是什麼? The tar archiving format is, in computing years, a veritable Methuselah yet it is still in heavy use today. What makes the tar format so useful long after its inception?

今天的問答環節是由SuperUser提供的,SuperUser是Stack Exchange的一個分支,是一個由社群驅動的問答網站分組。

問題

超級使用者讀者MarcusJ對tar格式很好奇,為什麼這麼多年後我們還在使用它:

I know that tar was made for tape archives back in the day, but today we have archive file formats that both aggregate files and perform compression within the same logical file format.

Questi***:

  • Is there a performance penalty during the aggregation/compression/decompression stages for using tar encapsulated in gzip or bzip2, when compared to using a file format that does aggregation and compression in the same data structure? Assume the runtime of the compressor being compared is identical (e.g. gzip and Deflate are similar).
  • Are there features of the tar file format that other file formats, such as .7z and .zip do not have?
  • Since tar is such an old file format, and newer file formats exist today, why is tar (whether encapsulated in gzip, bzip2 or even the new xz) still so widely used today on GNU/Linux, Android, BSD, and other such UNIX operating systems, for file transfers, program source and binary downloads, and sometimes even as a package manager format?

這是一個非常合理的問題;在過去的三十年裡,計算機世界發生了很大的變化,但我們仍然使用tar格式。怎麼回事?

答案

超級使用者貢獻者Allquixotic對tar格式的壽命和功能提供了一些見解:

Part 1: Performance

Here is a comparison of two separate workflows and what they do.

You have a file on disk blah.tar.gz which is, say, 1 GB of gzip-compressed data which, when uncompressed, occupies 2 GB (so a compression ratio of 50%).

The way that you would create this, if you were to do archiving and compression separately, would be:

tar cf blah.tar files ...

This would result in blah.tar which is a mere aggregation of the files ... in uncompressed form.

Then you would do

gzip blah.tar

This would read the contents of blah.tar from disk, compress them through the gzip compression algorithm, write the contents to blah.tar.gz, then unlink (delete) the file blah.tar.

Now, let’s decompress!

Way 1

You have blah.tar.gz, one way or another.

You decide to run:

gunzip blah.tar.gz

This will

  • READ the 1GB compressed data contents of blah.tar.gz.
  • PROCESS the compressed data through the gzip decompressor in memory.
  • As the memory buffer fills up with “a block” worth of data, WRITE the uncompressed data into the fileblah.tar on disk and repeat until all the compressed data is read.
  • Unlink (delete) the file blah.tar.gz.

Now, you have blah.tar on disk, which is uncompressed but contains one or more files within it, with very low data structure overhead. The file size is probably a couple bytes larger than the sum of all the file data would be.

You run:

tar xvf blah.tar

This will

  • READ the 2GB of uncompressed data contents of blah.tar and the tar file format’s data structures, including information about file permissi***, file names, directories, etc.
  • WRITE to disk the 2GB of data plus the metadata. This involves: translating the data structure / metadata information into creating new files and directories on disk as appropriate, or rewriting existing files and directories with new data contents.

The total data we READ from disk in this process was 1GB (for gunzip) + 2GB (for tar) = 3GB.

The total data we WROTE to disk in this process was 2GB (for gunzip) + 2GB (for tar) + a few bytes for metadata = about 4GB.

Way 2

You have blah.tar.gz, one way or another.

You decide to run:

tar xvzf blah.tar.gz

This will

  • READ the 1GB compressed data contents of blah.tar.gz, a block at a time, into memory.
  • PROCESS the compressed data through the gzip decompressor in memory.
  • As the memory buffer fills up, it will pipe that data, in memory, through to the tar file format parser, which will read the information about metadata, etc. and the uncompressed file data.
  • As the memory buffer fills up in the tar file parser, it will WRITE the uncompressed data to disk, by creating files and directories and filling them up with the uncompressed contents.

The total data we READ from disk in this process was 1GB of compressed data, period.

The total data we WROTE to disk in this process was 2GB of uncompressed data + a few bytes for metadata = about 2GB.

If you notice, the amount of disk I/O in Way 2 is identical to the disk I/O performed by, say, the Zip or7-Zip programs, adjusting for any differences in compression ratio.

And if compression ratio is your concern, use the Xz compressor to encapsulate tar, and you have LZMA2’ed TAR archive, which is just as efficient as the most advanced algorithm available to 7-Zip :-)

Part 2: Features

tar stores UNIX permissi*** within its file metadata, and is very well known and tested for successfully packing up a directory with all kinds of different permissi***, symbolic links, etc. There’s more than a few instances where one might need to glob a bunch of files into a single file or stream, but not necessarily compress it (although compression is useful and often used).

Part 3: Compatibility

Many tools are distributed in source or binary form as .tar.gz or .tar.bz2 because it is a “lowest common denominator” file format: much like most Windows users have access to .zip or .rar decompressors, most Linux installati***, even the most basic, will have access to at least tar and gunzip, no matter how old or pared down. Even Android firmwares have access to these tools.

New projects targeting audiences running modern distributi*** may very well distribute in a more modern format, such as .tar.xz (using the Xz (LZMA) compression format, which compresses better than gzip or bzip2), or .7z, which is similar to the Zip or Rar file formats in that it both compresses and specifies a layout for encapsulating multiple files into a single file.

You don’t see .7z used more often for the same reason that music isn’t sold from online download stores in brand new formats like Opus, or video in WebM. Compatibility with people running ancient or very basic systems.


有什麼要補充的解釋嗎?在評論中發出聲音。想從其他精通技術的Stack Exchange使用者那裡瞭解更多答案嗎?在這裡檢視完整的討論主題。

 

  • 發表於 2021-04-11 23:20
  • 閱讀 ( 41 )
  • 分類:網際網路

你可能感興趣的文章

輕量級標記語言:這就是為什麼應該使用asciidoc而不是常規標記語言

...和顯示你的電子郵件是建立,至少部分,與HTML標記。HTML使用標記作為標記原始文字內容的機制。例如,考慮以下HTML片段: ...

  • 發佈於 2021-03-13 03:23
  • 閲讀 ( 58 )

linux終端的5個聯網命令

在命令列中工作比在圖形使用者介面(GUI)中工作有許多優點。相對於GUI,它幾乎總是更快。更高階的計算機使用者通常更喜歡命令列,因為它的指令碼和簡潔的工作過程。 ...

  • 發佈於 2021-03-13 15:30
  • 閲讀 ( 50 )

放棄cms並考慮靜態站點生成器的7個原因

... 在下一節中,我們將研究SSGs的“靜態”特性相對於今天的cms提供的一些優勢。 ...

  • 發佈於 2021-03-14 15:10
  • 閲讀 ( 55 )

如何在linux上安裝軟體:軟體包格式說明

歡迎使用Linux。很可能你的發行版附帶了大量的軟體來涵蓋基礎知識。然而,無論它做了多麼徹底的工作,你想安裝更多。問題是,怎麼做? ...

  • 發佈於 2021-03-17 09:44
  • 閲讀 ( 59 )

不使用photoshop開啟psd檔案的7種最佳方法

... 或者使用下面許多免費選項之一。它們和Adobe Photoshop一樣好嗎?不可以。事實上,以下大多數應用程式實際上都不能編輯PSD——它們只能將PSD視為扁平影象。這是專有軟體的成...

  • 發佈於 2021-03-23 08:49
  • 閲讀 ( 48 )

七大最佳線上rar提取器

... 此外,使用此轉換檔案所花費的時間比使用歸檔提取器將所有內容下載為ZIP所花費的時間要長。我們建議遠離此網站。 ...

  • 發佈於 2021-03-25 08:53
  • 閲讀 ( 74 )

如何在mac上開啟和提取rar檔案

... Keka是一個壓縮和提取工具,既可以在Mac應用商店中使用,也可以作為一個獨立的安裝。一旦安裝,你所要做的就是雙擊一個RAR檔案來提取它,或者右擊一個並選擇Open。 ...

  • 發佈於 2021-03-25 09:15
  • 閲讀 ( 48 )

使用靜態站點生成器快速構建網站

... 歡迎使用MakeUseOf指南來使用靜態站點生成器。在本指南中,我們將介紹什麼是靜態站點生成器(SSG),為什麼要使用它,以及如何使用它構建一個全新的站點。我們將探討的一...

  • 發佈於 2021-03-25 19:23
  • 閲讀 ( 68 )

執行您自己的比特幣完整節點,只需一個樹莓皮!

...帶來任何實質性的好處。相反,擁有一個節點可以為其他使用比特幣網路的人提供價值。 ...

  • 發佈於 2021-03-28 03:24
  • 閲讀 ( 42 )

如何從檔案中提取檔案。焦油.gz或者。焦油bz2linux上的檔案

Tar檔案是壓縮檔案。在使用像Ubuntu這樣的Linux發行版時,甚至在macOS上使用終端時,您會經常遇到它們。下面介紹如何提取或解壓tar檔案(也稱為tarball)的內容。 這是什麼。焦油.gz還有。焦油bz2什麼意思? 檔案有一個。焦油.gz...

  • 發佈於 2021-04-03 17:18
  • 閲讀 ( 55 )
gixe225249
gixe225249

0 篇文章

作家榜

  1. admin 0 文章
  2. 孫小欽 0 文章
  3. JVhby0 0 文章
  4. fvpvzrr 0 文章
  5. 0sus8kksc 0 文章
  6. zsfn1903 0 文章
  7. w91395898 0 文章
  8. SuperQueen123 0 文章

相關推薦