Smaller Zip archives than what's normally expected from the Php ZipArchive compression

The zipping algorithms we have tested (Php 5.3.3 et une distribution Linux 2.6.31.14-0.8-desktop x86_64) do not compress the file names. It follows that in many cases a secondary compression reduced considerably the final archive size, because full blown file names are finaly compressed! Note : a 3rd compression generally increased the archive size, So we have inside the Ataox environment two ways to generate ultra-compressed files :
  1. Zip-->zip : meaning two zip compressions in a row and using the Php class ZipArchive.
  2. Concat-->zip : meaning first a concatenation of the file and directories we want to archive with the File-manager menu option : Concatenation / Build, then a standard zip archive. Note the concatenator generates pseudo .tar files that are really text files having a fake ".tar" extension to allow an easy downloading from any browser.
Surprise : the compression method "concat-->zip", compress often (may be always) better (*x) than the "zip-->zip" compression (*x) by the way many compression / concatenation methods are available and we tested only two

To tests to examine the differences :

Files and directories of the Ataox fully uncompressed distribution 1.2.0 beta (note the real distribution has an hybrid compression and is therefore slightly different from our examples) : we obtain for the cases "concat-->zip" and "zip-->zip" the following results : (testConcat.zip : 1 Mb + 773 Kb + 495 bytes || testZip.zip : 1 Mb + 931 Kb + 714 bytes) ----> (1024*1024+931*1024+714) / (1024*1024+773*1024+495) = 1.08802 as a saving factor, which means a storage saving of roughly 8.8%.

A few files and directories with the longest possible names (255 characters of 1 byte each, multibytes chars not tested) : we get 965 bytes for a "zip-->zip" archive and 402 bytes for a "aglo-->zip" archive, which gives the following ration : 965 / 402 = 2.400498 so a benefit of about 240%, the used example is available here with a "zip-->zip" compression.

The concatenator was not created with the goal to save a few percent from a storage memory cheaper from year to year but for important needs, because :
  • At that time the php class ZipArchive did not exist.
  • The classical FTP approach was often heavy and cumbersome to use, a point of view quite actual and it's hard to understand why http transfer application did not replace the FTP protocol as a default for host installation.
  • Some Php environment have difficulties with the class ZipArchive, saw that on a php 5.6, but it could have been a local compiling error. So it was possible to extract a full archive, but not possible to compress a single directory, and of course in these conditions, having an available concatenator is very useful.
Remarks :
  • The comparison between a simple concatenator and the actual zip action is quite instructive..
  • Possible to find a few more advantages to the concatenator in the future.
Conclusion : a first step concatenation often improves the resulting archive size, but the gain is not enough to justify the manipulation, so the concatenator is more likely to be used in niche situations, for example when the standard zip class is not fully available.




Ataox a CMS adapted for Search Engine Optimization.