Posted December 9, 2020 Halo MCC introduced compression to some of the game files. All of these formats are generally the same concept, but applied in different ways. In this topic, we're going to discuss the different forms of compression found in a few of the MCC halo versions. Primarily halo cea, and h2a pc. Compression on h1a standalone release for the 360 is different. Each of the compression algorithms use zlib compression. Generally the file gets broken up into certain size chunks, each chunk gets compressed and laid out sequentially. An offset to the beginning of each chunk is stored in the header. Each algorithm uses a variation of this technique, and I will cover the three main ones. H1A compression - Used in halo 1 anniversary files (*.map, *.s3dpak, *.ipak) * NOTE: due to a recent update, map files are no longer compressed * H2A compression - Used with halo 2 anniversary non-map files (*.pck) H2Am compression - Used with halo 2 anniversary map files (*.map) I have developed a tool for decompressing both generations of files with a quick right click shortcut. H1A Compression Spoiler The header for h1a compression is 0x40000 bytes wide (also where the first chunk starts). The first 32bit integer of the header is the chunk count (yellow). Following this is an array of offsets to each chunk as 32 bit integers (green). in sequential order. Each chunk has a maximum size of 0x20000 bytes decompressed. (compressed size will vary because zlib) the first 32 bit integer of the chunk is the decompressed chunk size (shown in yellow; this will usually be 0x20000 unless the last chunk). The rest of the data is the zlib compressed chunk of data. Theroetically that imposes a file size limit. Due to the fact we only have enough offsets to fill the header (excluding the first 4 bytes for the chunk count), and each chunk can only be a certain size. We get: limit = ((header_size - count_size) / offset_size) * chunk_size = ((0x40000 - 4) / 4) * 0x20000 or about 8.5GB. This would be true if the offsets were 64 bit values; however, since the offsets are 32 bit about half of the available header space goes unused. bringing us to a theoretical limit of ~4GB H2A Compression Spoiler The header for h2a compression is 0x600000 bytes wide (also where the first chunk starts). The first 32bit integer of the header is the chunk count (yellow). The second 32bit number is a bitmask that tells if file is compressed (red) Following this is an array of offsets to each chunk as 64 bit integers (green). in sequential order. for H2A there is a flag that tells the engine if the file is uncompressed (red). this is used in shared.pck. (Nobody got time to decompress a 12gb file every time they load a map) the file still gets broken up into "chunks" in the header; however, the file its self is uncompressed. Spoiler This is a 32bit bitmask and stored right after chunk count (uncompressed = 4) Each chunk has a maximum size of 0x8000 bytes decompressed. (compressed size will vary because zlib) One major difference between h1a and h2a compression is chunk size is not stored in h2a chunk. Each offset points straight to a zlib compressed chunk of data. (78 01 is a valid zlib header) Theroetical file size limit: limit = ((header_size - count_size) / offset_size) * chunk_size = ((0x600000 - 8) / 8) * 0x8000 or about 25gb. This time the available space can actually be used; however, amounts will vary depending on zlib compression ratios and such H2Am Compression Spoiler The header for h2am compression is a bit more unique than the rest. at first glance in a hex editor, it would look like the file is uncompressed. This is because the maps are compressed after their Blam! header. The header for h2a maps is actually larger than h2 classic maps coming in around 0x1000 bytes wide. after the Blam! header There is 0x2000 bytes of pairs of chunk size, chunk offset. There is no chunk count for h2am. Instead, you read until you get to your first set of 0,0. Chunks can be uncompressed, uncompressed chunks have their size negative. So each chunk should be read using the absolute value of the chunk size. (notice the footer for the header highlighted in orange) Each chunk has a maximum size of 0x40000 bytes decompressed. (compressed size will vary because zlib) The "header" has a size of 0x2000 (header in quotes because this time it comes after the Blam! header.) And like h2a compression it doesn't have a chunk size in the chunk. It just goes straight into zlib header Theroetical file size limit: limit = ((header_size - count_size) / offset_size) * chunk_size = ((0x2000 - 4) / 4) * 0x40000 or about 500mb (+0x1000 for the blam header) Something else to note is each chunk is aligned to the nearest 0x80 offset. (either 0x______80, or 0x______00) in hex, this is the same concept as stepping in increments of 50. the first two digits will always contain either a 50, or 00 regardless of the higher significant digits. Halo 2 maps are aligned as well, so this may of just been a natural progression of the existing architecture. It is untested if these constraints need to be maintained. EDIT: Some software uses a variation on the compression algorithms. MCC does not verify the first chunk offsets, This means we could leave the superfluous 0's out of the header. For example Invader only writes the offsets in the header needed for the file. Meaning the chunks start right after the last offset. This can reduce the header size significantly. This is not required for understanding; however, you may come across this variant in your hex adventures. When writing algorithms, I feel it is good practice to continue this way. EDIT 2: Fixed some information relating to h2am compression EDIT 3: They changed how h2am map compression works, I need to look into the format; however, until the next update assume the information for h2am is antiquated Takka, tarikja, ST34MF0X and 2 others like this Specifications: S3dpak - format - Imeta/ipak - format - Fmeta - format Programs: H2a-inflate - SuP Share this post Link to post Share on other sites
Posted June 18, 2021 (edited) Holy HECK I made a huge mistake on this. H2A chunks are not chunk count > offsets. They're (chunksize, chunk) x 500. I even wrote it right in QuiCript. I guess I just completely spaced when I wrote this. Tool me 7 months to notice. I'm so sorry, my dudes. When I'm home from work I will fix. Edited June 18, 2021 by Zatarita Enclusion, ST34MF0X and Takka like this Specifications: S3dpak - format - Imeta/ipak - format - Fmeta - format Programs: H2a-inflate - SuP Share this post Link to post Share on other sites