Zatarita

SeT team - progress

56 posts in this topic

Decompression is a go.

All I need at this point is to thread the decompression. Because they're template classes, it's header only.

Also started writing the compression algorithm; however, I realized I didn't really need a "object" to achieve this. I can just pass the data to a function that branches to the correct function for the compression type.

Debating on still making it an object though for clarity

 

Not sure how long/how much decompression will be needed, but until then we have it taken care of

Takka likes this

Specifications:

S3dpak - format - Imeta/ipak - format - Fmeta - format

Programs:

H2a-inflate - SuP

Share this post


Link to post
Share on other sites

Tiddy-bits:

Alright, So I ended up deciding on a compression object.
I can't test h2am yet until I fix my files ;-; accidently messed up my backups. This seems to be what I'm going to go with


Decompression and compression objects are template classes; however, I wrapped them in a static function to make their usage a little bit easier.

// Decompression Objects
// H1A Decompression objects
auto compressedFile = Compression::H1ADecompressionObject(path);
auto compressedFile = Compression::CEADecObj(path);
// H2AM Decompression objects
auto compressedFile = Compression::H2ADecompressionObject(path);
auto compressedFile = Compression::H2ADecObj(path);
// H2AM Decompression objects
auto compressedFile = Compression::H2AMDecompressionObject(path);
auto compressedFile = Compression::H2AMDecObj(path);

// Or the raw constructor
Compression::DecompressionObject<uint32_t> H1A(path, Compression::H1A);
Compression::DecompressionObject<uint64_t> H2A(path, Compression::H2A);
Compression::DecompressionObject<uint32_t> H2AM(path, Compression::H2AM);

// Compression Objects
// H1A Compression objects
auto compressor = Compression::H1ACompressionObject(path);
auto compressor = Compression::CEACompObj(path);
// H2AM Compression objects
auto compressor = Compression::H2ACompressionObject(path);
auto compressor = Compression::H2ACompObj(path);
// H2AM Compression objects
auto compressor = Compression::H2AMCompressionObject(path);
auto compressor = Compression::H2AMCompObj(path);

// Or the raw constructor
Compression::DecompressionObject<uint32_t> H1A(Compression::H1A);
Compression::DecompressionObject<uint64_t> H2A(Compression::H2A);
Compression::DecompressionObject<uint32_t> H2AM(Compression::H2AM);

 

The compression objects act more like a "machine" than anything.
I added flags for things like minimizing header and file size. This implemented invader-esque minimal header for h1a, and by extension the other generations (h2a works; however, h2am is untested.)
 

Decompression objects can all do standard decompress and save to disk, but it can also decompress and extract, or just decompress and cache each decompressed chunk. The plan is to have each higher level object hold a decompression object that holds access to the stream. say for example the ipak. The plan would be to decompress the first two chunks manually to get the header, which holds all the offsets and sizes. Then from there, you can use the decompression objects "get" function to decompress the chunks you need to extract the data from the file. meaning you only need to decompress the exact number of chunks you need, plus 2. Of course you can opt to just "decompressAll" the chunks, which would cache the decompressed chunks; however, this shouldn't really be needed to be called, ever. The decompression object will ensure it has the chunks it needs to extract the data. Even if that include extracting all the data.

Over-all compression, and decompression has been implemented for each generation of game. I just need to do some polishing, and add a few quality of life features to some of the over arching systems (like adding Endian IO access to an endian stream targeting a ByteArray instead of file, and the ability to reassign the decompression objects target file)
 

I also plan to start designing my own personal website. This will act as more of a portfolio; however, I plan to upload the documentation to there so I can have a bit more control than something like github pages.

Edited by Zatarita
Takka likes this

Specifications:

S3dpak - format - Imeta/ipak - format - Fmeta - format

Programs:

H2a-inflate - SuP

Share this post


Link to post
Share on other sites

A Few more of my books came in! I'm big excited. I feel I have leveled up as bit more as a programmer.

one explains the standard library a bit more in depth. I found a few optimizations.

1) I've heard of move semantics, and seen them in the field; however, never fully understood how to utilize their potential. I'm going to review my previous code and attempt to implement

2) I read up on std::spans, and they can be used to increase the efficiency of my code a lot. Since I'm using a vector array of bytes. I'm copying elements (unless I move) even when passing by reference (potentially).  By implementing a span instead I'm passing a read only view into a byte array, and I don't have to copy the contents without moving the data out of the current context. Plus this allows me to "chunk" the data more elegantly.

3) I read up on threading. This may be a bit more complicated than I first thought to implement efficiently. I feel it could be an ongoing battle of trying to milk as much efficiency as I can from it; HOWEVER, I don't feel it will be as needed with the progressive decompression. I'll implement standard async concurrency, but I don't think I'm going to over engineer for now. which segways me into the next book

 

I started reading up on system architecture as well. I found a lot of my code is unsustainable for a few reasons
 

1) What I program is called "accidental architecture" and I really need to plan things out better

2) My functions aren't designed to be extended, and they are too specialized. ( Though my intuition was correct here. I now understand how to properly do what I wanted to do. )

3) My objects are way too codependent.

4) I should utilize more for the standard library when possible for more portable code.

...

 

The list goes on, and they're things I'm going to try and keep in mind; however, these are the "big things" I'm focusing on.

 

THOUGH
this means I need to redo somethings. I may just completely rewrite the compression algorithm.

 

I plan to keep the decompression algorithm though. I will just implement spans, and move, when passing data around so I only have to allocate memory once when parsing the file.

 

 

After this I plan to write the saber API complete. This will just be the back bone for loading the data that the UI will access and display.
Sadly this stage doesn't have as much visual "tangible" progress; however, I have been chugging away.

Edited by Zatarita
Takka likes this

Specifications:

S3dpak - format - Imeta/ipak - format - Fmeta - format

Programs:

H2a-inflate - SuP

Share this post


Link to post
Share on other sites

Very glad I rewrote the compression algorithm. I've been trying to apply some of the good practices. I feel it's MUCH cleaner of an implementation now. Paradoxically the code body has drastically increased; however, the division of labor is clear, and since each function is broken into the smaller pieces that make it up, it's easier to make changes by changing the smaller functions that make up the larger functions.

It feels very sacrilegious though. It will definetly take some getting used to.

 

Went ahead and defined the structures for s3dpaks, imeta, ipak. I'm going to include fmeta; however, it has recently came to my attention that mcc pc no longer uses it since season 6. This pushes it way down my priority list. Quite saddening :c I was hoping to use the fmeta as a means for a map patcher. oh, well...C'est la vie :c en mouvement

Now I just need to mesh the decompression object and the structures together. Might tackle some of that tonight, we'll see.

New shiny mod tools might distract me a bit though 

unknown.png

EDIT:
~2 seconds to create the object, decompress, and extract ~68mbs. It's not even threaded yet c:<

Edited by Zatarita
Takka likes this

Specifications:

S3dpak - format - Imeta/ipak - format - Fmeta - format

Programs:

H2a-inflate - SuP

Share this post


Link to post
Share on other sites

;-; I keep reading up on new things and realizing how little I know about my primary programming language.
Today I learned about valarrays. I found out I can calculate the sizes for every chunk simultaneously (even removing the last invalid option since last chunk offset -> end of file is the last chunk size, but filesize isn't an offset)  It also allows me to make changes to every element in the array to adjust for things like h2a block alignments and shifting offsets to account for "magic"

much handy, such powerful.

I did some testing, but I think I'm going to use this method for holding chunk offsets in the decompression algorithm. I feel it will make for more elegant code.

 

Another new concept I learned about was a weak pointer.
This could actually be useful for storing decompressed chunks. A weak pointer wont keep a smart pointer alive, but acts kinda like a shared pointer. So if I stored my decompressed chunks as week pointers. Whenever the object that used that data to create its self releases the data, the decompressed chunk can be removed from memory. I plan to make the program aware of how much memory is available, and how much is consumed to maintain a healthy threshold. I want to make three different memory profiles.
"MAXIMIZE_CACHE_SIZE" will allow the cache size to go to it's theoretical maximum allowed size. This should be determined dynamically, based off of other system resources; however, it should not consume any more that say ~30%/35% of system resources. Optimally the allocation size should be determined by the end user. This prioritizes speed at the cost of using more memory.
"MINIMIZE_CACHE_SIZE" flag will be passed to the decompression object to force decompressed chunks alive only as long as needed. This means though, that if you need that chunk again in the future, the decompression object might have to redecompress the chunks. This could cause more overhead. This prioritizes memory, at the cost of speed
And if neither of these flags are set, the last profile will be normal cache operation. Which will be a healthy balance between memory and speed preservation. Most systems should be able to handle storing most files in memory. Hell most computers HAVE to to load the games. Though there are a few notable outliers like the shared.pck coming it at ~12 gb. Plus if people plan to be using SeK while playing a game, and doing other things, memory should likely be managed in some way to avoid the system doing it (potentially less efficiently.) it's quicker to keep track of memory than have a page go to virtual memory

 

ALSO

For some reason I can't seem to get h2am compression to work right for me :c I can decompress the maps compressed by assembly, I can decompress maps supplied with halo, but I can't compress maps. I'm not quite sure what I'm missing.. Though since the decompression implementation is header only, I can come back to it without having to recompile libraries. Since I'm prioritizing saber currently it gives me time to research wtf is going on there.

 

I was also playing around with modules, and I found some...interesting behavior. I dunno if I'm a big fan. I will likely avoid this concept all together. While c++20 is required for some of the features, I will not be using modules. I plan to utilize features that have been implemented into most compilers, to keep portability issues to a minimum. 

Takka and Enclusion like this

Specifications:

S3dpak - format - Imeta/ipak - format - Fmeta - format

Programs:

H2a-inflate - SuP

Share this post


Link to post
Share on other sites

Spent most of my day today finalizing the EndianReader class.
After starting to use it for reading chunks in the decompression objects I started finding the way I wanted to use the library did not match it's implementation completely. Leading to more difficult code to manage. Instead of working around it indefinetly I just decided to implement the functions.

Here are the major changes:

  1. No longer throws exceptions using the 'throw' keyword (which apparently has been depreciated anyway)
  2. Exceptions are now handled internally, and can be used to gracefully recover, or even use them to gather more information about the state of the stream its self (Explained below).
  3. Replaced all instances of const string ref with a string_view to mitigate unneeded copying.
  4. Added a few more helper functions
    • isInBounds - Verifies a requested offset is in the bounds of the file. Sets the exception status to EXCEPTION_FILE_BOUNDS if exceeded and returns true or false
    • isOpen - Was a method before; however, now it sets the exception status to EXCEPTION_FILE_ACCESS as well as returning if it is open or not.
    • hasException - Tells if there has been an exception set
    • getException  - Returns a string_view containing the exception text.       
    • clearException - Clears the exception (weird..)
    • releaseException - Returns a string view containing the exception text, and clears the exception at the same time.
    • isOpen - sets the exception status to EXCEPTION_FILE_ACCESS as well as returning if it is open or not.
  5. Added functions that should of been there to start with
    • setFileEndianness - To modify a file's endianness. this may be needed if the stream is reused for a different file, or in the case of a mixed endian file.
    • readInto - A modification of the read function that reads a value into an existing memory location instead of creating a copy.
  6. Added sanity checks
    • Set a maximum string recursion to prevent the stream from getting stuck in an infinite loop. ( currently set to 0xffffffff; however, may set to macro definition )
    • Seek, Read, Pad, and Peek all validate their offsets and sizes against the file size gracefully.
    • Read now assumes if offset in bounds of file, but the size of the data exceeds the file, the desired result is to read till end of file.  (EXCEPTION_FILE_BOUNDS will be set to let you know the chunk size is different than requested)
  7. Updated the documentation for it one more time.

 

 

Next I'm going to touch up the features I wanted in the EndianWriter, like being able to write to a ByteArray in memory, instead of a file on disk. as well as refactor it to not throw exceptions.

 

Also I've been migrating my projects to cmake.
I am a SCRUB when it comes to cmake, so I'm learning as I go; however, this might be the best way ultimately. I feel my natural progression through the project will allow me to naturally learn more about cmake as the project increases in size. So i'll let my brain absorb that via osmoses.

Takka likes this

Specifications:

S3dpak - format - Imeta/ipak - format - Fmeta - format

Programs:

H2a-inflate - SuP

Share this post


Link to post
Share on other sites

Having trouble focusing today, which is really stressing me out.
I need to setup source control better and clean up old copies, cause organization is getting out of hand.

 

Redid the endian writer to use a similar exception system as the reader, and did some optimizations.

Created a byte reader, and byte writer call that reads and writes from byte arrays (meaning an endian stream that reads/writes to memory instead of file). Tried a different approach at tackling the problem. I made an "endianPlace"/"endianGet" function to retrieve and place data into a byte array. I made these static functions so I can call them on any byte array; however, I wrapped it in a reader/writer class. Meaning I can read/write to byte arrays through a stream object, or using the byte array its self.

 

one effect of this was I did a bit more than "refactoring" the library. I was hoping to avoid having to change code in the decompression objects that depend on it. But I just bit the bullet, and propagated the changes out to the decompression and compression objects.

I hate having to do that, that's where bugs creep in, but it builds so we'll see what happens

I can't seem to think very clearly today though, so I think I'm going to take the rest of the day to just relax and try to destress over whatever seems to be grinding the gears.


Specifications:

S3dpak - format - Imeta/ipak - format - Fmeta - format

Programs:

H2a-inflate - SuP

Share this post


Link to post
Share on other sites

So I've been working on trying to abstract the saber file definition. I think I've managed it without using too many abstract classes, reducing the need for the vtable.
I was looking at saber files as a whole, and abstracted each objects functionality to it's core components. They all work practically identical
"Read child count, Read entries, Use entries to index and retrieve data." The only issue "entries" being "abstract". This means I had a few options, I could create an abstract class and override functions , or specialize a template. I decided on specializing a templates to reduce calls to the vtable, it also allowed me to invert the dependencies leading to easier code to expand in the future.
Each "File entry" for their respective files are specialized template base classes, with some file specifics included. The entry IS technically abstract; however, there is no getting around that as they have different data to write to file, and c++ isn't reflexive.  So there is one function that takes a file handle and writes the header. Besides that though, All entries function the same. Each entry HAS to have a name, offset, and size. Since my progressive decompression algorithm gets data from a stream using it's offset, and size. This is enough to extract the data, and map it to a meaningful name. Meaning I can leave the specialization to handle things like format (which changes from file to file) but reuse "get file data" functions for ANY saber file. (h1a at least)

This also means I can spend time creating generalized functions that span a larger scope, so I don't have to repeat myself. As merging archives, or splitting is going to be the same no matter what, the only thing changing is which "entry type" I'm refering to. ( Copy entries from a into archive b ) So long as I don't try to copy "ipak entries" into a "s3dpak". (which would cause compiler errors anyway, as polymorphism only works one way)

Object_Inheritance.png?width=696&height=
If we look at the graph it shows base objects, pointing to their inheriting classes.
The part in blue is the interface for the file data. Everything in red is hidden from the user. Even the entries are hidden from the user, the only way to interface with them is through the file its self.

Blank_diagram.png
As you can see with the data flow, regardless of what filetype, OR entry type is available, this is the exact same graph.
Which means I can leverage OOP pretty heavy here, and have one big "file" class, but smaller easier to maintain specializations.

This is the plan. I feel this is an easy enough interface for the data that would also be very easy to maintain. Also means I can "load data" as it's requested kinda like the decompression algorithm. So at this point I'm using the least amount of information possible to execute my goal. Which feels pretty good c:
It's taken a bit of brain power though x.x



Also, going back to previous statements. For now I think I'm going to hold off on implementing an intelligent cache. I don't feel it would be worth the refactor.

Edited by Zatarita
Fixed the graph
Takka likes this

Specifications:

S3dpak - format - Imeta/ipak - format - Fmeta - format

Programs:

H2a-inflate - SuP

Share this post


Link to post
Share on other sites

Alright, So I finished up a class diagram
SaberGeneric_-_SaberGenericEntry_Class_Diagram.png

The file interfaces has been hidden; however, this is what I've decided on.
SaberFileGeneric is a template class and gets specialized by the file interface. The template parameters are what type of decompression object is used, what data type the child count is (some files is 64 bit some is 32bit), and which entry type is used. SaberEntryGeneric is also a template class specialized by the entry interface. Its template parameter is which data type the offsets are (32 or 64 bit). Looking at the graph above we can see the relationship between the generic definitions and their respective interfaces.
With our progressive decompression object all we need to know is the offset, and the size of the data. These are parameters that are required for ANY file’s entry. This is where FileEntryGeneric comes in. When we request data, it propagates out to the child which is then responsible for accessing the stream and getting its data. This will be the same regardless of entry type ( see data flow above ). These types are specialized to add non-standard parameters.
in order to abstract the concept of an entry from the user, the files have to be responsible for interacting with the file, and thusly, MUST define functions for expected functionality. This will be covered more when we discuss the File interfaces. There is one side effect for this; however, and that is for IpakFiles. Since an Ipak is basically the same thing as and Imeta (the Ipak just has extra data). The ipak depends on the imeta. The implementation of IpakFile should have extra care taken when addressing this quirk.
This means that the file interface will be responsible for converting a dds to an ipak entry, or determining the format of the file in a s3dpak at import.

 

You also may have noticed that Ipak entry is actually called a TextureEntry. TextureEntry and ipak entry are the same. In fact IpakEntry is an alias for TextureEntry. This is because on the xbox version ipak entries are inside of the s3dpaks. They're practically idential. The only difference is the xbox version uses the end of blocks in the sentinel values. I have actually learned a lot about s3dpaks since my original post on the format. It will be updated soon to reflect the information here. This also means that the TextureEntry must be a portable implementation and not coupled with the ipak.


ALSO! you may have noticed that the fmeta has been left out. This has become legacy, and I don't feel there is need to dedicate time to designing this. I don't think anyone will be using a legacy version, and the xbox does not utilize the fmeta, so I feel this has become vestigial. If I'm allowed to use that word here.

The S3dpak class will have some extra care taken in it to ensure that the interface is open for new definitions as they are discovered. My understanding of files contained within the paks is minimal, and that is expended to expand as the community makes more progress in reverse engineering. Ultimately interpreting the data retrieved from a s3dpak should be the responsibility of another class all together; however, I feel it would be nice to have helper functions inside the s3dpak that calls the appropriate constructor for the data to cast data straight from the s3dpak.
I may create a "parser" object to handle generalizing this for each file type. I feel this may be more organized, intuitive, and will be more open to expansion in the future. We will cross this bridge as we get to it.

And believe it or not, that's all for H1A libSaber. I feel this is the cleanest implementation I've come up with yet. This should allow optimized access to saber files. This will be available for anyone to use in their projects to access saber objects progressive decompression and all. My projected date for this is ~14th. Then I plan to go into testing. for about a week. H2A Saber MIGHT be able to utilize the same interface; however, I don't want to enforce it if it results in un-intuitive code. So we'll see what happens there

For libMccCompress I still need to thread the decompression; however, it appears that ( during development of libSaber ) it seems to be functional. Which is  d a n k for both H1A and H2A. The way it is set up left expanding to threading open, so it's just a matter of plugging the functionality into the right place.

Documentation for libSaber H1A has been compiled from an architectural standard, code documentation will be added. libMccCompress already has documentation, and libEndianStream also has documentation; however, not from an abstracted higher level.

Edited by Zatarita
Takka likes this

Specifications:

S3dpak - format - Imeta/ipak - format - Fmeta - format

Programs:

H2a-inflate - SuP

Share this post


Link to post
Share on other sites

OKAY
So after screwing up my vtables and spending an entire day trying to pin point the issue I have finally taken the time to organize a few things.

Cmake is now pretty, and does it's job. All targets are no longer compiled independently calling for a cascade of rebuilding when a low level library gets edited -shudder-
It also made debugging much easier, as visual studio can debug trace.

everything works as the Cmake gods intended.

S3dpaks were implemented using the new generic file system. A few changes were made.

  1.  Format is built into the generic file as a template parameter. This is abstracted from the end user through the file interface anyway.
  2.  A new virtual function has been defined "getFileExtention" which must be defined for each file type and takes a format. This will be helpful for differentiating between file types
        -Files with extensions use their existing extensions
        -Files follow a halo-esque file format barring the standard 3 character abbreviations. (eg ".wavebanks_strm_file", ".wavebanks_mem", ".cacheblock", )
  3. File format types are now stored inside a Format namespace instead of the file interface.

These changes were deemed necessary due to the way I organized my files. This is not an issue by any means. In fact I think I like this implementation better. This means I can write more generalized code and repeat myself less.

 I have successfully managed to extract data from all s3dpaks ( pc and xbox ) using this method. Implementing ipaks, and imetas should be as simple as defining the format, and the "parse/serialize" header function and the same code should work.

 

I feel I'm finally leveraging c++ a bit better c:<

Also made some minor tweaks to the decompression object classes.
Moved the static wrapper functions to the global namespace in the main include header. This just returns a decompression object specialized for that files offset types, chunk sizes, and format. Just hides the clunky template syntax from the higher level.

Made it so when the decompresion object attempts to decompress a chunk, and fails, it sets the uncompressed flag and parses the data as raw uncompressed data.

I felt this made the most sense. It's possible to try and open any file as a s3dpak anyway, so I would need to have the parsing file determine if the data is valid. So I feel this is a safe assumption to make since either way the file needs to be validated.

Currently the zlib decompression is not threaded. So there is no discernable time difference for extracting from an entire file; however, random access to the files is now significantly decreased. The difference is insane. I can extract a 68mb file from a compressed s3dpak in 1.6 seconds.
Of course if I'm extracting all of the s3dpak, I still have to decompress the entire file, and write all the data to disk. With SuP for a10.s3pdak it takes me ~8 seconds, which is only slightly improved with libSaber. This will change once I thread decompression.


Also small side note, but it's versatile enough to extract from inversion s3dpaks (another game released by saber around the same time) as well with minor tweaks. Might generalize the decompression object a bit further to allow for variants in both compression format, as well as file format. Which will be needed to decompress xbox s3dpaks anyway.

Edited by Zatarita
Takka likes this

Specifications:

S3dpak - format - Imeta/ipak - format - Fmeta - format

Programs:

H2a-inflate - SuP

Share this post


Link to post
Share on other sites
  • Recently Browsing   0 members

    No registered users viewing this page.