Accessing Mafias DTA files
by MassaSnygga
1. Introduction
As I often receive email from people that ask for information about the DTA files, because they want to code another DTA tool (like the often requested DTA-packer), I decided to write this document and provide anyone with all the information I gathered about DTA files.
This document completely describes the methods the MafiaDataXtractor uses to extract the content from the DTA files.
Please note that the information provided here is based on my personal inspection of the disassembled Mafia code, lengthy debugging sessions, logged function calls and pure guesses. It is in no way official and not confirmed to be correct, it just works for me. Some parts may be incorrect or just wrong. Therefore you should read this document with an open mind, and don’t take everything for granted that is written here (btw: that applies to all written things). If you find anything that is wrong, please let me know.
I expect that you know C and a little bit of x86-Assembler otherwise you wont understand all of this. Please don’t bug me with any questions regarding C and ASM.
But now, let’s get to the fun part…
2. The encryption keys
As you may have guessed on inspecting DTA files with your hex editor, the files are encrypted. In fact some of them are also compressed, but I haven’t dealt with the compression methods at all. To read from them you have to know two 32-bit keys which are different for (almost) every DTA file. Luckily I could find them quite easily while inspecting the disassembled GAME.EXE as they are stored there without any (serious) protection. These keys seem to be identical for every international version, as they all seem to use the same GAME.EXE, except the German version, but the keys of that version are identical nonetheless.
I provide the keys here for your reference:
|
Filename |
Content |
Key 1 |
Key 2 |
|
A0.dta |
Sounds |
0xD8D0A975 |
0x467ACDE0 |
|
A1.dta |
Missions |
0x3D98766C |
0xDE7009CD |
|
A2.dta |
Models |
0x82A1C97B |
0x2D5085D4 |
|
A3.dta |
Animations I |
0x43876FEA |
0x900CDBA8 |
|
A4.dta |
Animations II |
0x43876FEA |
0x900CDBA8 |
|
A5.dta |
Difference Data |
0xDEAC5342 |
0x760CE652 |
|
A6.dta |
Textures |
0x64CD8D0A |
0x4BC97B2D |
|
A8.dta |
Patch 1.1 |
0xD8DD8FAC |
0x5324ACE5 |
|
A7.dta |
Records |
0xD6FEA900 |
0xCDB76CE6 |
|
A9.dta |
System Data |
0x6FEE6324 |
0xACDA4783 |
|
AA.dta |
Tables |
0x5342760C |
0xEDEAC652 |
|
AB.dta |
Music |
0xD8D0A975 |
0x467ACDE0 |
|
AC.dta |
Animations III |
0x43876FEA |
0x900CDBA8 |
Please note that A8.dta was just recently introduced with the 1.1 Patch and is not present in the 1.0 Version of the game.
3. Using the rw_data.dll
The easiest approach to the extraction problem is to use Mafias own DLL. This file is “rw_data.dll”, you will find if in your Mafia directory. As Mafia itself uses this file we can extract the files the very same way Mafia does it and all that without having to deal with the encryption and compression methods that Mafia uses. COOL! you might yell out now, but as always life is not that easy, as is it will turn out to be necessary to crack the encryption to achieve perfect results. But I am getting ahead of myself.
First of all let’s take a look at all the functions “rw_data.dll” exports and how they work:
DWORD dtaBin2Text(DWORD unknown1, DWORD unknown2);
Description:
I haven’t dealt with this function as it doesn’t seem to be used by Mafia at all.
Parameters:
unknown
Returns:
unknown
void dtaClose(DWORD FileHandle);
Desciption:
Closes a file that was opened from inside a DTA.
Parameters:
FileHandle: The handle to a file that was opened with dtaOpen
Returns:
nothing
DWORD dtaCreate(char* FileName);
Description:
This function does NOT create a file in any way, as the name suggests. In fact it sort of “mounts” a DTA file, so that the files it contains can be accessed by successive calls to dtaOpen.
Parameters:
FileName: The filename of the DTA files to be mounted.
Returns:
NULL when the call didn’t succeed.
A non null object pointer if the call succeeded (see remarks) (Thanks go to Jonathan Wilson who pointed that out to me).
Remarks:
There is something special about the number that is returned from this call. At first I thought the return value didn’t matter, but it turned out that Mafia uses this value to set the decryption keys of the DTA files. In fact it is a pointer to an instance of a class inside “rw_data.dll”. Because Mafia was written in C++ the first double word of the class points to its virtual function table. The function we are interested in is the forth (table address + 0x0c). It has to be called with the two correct keys on the stack that are XORed with one value each. Furthermore the ECX register has to contain the return value to comply with the Visual C++ calling convention for objects.
The two XOR values are always the same: the first key must be XORed with 0x0x34985762 and the second be must be XORed with 0x39475694.
The following code sample mounts “a9.dta” and sets its decryption keys:
|
DWORD Result; if (Result = dtaCreate(“a9.dta”)) { __asm { mov eax, 06FEE6324h ; load the first key into EAX xor eax, 034985762h ; process it with the first XOR-value push eax ; push the prepared first key onto the stack mov eax, 0ACDA4783h ; load the second key into EAX xor eax, 039475694h ; process it with the second XOR-value push eax ; push the prepared second key into the stack mov ecx, Result ; load the result into EAX as demanded by the function mov eax, [ecx] ; load the address of the function-table into EAX call [eax + 0ch] ; call the forth entry in that table } }
|
You might want to check the other functions in that function-table; maybe they also do something interesting.
DWORD dtaDelete(DWORD unknown1);
Description:
I haven’t dealt with this function.
Parameters:
unknown
Returns:
unknown
DWORD dtaDumpMemoryLeaks(DWORD unknown1);
Description:
I haven’t dealt with this function as it doesn’t seem to be used by Mafia at all.
Parameters:
unknown
Returns:
unknown
DWORD dtaGetTime(DWORD unknown1, DWORD unknown2, DWORD unknown3);
Description:
I haven’t dealt with this function.
Parameters:
unknown
Returns:
unknown
DWORD dtaOpen(char* FileName, DWORD unknown1);
Description:
This function opens a file from the DTA files. The DLL searches all DTA files that very previously mounted with dtaCreate().
Parameter:
FileName: The filename of the file to be opened.
unknown1: This value seems to be always 0.
Returns:
0xffffffff if the call didn’t succeed.
A file handle that can be used to access the file, if the call did succeed.
DWORD dtaOpenWrite(DWORD unknown1, DWORD unknown2);
Description:
I haven’t dealt with this function.
Parameters:
unknown
Returns:
unknown
DWORD dtaRead(DWORD FileHandle, char* Buffer, DWORD ByteCount);
Description:
This function reads from a file that was opened with dtaOpen. The reading begins at the current file position.
Parameters:
FileHandle: The handle of the file to be read.
Buffer: Pointer to the Buffer that will receive that data. Make sure that it is big enough to hold all the data you request.
ByteCount: The amount of bytes to be read.
Returns:
The number of bytes that were actually read. This value might be lower than ByteCount when the end of the file was encountered of an error occurred.
DWORD dtaSeek(DWORD FileHandle, DWORD unknown1, DWORD unknown2);
Description:
I didn’t use this function. But it should be easy to figure out. I suspect the parameters to be similar or even identical to those of fseek().
Parameters:
unknown
Returns:
unknown
void dtaSetDtaFirstForce();
Description:
This is a very interesting function, it controls from which source files are loaded.
“rw_data.dll” has two operation modes:
1. Read from hard drive as default and read from DTA as fallback.
2. Read from DTA as default and read from hard drive as fallback.
1. Is the default mode of operation.
When this is called “rw_data.dll” switches to operation mode 2.
This is also the function that is patched by MafiaDataXtractor to a no-op function. That way mode 2 is never activated and the extracted files are read.
Parameters:
none
Returns:
Nothing
DWORD dtaWrite(DWORD unknown1, DWORD unknown2, DWORDN unknown3):
Description:
I haven’t dealt with this function
Parameters:
unknown
Returns:
unknown
Whew, that’s it. If you find anything out about the unexplored functions, please let me know, so that I can update this information.
So how do you actually read a file using these functions?
First of all, use an unpatched version of “rw_data.dll”, either a version from a clean installation or “rw_data.bak” when MafiaDataXtractor is installed. I you use a patched version you might end up reading files that were already extracted (and maybe afterwards modified) and that is not what we want.
1. Call dtaSetDtaFirstForce() to ensure that the files inside the DTAs are read.
2. Mount the DTAs with dtaCreate() and set the decryption keys.
3. Open a file with dtaOpen().
4. Read the content with dtaRead().
5. Close the file with dtaClose().
If you try to import the functions from “rw_data.dll” and use LoadLibrary() and GetProcAddress(), please keep in mind, that the function names are decorated. They are preceded by a hyphen and end with a ‘@’ followed by the number of bytes that are passed to the function as parameters. So the exported name for dtaRead() is: “_dataRead@12”.
That’s it! Beautiful, isn’t it`? Well one little thing is still missing: How to find out which files are inside these DTAs? The DLL obviously doesn’t contain any functions to search for files. I deal with that problem in the next chapter.
3. Decrypting the DTA content table
Now we get to the rough stuff: Opening the DTA files on our own!
Before I explain the structure of the DTA files to you, I will show you how to decrypt the data inside them, as almost everything needs to be decrypted with it. I haven’t inspected the decryption function that closely, but thankfully Roger H. Jörg converted my assembler dump to a nice C function.
|
void Decrypt(void* pBuffer, unsigned int cbBuffer, unsigned int Key1, unsigned int Key2) { // First loop: process whole 64bit sequences.
__int64* pLongLong = (__int64*) pBuffer; unsigned int cLongLong = cbBuffer / 8;
for (; cLongLong; --cLongLong, ++pLongLong) { unsigned int* pLong = ((unsigned int*) pLongLong); unsigned int ulong;
ulong = *pLong; *pLong = ~((~ulong) ^ Key2);
++pLong;
ulong = *pLong; *pLong = ~((~ulong) ^ Key1); }
// Second loop: process remaining bytes.
unsigned char* pByte = ((unsigned char*) pLongLong); unsigned int cBytes = cbBuffer % 8; unsigned int keys[2] = { Key2, Key1 }; unsigned char* pKey = ((unsigned char*) keys);
for (; cBytes; --cBytes, ++pByte, ++pKey) { unsigned char byte = *pByte; unsigned char key = *pKey;
*pByte = (unsigned char)(~((~byte) ^ key)); } }
|
As you see, using this function is quite easy: pass a pointer to the buffer with the encrypted data, its length and both keys which you can look up in the key table at the beginning of this document. On return the data is decrypted and you can continue your work. EASY!
So, no on to the file structure:
The four first bytes of every DTA must be “ISD0”. This is used to identify correct DTA files. Following this magic-value there is a small header that needs to be decrypted with the Decrypt() function:
|
Offset |
Type |
Description |
|
0 |
DWORD |
Number of files in the archive |
|
4 |
DWORD |
Offset to the content table |
|
8 |
DWORD |
Size of the content table |
|
12 |
DWORD |
Unknown |
Now we have all information we need, to read the content table. After decryption, the content table is an array of tables that look like this:
|
Offset |
Type |
Description |
|
0 |
DWORD |
Unknown |
|
4 |
DWORD |
Offset to file information header |
|
8 |
DWORD |
Unknown2 |
|
12 |
char[16] |
Filename hint |
Unfortunately this still doesn’t contain the information we want, as the complete filename is not part of this structure. All that is included here, is a filename hint that contains the last 16 characters of the filename. My guess is that Mafia uses this to quickly reject entries when searching for a specific file. Furthermore I suspect one of the unknown entries to be a hash-value of the real filename. As we still haven’t found what we were looking for, we have to dig deeper. Therefore we read the file information header, which offset we just obtained. It has, jet again, to be decrypted with Decrypt().
The file information header looks like this:
|
Offset |
Type |
Description |
|
0 |
DWORD |
Unknown1 |
|
4 |
DWORD |
Unknown2 |
|
8 |
DWORD |
Unknown3 |
|
12 |
DWORD |
Unknown4 |
|
16 |
DWORD |
Filesize |
|
20 |
DWORD |
Unknown5 |
|
24 |
UCHAR |
Filename length |
|
25 |
char[7] |
Unknown6 |
|
32 |
char[] |
Filename |
Now this is exactly what we want to know! At this point we know the exact filename, that can be passed to dtaOpen(). With this information we are able to extract ALL files from the DTA files.
4. That’s it!
This is all that I know. Now it’s up to you to dig deeper. And please let me know if you find out anything interesting, so that I can update this document.
If you have any problems understanding this, do the following:
1. Read this document COMPLETELY!
2. Use your brain!
3. Use your brain even more!
3. Mail me: massasnygga@kamalook.de
HAPPY HACKING!
BTW: You can always download the latest version of MafiaDataXtractor from this location.