Accessing Mafias DTA files

by MassaSnygga

 

1. Introduction

As I often receive email from people that ask for information about the DTA files, because they want to code another DTA tool (like the often requested DTA-packer), I decided to write this document and provide anyone with all the information I gathered about DTA files.

This document completely describes the methods the MafiaDataXtractor uses to extract the content from the DTA files.

Please note that the information provided here is based on my personal inspection of the disassembled Mafia code, lengthy debugging sessions, logged function calls and pure guesses. It is in no way official and not confirmed to be correct, it just works for me. Some parts may be incorrect or just wrong. Therefore you should read this document with an open mind, and don’t take everything for granted that is written here (btw: that applies to all written things). If you find anything that is wrong, please let me know.

I expect that you know C and a little bit of x86-Assembler otherwise you wont understand all of this. Please don’t bug me with any questions regarding C and ASM.

But now, let’s get to the fun part…

 

2. The encryption keys

As you may have guessed on inspecting DTA files with your hex editor, the files are encrypted. In fact some of them are also compressed, but I haven’t dealt with the compression methods at all. To read from them you have to know two 32-bit keys which are different for (almost) every DTA file. Luckily I could find them quite easily while inspecting the disassembled GAME.EXE as they are stored there without any (serious) protection. These keys seem to be identical for every international version, as they all seem to use the same GAME.EXE, except the German version, but the keys of that version are identical nonetheless.

I provide the keys here for your reference:

Filename

Content

Key 1

Key 2

A0.dta

Sounds

0xD8D0A975

0x467ACDE0

A1.dta

Missions

0x3D98766C

0xDE7009CD

A2.dta

Models

0x82A1C97B

0x2D5085D4

A3.dta

Animations I

0x43876FEA

0x900CDBA8

A4.dta

Animations II

0x43876FEA

0x900CDBA8

A5.dta

Difference Data

0xDEAC5342

0x760CE652

A6.dta

Textures

0x64CD8D0A

0x4BC97B2D

A8.dta

Patch 1.1

0xD8DD8FAC

0x5324ACE5

A7.dta

Records

0xD6FEA900

0xCDB76CE6

A9.dta

System Data

0x6FEE6324

0xACDA4783

AA.dta

Tables

0x5342760C

0xEDEAC652

AB.dta

Music

0xD8D0A975

0x467ACDE0

AC.dta

Animations III

0x43876FEA

0x900CDBA8

Please note that A8.dta was just recently introduced with the 1.1 Patch and is not present in the 1.0 Version of the game.

 

 

3. Using the rw_data.dll

The easiest approach to the extraction problem is to use Mafias own DLL. This file is “rw_data.dll”, you will find if in your Mafia directory. As Mafia itself uses this file we can extract the files the very same way Mafia does it and all that without having to deal with the encryption and compression methods that Mafia uses. COOL! you might yell out now, but as always life is not that easy, as is it will turn out to be necessary to crack the encryption to achieve perfect results. But I am getting ahead of myself.

 

First of all let’s take a look at all the functions “rw_data.dll” exports and how they work:

 

DWORD dtaBin2Text(DWORD unknown1, DWORD unknown2);

Description:

I haven’t dealt with this function as it doesn’t seem to be used by Mafia at all.

Parameters:

unknown

Returns:

unknown

 

void dtaClose(DWORD FileHandle);

Desciption:

Closes a file that was opened from inside a DTA.

Parameters:

FileHandle: The handle to a file that was opened with dtaOpen

Returns:

nothing

 

DWORD dtaCreate(char* FileName);

Description:

This function does NOT create a file in any way, as the name suggests. In fact it sort of “mounts” a DTA file, so that the files it contains can be accessed by successive calls to dtaOpen.

Parameters:

FileName: The filename of the DTA files to be mounted.

Returns:

NULL when the call didn’t succeed.

A non null object pointer if the call succeeded (see remarks) (Thanks go to Jonathan Wilson who pointed that out to me).

Remarks:

There is something special about the number that is returned from this call. At first I thought the return value didn’t matter, but it turned out that Mafia uses this value to set the decryption keys of the DTA files. In fact it is a pointer to an instance of a class inside “rw_data.dll”. Because Mafia was written in C++ the first double word of the class points to its virtual function table. The function we are interested in is the forth (table address + 0x0c). It has to be called with the two correct keys on the stack that are XORed with one value each. Furthermore the ECX register has to contain the return value to comply with the Visual C++ calling convention for objects.

The two XOR values are always the same: the first key must be XORed with 0x0x34985762 and the second be must be XORed with 0x39475694.

The following code sample mounts “a9.dta” and sets its decryption keys:

DWORD Result;

if (Result = dtaCreate(“a9.dta”))

{

            __asm

            {

                        mov eax, 06FEE6324h            ; load the first key into EAX

                        xor eax, 034985762h              ; process it with the first XOR-value

                        push eax                                  ; push the prepared first key onto the stack

mov eax, 0ACDA4783h          ; load the second key into EAX

                        xor eax, 039475694h              ; process it with the second XOR-value

                        push eax                                  ; push the prepared second key into the stack

                        mov  ecx, Result                      ; load the result into EAX as demanded by the function

                        mov  eax, [ecx]                        ; load the address of the function-table into EAX

                        call [eax + 0ch]                        ; call the forth entry in that table

            }

}

 

 

You might want to check the other functions in that function-table; maybe they also do something interesting.

 

DWORD dtaDelete(DWORD unknown1);

Description:

I haven’t dealt with this function.

Parameters:

unknown

Returns:

unknown

 

DWORD dtaDumpMemoryLeaks(DWORD unknown1);

Description:

I haven’t dealt with this function as it doesn’t seem to be used by Mafia at all.

Parameters:

unknown

Returns:

unknown

 

DWORD dtaGetTime(DWORD unknown1, DWORD unknown2, DWORD unknown3);

Description:

I haven’t dealt with this function.

Parameters:

unknown

Returns:

unknown

 

DWORD dtaOpen(char* FileName, DWORD unknown1);

Description:

This function opens a file from the DTA files. The DLL searches all DTA files that very previously mounted with dtaCreate().

Parameter:

FileName: The filename of the file to be opened.

unknown1: This value seems to be always 0.

Returns:

0xffffffff if the call didn’t succeed.

A file handle that can be used to access the file, if the call did succeed.

 

DWORD dtaOpenWrite(DWORD unknown1, DWORD unknown2);

Description:

I haven’t dealt with this function.

Parameters:

unknown

Returns:

unknown

 

DWORD dtaRead(DWORD FileHandle, char* Buffer, DWORD ByteCount);

Description:

This function reads from a file that was opened with dtaOpen. The reading begins at the current file position.

Parameters:

FileHandle: The handle of the file to be read.

Buffer: Pointer to the Buffer that will receive that data. Make sure that it is big enough to hold all the data you request.

ByteCount: The amount of bytes to be read.

Returns:

The number of bytes that were actually read. This value might be lower than ByteCount when the end of the file was encountered of an error occurred.

 

DWORD dtaSeek(DWORD FileHandle, DWORD unknown1, DWORD unknown2);

Description:

I didn’t use this function. But it should be easy to figure out. I suspect the parameters to be similar or even identical to those of fseek().

Parameters:

unknown

Returns:

unknown

 

void dtaSetDtaFirstForce();

Description:

This is a very interesting function, it controls from which source files are loaded.

“rw_data.dll” has two operation modes:

1. Read from hard drive as default and read from DTA as fallback.

2. Read from DTA as default and read from hard drive as fallback.

1. Is the default mode of operation.

When this is called “rw_data.dll” switches to operation mode 2.

This is also the function that is patched by MafiaDataXtractor to a no-op function. That way mode 2 is never activated and the extracted files are read.

Parameters:

none

Returns:

Nothing

 

DWORD dtaWrite(DWORD unknown1, DWORD unknown2, DWORDN unknown3):

Description:

I haven’t dealt with this function

Parameters:

unknown

Returns:

unknown

 

Whew, that’s it. If you find anything out about the unexplored functions, please let me know, so that I can update this information.

 

So how do you actually read a file using these functions?

First of all, use an unpatched version of “rw_data.dll”, either a version from a clean installation or “rw_data.bak” when MafiaDataXtractor is installed. I you use a patched version you might end up reading files that were already extracted (and maybe afterwards modified) and that is not what we want.

1. Call dtaSetDtaFirstForce() to ensure that the files inside the DTAs are read.

2. Mount the DTAs with dtaCreate() and set the decryption keys.

3. Open a file with dtaOpen().

4. Read the content with dtaRead().

5. Close the file with dtaClose().

If you try to import the functions from “rw_data.dll” and use LoadLibrary() and GetProcAddress(), please keep in mind, that the function names are decorated. They are preceded by a hyphen and end with a ‘@’ followed by the number of bytes that are passed to the function as parameters. So the exported name for dtaRead() is: “_dataRead@12”.

 

That’s it! Beautiful, isn’t it`? Well one little thing is still missing: How to find out which files are inside these DTAs? The DLL obviously doesn’t contain any functions to search for files. I deal with that problem in the next chapter.

 

3. Decrypting the DTA content table

Now we get to the rough stuff: Opening the DTA files on our own!

Before I explain the structure of the DTA files to you, I will show you how to decrypt the data inside them, as almost everything needs to be decrypted with it. I haven’t inspected the decryption function that closely, but thankfully Roger H. Jörg converted my assembler dump to a nice C function.

void Decrypt(void* pBuffer, unsigned int cbBuffer, unsigned int Key1, unsigned int Key2)

{

    // First loop: process whole 64bit sequences.

           

    __int64*                  pLongLong = (__int64*) pBuffer;

    unsigned int    cLongLong = cbBuffer / 8;

           

    for (; cLongLong; --cLongLong, ++pLongLong)

    {

        unsigned intpLong = ((unsigned int*) pLongLong);

        unsigned int   ulong;

                       

        ulong = *pLong;

        *pLong = ~((~ulong) ^ Key2);

                       

        ++pLong;

                       

        ulong = *pLong;

        *pLong = ~((~ulong) ^ Key1);

    }

           

    // Second loop: process remaining bytes.

           

    unsigned char*         pByte = ((unsigned char*) pLongLong);

    unsigned int  cBytes = cbBuffer % 8;

    unsigned int  keys[2] = { Key2, Key1 };

    unsigned char*         pKey = ((unsigned char*) keys);

           

    for (; cBytes; --cBytes, ++pByte, ++pKey)

    {

        unsigned char byte = *pByte;

        unsigned char key = *pKey;

                       

        *pByte = (unsigned char)(~((~byte) ^ key));

    }

}

 

As you see, using this function is quite easy: pass a pointer to the buffer with the encrypted data, its length and both keys which you can look up in the key table at the beginning of this document. On return the data is decrypted and you can continue your work. EASY!

 

So, no on to the file structure:

The four first bytes of every DTA must be “ISD0”. This is used to identify correct DTA files. Following this magic-value there is a small header that needs to be decrypted with the Decrypt() function:

Offset

Type

Description

0

DWORD

Number of files in the archive

4

DWORD

Offset to the content table

8

DWORD

Size of the content table

12

DWORD

Unknown

Now we have all information we need, to read the content table. After decryption, the content table is an array of tables that look like this:

Offset

Type

Description

0

DWORD

Unknown

4

DWORD

Offset to file information header

8

DWORD

Unknown2

12

char[16]

Filename hint

Unfortunately this still doesn’t contain the information we want, as the complete filename is not part of this structure. All that is included here, is a filename hint that contains the last 16 characters of the filename. My guess is that Mafia uses this to quickly reject entries when searching for a specific file. Furthermore I suspect one of the unknown entries to be a hash-value of the real filename. As we still haven’t found what we were looking for, we have to dig deeper. Therefore we read the file information header, which offset we just obtained. It has, jet again, to be decrypted with Decrypt().

The file information header looks like this:

Offset

Type

Description

0

DWORD

Unknown1

4

DWORD

Unknown2

8

DWORD

Unknown3

12

DWORD

Unknown4

16

DWORD

Filesize

20

DWORD

Unknown5

24

UCHAR

Filename length

25

char[7]

Unknown6

32

char[]

Filename

Now this is exactly what we want to know! At this point we know the exact filename, that can be passed to dtaOpen(). With this information we are able to extract ALL files from the DTA files.

 

4. That’s it!

This is all that I know. Now it’s up to you to dig deeper. And please let me know if you find out anything interesting, so that I can update this document.

If you have any problems understanding this, do the following:

1. Read this document COMPLETELY!

2. Use your brain!

3. Use your brain even more!

3. Mail me: massasnygga@kamalook.de

 

HAPPY HACKING!

 

BTW: You can always download the latest version of MafiaDataXtractor from this location.