Hi everyone,
In the previous post, we mentioned trace, disassemble, debug and hooking processes of a sample native library. In this article, I will continue in the Native C/C++ Library layer. As a concept, I will talk about the analysis process of a mobile game developed with Unity Hub.
This article will be like rediscovering America. You will see the information (especially Katy’s) on the official site of il2cppdumper in this article. I will cover the initialize process of the IL2CPP VM. I will also touch on a small trace process in pseudocode format with the IDA. Since I have almost no ARM knowledge(I don’t feel ready.), I will not be able to talk about the debug and disassemble analysis processes, but the hooking process will be explained in detail.
Let’s start!
INITIALIZE VM
When I analyzed a android game that I had compiled, I saw the target class, method, field etc. in the Assembly-CSharp.dll file. However, most of the games developed with Unity that I reviewed on the Playstore did not have this file. Instead there was a native library; libil2cpp.so
This is because the developer can change the preference via the Script Backend while building the package.
├── assets
│ ├── bin
│ │ └── Data
│ │ ├── data.unity3d
│ │ ├── Managed
│ │ │ ├── Assembly-CSharp.dll
├── lib
│ └── armeabi-v7a
│ ├── lib_burst_generated.so
│ ├── libil2cpp.so
│ ├── libmain.so
│ └── libunity.so
The Intermediate Language type output resulting from compilation is directed to a different VM according to the Script Backend type. The game compiled with mono will compile the C# script code into intermediate language and package it into the client. On the other hand, the game compiled with il2cpp will compile the C# script code into native code, and finally in the libil2cpp.so file of the corresponding architecture in the lib directory.
Let’s skip the efficiency part of the process and touch on the security side;
Having the code in the “so” file will be tiring because of the disassembly and analysis process. But will this increase the security of the game enough?
In the IL2CPP process, not only the relevant so file is created, but also the global-metadata file. For example, a grep-match(“Coin”) output is given in the DAT file that has not been obfuscated. I will need to examine the il2cpp VM Runtime source code superficially in order to understand how this file is used in the process.(Next Title)
NOTE: If the first four bytes are af 1b b1 fa, then the global-metadata is generally not obfuscated. (Little endian 0xFAB11BAF)
The reason we trace will enable us to understand how and where we reach the target fields at the end of the analyze.
00010500 43 68 61 72 53 69 7a 65 43 6f 69 6e 50 69 63 6b |CharSizeCoinPick|
00095060 43 6f 69 6e 50 69 63 6b 65 72 00 4f 6e 54 72 69 |CoinPicker.OnTri|
000950a0 72 00 43 6f 69 6e 73 47 65 6e 65 72 61 74 6f 72 |r.CoinsGenerator|
000950c0 77 6e 43 6f 69 6e 73 00 63 6f 69 6e 50 6f 6f 6c |wnCoins.coinPool|
The DAT file was only called in a cpp file. The relevant file is called under the il2cpp::vm::MetadataCache::Initialize function by defining it to the il2cpp::vm::MetadataLoader::LoadMetadataFile function. The file opened via File::Open in the function where the file is loaded. In the MemoryMappedFile::Map method, all the contents of the file will be created on Virtual Memory and it will be ensured that the application is located in the logical address space. Then this opened file will return in byte array format.
Attention!
The graph in Figure – 2 is out of date. Because when I saw the source code of il2cpp for the 2021.3.4f1 version, I saw that different classes were passed DAT file. Although not much has changed, I will proceed through the up-to-date source code.
Now let’s try to trace the related compiled functions via IDA. I’ll start the process with global-metadata.
The sub_222B04 function is where the string value searched in IL2CPP is referenced. When we examine it with pseudocode, the string value is passed to sub_286820 address. This could be the interaction of the GlobalMetadata(for 2021.3.4f1) and MetadataLoader classes. When sub_286820 is examined, some methods used in MetadataLoader are seen;
For example, enclosing the passed string value in the strlen method in the sub_286820 function as a1 variable. We also see that the Metadata expression is enclosed in a function (I think StringView).
.rodata:00C8DBF4 aGlobalMetadata DCB "global-metadata.dat",0
.rodata:00C8DBF4 ; DATA XREF: sub_222B04+8↑o
.rodata:00C8DBF4 ; sub_222B04+10↑o ...
int __fastcall sub_222B04(unsigned int *a1, _DWORD *a2)
{
int v4; // r0
unsigned int v5; // r1
int v6; // r4
int i; // r0
.
.
v4 = sub_286820("global-metadata.dat");
dword_EB0BE0 = v4;
int __fastcall sub_286820(const char *a1)
{
.
.
sub_1E5AD4(&v17, v4, v2);
sub_20D5E0(&v17, 1, 47);
sub_1E5AD4(&v17, "Metadata", 8);
if ( (v14 & 1) != 0 )
operator delete(p);
v5 = strlen(a1);
void* il2cpp::vm::MetadataLoader::LoadMetadataFile(const char* fileName)
{
#if IL2CPP_TARGET_ANDROID && IL2CPP_TINY_DEBUGGER && !IL2CPP_TINY_FROM_IL2CPP_BUILDER
std::string resourcesDirectory = utils::PathUtils::Combine(utils::StringView<char>("Data"), utils::StringView<char>("Metadata"));
std::string resourceFilePath = utils::PathUtils::Combine(resourcesDirectory, utils::StringView<char>(fileName, strlen(fileName)));
Since the renamed name of the GlobalMetadata and MetadataLoader function has been determined, it’s time to find the MetadataCache and Runtime.
When we examine the XREF of the sub_222B04 function, the sub_2967D0 function appears. It’s reminiscent of MetadataCache::Initialize that the function doesn’t take any variables. Even calling the function representing GlobalMetadata::Initialize here and passing two variables strengthens our guess.
We can clearly see the Runtime::Init function in the sub_286C78 function as the cross reference of the MetadataCache. Matching string values is enough for now.
.text:002967D0 ; =============== S U B R O U T I N E =======================================
.text:002967D0
.text:002967D0
.text:002967D0 ; int sub_2967D0()
.text:002967D0 sub_2967D0 ; CODE XREF: sub_286C78+168↑p
v0 = sub_222B04((unsigned int *)&dword_EB1800, &dword_EB1804);
result = 0;
if ( v0 )
bool il2cpp::vm::MetadataCache::Initialize()
{
if (!il2cpp::vm::GlobalMetadata::Initialize(&s_ImagesCount, &s_AssembliesCount))
v21 = sub_1DB3E0("mscorlib.dll");
v22 = sub_1DB3E0("__Generated");
dword_EB11B0 = jinfo_get_method(v21);
dword_EB11B4 = jinfo_get_method(v22);
dword_EB11B8 = il2cpp_class_from_name_0(dword_EB11B0, "System", "Object");
dword_EB11C0 = il2cpp_class_from_name_0(dword_EB11B0, "System", "Void");
const Il2CppAssembly* assembly = Assembly::Load("mscorlib.dll");
const Il2CppAssembly* assembly2 = Assembly::Load("__Generated");
il2cpp_defaults.corlib = Assembly::GetImage(assembly);
il2cpp_defaults.corlib_gen = Assembly::GetImage(assembly2);
DEFAULTS_INIT(object_class, "System", "Object");
DEFAULTS_INIT(void_class, "System", "Void");
When we examined it in Figure – 2, it was the il2cpp_init function that triggered the Runtime::Init function. However, there are 3 different sections where the sub_286C78 function is referenced. We can take a step back and see where il2cpp_init is referenced. As the Elf32 symbol, the relevant structure information is given and loaded with the LOAD segment.
int __fastcall il2cpp_init(char *a1)
{
setlocale(6, &byte_C889C1);
return sub_286C78(a1);
}
int il2cpp_init(const char* domain_name)
{
setlocale(LC_ALL, "");
return Runtime::Init(domain_name);
}
LOAD:00002800 Elf32_Sym <aIl2cppInit - byte_B370, il2cpp_init, 0x28, 0x12, 0, 0xD> ; "il2cpp_init"
You can examine the cross-functional xref graph below, with reference to WinGraph32;
TRACE -> METADATA PROCESSING
Let’s go back to the function where the metadata file is in the il2cpp library; GlobalMetadata.
In the related function, s_GlobalMetadata is defined by converting the byte array returned from LoadMetadataFile to the Il2CppGlobalMetadataHeader structure. Changing this structure prevents tools such as il2cppdumper from extracting information such as related methods, fields, class etc. More precisely, it will determine the offset of the typedef structere of these fields.
For example, imageOffset represents the Il2CppImageDefinition structer. In the next topic, we will determine the target image address Assembly-CSharp via this structer.
s_GlobalMetadata = vm::MetadataLoader::LoadMetadataFile("global-metadata.dat");
if (!s_GlobalMetadata)
return false;
s_GlobalMetadataHeader = (const Il2CppGlobalMetadataHeader*)s_GlobalMetadata;
int32_t imagesOffset;
int32_t imagesSize;
int32_t assembliesOffset;
int32_t assembliesSize;
Let’s leave the s_GlobalMetadataHeader aside. Now let’s analyze another important struct, Il2CppMetadataRegistration. This is because pointers of metadata are defined through this struct.
One of the reasons I’m questioning this is because it’s called in the GlobalMetadata::InitializeAllMethodMetadata function. In this function, it defines metadataUsagesCount values to metadataPointer with for loop and passes it to InitializeRuntimeMetadata.
uintptr_t* metadataPointer = reinterpret_cast<uintptr_t*>(s_Il2CppMetadataRegistration->metadataUsages[i]);
typedef struct Il2CppMetadataRegistration
{
const size_t metadataUsagesCount;
void** const* metadataUsages;
} Il2CppMetadataRegistration;
void* il2cpp::vm::GlobalMetadata::InitializeRuntimeMetadata(uintptr_t* metadataPointer, bool throwOnError)
{
uintptr_t metadataValue = (uintptr_t)UnityPalReadPtrVal((intptr_t*)metadataPointer);
After examining the complex function calls, the function named il2cpp_codegen_il2cpp.cpp->il2cpp_codegen_register was determined as the last reference. Pointers are then stored in Il2CppMetadataRegistration and Il2CppCodeRegistration to match global-metadata(MemoryMappedFile).
NOTE: I can’t find the relevant definition in libil2cpp, it may be generated by the il2cpp AOT compiler, and then injected into the il2cpp runtime.(likely il2cpp_init)
├── il2cpp_codegen_register
│ ├── il2cpp::vm::MetadataCache::Register
│ │ └── il2cpp::vm::GlobalMetadata::Register
void il2cpp::vm::GlobalMetadata::Register(const Il2CppCodeRegistration* const codeRegistration, const Il2CppMetadataRegistration* const metadataRegistration, const Il2CppCodeGenOptions* const codeGenOptions)
{
s_GlobalMetadata_CodeRegistration = codeRegistration;
s_Il2CppMetadataRegistration = metadataRegistration;
}
Now let’s go back to InitializeRuntimeMetadata.
There is a switch condition with the value in the forwarded pointer. The cases here refer to the functions according to the enumeration of Il2CppMetadataUsage.
case kIl2CppMetadataUsageMethodRef:
initialized = (void*)GetMethodInfoFromEncodedIndex(encodedToken);
break;
case kIl2CppMetadataUsageFieldInfo:
initialized = (void*)GetFieldInfoFromIndex(decodedIndex);
break;
case kIl2CppMetadataUsageStringLiteral:
initialized = (void*)GetStringLiteralFromIndex(decodedIndex);
break;
The offset of important data such as method, field, type obtained by il2cppdumper is determined by the functions in these cases.
For example, the method and field returned in the typedef. More specifically, the Update method and jump field in the player type.
Let’s examine a single case in order not to prolong the article; GetMethodInfoFromMethodDefinitionIndex
This function is located under the GetMethodInfoFromEncodedIndex case and passes the value pointed to by the metadataPointer. GetMethodDefinitionFromIndex is passed for this index offset calculation;
reinterpret_cast<T>(reinterpret_cast<uint8_t*>(const_cast<void*>(metadata)) + sectionOffset) + itemIndex
The declaringType pointed to by methodDefinition is defined and passed to the following function. The return value from that function is s_TypeInfoDefinitionTable[index](This is also an Il2CppClass variable).
const MethodInfo* il2cpp::vm::GlobalMetadata::GetMethodInfoFromMethodDefinitionIndex(MethodIndex index)
{
if (!s_MethodInfoDefinitionTable[index])
{
const Il2CppMethodDefinition* methodDefinition = GetMethodDefinitionFromIndex(index);
Il2CppClass* typeInfo = GetTypeInfoFromTypeDefinitionIndex(methodDefinition->declaringType);
il2cpp::vm::Class::SetupMethods(typeInfo);
const Il2CppTypeDefinition* typeDefinition = reinterpret_cast<const Il2CppTypeDefinition*>(typeInfo->typeMetadataHandle);
s_MethodInfoDefinitionTable[index] = typeInfo->methods[index - typeDefinition->methodStart];
}
return s_MethodInfoDefinitionTable[index];
}
SetupMethods is passing data to SetupMethodsLocked. The important part of this function is given below. The Il2CppClass variable is passed to GlobalMetadata::GetMethodInfo and returns the values that Il2CppMethodDefinition points to. GetMethodPointer finds the codeGenModule through the image object, locates the methodPointers array under it, and then retrieves the corresponding function pointer according to the token.
These are not the only define to the newMethod; Other pointed values such as flags, token, slot are also taken from methodInfo. And as a result, the addresses from which tools such as Il2cppDumper have removed will be listed as seen in dnSpy.
NOTE: Here it points to non-portable addresses before the library is loaded.
Il2CppMetadataMethodInfo methodInfo = MetadataCache::GetMethodInfo(klass, index);
newMethod->name = methodInfo.name;
newMethod->methodPointer = MetadataCache::GetMethodPointer(klass->image, methodInfo.token);
GetStringFromIndex(methodDefinition->nameIndex),
GetIl2CppTypeFromIndex(methodDefinition->returnType),
methodDefinition->token,
methodDefinition->flags,
methodDefinition->iflags,
methodDefinition->slot,
methodDefinition->parameterCount,
uint32_t rid = GetTokenRowId(token);
uint32_t table = GetTokenType(token);
if (rid == 0)
return NULL;
return image->codeGenModule->methodPointers[rid - 1];
1 0x06000001 0x0000052E 0x2050 0 0x81 0x4A8 0xA 1 Start
2 0x06000002 0x0000053C 0x2050 0 0x81 0x1D2 0xA 1 Update
3 0x06000003 0x0000054A 0x2050 0 0x1886 0x3B4 0xA 1 .ctor
4 0x06000004 0x00000558 0x2050 0 0x81 0x4A8 0xA 1 Start
5 0x06000005 0x00000566 0x2050 0 0x81 0x24 0x250 1 OnTriggerEnter2D
HOOK VIA IL2CPP-API
In this thread, we will hook il2cpp-api and detect the relevant offsets without using Il2cppDumper. First we will determine the method and then the field.
I could not go into details because I could not detect the reference of some functions in the above titles. This is because when il2cpp is run, functions of il2cpp-api are called. For example, the il2cpp_class_get_methods api is a function that is necessary for us to list methods.
For this reason, I will hook up a few seconds after the application runs. In order to be sure, after the offset of the related module was detected via frida, I attached to the process with IDA and jumped to the address in debug mode to be sure. Addresses matched…
LOAD:00004950 Elf32_Sym <aIl2cppClassGet - byte_B370, mono_class_get_methods, 4, \ ; "il2cpp_class_get_methods"
LOAD:00004950 0x12, 0, 0xD>
[Redmi 8A::???? ]-> Module.findExportByName("libil2cpp.so","il2cpp_class_get_methods")
"0x807efe90"
.text:807EFE90
.text:807EFE90
.text:807EFE90 ; Attributes: thunk
.text:807EFE90
.text:807EFE90 ; int __fastcall mono_class_get_methods(int, unsigned int *)
.text:807EFE90 EXPORT mono_class_get_methods
.text:807EFE90 mono_class_get_methods
.text:807EFE90 B mono_class_get_methods_0
.text:807EFE90 ; End of function mono_class_get_methods
.text:807EFE90
In order to detect the methods, I need to find the address of the first image named Assembly-CSharp. For this, we will pass an address to the il2cpp_assembly_get_image native function, which will be defined, to return a pointer to the domain and memory. Because Il2CppAssembly* returns pointing to Il2CppImage*.
var alloc = size => Memory.alloc((size == undefined ? 1 : size) * p_size)
const domain = il2cpp_domain_get()
const size_t = alloc()
const assemblies = il2cpp_domain_get_assemblies(domain, size_t)
// assembly
const Il2CppImage* il2cpp_assembly_get_image(const Il2CppAssembly *assembly){ return Assembly::GetImage(assembly);}
// domain
Il2CppDomain* il2cpp_domain_get(){ return Domain::GetCurrent(); }
After getting the address of all images from Il2CppAssembly, the names corresponding to the image were also defined. When examined with dnSpy, you will see that there are 11 class. The class count comes from the typeCount pointed to by Il2CppImage* via the Image::GetNumTypes method.
for (let i = 0; i < size_t.readInt(); i++) {
let imgAddr = il2cpp_assembly_get_image(assemblies.add(p_size * i)).readPointer()
let imgName = imgAddr.add(p_size).readPointer().readCString()
let classCount = il2cpp_image_get_class_count(imgAddr).toInt32()
if (imgName.indexOf("Assembly-CSharp") != -1) {
return imgAddr
}
}
[Redmi 8A::???? ]-> ptrImage()
[*] Image Address -> 0x93e90bc8
[*] Image Name -> Assembly-CSharp
[*] Image Class Count -> 11
"0x93e90bc8"
After listing the classes included in the Assembly-CSharp image, the methods are called via the il2cpp_class_get_methods api. The pClass and the iter variable allocated to be written to memory are passed to the Class::GetMethods function.
With the MethodInfo** array list that Il2CppClass points to, the methods corresponding to the class are listed. Also, It is initialized with SetupMethods.
When I subtracted the libil2cpp base address with the address resulting from adding Process.pointerSize, I reached the target offset. This is important to me because when I compared the dumped offsets with il2cppdumper, the match was confirmed. 8085f90c – 80580000 -> 2df90c
for (let i = 0; i < imageCount; i++){
let iter = alloc()
let method = NULL
let pClass = il2cpp_image_get_class(retImage, i)
while (method = il2cpp_class_get_methods(il2cpp_image_get_class(retImage, i), iter)) {
if (method == 0) break
let methodName = getMethodName(method)
let methodAddr = method.readPointer()
if (methodAddr == 0) continue
methodCount++
}
}
====================Player(0x6bb648c0)====================
[*] 0x6bb6d750 ---> 0x8085f830 (0x2df830) ---> Start
[*] 0x6bb6d780 ---> 0x8085f90c (0x2df90c) ---> Update
[*] 0x6bb6d7b0 ---> 0x8085fe5c (0x2dfe5c) ---> GameOver
[*] 0x6bb6d7e0 ---> 0x8085fe7c (0x2dfe7c) ---> .ctor
====================ScoreManager(0x6bb655e0)==============
[*] 0x6bb84128 ---> 0x8085fe84 (0x2dfe84) ---> Start
[*] 0x6bb84158 ---> 0x8085fef8 (0x2dfef8) ---> Update
[*] 0x6bb84188 ---> 0x80860154 (0x2e0154) ---> .ctor
// Methods
// RVA: 0x2DF830 Offset: 0x2DF830 VA: 0x2DF830
private void Start() { }
// RVA: 0x2DF90C Offset: 0x2DF90C VA: 0x2DF90C
private void Update() { }
// RVA: 0x2DFE5C Offset: 0x2DFE5C VA: 0x2DFE5C
private void GameOver() { }
// RVA: 0x2DFE7C Offset: 0x2DFE7C VA: 0x2DFE7C
public void .ctor() { }
It is easier to detect fields via the class. For this, the pointer of the targeted class is passed to the il2cpp_class_get_fields api. The class pointer address to be passed to the Class::GetFields function will take the value of the fields that Il2CppClass points to. As in the method, Class.cpp comes into play in this process and is initialized in SetupFields.
When I compare it with il2cppdumper, you can see that it is dumped in the same order.
while (field = il2cpp_class_get_fields(ptr(pClass), iter)) {
if (field == 0x0) break
let fieldName = field.readPointer().readCString()
let filedType = field.add(p_size).readPointer()
}
Class::SetupFields(klass);
if (klass->field_count == 0)
return NULL;
*iter = klass->fields;
return klass->fields;
[*] Field Name ---> speed(0x81402f98)
[*] Field Name ---> jump(0x81402f98)
[*] Field Name ---> ground(0x814072f0)
[*] Field Name ---> deathGround(0x814072f0)
[*] Field Name ---> rigidBody(0x81408c20)
[*] Field Name ---> playerCollider(0x81404bf8)
[*] Field Name ---> animator(0x81403f38)
[*] Field Name ---> deathSound(0x81404350)
[*] Field Name ---> jumpSound(0x81404350)
[*] Field Name ---> mileStone(0x81402f98)
[*] Field Name ---> mileStoneCount(0x81402f80)
[*] Field Name ---> speedMultipier(0x81402f98)
[*] Field Name ---> gameManager(0x81406190)
public float speed;
public float jump;
public LayerMask ground;
public LayerMask deathGround;
private Rigidbody2D rigidBody;
private Collider2D playerCollider;
private Animator animator;
public AudioSource deathSound;
public AudioSource jumpSound;
public float mileStone;
private float mileStoneCount;
public float speedMultipier;
public GameManager gameManager;
Now it’s time for Frida’s intercept api…
The intercepted address is actually the address of the target method. Instead of defining that address directly, I added to the base address of the native library and reached the offset.
let point = subMethod
Interceptor.attach(ptr(ptrIl2cpp).add(point),{
onEnter: function(args){
this.instance = args[0]
},
onLeave: function(result){
var jump = this.instance.add(ofField)
jump.writeFloat(newValue)
}
})
Of course, it was necessary to find the location of the target field in the offset. Since it goes in order (0xC, 0x10, 0x14…..) I added the relevant offset according to the order in the findField output.
Since the related value is a float variable, the writeFloat method was used and I increased the jump value as in the example.
NOTE: The Game Over control was controlled via a true and false value. I set die to false during hook.. Please don’t be surprised by the video 😛
CONCLUSION
Damn great articles are being shared!
Following the industry on Twitter helps me know my place. There are so many solid researches, blog posts, techniques that excite me and help me navigate my way…
To be honest, I envy these posts. 😀 I am aware that I need more technical information for this type of articles and believe me I am trying to learn. In short, the important thing is action.
Now I’m more hyped than ever and it helps me push my limits!
If you see any missing or incorrect information in the article, please contact me.