反射式PE加载

CobaltStrike 的 Beacon，实际上是一个 DLL。Shellcode 形式的 Beacon，是补丁后的 DLL 文件。通过巧妙的补丁，Beacon 可以实现像 Shellcode 一般的位置独立。我们分别生成 DLL 与 RAW 格式的载荷，进行对比：

DLL 格式的 Beacon，符合典型的 PE 文件格式。

对于 Shellcode 格式的 Beacon，我们发现其实际上是个补丁后的 DLL 文件，因为其格式符合 PE 格式标准

我们甚至能解析出导出函数 ReflectiveLoader。

那么，补丁了哪些地方呢？我们仔细对比一下这 2 个文件的 DOS 头，我们会发现 Shellcode 格式的 Beacon(右边) 虽然大体上符合 PE 格式标准，但 DOS 头是补丁过的。

对于 PE 文件，因为 DOS 头并非代码区，所以并不该被解析成机器码执行。因此 DLL 文件的 DOS 头如果被强行解释成汇编指令，代码看起来没有什么实际意义。而右图的 DOS 头被补丁成了精心设计的代码，我们来解读一下：

4D 5A					pop r10				# PE Magic Bytes，同时与下面的指令共同平衡栈	
41 52					push r10			# 平衡栈 						
55 						push rbp			# 设置栈帧
48 89 E5 				mov rbp, rsp		
48 81 EC 20 00 00 00 	sub rsp,0x20		
48 8D 1D EA FF FF FF 	lea rbx, [rip-0x16]	# 前移0x16字节从而获得Shellcode地址
48 89 DF 				mov rdi,rbx			
48 81 C3 F4 5F 01 00 	add rbx, 0x15ff4	# 通过硬编码偏移调用ReflectiveLoader导出函数
FF D3 					call rbx
41 B8 F0 B5 A2 56 		mov r8d,0x56a2b5f0	# 调用 DllMain函数
68 04 00 00 00 			push 4
5A 						pop rdx
48 89 F9 				mov rcx, rdi
FF D0 					call rax

我们来查看一下硬编码的偏移 0x15ff4，对应的 RVA 是 0x16bf4，确实正好是导出函数 ReflectiveLoader 的地址。

简单来说，通过补丁 DOS 头，使其成为具有实际意义的 Shellcode 头，实现当 Shellcode 被加载后，执行流程跳转到 ReflectiveLoader 导出函数，最后再执行 DllMain 函数。这样，可以将 DLL 转换为位置独立的 Shellcode。

反射式加载

那么，ReflectiveLoader 函数充当了什么作用？为什么在 DLL 被加载之前，这个导出函数就可以被执行了呢？在回答这些问题之前，我们需要知道 Windows DLL 加载器负责将存在于磁盘中的 DLL 加载到进程的虚拟内存空间。如果用于攻防模拟，Windows DLL 加载器存在着这些缺点：

DLL 必须存在于磁盘
DLL 不可被混淆
DLL 的加载会触发内核回调

所以，直接用 Windows DLL 加载器加载 DLL Beacon 不是最理想的，但如果我们能从内存中加载 Beacon DLL 呢？这么一个概念被称为反射式加载，被 Stephen Fewer 提出并实现(https://github.com/stephenfewer/ReflectiveDLLInjection)。反射式加载可以带来以下优势：

DLL 不必存在于磁盘，避免文件特征
避免映像文件加载触发的内核回调
我们的DLL 不会被 PEB 罗列

反射式加载即直接从内存中加载 DLL，与传统的 Windows DLL 加载都是将原始文件转换为在进程的虚拟内存中的格式。我们之前得知，当 PE 文件存在于磁盘和内存中时，因为对齐系数的不同，尺寸、原始文件偏移与RVA的映射关系会略有变化，一般来说在内存中会显得更加膨胀，在磁盘中时更加紧凑。

我们知道，PE 文件有着偏好加载地址，尽管实际被加载时，基址不一定与偏好加载地址相同。在 PE 文件中，有一些全局变量的地址是硬编码的(这些数据的地址由重定向表追踪)，那么自然也会随着实际加载地址的变化而变化。此外，IAT 表中的条目也会被更新，等等。平时，是由 Windows DLL 加载器帮我们完成了这些，但如果要实现反射式加载，这些任务就落在了我们头上。那么，实现反射式加载有这些步骤：

通过诸如 CreateRemoteThread 直接执行导出函数 ReflectiveLoader，或者像 CobaltStrike 一样补丁 DLL 的 DOS 头使其成为 Shellcode 头，跳转到 ReflectiveLoader。
ReflectiveLoader 函数计算出 DLL 的基址，通过不断前移，直到遇到 MZ，即 Magic Bytes。
通过 PEB walking 的方法得到 Kernel32 模块以及一些必要的 API 例如 LoadLibrary，GetProcAddress，VirtualAlloc 的地址。因为 ReflectiveLoader 函数在 DLL 被加载前就被调用了，所以需要位置独立，即不能使用全局变量以及直接调用 API。
使用 VirtualAlloc 分配内存空间，用于盛放映射后的 DLL
将 DLL 的各个头以及节复制到分配的内存空间，以及为不同区域设置对应的内存权限
修复 IAT 表。遍历每个导入的 DLL，对于每个 DLL，遍历每个导入函数。根据函数的导入方式(函数序数或名称)，补丁导入函数的地址。
修复重定向表。方法为计算出实际基址与偏好地址的差值，然后对于每个硬编码的地址都应用上这个差值。
调用 DllMain 入口函数，DLL 被成功加载至内存中。
如果是通过 Shellcode 头跳转的，那么 ReflectiveLoader 函数调用结束后会返回 Shellcode 头。如果是通过 CreateRemoteThread 调用的，那么线程会结束。

具体的代码实现，可以参考原始项目(https://github.com/stephenfewer/ReflectiveDLLInjection/blob/master/dll/src/ReflectiveLoader.c)

在 PE 小节，我们讲过了导入导出过程，关于重定向表的修复，我们以案例来学习一下：

calc 的偏好地址为 0x140000000。

calc 有 2 个重定向块，分别有 12 和 2 个条目。

Page RVA 与 Block Size 分别占 4 个字节，总计 8 个。从第 9 个字节开始，每个条目占用 2 个字节。因此，每个重定向块的尺寸为 8+2*条目数量，这里是 32 = 8 + 12*2。

每个条目中的 WORD 值，我们可以提取出其与页的偏移值，加上页的 RVA，我们就可以得到硬编码地址的 RVA。我们选择一个硬编码的地址，该地址处于 0x2000 的 RVA 处，值为 0x140003060，相对于偏好地址的偏移值为 0x3060。

在 WinDBG 中，当 calc 存在于内存空间时，我们会发现该地址被修复了：

不过这个地址与映像基址的相对偏移依旧是 0x3060。

尽管提供了反射式加载原始项目的代码，但我们再以 Maldev 中的代码来回顾一下一些重难点步骤：

复制各个节：

PBYTE			pPeBaseAddress			= NULL;

if ((pPeBaseAddress = VirtualAlloc(NULL, pPeHdrs->pImgNtHdrs->OptionalHeader.SizeOfImage, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE)) == NULL) {
	PRINT_WINAPI_ERR("VirtualAlloc");
	return FALSE;
}

for (int i = 0; i < pPeHdrs->pImgNtHdrs->FileHeader.NumberOfSections; i++) {
	memcpy(
		(PVOID)(pPeBaseAddress + pPeHdrs->pImgSecHdr[i].VirtualAddress),			// Distination: pPeBaseAddress + RVA
		(PVOID)(pPeHdrs->pFileBuffer + pPeHdrs->pImgSecHdr[i].PointerToRawData),		// Source: pPeHdrs->pFileBuffer + RVA
		pPeHdrs->pImgSecHdr[i].SizeOfRawData							// Size
	);
}

修复重定向表：

BOOL FixReloc(IN PIMAGE_DATA_DIRECTORY pEntryBaseRelocDataDir, IN ULONG_PTR pPeBaseAddress, IN ULONG_PTR pPreferableAddress) {

    // Pointer to the beginning of the base relocation block.
    PIMAGE_BASE_RELOCATION pImgBaseRelocation = (pPeBaseAddress + pEntryBaseRelocDataDir->VirtualAddress);

    // The difference between the current PE image base address and its preferable base address.
    ULONG_PTR uDeltaOffset = pPeBaseAddress - pPreferableAddress;

    // Pointer to individual base relocation entries.
    PBASE_RELOCATION_ENTRY pBaseRelocEntry = NULL;

    // Iterate through all the base relocation blocks.
    while (pImgBaseRelocation->VirtualAddress) {

        // Pointer to the first relocation entry in the current block.
        pBaseRelocEntry = (PBASE_RELOCATION_ENTRY)(pImgBaseRelocation + 1);

        // Iterate through all the relocation entries in the current block.
        while ((PBYTE)pBaseRelocEntry != (PBYTE)pImgBaseRelocation + pImgBaseRelocation->SizeOfBlock) {
            // Process the relocation entry based on its type.
            switch (pBaseRelocEntry->Type) {
	            case IMAGE_REL_BASED_DIR64:
	                // Adjust a 64-bit field by the delta offset.
	                *((ULONG_PTR*)(pPeBaseAddress + pImgBaseRelocation->VirtualAddress + pBaseRelocEntry->Offset)) += uDeltaOffset;
	                break;
	            case IMAGE_REL_BASED_HIGHLOW:
	                // Adjust a 32-bit field by the delta offset.
	                *((DWORD*)(pPeBaseAddress + pImgBaseRelocation->VirtualAddress + pBaseRelocEntry->Offset)) += (DWORD)uDeltaOffset;
	                break;
	            case IMAGE_REL_BASED_HIGH:
	                // Adjust the high 16 bits of a 32-bit field.
	                *((WORD*)(pPeBaseAddress + pImgBaseRelocation->VirtualAddress + pBaseRelocEntry->Offset)) += HIWORD(uDeltaOffset);
	                break;
	            case IMAGE_REL_BASED_LOW:
	                // Adjust the low 16 bits of a 32-bit field.
	                *((WORD*)(pPeBaseAddress + pImgBaseRelocation->VirtualAddress + pBaseRelocEntry->Offset)) += LOWORD(uDeltaOffset);
	                break;
	            case IMAGE_REL_BASED_ABSOLUTE:
	                // No relocation is required.
	                break;
	            default:
	                // Handle unknown relocation types.
	                printf("[!] Unknown relocation type: %d | Offset: 0x%08X \n", pBaseRelocEntry->Type, pBaseRelocEntry->Offset);
	                return FALSE;
            }
            // Move to the next relocation entry.
            pBaseRelocEntry++;
        }

        // Move to the next relocation block.
        pImgBaseRelocation = (PIMAGE_BASE_RELOCATION)pBaseRelocEntry;
    }

    return TRUE;
}

修复 IAT 表：

BOOL FixImportAddressTable(IN PIMAGE_DATA_DIRECTORY pEntryImportDataDir, IN PBYTE pPeBaseAddress) {

	// Pointer to an import descriptor for a DLL
	PIMAGE_IMPORT_DESCRIPTOR	pImgDescriptor		= NULL;
 	// Iterate over the import descriptors
	for (SIZE_T i = 0; i < pEntryImportDataDir->Size; i += sizeof(IMAGE_IMPORT_DESCRIPTOR)) {
		// Get the current import descriptor
		pImgDescriptor = (PIMAGE_IMPORT_DESCRIPTOR)(pPeBaseAddress + pEntryImportDataDir->VirtualAddress + i);
		// If both thunks are NULL, we've reached the end of the import descriptors list
		if (pImgDescriptor->OriginalFirstThunk == NULL && pImgDescriptor->FirstThunk == NULL)
			break;

		// Retrieve information from the current import descriptor
		LPSTR		cDllName                        = (LPSTR)(pPeBaseAddress + pImgDescriptor->Name);
		ULONG_PTR	uOriginalFirstThunkRVA          = pImgDescriptor->OriginalFirstThunk;
		ULONG_PTR	uFirstThunkRVA                  = pImgDescriptor->FirstThunk;
		SIZE_T		ImgThunkSize                    = 0x00;	// Used to move to the next function (iterating through the IAT and INT)
		HMODULE		hModule                         = NULL;

		// Try to load the DLL referenced by the current import descriptor
		if (!(hModule = LoadLibraryA(cDllName))) {
			PRINT_WINAPI_ERR("LoadLibraryA");
			return FALSE;
		}

		// Iterate over the imported functions for the current DLL
		while (TRUE) {
			
			// Get pointers to the first thunk and original first thunk data
			PIMAGE_THUNK_DATA               pOriginalFirstThunk     = (PIMAGE_THUNK_DATA)(pPeBaseAddress + uOriginalFirstThunkRVA + ImgThunkSize);
			PIMAGE_THUNK_DATA               pFirstThunk             = (PIMAGE_THUNK_DATA)(pPeBaseAddress + uFirstThunkRVA + ImgThunkSize);
			PIMAGE_IMPORT_BY_NAME           pImgImportByName        = NULL;
			ULONG_PTR                       pFuncAddress            = NULL;

			// At this point both 'pOriginalFirstThunk' & 'pFirstThunk' will have the same values
			// However, to populate the IAT (pFirstThunk), one should use the INT (pOriginalFirstThunk) to retrieve the 
			// functions addresses and patch the IAT (pFirstThunk->u1.Function) with the retrieved address.
			if (pOriginalFirstThunk->u1.Function == NULL && pFirstThunk->u1.Function == NULL) {
				break;
			}

			// If the ordinal flag is set, import the function by its ordinal number
			if (IMAGE_SNAP_BY_ORDINAL(pOriginalFirstThunk->u1.Ordinal)) {
				if ( !(pFuncAddress = (ULONG_PTR)GetProcAddress(hModule, IMAGE_ORDINAL(pOriginalFirstThunk->u1.Ordinal))) ) {
					printf("[!] Could Not Import !%s#%d \n", cDllName, (int)pOriginalFirstThunk->u1.Ordinal);
					return FALSE;
				}
			}
			// Import function by name
			else {
				pImgImportByName = (PIMAGE_IMPORT_BY_NAME)(pPeBaseAddress + pOriginalFirstThunk->u1.AddressOfData);
				if ( !(pFuncAddress = (ULONG_PTR)GetProcAddress(hModule, pImgImportByName->Name)) ) {
					printf("[!] Could Not Import !%s.%s \n", cDllName, pImgImportByName->Name);
					return FALSE;
				}
			}

			// Install the function address in the IAT
			pFirstThunk->u1.Function = (ULONGLONG)pFuncAddress;

			// Move to the next function in the IAT/INT array
			ImgThunkSize += sizeof(IMAGE_THUNK_DATA);
		}
	}

	return TRUE;
}

实际上，对于更加复杂的 PE 文件，我们可能还要处理异常表、TLS 回调表、函数参数等，请大家查询相关资料进行探索。

膨胀式加载

反射式加载实现了从内存中加载 DLL，有效地避免了一些 IOC。尽管如此，随着检测技术的升级，反射式加载其实也会留下一些显著的 IOC，我们来分析一下：

分配空间、修改值、复制节、更改权限等这一系列操作很嘈杂
分配 RWX 权限的内存空间是一个红线
从调用栈的角度来看，因为加载的 DLL 并非来源于磁盘，因此没有对应的符号，如下图所示，多个函数都没有对应的模块以及符号。该内存区域还是私有的，意味着很有可能是 Shellcode。这样的内存区域被称为漂浮代码，或者没有支持的内存区域(unbacked memory)

0:004> k
 # Child-SP          RetAddr               Call Site
00 0000009e`4b3afe58 00000245`d207208d     KERNEL32!SleepEx
01 0000009e`4b3afe60 00000245`d2073260     0x00000245`d207208d
02 0000009e`4b3afe68 00000245`d1cf5580     0x00000245`d2073260
03 0000009e`4b3afe70 00000245`cfdb5d10     0x00000245`d1cf5580
04 0000009e`4b3afe78 0000009e`4b3afe08     0x00000245`cfdb5d10
05 0000009e`4b3afe80 00000245`d2071000     0x0000009e`4b3afe08
06 0000009e`4b3afe88 00000245`d20722c0     0x00000245`d2071000
07 0000009e`4b3afe90 00000245`d2071000     0x00000245`d20722c0
08 0000009e`4b3afe98 00007ffb`c87f0000     0x00000245`d2071000
09 0000009e`4b3afea0 00000000`00000000     ucrtbase!parse_bcp47 <PERF> (ucrtbase+0x0)

关于第 3 点，延伸阅读可以参考该文章(https://www.elastic.co/security-labs/hunting-memory)。上图的案例，我是反射式加载了调用 SleepEx 的 PE 文件，用于方便观察调用栈。