BLOG, AMD64, UEFI, BOOTKIT, WINDOWS, KERNEL

Bootkits and kernel patching

Jun 26, 2024 Samuel Tulach

If you have at least a little bit of experience with Windows kernel development, then you know that patching any kernel code at runtime is a bad idea. Windows has had KPP (PatchGuard) since the Windows XP days. Even if you somehow disable it, anticheat software is going to have an easy time finding those patches by comparing the code in memory to the code on the disk. Right?

Kernel Patch Protection, PatchGuard?

If we were to just write something into the code section of the Windows kernel, the system would crash with bug check code CRITICAL_STRUCTURE_CORRUPTION. The kernel continuously checks for changes in protected memory regions (read-only code sections) and certain structures. If it finds something it’s not expecting, the system crashes.

I don’t really want to go into details on how it works, because all you need to know for this article to make sense (somewhat) is that its initialization happens when the kernel is started and it uses for its checks the state of the kernel image that is currently in memory. In other words, if you do any modification to the kernel code from within the boot sequence, when it’s already loaded in memory, but KPP is not yet initialized, the KPP will just take the modified code as the correct one.

But… it’s different from the disk!

Let’s say that we have a bootkit. It patches the kernel before KPP is initialized and therefore the system runs just fine. How about anticheats? Surely they do check the kernel code against the files on the disk. After all, when the PE image is loaded into the memory, the individual code sections (usually .text, kernel does not allow RWX sections) should always correspond 1:1 to the ones on the disk. If they don’t, modifications were made.

Except there is a plot twist. They don’t.

Since Windows 10, version 1809, Microsoft has implemented mitigations against the Spectre vulnerability. How does it work? It runtime patches kernel-mode drivers and parts of the kernel itself.

screenshot

As you can imagine, this is extremely abusable. Now those code sections are not the same as those in the file on disk. Writing huge shellcode or mapping a whole driver over those sections, though, would still be pretty easy to spot. Only imports are being replaced, which means that only a few bytes are modified at a time and the rest remains the same.

cringe meme

What can we do?

If we assume that anticheat software does not perform in-depth checks on those code sections, we can freely replace a few bytes without triggering a detection. Here is an example:

EFI_STATUS HookedExitBootServices(EFI_HANDLE imageHandle, UINTN mapKey)
{
	PROTECT_ULTRA();
	gBS->ExitBootServices = OriginalExitBootServices;

	UINT64 returnAddress = (UINT64)_ReturnAddress();
	while (CompareMem((VOID*)returnAddress, "This program cannot be run in DOS mode", 38) != 0)
	{
		returnAddress--;
	}

	UINT64 moduleBase = returnAddress - 0x4E;

	UINT64 loaderBlockScan = ScanForLoaderBlock((VOID*)moduleBase);
	if (!loaderBlockScan)
	{
		Print(EW(L"Failed to find OslExecuteTransition!\n"));
		INFINITE_LOOP();
	}

	UINT64 resolvedAddress = *(UINT64*)((loaderBlockScan + 7) + *(int*)(loaderBlockScan + 3));

	BlpArchSwitchContext = (BlpArchSwitchContext_t)Utils::FindPatternImage((VOID*)moduleBase, EC("40 53 48 83 EC 20 48 8B 15"));
	if (!BlpArchSwitchContext)
	{
		Print(EW(L"Failed to find BlpArchSwitchContext!\n"));
		INFINITE_LOOP();
	}

	BlpArchSwitchContext(ApplicationContext);

	PLOADER_PARAMETER_BLOCK loaderBlock = (PLOADER_PARAMETER_BLOCK)resolvedAddress;

	KLDR_DATA_TABLE_ENTRY kernelModule = Utils::GetModule(&loaderBlock->LoadOrderListHead, EW(L"ntoskrnl.exe"));
	if (!kernelModule.DllBase)
	{
		ContextPrint(EW(L"Failed to find ntoskrnl.exe in OslLoaderBlock!\n"));
		INFINITE_LOOP();
	}

	UINT64 functionScan = ScanForPatch1(kernelModule.DllBase);
	if (!functionScan)
	{
		ContextPrint(EW(L"Failed to find target function (1)!\n"));
		INFINITE_LOOP();
	}

	// mov r11, [rsp+64]
	// jmp r11
	UINT8 code[] = { 0x4C, 0x8B, 0x5C, 0x24, 0x40, 0x41, 0xFF, 0xE3 };
	Utils::CopyMemory((VOID*)functionScan, code, sizeof(code));

	BlpArchSwitchContext(FirmwareContext);

	// mapKey can change if we call certain functions with internal allocations
	// for this reason we can either resolve the key manually, but that does not seem to work
	// on certain systems (gBS->GetMemoryMap fails) so we will just use the key
	// that was passed to this function but make sure there are no internal allocation
	// (no logo or extensive framebuffer printing)

	PROTECT_END();
	return OriginalExitBootServices(imageHandle, mapKey);
}

Since the function has 8 arguments, arguments 5-8 are pushed onto the stack. rsp+64 is then the location of the last argument. That allows us to call any kernel function while still having 7 arguments.

inline uint64_t call_backdoor(uint64_t function_address, uint64_t param1 = 0, uint64_t param2 = 0, uint64_t param3 = 0, uint64_t param4 = 0, uint64_t param5 = 0, uint64_t param6 = 0, uint64_t param7 = 0)
{
    if (!data::ntdll_handle)
    {
        data::ntdll_handle = GetModuleHandleA("ntdll.dll");
        if (!data::ntdll_handle)
            throw std::exception("ntdll.dll handle could not be obtained");
    }

    if (!data::function_address)
    {
        data::function_address = reinterpret_cast<uint64_t>(GetProcAddress(data::ntdll_handle, "NtExerciseForTheReader"));
        if (!data::function_address)
            throw std::exception("NtExerciseForTheReader export not located");
    }

    typedef uint64_t(_stdcall* function_t)(uint64_t first, uint64_t second, uint64_t third, uint64_t fourth, uint64_t fifth, uint64_t sixth, uint64_t seventh, uint64_t target_function);
    const auto target_function = reinterpret_cast<function_t>(data::function_address);
    return target_function(param1, param2, param3, param4, param5, param6, param7, function_address);
}

Then we can just do the typical memory copy…

inline uint64_t overwrite_page(uint32_t page_id, uint64_t target_address, defines::pte_t& original_value)
{
    uint32_t page_offset = target_address % defines::page_size;
    uint64_t page_start_physical = target_address - page_offset;

    defines::pte_t pte = { 0 };
    kernel_copy_memory(reinterpret_cast<uint64_t>(&pte), data::swappable_pages[page_id].pte, sizeof(defines::pte_t));

    original_value = pte;
    pte.page_frame = PAGE_TO_PFN(page_start_physical);

    kernel_copy_memory(data::swappable_pages[page_id].pte, reinterpret_cast<uint64_t>(&pte), sizeof(defines::pte_t));

    kernel_flush_current_tb_immediately();

    return data::swappable_pages[page_id].virtual_address + page_offset;
}

inline void read_physical_address(uint32_t page_id, uint64_t target_address, uint64_t buffer, size_t size)
{
    if (!check_valid_physical_address(target_address))
        return;

    defines::pte_t original;
    uint64_t virtual_address = overwrite_page(page_id, target_address, original);

    kernel_copy_memory(buffer, virtual_address, size);

    restore_page(page_id, original);
}

inline bool read_process_memory(uint32_t page_id, uint64_t process, uint64_t address, uint64_t buffer, size_t size)
{
    if (!address)
        return false;

    uint64_t dirbase = get_process_directory_base(page_id, process);
    SIZE_T current_offset = 0;
    SIZE_T total_size = size;
    while (total_size)
    {
        uint64_t current_physical_address = translate_linear_address(page_id, address + current_offset);
        if (!current_physical_address)
            return false;

        uint64_t read_size = min(defines::page_size - (current_physical_address & 0xFFF), total_size);

        read_physical_address(page_id, current_physical_address, buffer + current_offset, read_size);

        total_size -= read_size;
        current_offset += read_size;

        if (!read_size)
            break;
    }

    return true;
}

Detection

If we call kernel functions through our jumpout patch, the call stack is only going to contain memory locations associated with legitimate kernel modules. We can fully unload our EFI bootkit after the patch is done (it does not have to be a runtime driver).

The most obvious problem is then the patch. While anticheats still do compare code sections to the disk, they either just disregard the results (unless it’s a known public patch) or spam all differences to its servers (including the ones made by Windows itself). In both cases, the user will not receive a ban or even a kick from the game, although I have seen several P2C (pay-to-cheat) providers utilizing a similar method, so it’s very likely more in-depth checks are going to be implemented in the future.