[THCon 2023] Decrypting PHP source code protected by a custom extension


2023-05-10

This blog post may look like a CTF writeup but it’s actually an excuse to talk about PHP extensions and how they work. The writeup part is on the “Codelocker” challenge from the THCON23 CTF (where we scored 3rd). At the end of the article is a real-world example using phpBolt, a freemium source code protector.

Challenge description and battle plan

We managed to find the THCon admin panel sources, unfortunately the source code seems to be encrypted. Find a way to access to their administration panel !

Author : voydstack (voydstack#6035)

https://codelocker01.ctf.thcon.party/

https://codelocker02.ctf.thcon.party/

An archive is also provided, it contains the following files:

├── html               // webapp source code
│   ├── assets
│   │   └── index.css
│   ├── config.php
│   ├── index.php
│   └── login.php
├── php_codelocker.ini // config file php
└── php_codelocker.so  // php custom "extension"

The files are stored encrypted and are seemingly decrypted before being served to the user.

$ xxd html/login.php
00000000: 01c0 dec0 7d01 0000 ed80 576c 4063 1ddc  ....}.....Wl@c..
00000010: 661d b19f 5a9d 94c5 f58e 029b d234 09e0  f...Z........4..
00000020: 48d4 a60a aa1d e6e3 f449 6e7f ab9b 85b8  H........In.....
00000030: cf8c 9e90 a490 4e5a 9ac2 08a9 a531 84d7  ......NZ.....1..

Since we don’t have access to the authentication mechanism (and thus, the flag it prevents us from accessing), we’ll have to reverse engineer the PHP “codelocker” extension to understand how the source code is encrypted.

Reverse engineering

PHP extensions

The code has been commented and the functions renamed for the sake of clarity. An extensive documentation on PHP extensions can be found here : phpinternalsbook - extensions_design.

While exploring the call-graph, we find the following function that hints us about how PHP extensions work.

PHP extensions offer the possibility to “hook” certain of its internal functions. To “hook” describes the action of replacing (in memory) the target functions address with the address of a function that we control (generally to override the functions default logic or to simply log all arguments passed to it).

Importing structs

Let’s examine custom_compile_file a bit closer.

__int64 __fastcall sub_13C0(__int64 a1, unsigned int a2)
{
  // removed for brevity
    
  if ( !a1 )
    goto LABEL_4;
  v4 = *(a1 + 40);
  if ( !v4 )
    goto LABEL_4;
  if ( strstr((v4 + 24), ".phar") )
    goto LABEL_4;
  v5 = strstr((v4 + 24), "phar://");
  // ...
  v7 = zend_fopen(*(a1 + 40), a1 + 48);
  // ...

At first glance the code doesn’t seem very comprehensible, we can see that the parameter a1 seems to be a struct (the pattern to look for is a1+const where const is the offset of a member of the struct). This is confirmed by looking at the definition of the zend_compile_file function inside PHP’s source code.

extern ZEND_API zend_op_array *(*zend_compile_file)(zend_file_handle *file_handle, int type);

The function zend_compile_file takes as first parameter a zend_file_handle struct (that also contains other structs).

typedef struct _zend_file_handle {
  union {
    FILE          *fp;
    zend_stream   stream;
  } handle;
  zend_string       *filename;
  zend_string       *opened_path;
  uint8_t           type; /* packed zend_stream_type */
  bool              primary_script;
  bool              in_list; /* added into CG(open_file) */
  char              *buf;
  size_t            len;
} zend_file_handle;

To make reverse engineering easier, we’re going to import them into IDA.

IDA offers the possibility to construct structs from user supplied C/C++ header files (File -> Load File -> Parse C Header File ou CTRL+F9). IDA automatically tries to resolve dependencies between the different headers but this is error-prone on large codebases. An other option is to simply copy-paste each struct (and their dependencies) into a new .h (which is what I did). Once that’s done, we can simply left click a1 and Convert to struct * -> "zend_file_handle".

After creating the structs and renaming the different variables the code has become far more readable. All that’s left now is to understand what it does.

char *__fastcall custom_compile_file(zend_file_handle *file_handle, zend_stream_type type)
{
  // removed for brevity
    
  if ( !file_handle )                    // Exit if file_handle is NULL
    goto FUNC_before_END;
  filename = file_handle->filename;
  if ( !filename )                       // Exit if file_handle->filename is NULL
    goto FUNC_before_END;
  if ( strstr(filename->val, ".phar") )  // Exit if the file is a phar archive
    goto FUNC_before_END;
  v5 = strstr(filename->val, "phar://"); // Exit if the file is a phar archive
  if ( v5 )
    goto FUNC_before_END;
  if ( zend_is_executing() )
  {
    if ( get_active_function_name() )
    {
      active_function_name = get_active_function_name();
      strncpy(dest, active_function_name, 0x3EuLL);
      if ( dest[0] )
      {
        // ANTIDEBUG: Exit if caller function is "show_source" or "highlight_file"
        if ( !strcasecmp(dest, "show_source") || !strcasecmp(dest, "highlight_file") )
          goto FUNC_END;
      }
    }
  }
  // open the file
  bOpenSuccess = zend_fopen(file_handle->filename, &file_handle->opened_path);
  v8 = bOpenSuccess;
  if ( !bOpenSuccess )
    goto FUNC_before_END;
  // check file format (starts with 0xC0DEC001 in LE)
  if ( fread(ptr, 0x18uLL, 1uLL, bOpenSuccess) == 1 && ptr[0] == 0xC0DEC001 )
  {
    file_basename = basename(file_handle->filename->val);
    memset(buff_192, 0, sizeof(buff_192));
      
    // v---------------------------------- Voodoo magic ----------------------------------v
    v10 = sub_1AA0(file_basename);
    v11 = v10;
    v12 = v10 + 4;
    do
    {
      v10->m128i_i32[0] ^= 0xCAFEBABE;
      v10 = (v10 + 4);
    }
    while ( v10 != v12 );
    size = ptr[1];
    v13 = malloc(ptr[1]);
    if ( fread(v13, size, 1uLL, v8) == 1
      && (sub_1CD0(buff_192, v11),
          v21 = _mm_loadu_si128(&ptr[2]),
          sub_1D10(buff_192, &v21),
          sub_1D20(buff_192, v13, ptr[1]),
          sub_1720(v18),
          sub_1890(v18, v13, ptr[1]),
          sub_1990(v18),
          *&ptr[2] == v19) )
    {
    // ^---------------------------------- Voodoo magic ----------------------------------^
        
      fclose(v8);
      fpTempFile = tmpfile();
      fwrite(v13, ptr[1], 1uLL, fpTempFile);       // Write decrypted contents to tempfile
      rewind(fpTempFile);
      free(v13);
      free(v11);
      if ( file_handle->type == 1 )
        fclose(file_handle->handle.fp);
      file_handle->handle.fp = fpTempFile;         // Replace file_handle's fp with the one to tempfile
      file_handle->type = 1;
    }
    else
    {
      free(v13);
      free(v11);
    }
FUNC_before_END:
    v5 = original_compile_file(file_handle, type); // call the original zend_compile_file on the modified file_handle
    goto FUNC_END;
  }
  fclose(v8);
  v5 = original_compile_file(file_handle, type);
FUNC_END:
  if ( v26 == __readfsqword(0x28u) )
    return v5;
  else
    return get_module();
}

TL;DR: the content of each accessed PHP file is decrypted into a temporary file that is then returned to the user (before the temp file is finally removed).

Instead of reversing the encryption algorithm, we can simply “trace” (log) all calls to the read syscall to see the contents of the temp files as it is being read. We could also trace the write syscall or even hook the unlink function to stop it from deleting the temp files.

Before doing this, we must set up a local environment.

Ptracing in Docker

To recreate the closest environment possible to the one on the remote server, we can use an addon like Wappalyzer to discover what services are running (and their corresponding versions).

From this information, we can ask ChatGPT to write a Dockerfile and install a few practical tools (gdb+gef and strace).

# docker build --tag thcon-codelocker .
FROM php:8.1-apache

# Update and install necessary packages
RUN apt-get update && \
    apt-get install -y libssl-dev python3 gdb strace && \
    apt-get clean && \
    bash -c "$(curl -fsSL https://gef.blah.cat/sh)"

# Copy the custom extension into the container
COPY php_codelocker.so /usr/local/lib/php/extensions/no-debug-non-zts-20210902

# Add configuration for the extension
RUN echo "extension=php_codelocker.so">/usr/local/etc/php/conf.d/php_codelocker.ini

# Copy the HTML and PHP files into the container
COPY html/ /var/www/html/

# Start PHP
CMD ["php", "-S", "0.0.0.0:80", "-t", "/var/www/html"]

Once the image is built we can run a container and exec into it.

/!\ The error attach: ptrace(PTRACE_SEIZE, 1): Operation not permitted stems from the fact that the container doesn’t have the “capabilities” to ptrace, this is remediated by running the container with the flag: --cap-add=SYS_PTRACE.

# In a first terminal
docker run --rm -d --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
      -p9001:80 --name thcon-codelocker thcon-codelocker:latest

docker exec -it thcon-codelocker bash
strace -e trace=read -s 100000000 -p $(pgrep php)

# In a second terminal
curl localhost:9001/file_to_recover.php

We can see two calls to read, the first one reads the (encrypted) content of the source files and the second one reads the (decrypted) content of the temp file.

Below are the files of interest to us:

// config.php
<?php
define('FLAG', 'THCon23{flag_will_be_there_remotely}'); // :( aww
// login.php
<?php 
session_start();

$username = $_POST['username'] ?? NULL;
$password = $_POST['password'] ?? NULL;

if ($username && $password) {
    // :) yaay !
    if ($username == "thc0n_4dm1n" && md5($password) == "61dc46bf08248bb7d5878a6fe655bbe7") {
        $_SESSION['logged'] = true;
    } else {
        $_SESSION['errors'] = ['login' => 'Invalid credentials'];    
    }
}

header('Location: /');

After looking up the hash on crackstation.net, we recover the password: “unicorn1337”. All that’s left to do is to connect to the remote server to get the flag!

Bonus round: phpBolt

The practice of encrypting PHP source code before publishing it isn’t new by any means, and there already exist quite a few solutions that do just that (for example: ionCube and phpbolt).

After the CTF ended, I wanted to have a look at other “protectors” to see if the same methodology would also work on them. Phpbolt being free (although the source code isn’t available), that’s what I went with. The author also has a github repo containing a crack-it.php, how nice of him :)

// phpBolt/fun/crack-it.php
<?php bolt_decrypt( __FILE__ , '123abc'); return 0;
    ##!!!##+/rc+PssJCzcxsYzJCUoIeQwLjEh5TfG3Nzc3CEfJCvc4w8wHS4w3N3d4/fG3Nzc3B4uIR0n98Y5xsYhHyQr3OP/Lh0fJ9wqKzPc6dwGMS8w3CIrLtwiMSrj9w==

The Docker setup stayed the same, however, the reverse engineering part did change quite a bit:

  • Instead of hooking zend_compile_file, the extension exports multiple functions (most notably zif_bolt_decrypt).
  • Temporary files aren’t used so tracing read/write syscalls won’t amount to anything. We can however trace all calls to the memcpy function to achieve a similar result to what we had before.

memcpy is a library function, not a syscall, so instead of strace we’ll use ltrace. The command syntax is the same:

ltrace -e memcpy -s 100000000 -p $(pgrep php)

Welp, this crackme wasn’t very hard :p

?> <?php

while(true){
    echo 'Start !!';
    break;
}

echo 'Crack now - Just for fun';