[THCon 2023] Decrypting PHP source code protected by a custom extension
Tags: #ctf #reverse engineering
2023-05-10
This blog post may look like a CTF writeup but it’s actually an excuse to talk about PHP extensions and how they work. The writeup part is on the “Codelocker” challenge from the THCON23 CTF (where we scored 3rd). At the end of the article is a real-world example using phpBolt, a freemium source code protector.
Challenge description and battle plan
We managed to find the THCon admin panel sources, unfortunately the source code seems to be encrypted. Find a way to access to their administration panel !
Author : voydstack (voydstack#6035)
An archive is also provided, it contains the following files:
├── html // webapp source code
│ ├── assets
│ │ └── index.css
│ ├── config.php
│ ├── index.php
│ └── login.php
├── php_codelocker.ini // config file php
└── php_codelocker.so // php custom "extension"
The files are stored encrypted and are seemingly decrypted before being served to the user.
$ xxd html/login.php
00000000: 01c0 dec0 7d01 0000 ed80 576c 4063 1ddc ....}.....Wl@c..
00000010: 661d b19f 5a9d 94c5 f58e 029b d234 09e0 f...Z........4..
00000020: 48d4 a60a aa1d e6e3 f449 6e7f ab9b 85b8 H........In.....
00000030: cf8c 9e90 a490 4e5a 9ac2 08a9 a531 84d7 ......NZ.....1..
Since we don’t have access to the authentication mechanism (and thus, the flag it prevents us from accessing), we’ll have to reverse engineer the PHP “codelocker” extension to understand how the source code is encrypted.
Reverse engineering
PHP extensions
The code has been commented and the functions renamed for the sake of clarity. An extensive documentation on PHP extensions can be found here : phpinternalsbook - extensions_design.
While exploring the call-graph, we find the following function that hints us about how PHP extensions work.
PHP extensions offer the possibility to “hook” certain of its internal functions. To “hook” describes the action of replacing (in memory) the target functions address with the address of a function that we control (generally to override the functions default logic or to simply log all arguments passed to it).
Importing structs
Let’s examine custom_compile_file
a bit closer.
__int64 __fastcall sub_13C0(__int64 a1, unsigned int a2)
{
// removed for brevity
if ( !a1 )
goto LABEL_4;
v4 = *(a1 + 40);
if ( !v4 )
goto LABEL_4;
if ( strstr((v4 + 24), ".phar") )
goto LABEL_4;
v5 = strstr((v4 + 24), "phar://");
// ...
v7 = zend_fopen(*(a1 + 40), a1 + 48);
// ...
At first glance the code doesn’t seem very comprehensible, we can see that the parameter a1
seems to be a struct
(the pattern to look for is a1+const
where const is the offset of a member of the struct
). This is confirmed by looking at the definition of the zend_compile_file
function inside PHP’s source code.
extern ZEND_API zend_op_array *(*zend_compile_file)(zend_file_handle *file_handle, int type);
The function zend_compile_file
takes as first parameter a zend_file_handle
struct
(that also contains other struct
s).
typedef struct _zend_file_handle {
union {
FILE *fp;
zend_stream stream;
} handle;
zend_string *filename;
zend_string *opened_path;
uint8_t type; /* packed zend_stream_type */
bool primary_script;
bool in_list; /* added into CG(open_file) */
char *buf;
size_t len;
} zend_file_handle;
To make reverse engineering easier, we’re going to import them into IDA.
IDA offers the possibility to construct struct
s from user supplied C/C++
header files (File -> Load File -> Parse C Header File ou CTRL+F9
). IDA automatically tries to resolve dependencies between the different headers but this is error-prone on large codebases. An other option is to simply copy-paste each struct (and their dependencies) into a new .h
(which is what I did). Once that’s done, we can simply left click a1
and Convert to struct * -> "zend_file_handle"
.
After creating the struct
s and renaming the different variables the code has become far more readable. All that’s left now is to understand what it does.
char *__fastcall custom_compile_file(zend_file_handle *file_handle, zend_stream_type type)
{
// removed for brevity
if ( !file_handle ) // Exit if file_handle is NULL
goto FUNC_before_END;
filename = file_handle->filename;
if ( !filename ) // Exit if file_handle->filename is NULL
goto FUNC_before_END;
if ( strstr(filename->val, ".phar") ) // Exit if the file is a phar archive
goto FUNC_before_END;
v5 = strstr(filename->val, "phar://"); // Exit if the file is a phar archive
if ( v5 )
goto FUNC_before_END;
if ( zend_is_executing() )
{
if ( get_active_function_name() )
{
active_function_name = get_active_function_name();
strncpy(dest, active_function_name, 0x3EuLL);
if ( dest[0] )
{
// ANTIDEBUG: Exit if caller function is "show_source" or "highlight_file"
if ( !strcasecmp(dest, "show_source") || !strcasecmp(dest, "highlight_file") )
goto FUNC_END;
}
}
}
// open the file
bOpenSuccess = zend_fopen(file_handle->filename, &file_handle->opened_path);
v8 = bOpenSuccess;
if ( !bOpenSuccess )
goto FUNC_before_END;
// check file format (starts with 0xC0DEC001 in LE)
if ( fread(ptr, 0x18uLL, 1uLL, bOpenSuccess) == 1 && ptr[0] == 0xC0DEC001 )
{
file_basename = basename(file_handle->filename->val);
memset(buff_192, 0, sizeof(buff_192));
// v---------------------------------- Voodoo magic ----------------------------------v
v10 = sub_1AA0(file_basename);
v11 = v10;
v12 = v10 + 4;
do
{
v10->m128i_i32[0] ^= 0xCAFEBABE;
v10 = (v10 + 4);
}
while ( v10 != v12 );
size = ptr[1];
v13 = malloc(ptr[1]);
if ( fread(v13, size, 1uLL, v8) == 1
&& (sub_1CD0(buff_192, v11),
v21 = _mm_loadu_si128(&ptr[2]),
sub_1D10(buff_192, &v21),
sub_1D20(buff_192, v13, ptr[1]),
sub_1720(v18),
sub_1890(v18, v13, ptr[1]),
sub_1990(v18),
*&ptr[2] == v19) )
{
// ^---------------------------------- Voodoo magic ----------------------------------^
fclose(v8);
fpTempFile = tmpfile();
fwrite(v13, ptr[1], 1uLL, fpTempFile); // Write decrypted contents to tempfile
rewind(fpTempFile);
free(v13);
free(v11);
if ( file_handle->type == 1 )
fclose(file_handle->handle.fp);
file_handle->handle.fp = fpTempFile; // Replace file_handle's fp with the one to tempfile
file_handle->type = 1;
}
else
{
free(v13);
free(v11);
}
FUNC_before_END:
v5 = original_compile_file(file_handle, type); // call the original zend_compile_file on the modified file_handle
goto FUNC_END;
}
fclose(v8);
v5 = original_compile_file(file_handle, type);
FUNC_END:
if ( v26 == __readfsqword(0x28u) )
return v5;
else
return get_module();
}
TL;DR: the content of each accessed PHP file is decrypted into a temporary file that is then returned to the user (before the temp file is finally removed).
Instead of reversing the encryption algorithm, we can simply “trace” (log) all calls to the read
syscall to see the contents of the temp files as it is being read. We could also trace the write
syscall or even hook the unlink
function to stop it from deleting the temp files.
Before doing this, we must set up a local environment.
Ptracing in Docker
To recreate the closest environment possible to the one on the remote server, we can use an addon like Wappalyzer to discover what services are running (and their corresponding versions).
From this information, we can ask ChatGPT to write a Dockerfile and install a few practical tools (gdb+gef
and strace
).
# docker build --tag thcon-codelocker .
FROM php:8.1-apache
# Update and install necessary packages
RUN apt-get update && \
apt-get install -y libssl-dev python3 gdb strace && \
apt-get clean && \
bash -c "$(curl -fsSL https://gef.blah.cat/sh)"
# Copy the custom extension into the container
COPY php_codelocker.so /usr/local/lib/php/extensions/no-debug-non-zts-20210902
# Add configuration for the extension
RUN echo "extension=php_codelocker.so">/usr/local/etc/php/conf.d/php_codelocker.ini
# Copy the HTML and PHP files into the container
COPY html/ /var/www/html/
# Start PHP
CMD ["php", "-S", "0.0.0.0:80", "-t", "/var/www/html"]
Once the image is built we can run a container and exec into it.
/!\ The error
attach: ptrace(PTRACE_SEIZE, 1): Operation not permitted
stems from the fact that the container doesn’t have the “capabilities” to ptrace, this is remediated by running the container with the flag:--cap-add=SYS_PTRACE
.
# In a first terminal
docker run --rm -d --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
-p9001:80 --name thcon-codelocker thcon-codelocker:latest
docker exec -it thcon-codelocker bash
strace -e trace=read -s 100000000 -p $(pgrep php)
# In a second terminal
curl localhost:9001/file_to_recover.php
We can see two calls to read
, the first one reads the (encrypted) content of the source files and the second one reads the (decrypted) content of the temp file.
Below are the files of interest to us:
// config.php
<?php
define('FLAG', 'THCon23{flag_will_be_there_remotely}'); // :( aww
// login.php
<?php
session_start();
$username = $_POST['username'] ?? NULL;
$password = $_POST['password'] ?? NULL;
if ($username && $password) {
// :) yaay !
if ($username == "thc0n_4dm1n" && md5($password) == "61dc46bf08248bb7d5878a6fe655bbe7") {
$_SESSION['logged'] = true;
} else {
$_SESSION['errors'] = ['login' => 'Invalid credentials'];
}
}
header('Location: /');
After looking up the hash on crackstation.net, we recover the password: “unicorn1337”. All that’s left to do is to connect to the remote server to get the flag!
Bonus round: phpBolt
The practice of encrypting PHP source code before publishing it isn’t new by any means, and there already exist quite a few solutions that do just that (for example: ionCube and phpbolt).
After the CTF ended, I wanted to have a look at other “protectors” to see if the same methodology would also work on them. Phpbolt being free (although the source code isn’t available), that’s what I went with. The author also has a github repo containing a crack-it.php
, how nice of him :)
// phpBolt/fun/crack-it.php
<?php bolt_decrypt( __FILE__ , '123abc'); return 0;
##!!!##+/rc+PssJCzcxsYzJCUoIeQwLjEh5TfG3Nzc3CEfJCvc4w8wHS4w3N3d4/fG3Nzc3B4uIR0n98Y5xsYhHyQr3OP/Lh0fJ9wqKzPc6dwGMS8w3CIrLtwiMSrj9w==
The Docker setup stayed the same, however, the reverse engineering part did change quite a bit:
- Instead of hooking
zend_compile_file
, the extension exports multiple functions (most notablyzif_bolt_decrypt
). - Temporary files aren’t used so tracing
read
/write
syscalls won’t amount to anything. We can however trace all calls to thememcpy
function to achieve a similar result to what we had before.
memcpy
is a library function, not a syscall, so instead of strace
we’ll use ltrace
. The command syntax is the same:
ltrace -e memcpy -s 100000000 -p $(pgrep php)
Welp, this crackme wasn’t very hard :p
?> <?php
while(true){
echo 'Start !!';
break;
}
echo 'Crack now - Just for fun';