Application security is now more than ever, paramount to an organisation’s security posture. Verizon’s 2020 Data Breach Investigation Report found that 43% of security breaches originated from web applications, doubling the number of compromises from 2019.
Serious application security flaws which often result in the exposure of sensitive data and underlying infrastructure such as SQL injection, code execution and command injection are (or should be) well known by security engineers, developers, and management personnel. However, subtler flaws which are less disclosed and publicised such as those present in HTML to PDF document conversion functions can have a serious impact to the security of an application.
For applications which need to export various datasets into universally excepted formats—which is pretty much all modern applications, common frameworks and open-source libraries are relied upon. For instance, an application may need to convert a series of data inputs provided by a user via a common HTML form to PDF in order to then send to their client(s)/customer(s). The potential security impacts of this example where HTML data is converted into a PDF document is the focus of this article.
Application flaws resulting from HTML to PDF (and other) conversion functions are often overlooked as they are mistakenly viewed by some as benign and relatively trivial functionalities. However, when the libraries handling the underlying data conversion do not adequately sanitise input data, attackers can successfully cause Cross-Site-Scripting (XSS), Server-Side Request Forgery (SSRF) and in some cases code execution.
It is likely that whenever a user clicks ‘Export to PDF’ in your application—form data previously supplied to the application by a user will be converted into a PDF document. Another common scenario is when raw HTML is captured as part of a WYSIWYG editor. The formatted content captured by the editor is usually sent as HTML markup in the request body, for example:
If a user were to complete the ‘messages’ field as outline above. The resulting request to the application server would often look like the following:
More broadly, applications are expecting to convert simple strings of text or HTML markup to a PDF format but fail to adequately sanitise user input.
So how can attackers compromise AWS tenancies via vulnerable PDF export functionalities? Well, in a nutshell—if an attacker managed to supply the application HTML/JavaScript which is subsequently parsed when exporting to a PDF document, the underlying JavaScript code will execute on the server side, which depending on the given payload could result in Server-Side Request Forgery (SSRF). Server-Side Request Forgery (SSRF) is a data validation flaw resulting in arbitrary requests (attacker controlled) being made by an application to retrieve a resource at a separate domain. A common attack vector for SSRF flaws in AWS is to access the AWS instance metadata service and obtain temporary credentials for the underlying EC2 instance running the application.
Aurian conducted two recent application penetration tests in which vulnerable HTML to PDF libraries were used. Both vulnerable applications happen to be held in Amazon Web Services (AWS) and as such, Aurian executed the attack path outlined above and successfully compromised the underlying EC2 infrastructure.
To identify and exploit vulnerable HTML to PDF converters, Aurian relies on the following testing methodology:
<b>testing123</b>
is provided and the subsequent PDF export has ‘testing123’ in bold typeface, this is a good indication of potential problems. The same should apply to a similar payload such as <script>document.write(testing123)</script>
.<iframe src=http://[aws metadata service]>
<img src=x onerror="location.href='http://[aws meta dataservice]'">
<link rel=attachment href="http://[aws metadata service]">
<object data="http://[aws metadata service]">
<portal src="http://[aws metadata service]" id=SSRF>
Assuming a successful payload is used, temporary credentials from the AWS instance metadata service can be retrieved. Commonly, savvy attackers will enumerate the underlying permissions assigned to the EC2 instance role and subsequently access further AWS resources enforced by the given permission policy.
Using the methodology above, of the two vulnerable applications tested by Aurian, application A was found to be using the Prince10 PDF library. Using similar payloads to the ones shown above, Arian successfully received temporary credentials via the metadata service. Application B was running the PD4ML PDF library. None of the above payloads seemed to result in successful execution.
After further research on the PD4ML library and reading existing security research, Aurian consultants used a customised HTML tag which is uniquely supported by the PD4ML library. Namely, the <pd4ml:attachment>
tag which allows a user to add attachments to a PDF document. An example payload would be:
<pd4ml:attachment description="attachment.txt" icon="something">file:///etc/passwd</pd4ml:attachment>
Aurian used this tag to attach sensitive local files such as SSH keys and database/application configuration files.
Many commonly used PDF libraries allow unrestricted JavaScript execution by default, including the following:
The implications of an attacker successfully exploiting a vulnerable HTML to PDF library can range from information disclosure to the complete compromise of the application and underlying AWS cloud tenancy. Aurian recommends organisations consider the following: