header image

Blog

AWS S3 UTF-8 Content Disposition

|

Have you ever tried to make the download filename of an AWS S3 object be in Korean? No…? Being a developer who, until recently, had only written in English, I hadn't either. However, it turns out using some UTF-8 characters in S3 object request parameters can lead to some interesting errors.

How to Change the Filename of a Download

When downloading a file directly from the browser using a HTML anchor tag, the download=<filename> attribute can be used. However, the attribute will only work for same-origin URLs and the blob: and data: schemes. For example:

<a href="https://example.com/image.png" download="painting.png">Download painting<a/>

For situations where the application CDN (CloudFront) is not set up with the same origin or when download request are not made directly by HTML link, such as when clicking a link in an email, the Content-Disposition header can be used to set the download filename. The header is returned in the HTTP response and indicates to the client the filename and whether the content is expected to be displayed inline or downloaded as an attachment.

The Content-Disposition header takes the form:

Content-Disposition: attachment; filename=painting.png;

Setting Content-Disposition in AWS S3 HTTP Responses

S3 provides multiple ways to set the Content-Disposition header of an object being downloaded, two of the main ways are:

  • Set Content Disposition parameter on upload - works for new objects
  • Set response-content-disposition parameter in request - works for an existing object however requires a signed URL

In this blog post we will look at setting the response-content-disposition parameter in request, which has the following syntax (excluding signed request parameters):

GET /<object-key>?response-content-disposition=<content-disposition>&response-content-type=<content-type>

Sample Repository

You can test out a request with the above syntax by running the first example in the sample repository I’ve made, provided here. The sample repository includes a basic AWS S3 infrastructure with CloudFront CDN for signed URLs, along with some scripts for generating the URLs outlined in this post. The readme has details on how to set it up and run the examples.

The object uploaded to S3 during the infrastructure creation is an image of my cat, Jessie, however the object will be named 'not-a-cat', so you will know if the Content-Disposition HTTP header has worked correctly.

The first example request uses:

  • content disposition: attachment; filename=jessie.jpg;
  • content type: image/jpg

The Problem with UTF-8 in the 'response-content-disposition' Parameter

Let's try setting the response-content-disposition parameter to the Korean translation of cat, which is "고양이.jpg".

response-content-disposition=attachment; filename=고양이.jpg;

However, setting the response-content-disposition parameter filename in the S3 request as above won't work.

Firstly, as per RFC2396 2.4 all data must be escaped to a printable unreserved character of the US-ASCII coded character set. Therefore, the URL must be URL encoded, converting any non-valid characters into US-ASCII.

Additionally, UTF-8 encoded characters, such as those in the Korean alphabet, are not typically allowed in HTTP header fields. This is due to the HTTP specification only allowing field content with the text in ISO-8859-1, see RFC-7230 3.2.4. While most modern browsers and application allow the use of non-ISO-8859-1 characters in HTTP fields, some do not. For example, support for UTF-8 headers was only recently added to the Puma application server, see the changelog here.

Fortunately, as described in RFC6266, the Content-Disposition field allows for an additional filename* parameter which uses the encoding defined in RFC8187, allowing for the use of characters not present in the ISO-8859-1 character set.

So, to allow for backwards combability with legacy applications, a pure ISO-8859-1 filename parameter must be supplied, along with UTF-8 encoded filename* parameter. Leading to Content-Disposition field being described as:

Content-Disposition: attachment; filename=jessie.jpg; filename*= UTF-8''고양이.jpg;

When the S3 object request is URL encoded, see example 2 in the sample repository, the request will look like:

GET https://example.cloudfront.net/not-at-cat.jpg?response-content-disposition=attachment%3B+filename%3D%22jessie.jpg%22%3B+filename%2A%3D+UTF-8%27%27%22%EA%B3%A0%EC%96%91%EC%9D%B4.jpg%22

However, performing this request results in an error from S3 😭

AWS S3 UTF-8 InvalidArgument error
AWS S3 UTF-8 InvalidArgument error

The problem here is that S3 decodes the URL encoded URL, then attempts to return the value of the response-content-disposition parameter in the HTTP response Content-Disposition header field. But, since the filename is no longer URL encoded and contained within ISO-8859-1 character set, AWS S3 throws an error.

How to Set a UTF-8 in the 'response-content-disposition' Parameter

The solution to this issue is rather simple, URL encode the UTF-8 filename twice in the request. However, this took me a fair bit of debugging to arrive at, hence this blog post to save you time if you come across the same issue.

The reason for encoding the UTF-8 filename twice, is to ensure the Content-Disposition handled by S3 will remain in the US-ASCII character set and therefore the ISO-8859-1 character set. It will then be decoded into UTF-8 by the browser on receipt.

Following on from the examples above, example 3 encodes only the UTF-8 filename* parameter twice using the following Ruby code:

require 'cgi'

filename_legacy = "jessie.jpg"
filename_utf8 = "고양이.jpg"
filename_utf8_url_encoded = CGI.escape(filename_utf8)
content_disposition = CGI.escape("attachment; filename=#{filename_legacy}; filename*= UTF-8''#{filename_utf8_url_encoded};")

Which results in the URL:

GET https://example.cloudfront.net/not-at-cat.jpg?response-content-disposition=attachment%3B+filename%3Djessie.jpg%3B+filename%2A%3D+UTF-8%27%27%25EA%25B3%25A0%25EC%2596%2591%25EC%259D%25B4.jpg%3B&response-content-type=image/jpeg

Performing the above request will result in a file with the desired filename being downloaded, whether the browser is new or legacy.

Successful download with UTF-8 filename
Successful download with UTF-8 filename