Regular expression for Amazon S3 URL

Hello Everyone,

We added support for Amazon S3 storage services recently to Hoopoe. Following the previous article with our general account details, we wanted to share with you a regular expression we use for validating S3 URL as sources of data and files.

You may find more information about S3 naming conventions and requirements in the manuals available from http://aws.amazon.com/s3.

When submitting a task to Hoopoe with input/output sources from Amazon S3, one must specify the S3 URL of the resource. A simple format for a resource can be:
https://test-bucket.s3.amazonaws.com/dir1/input.bin.
With this example, the bucket of the user storing the object is called “test-bucket“, and the file for input is “dir1/input.bin”, called the key of the object (in the bucket).

This is a general form for S3 URL to make them accessible over the internet.

Regular Expression

We are using a regular expression to validate all Amazon S3 URLs with submitted tasks to Hoopoe.

In .NET (and general) manners, the RegEx is:
https://[a-z0-9][a-z0-9-.]*.s3.amazonaws.com/[w][wW]*

As you may see, the following limitations exist:

  1. For DNS compatibility, bucket names must be lower case and start with a letter or number
  2. In S3, and following DNS limitations, bucket names should not exceed 63 characters in length
  3. Object keys can be of variable length, must start with a valid character but can follow with other possible characters, also to denote paths (a file named: “dir/input.bin” is located under “dir” directory)
  4. In addition to the above, Hoopoe restricts S3 URL to be up to 256 characters in length

In case you find a mistake in the regular expression definition, whether possible URLs do not fit or it is permissive, please send us an email.
We also hope you may find this information useful for your own purposes.