Thursday, April 5, 2012

Connecting Amazon S3 to Orion

Amazon S3 is a popular cloud file hosting service. For months people have been complaining that there was no way to store your Orion work — that is to say, your files and folders — in S3 so you'd have a fast, reliable, place to store your data in the cloud. (Well OK, these complaints were all from one particular Orion committer, but whatever.)

In any case, I finally opted to write a plugin for Orion that lets you use S3 as a file store. This post talks about my experiences. If you'd rather skip right to the code, go here:

The S3 Worldview

In the S3 storage model, an object is the basic unit of storage. Objects are resources that you can refer to and manipulate using a REST API. Objects live in buckets, which are flat containers that can hold many objects (sometimes millions). As an S3 user, you'll generally create one or more buckets to hold your data. S3 also supports powerful security policies for granting bucket access to other users, but that doesn't concern us here.

Some S3 concepts.

The Orion File Model

Orion's file support is based on the concept of filesystems. The model here is decidedly more traditional than S3's:

Orion file concepts.

As the diagram tries to show, a filesystem is a place accessible to Orion where files can live. At the top level of a filesystem is the workspace. Within a workspace are folders and files. Naturally, folders can contain files and other folders, nested to an arbitrary depth.

NOTE: The file API is still evolving: see the Client API documentation for details.

From S3 to Orion

The important thing to note about S3 is that the bucket is the only unit of containment. Every object is contained by one and only one bucket. There is no containment relationship between objects. Initially this might seem like it limits us to a filesystem that is simply a huge, flat list of files. But that's not the case: a hierarchical view can be imposed on a properly-structured bucket. I won't go into detail about how this is done (read the AWS documentation if you're interested). The upshot is that we can indeed implement the Orion filesystem concepts on top of S3 — although some operations will necessarily be more complex (and likely slower) than if we were connecting to a true hierarchical back end.

Extending Orion's File Support

To connect S3 as an Orion filesystem, we need to write a plugin that contributes an implementation of a service named orion.core.file. Various parts of the Orion client UI delegate to implementors of this service. For example, the Orion navigator relies on the fetchChildren() operation, so implementing this function immediately allows the Orion navigator to descend into your filesystem and display child items as nodes in the navigation tree. Implementing read() and write() allows the Orion editor to open and save files that live in your file system. There's also file and folder creation, move, copy, rename, delete, etc. The more of these features you implement, the smoother the integration into Orion will be.

fetchChildren: function(location) loadWorkspace: function(location) createFolder: function(parentLocation, folderName) createFile: function(parentLocation, fileName) deleteFile: function(location) moveFile: function(sourceLocation, targetLocation, name) copyFile: function(sourceLocation, targetLocation) read: function(location, isMetadata) write: function(location, contents, args) remoteImport: function(targetLocation, options) remoteExport: function(sourceLocation, options) search: function(location, query)
List of orion.core.file API methods.

Domain Topology

For maximum portability, we want our implementation of the FileClient API to live purely on the client side: no server-side proxies or other hacks allowed. This means all our logic for talking to S3 will run as client-side JavaScript in a browser. Our code will make Ajax requests using the S3 REST API to manipulate S3 objects.

This imposes one important limitation: the web browser's same origin policy. Essentially the plugin (web page) hosting our S3 file service must have the same origin (protocol + host + port) as the target of all its Ajax requests. So does S3 let us configure a bucket to satisfy this limitation? The answer appears to be no. You can't have, say, a particular path prefix in your bucket configured to host a static website, while other prefixes respond to S3 API calls. A bucket can be configured either to host a static website, or as an S3 endpoint, but not both.

OK. Can we instead use two separate buckets to overcome this limitation? Yes! Conveniently, S3 allows you to access objects using 3 different URL styles:

  1. http://s3.amazonaws.com/bucket/key
  2. http://bucket.s3.amazonaws.com/key
  3. http://bucket/key
    (where bucket is a DNS CNAME record pointing to bucket.s3.amazonaws.com)

Notice that while URLs #2 and #3 have the bucket name as part of the host name (hence tying the origin to a single bucket), URL #1 does not. Using the style of URL #1, our plugin can be hosted from one bucket (configured as a publicly-readable static website), and still be free to blast Ajax REST requests at a second bucket (configured as a restricted-access S3 endpoint). Both buckets are accessed from the same domain, s3.amazonaws.com, so the browser won't complain about same-origin violations. Below is a diagram showing how this arrangement works out:

High-level view of the architecture.

In the diagram above, Bucket 1 is publicly-readable. Thus the Orion plugin registry can access and load the plugin without worrying about how to authenticate itself. By contrast, we've restricted access to Bucket 2 to just our S3 user account, thus the Ajax requests that are sent its way must be authenticated. The next section explains further. (Also note: I've drawn the Orion client UI in red since access it usually requires some kind of authentication to access. That's not a requirement, however: you can deploy Orion in all sorts of less-secure ways.)

Security

Keeping all the code on the client is convenient, but presents some security challenges. To modify the contents of Bucket 2, all our REST requests must be properly authenticated. Authentication requires three pieces of information: the request, our Access Key (public key), and our Secret Access Key (private key).

So the plugin needs to know how to authenticate requests, and to authenticate a request it needs your private key. This exposes a limitation in the existing Orion plugin/service framework, which is that there's no secure way to do this. (More generally, there's no way for a plugin to request access to a user setting from the Orion core… but that's a larger issue.)

My somewhat-inadvisable temporary solution is this: when the plugin requires your keys, it simply opens a JavaScript prompt asking you to paste them in. It also asks for a "passphrase", which it uses to encrypt the keys. The encrypted keys are kept in localStorage, which saves you the trouble of typing them in every time. The passphrase is not persisted, and gets squirreled away inside a closure variable, which vanishes when you close the browser tab.

The upshot is this: if you close and reopen your browser, the plugin will find the cached keys in local storage, and prompt only for the passphrase so it can decrypt the keys and carry on.

Now, obviously asking people to paste secret information into a web page is quite dubious. On top of that, storing sensitive information (albeit encrypted) in a localStorage database for a domain that you don't fully control (here, s3.amazonaws.com) is also a very bad idea. Finally, the usability suffers because the keys are tied to your browser, not your Orion user settings. (Hence, if you use a different web browser, you'll have to paste in your keys all over again). So please regard this as a starting point for future work around security and authentication in Orion, not a realistic long-term solution!

The End Result

If the disclaimer in the previous section hasn't scared you off, you can set up the plugin (see its GitHub readme for installation instructions). It lets you do basic file operations against your S3 bucket, and performs pretty well.

Left: Orion showing our plugin in action. Right: The S3 console showing the same contents.

Future Directions

Storing any sensitive data in localStorage is not a good approach. The naive solution might be to move your private credentials into the Orion preference store, where access to them could at least be managed somehow (perhaps on a per-plugin basis). But while security is one of Orion's concerns, becoming a secure-storage provider is not. In fact, it's probably better to compartmentalize sensitive information so that the Orion core never actually sees your private credentials.

So how do we accomplish that? One approach would be for the Orion core to simply establish a channel between an signing requester and a signing provider. The requester in this case would be our S3 plugin, and the provider could be any Orion-connected entity who possesses your S3 credentials and knows how to use them to sign a given request. (We can imagine a provider linked to a secure, trusted third-party service running a cloud deployment of something like KeePass, or even a private provider that is only accessible from your local network.)

Once the provider–requester channel is established, these pieces talk only to each other. Most importantly, your private key never has to leave the signing provider. This all sounds a bit hand-wavy right now, but remember Orion lives on the web where all kind of crazy ideas are possible.

See Also