a state of limbo: the html5 file api, filereader, and blobs

HTML5, the emerging web standards specified by the W3C consortium, present many powerful new extensions to how the internet is accessed, displayed, and interacted with by browsers. One of the most interesting capabilities within the specification is the File API, which allows users to have a richer experience that removes some hurdles from moving files between one's own computer and the internet. An example of this hurdle is the inability for current browsers to handle dragging-and-dropping of files straight into the browser window -- something that is tremendously useful for online photo editors, document managers, and email applications. Thankfully, with the new standards, there is a way to handle these dragging events and read into the files.

Drag and Drop in Gmail

Gmail has recently incorporated this in order to allow people to drag and drop attachments onto messages (even multiple messages):

image

In short, the dropevent now has a property called dataTransfer, which itself has a property called fileswhich returns a list of files that were dropped. (This works in parallel with the traditional way of selecting files, through a dialog). The full web application file tutorial can be found at the Mozilla Developer website.

Once the handle to a particular file is created, it is trivial to read out things like its name, size, and content type. It is also quite easy to upload the file through a POST request. A more detailed description of how Gmail in particular does their uploading can be read is also available.

What else is possible with files and HTML5 besides drag and drop?

This is where the new File API really shines (at least in the specification): Once this file pointer is available, there is a new Class called FileReader which allows the file to be read in to memory asynchronously, meaning that once the file is selected, it will load in into memory as binary data (or as a String, or as a DataURI) in the background, not holding up anything else, and merely sends a message once the file is in memory. From there, it is possible to POST the contents to a server to complete the upload. But that isn't really thatcool.

What is truly a novelty is the Blob interface, which is a representation of a slice of a file. File, then, is on top of Blob, which merely represents a blob with additional parameters such as name and content type. Thus, instead of having to read the entire file into memory, and then send the entire thing (which takes a long time if it is a large image, video, archive, etc), it becomes possible to just read a small part of the file into memory, and deal with only that small bit at a time. The method is called just that, slice,and it takes just two parameters, the starting offset and the length to be read from the file. Once a blob or file is defined it can then be read into file using FileReader. Here is an overview of this process:

image

What some benefits of this?

  1. File uploads can be resumed
  2. Memory use is drastically lower -- it is conceivable to upload files larger than 2GB even
  3. Using the new HTML5 Web Workers interface, it will be possible to parallelize reading a chunk from file into memory and from memory into the server.

Here lies the problem:

While all of this looks great in the specification, there is a problem: No currently released browser contains all of the HTML5 implemented to perform file chunking in the way it is meant to be done using the new APIs. Ironically, Firefox 3.64 has implemented the FileReader interface, without implementing the Blob interface that underlies it, while Chrome 5.0 has an implementation of Blob, but without FileReader -- kind of silly because you can't do much with those blobs. So in the end, one browser has the capacity to define subsets of files, while another browser has the capacity to actually read them into memory.

Chrome 6 to the rescue

Striving to remedy this situation, I downloaded the latest beta version of the Chrome browser, 6.0.437.3, to see what's really good.

image

Thankfully, within both Blob and FileReader implemented (although not without some small glitches and non-conformance to specification) -- which allows for really clean chunked file access.

Update: Firefox and Chrome and other browsers now have better support for the problems mentioned above