state transitions: simple and complex

2011-03-09 @ 08:55#

as i work through the final edits on my ALPS project, i am reminded of a number of "media-type design" aspects worth discussing. in this post, i'll concentrate on just one of those items: state transition "types."

yeah, this seems kinda boring, but i think it is important. as i explore the ways in which medis type definitions (fail to) support expressing hypermedia factors i find i am spending more time understanding the role that hypermedia plays in enabling state transitions.

it's easy to get bogged down in what state transitions "are", how they "work", and their role in hypermedia and the Web in general. i'm going to skip over all that here[grin]. instead, i want to focus on some of the concrete details of state transitions for hypermedia and why i think identifying them is critical for building robust distributed applications.

a state of mind

in general, one of the tasks in designing a media type (or a semantic profile for an existing media type) is to create ways to express state transitions within the responses. so much work today fails to account for this important task. instead most Web implementations simply serialize internal objects in a common format (XML, JSON, etc.) and the result is an "empty" response. empty in the sense that no hypermedia information is included within the response; only raw data. that means it is up to client code to determine the next transition to "express" for the human (as UI elements). this leads to a set of responses that are 'captive' of the client to which they are sent; it leads to 'dumb' data for 'smart' clients.

however, in order to implement Web solutions that do not rely on smart clients (usually in the form of custom Javascript, etc.) in order to "know" which step to take next, standard state transition information (in the form of hypermedia elements) must be designed into the media type used in responses. for example, HTML has a very simple set of state transition controls that every Web developer knows well: the A element and the FORM element. and these two elements represent the two types of state transitions with which all media type designs must contend: simple and complex. the very topic i want to discuss (how convenient!)

it is worth noting that the mere appearance of A and FORM elements in a response does not ensure that the representation is using them for state-transition purposes. that's another topic which i will avoid here. for now, i will address the ways in which media type provide state transition elements; not how they are actually used in representations.


simple state transitions are those that require no additional inputs from the user agent; they can simply be "activated" in order to change the current application state.

by "current application state" i mean the condition of the application currently experienced by the user. IOW, application state (in the way i am using the phrase) is the temporal state as viewed from the "client" or user agent.

the HTML A (anchor) element supports simple state transitions. the transition could be navigating to a new "page" (<a href="/home.html">Home</a>) or even executing a pre-defined filtering operation on the server (<a href="/clients?balance=unpaid">Clients w/ outstanding balances</a>).

since simple state transitions carry no additional inputs, this often this means that the transition is a "read-only" operation. but this need not always be the case. for example, HTTP DELETE requires no additional inputs and HTTP POST is valid with an empty entity body. however, in the case of the HTML media type, these examples of simple transitions are not supported w/o resorting to javascripted responses (i.e. Code-On-Demand). other media types have solved this problem, but i'm getting ahead a bit.

so, there will be times when simple state transitions must be represented in server responses. if the response is made using HTML, this can be done using the A (anchor) tag. if, however, the response is made using some other base format (e.g. XML, JSON, etc.) then some other element must be designated to handle the task of representing simple state transitions.


complex state transitions are those that require additional inputs from the user agent. using HTML (again), the FORM element is used to represent complex state transitions; it indicates one or more additional inputs that [may|should|must] be supplied when activating the transition.

for example a "search box" on an HTML page is a complex state transition; usually requiring a single additional input: the search value. HTML FORMs that prompt humans to enter text for posting on a micro-blog site (twitter,, etc.) also examples of complex state transitions.

note that the examples cited here represent both "read" opertions (seach) and "write" operations (posting to a micro-blog site).

again, when designing a media type (or a hypermedia profile for an existing media type) the concept of complex state transitions needs to be addressed. it's possible, of course. that some use cases require no complex state transitions (i.e. a web spider that just pulls down pages and indexes them). but any site that expects user agents to supply inputs will need to document the details of complex state transitions.

up to this point, as i describe these two types of transitions, i've used HTML for all my examples. but HTML is not the only hypermedia type in use today. how, you might ask, do other media type handle state transition details?

Atom and state transitions

the Atom format uses <link ... /> elements for state transitions. however, Atom has no native equivalent of HTML's FORM element. in fact, most transitions in Atom are of the "simple" type. the documentation for Atom identifies whether the <link ... /> element is "simple" or "complex" by delineating the meaning of the link's @rel attribute.

for example, RFC4278 documents the following rel values as indicating a simple state transition: alternate, enclosure, related, self, and via. In addition, RFC5023 documents a number additional link relations; some of them indicate complex state transitions: edit, and edit-media. There are also instructions on how to use HTTP POST against the collection URI along w/ a few more simple transition rels for paging long lists.

the exact details for how to "craft" complex transistions for the edit @rel and the collection URI are not spelled out in the document. however, there are rules on what constitues a valid entry item and most Atom user agents simply hard-code these rules into the client application and supply the resulting confirming XML block when activating these complex state transitions. this is quite different from the HTML media type which provides markup for representing any expected input elements for complex transition directly in the response.

troublesome transitions...

you can see that HTML and Atom approached the challenge of representing state transitions in responses in different ways. HTML identifies different elements for each type of state transition. Atom uses the same element, but expects clients to inspect the @rel attribute to know whether the element represents a simple or complex state transition. HTML uses elements within the response to provide detailed instructions on how to assemble a valid complex state transition. Atom relies on printed documentation to describe valid complex transitions.

this leads to a key questions you should ask yourself. for example, when representing responses using media types that contain no native state transition controls (XML, JSON, etc.), how will you identify state simple and complex transitions? how will you communicate to user agents that one or more state transitions are possible (e.g. search, pre-defined filters, creating new entities on the server, editing or deleting existing entities, etc.)?

as i mentioned at the start of this (now too long) post, i see too many cases where those responsible for designing response representations do not take the time to deal w/ these important issues. instead, the common practice is to write some prose in a document and expect user agents to just "know" not only what is possible, but also when it is valid to execute one of several pre-defined state transitions (both simple and complex). this can work if you limit your client-server interactions to mere CRUD interactions. but many implementations need more than a CRUD-y interface.

ignoring state transition representation is a mistake that leads to "empty" responses and brittle clients that are needlessly coupled to particular server implementations. any time any server decides to "change the rules" about the number of possible state transtions, the instances when any state transition is valid and/or the input elements that make up a valid complex state transtion, clients run the risk of making mistakes, confusing the results of simple transitions, crafting invalid complex transtions, etc. in some cases this means the client is "broken" and must be re-coded and re-deployed. in other cases, it means there is a possibility that exsiting clients may actually corrupt data on servers by executing state transitions improperly or "out of order."

these issues can, for the most part, be avoided w/ proper hypermedia designs and proper user agent application coding. the common Web browser has been dealing with these issues for years. there is no magic bullet and the browser does not have a monopoly on intelligent hypermedia agent implemention. everyone writing client applications for the Web should be doing essentially the same thing.

...and a challenge

so my challenge to all you Web developers out there is to "step up your game."

those responsible for writing server implementations need to start crafting hypermedia responses that include clearly identified state transtions. they should be consistent on how they describe and represent complex transitions so that clients can be coded accordingly.

and those who are writing clients should demand responses that contain well-defined state transitions. ones that user agents can recognize and understand. in cases where complex transitions are needed, clients should be able to identify them, locate and input elements, and have clear direction on how to assemble and send valid entities for these transitions.

and, finally, anyone contemplating designing new media types should be including state transition representation into their designs. stop writing message formats devoid of hypermedia information and expecting servers and clients to work out all the transitions ahead of time and encasing those transitions in hard-code.

sure, this takes some work but the solutions are out there. they may not be "simple" solutions, but when you get down it there are viable alternatives to "empty" responses that are not really "complex."

see what i did there [grin]?