Discovering RESTful Web Microservices: A Traveler’s Guide

Overview

Status: Delivered 2018-5-15 at MicroCPH 2018, Copenhagen, Denmark
Slides: PDF
Prepared Talk: NA
Video: YouTube
Audio: NA
Transcript: HTML

Transcript (Raw)

Thanks. Good to see you. My name is Mike Amundsen. So, this is how you can find me on LinkedIn, and GitHub, and Twitter, and it’s probably the best place to look because I move so often. So I travel quite a bit, as you already heard. Last year it was 40 weeks on the road, which is a lot, so I’m cutting back to maybe 35.

But because I travel so much, honestly it’s sometimes hard for me to remember. So I need your help for just a moment. I need proof that I was here. If maybe you could just wave, say hello. All right, very good. All right, I promise that’s it. I’ll put that away. Okay. Then I can look at these later and say, oh I was…oh yeah, that’s where I was.

So, as you heard, I work in a group called API Academy. I’m just very honored to work in this group of really intelligent people who are scattered all around the world, and we get to talk to each other, we get to listen to customers, we get to do events like this. And we get together and deliver lots of content. This is our website, I just encourage you to go there. I won’t say anything more about it.

I will say that the last project our team worked on together was about a little more than a year ago on a book on microservices, and if you visit this link you can download a free eBook copy of it. I think we’re working on a new book right now on API management or life cycle, so hopefully, that will come out at the end of this year as well. So it’s just a great group to be in.

So I thought I’d talk a little bit about the title of my talk, and before we get started I added these extra words to microservices, right, RESTful Web microservices. Why would I do that? Well, I add the first word because that’s me, right? I always add that word. But what does that really mean? So, you know, if we go back and we look, gee whiz, Roy kinda seems to nail microservices in this one sentence in his dissertation almost 20 years ago. Right? "We want these generality of interfaces, independent deployment. We wanna be able to put an intermediary." Gee, that seems pretty hip. Then maybe he’s got some cool ideas on that score.

And then if you go back even further, like 30 years ago, and you look at what Tim Berners-Lee was then talking about, he says, "We want this universal information system that has this generality and portability as one of its primary elements." So, again, this is kind of a lot of what we wanna talk about, right? So I find that really handy, not just because it’s old, but because that means there are decade’s worth of materials that we can learn from and that we can put together.

And in fact, in some ways, we haven’t been really good at this. One of the things that Tim Berners-Lee is great at is designing a system where we can arbitrarily link information and content. What we haven’t been able to figure out is how to arbitrarily link functionality. That’s a lot harder, right? So we’ve been building a lot of RPC-style systems and other things, in order to handle the functionality part.

So the real challenge is, can we discover a way to do some of that same kind of arbitrary linking where I don’t have to ask for permission, and have a meeting, and all these other things in order to get two services to start working to each other? And I think that’s kind of what I’d like to talk about.

Now, a few years ago, I did a talk, a very short talk on mapping the API landscape, mapping this notion of how we can get things to connect up and work together. Because I think maps are a really good metaphor, a really good visual and mental image to think about. This is not a map, right? This is a list. This is maybe places to visit, maybe things to do, but I can’t tell what’s connected or what’s not connected, what the relationship are. I can’t understand distance, how long it might be or how hard it might be.

This is closer. This is a little bit more like a map. It’s also pretty old school. At least I have distance and I have some relationship, but I can’t really tell what’s there. I can’t really tell why I would go to a particular place or what’s possible to do there. One of the things that maps have are symbols. Here’s a river, here’s a stream, here’s a forum, here’s a link, here’s an action, here’s an image. There’s all sorts of things that we can tell each other ahead of time in order to use that map to solve something.

Another thing that’s amazing about maps, is maps don’t tell you what to do. They give you an opportunity to explore, and that’s another thing that we haven’t really been very good at. And I mentioned I travel a lot. This is probably my most likely commuting vehicle. This is really what happens to me on Mondays and Fridays. But I also travel the network. I travel around. You know, I parse these links and make these distance jumps. And I even add to the network, I contribute, I create new links, I build new material and create relationships.

And in fact, what I’m really doing when I do that is I’m actually programming the network itself. I’m actually programming part of the network, creating more connections, moving connections, updating them. And that’s really when we get beyond just a particular service. What we really wanna do, is we wanna program the network. And it turns out programming the network, as we heard a little bit, I think, yesterday, it’s kind of hard. It’s a little more difficult.

It’s not like programming one machine. One machine, we know we’ve got a bounded space. Time and distance doesn’t really exist for us because it’s all just in that one little space, and I can write whatever I want and hit F5 and it’s gonna work, and it’s gonna work fine. But the network is very different. There’s lots of parts of the network I don’t control, there’s lots of parts of the networks I can’t easily see, there’s lots of parts of the network that may not be there tomorrow or at least for a moment or two today. It’s much more challenging for us.

What I find interesting is, there’s a group of developers and tool makers that would like to make this disappear. They’d like to make us think that we’re still doing this. So the tools are all in one machine, and the experience is all in one machine, and all these other things happen as if it’s all in one machine. I don’t think that’s a good idea. I think it’s a much better idea to say, "I’m gonna do all of that, and I’m gonna deal with all of that directly."

Now, we’ve seen this before, right, the fallacies of distributed computing. The network’s reliable, latency is zero, all these things that we forget. We know but we forget, because they’re not really tangible to us. What I want are tools that actually remind me of this every day. If you think about it, do we know this Chaos Monkey, Chaos Army from "Netflix"? That’s what they do, right? They remind themselves every day. They create what Neal Ford calls fitness functions to test the resilience as if parts of the system have gone away, as if parts of the system are unreliable, or slow, or ineffective.

That’s the place I wanna work. I don’t wanna work in a place that doesn’t have that, because then I’m gonna be in trouble. I really love this quote from Pat Helland, "There is no simultaneity at a distance." Yeah, right? Because I think we heard about this yesterday, everything’s in the past. Even my speech, my talking is in the past. It’s a very small past, but it’s in the past. And understanding that part and understanding the distance, and what I say is time. Time as an architectural element can be incredibly powerful because now I can use time, and I can manipulate time. I can take advantage of it, right?

Pat Helland, who was working at Microsoft, went to Amazon. I think he’s now at Salesforce data, tells us all sorts of things about how time is an important element in the system. And I like this other thing. I’m gonna borrow some stuff from Michael Nygard today. I love this quote from Michael, "Bugs will happen. They must be survived."

There’s actually this concept of safety type one and two. Does anybody know this idea from Erik Hollnagel? So Erik Hollnagel is a person who deals a lot in large complicated systems. They are so complicated because they include people, not just machines. And one of the things he talks about is this idea of safety one is removing all the mistakes or taking all the bugs out of the system, and then that’s the system you have.

Safety two is adding resiliency into the system so that even when the bugs happen it still works properly. And if you think about it, safety two is what we live every day, right? We buy a car, that’s a machine, right? But nobody can guarantee us that we’re not gonna get into an accident in that car, so we need safety two features, like an airbag and belts and crumple zones and all these other things, right? Safety two gets built in. So, safety two is a really important element in all the services we build and all the things we do, because we’re on the network.

So I mentioned Michael. This is a fantastic book. If you haven’t had a chance to pick it up, this is something you definitely should do. He’s got a second edition out, it’s quite good. And Michael talks about a lot of things in there. I’m gonna borrow a handful of things from his stability section, what I call the Nygard stability patterns, and we’ll recognize some of these, right? The timeout pattern which basically says I’ve given you enough time, thanks anyway.

Circuit breaker, which means you’re a little flaky. I’m gonna give you a break and then maybe try again. Bulkhead which basically says I’m having a problem, but I don’t wanna affect everyone else in the room. Steady state, I love this one. John Allspaw had a great way of describing this. Consider your system, whatever system you’re working on, and then restart it nice and fresh, and then walk away. How long will it run if nobody ever intervenes? If nobody runs a backup, if nobody restarts the machine?

Steady state is that ability to keep maintaining your state over a period of time. So, logs, and log shipping, and restarts, and all these other things are part of maintaining steady state. Fail fast, I love this one. "Netflix" uses this one a lot as well. That is, look, I’m gonna send you…you sent me a request, and it actually had to have a budget about how long it should take. But it turns out I’m running a little slow today, so I’m not even gonna try that complicated job, I’m just gonna tell you, "No, sorry, I can’t meet it. Go ask someone else." So I love these patterns because these patterns become…they’re all about networks, they’re all about the way we communicate with each other.

Then, finally, there’s one that he calls handshaking, which in TCP/IP we have sort of ack and no ack. We have those acknowledgements before we send and after we send. But on the network, especially in HTTP, we can’t do that. We can’t really sort of confirm ahead of time and so forth, especially if we’re using TCP/IP.

So what do we do instead? We have health checks, right? We have sort of constant moments where I can check and see, you know, are we good? Are we good? Are we good? Right? So all of these become really powerful and we can think of examples of them in the kinds of services that we do.

So, I love maps. So I thought I would take us on a little bit of a journey. This is a series of drawings that were prepared for me by Alex Rivera, who’s a great artist in Kentucky who’s done some really amazing things for me in the past. About four years ago, I did a talk about a character that inhabits a sort of a hypermedia world in a magical forest, and she goes from place to place. So on the big map, part of that hypermedia tale is off to this side of the map, but we’re gonna explore some other places.

This is sort of all in the same universe if you use the Marvel movie kind of approach. We’re gonna find some new places on this map that I thought we would talk about today. So there are lots of things we won’t get to cover. I’m gonna talk about just a handful of them, about four or five of them, maybe a few more. But what, again, I love about this map is this idea is that we can sort of travel around. There’s no set order, or no set place.

I just decided to pick a few locations that we can discover, and there’s lots and lots of stories inside each. We won’t get to go too far. But we’ll start this journey just today, and we’ll see how far we can get. Of course, a great place to start in this journey about services and microservices and REST and Web is probably at the ruins of monolithia.

All these places. We’ve sort of created this terrible thing that we hate, that we call it the monolith. Somehow we used to love it about five or six years ago, but now it’s like we hate it, right? It’s a bad thing, it’s a bad word. Sort of like the word legacy, right? We all have legacy. I’m stuck with legacy, right. Listen, in my family, our legacy is important, our legacy, our company’s legacy, our family’s legacy. But somehow when we apply it to code, it’s like a bad thing.

It turns out there are lots of reasons things don’t go well for large organizations or large cities. We always wonder, gee, wonder what happened here? Well, we don’t really know for sure what happened in this particular place, but we can sort of visit around in some other ones and take a look. And what I wanna do for these first few stops on this map, is I wanna talk about code. This is kind of unusual for me. I don’t often talk about code, I don’t often talk about the inside. But I thought especially because of what we’re talking about today in microservices, and how to build them, and mount them, I thought it would be a good idea.

So we’re gonna talk about code just for a little bit. So, on this map, probably the first place we’re gonna visit is what I call the fields of purity. This is sort of a magical place where everything works just perfectly, everything works just fine, and everybody does just one thing and one thing really well. The problem is, while they’re very efficient and effective at it, they can’t remember a damn thing. From one minute to the next, they’re confused, they don’t remember. They start from the beginning, "Hello, I’m Mike. I do this one thing."

So that’s handy, but it can get frustrating and confusing. This is what we refer to as stateless microservices, the sort of services that just do a simple process like a conversion, or a translation, or a mathematical problem, or something like that. But, they don’t depend on any other service, they don’t have any storage they have to worry about, they’re fancy-free. The problem is they’re not very useful. You can’t build an entire system out of a bunch of things that have no memory, deal with no stored data, and solve just one problem over and over again.

Somebody has to aggregate that into something else. But they are handy. They are important because they have no shared state, they can easily be replaced. I can spin up a second one or a third one. I don’t have to worry about pointing them even to the same data. I can scale them up as much as I want as well. I can run four or five, six, seven behind a bulkhead and they work just fine. So if you think about it, we probably have a few of these in our designs but probably not too many.

As a matter of fact, when I was working on some training materials, I kinda had to work hard to come up with something like this, look up lists, or some of the in-memory thing that if I just destroyed it I could start again. So, typically, the stateless services look like this. I want you to convert something, so I give you the little routine and you convert the request into something else, and then I hand it back and it’s all good. And it seems like that’s pretty easy, pretty simple, right?

But it turns out things can go wrong. Let’s say this is computing compound interest over like 30, 40, 50 years, maybe it’s a complex math problem, a matrix math problem. What happens when we add a network? What happens when we put a network just into this one simple little stateless thing? Well, what if it takes too long for me to figure this out? What if it turns out there’s something going on inside my system that means this is gonna take a long time? I could screw up an entire set of operations if everybody depends on me to generate a number.

So we have this idea, one of the ideas from Michael’s patterns called fail-fast, which basically says, "Look, I know how long this takes me. You’re telling me you want it in 500 milliseconds, it takes me 750 milliseconds right now. I’m not even gonna waste my time, I’m just gonna say no. I’m just gonna fail upfront." And using the fail-fast pattern means that I won’t sort of back up a huge load of elements in my queue that eventually will frustrate the rest of the system.

Now, the downside is, everybody else is gonna figure out what to do when I say no, right? That’s the other side of the network we’ll get to in a moment. But at least this makes sure by adding a fail-fast element to the set, this makes sure that my service itself is protected. If somebody pounds me with a whole lot of requests and I get backed up, and it’s gonna take me an inordinate amount of time, I can just simply call bankruptcy, just say no. So that can be pretty handy.

There’s this other place in the map that’s called the Caves of Perseus [SP]. This is where they store stuff. And the monks in the cave are rather ingenious. When people from the outer cities come and bring their things to the monks, the monks give them a little slip of paper, and then they take it away. Nobody goes inside the cave, nobody knows where it is, or where it’s going from the outside, but all the monks do. And as the cave gets bigger and bigger, they move things from one place to the next, they reorganize all their system and shelves, but everybody’s number is still good, right.

And they can do this for a long time. These caves are very, very large. In fact, sometimes these monks never even see the light of day. They’re stuck just moving things from place to place. Sometimes they don’t even know if anyone ever comes back to look for them, they just know it’s my job to take care of data. And often, that’s what our data systems are like, right? Sometimes we think there are places where things go in but they never come out, right?

So nobody remembers. "Why do we have this data table?" "I don’t remember. It was years before I was here. We just leave it." That’s kind of the way it works. They become these places of magical storage. Now, what’s really ingenious about these characters is they never include information about how storage is arranged, right? They just give them a little identifier and they say, "When you want it we’ll bring it back." And that’s really smart. That lets them do things behind the scenes.

So persistence microservices are these services that deal with some kind of storage. It’s usually local, like maybe I just have a local disk with my system. It might be remote. But there’s I/O dependencies. Maybe it’s in my VM, or maybe it’s in one-U next to me or something like that. But it’s very, very common kind of service, these sort of data storage services. And one of the tricks, we have to be careful, I think Stefan’s talked about this in the past, is we don’t wanna end up creating what are called entity services. Services that exist only because there’s a thing to store, right. Then you have lots of caves, and you have lots of places that people have to mess with.

It’s better to have this notion of, "Listen, there’s an action here I need to retrieve, or I need to give, or I need to update, or I need to send," so we don’t have something that’s just stuck on storage alone. But often we sort of inherit these systems of record or some source of truth, there’s a mainframe sumber [SP] or some database that somebody built years ago. So we have to deal with it in some kind of way. Usually, they’re relatively easy to scale for reads, right. We know the CQRS pattern which basically makes reading easier, but it can be difficult for write.

The other big thing about storage systems, as we begin to have various storage elements that we interact with, is we’re gonna lose the ability to do cross service transactions. So instead we have to use something like Sagas. Have we heard the Sagas before? 1980s, Garcia Molina, right. So, again, this is a 30, 40-year-old idea that we get to benefit from, this notion of creating compensating transactions, right.

So, again, because network, because of what we’re doing, there are a lot of things going on. So, typically, we might have some simple operation. Like, I don’t wanna update all the orders that have been just handed to me, so I’ve got local storage and I wanna write. But of course, it doesn’t work like that, right, because of the network. So what if that takes too long? What if some dependent service doesn’t respond in time? What if the service that I wanna access is down? What if there’s storage problems and they run out of disc space or all sorts of other things?

So we have to change what we do. We have to change the code we write because we’re programming the network. So now you can see I’ve got all sorts of extra things that…I’ve got the fail-fast, which we already talked about earlier, but now I have a certain timeout which says I’m gonna wait a certain amount of time before I get an acknowledgment back, and then I’m gonna give up and say I can’t write it.

And then when I do try the write, I’m gonna actually use a circuit breaker pattern to make sure that if it doesn’t work I’ll wait a few seconds and try it again, or I’ll flip a switch and store it locally. And I’ll still pass along a time budget so that the person on the other end can deal with a fail-fast as well. So there’s all these sort of steps along the way, and all of these make it easier for us to program the network.

So I’ve got fail-fast, I’ve got time out, I’ve got circuit breaker, I’ve got steady state. You can’t really see a steady state here but I’ve got a log cleanup routine on the data server all the time. Which makes sure that I always have a certain amount of space available, so it’ll ship something along, and if it has to, it’ll even get rid of some old data in order to do that.

So another third place that we’ll visit is what I call the Scholars of Aggregato. Nice name. These are rather intelligent people. They have a sort of a marketplace, and you can go ask them a question and they’ll figure out all of the little bits and pieces that are needed to solve that problem. And they’ll go talk to folks in the caves, and they’ll go talk to folks over in the fields of purity, and they’ll quickly put together a solution for you. They’re the ones who aggregate everything together into something that’s useful.

And in fact, these are the services we really want, right? These are the powerful ones, these are the wise ones, these are the ones that solve a problem for us and hide us from a lot of the details. Challenges, they depend on other services, and those services could be far away. They could be across the globe. In some cases… I worked for a short time in a group called CCSDS which is the Computing Systems for Space Data. They deal with services that are across the solar system that they communicate back and forth with.

Think about that for a moment. There’s no event-driven work in space, all right? It’s always a message, and we’ll wait a while. We’ll see if they tell us tomorrow whether or not that actually happened, right? So we’re network dependent. Even the network itself could go down. We’re I/O dependent. This is a pretty challenging service, what we really wanna do is we’ve got a whole series of challenges to think about.

So can we run some of these in sequence? Do we have to wait until we get an answer back or could we run some of them in parallel? Can we take a bunch of these services that we’re aggregating and just make all the requests all at once and just wait, or do we have to do something step by step? We have to know that when we aggregate. Timing is everything. If I can do this in parallel, then my cost, my time budget is only gonna be the longest response time of one of the services. If I do it in sequence, I’ve gotta add it up.

It should be easy to scale but it can be very, very difficult, especially if I’m writing data because I need to make sure that I’m writing data in a safe place and I’m not doing it more than once. There’s lots of challenges. So here’s a sort of a classic internal model where I get a bunch of orders that have to be split out against a bunch of other things. But I need to just gather up those resources and make that commit and then come back.

Now I’ve got lots of challenges for the network, and that challenge list just keeps getting longer. Takes too long, doesn’t respond, service is down, overflow, somebody is unhealthy. They’re there but they’re unresponsive. What about if traffic is really spiky, what’s gonna happen? Do I have to work on, you know, backflow, so on and so forth? All of these things are network elements. So now I have more and more things that I need to add to that request, to that service.

So fail-fast, timeout, circuit breaker, steady state, handshaking, bulkhead, is this idea of actually having a series of machines, right? So I kind of have a series of machines behind a cluster. If one of them goes down, then the other one still worked. That one machine doesn’t blow up and then kind of ruin the whole system. It’s sort of like my protection against the Titanic, right? That it’s not a problem if just one of the bulkheads breaks.

So these are all real challenges. What I find really interesting, and Michael Nygard points this out in his book, is you could see from the examples the code to handle the network is more than the code to solve your problem. And some of us are sort of annoyed by that. And Michael makes a really good comment, he says, "No, that’s actually the way it should be. That’s the way it works." Because it turns out you may find half of your code, he says, "Is devoted to error handling instead of features." But that’s the essence of aiming for production, for what he calls release it.

And the other line I like, and this is so good, is because he says, "Nobody notices if your system doesn’t go down." They don’t call you up and say, "Hey, I just wanna let you know that transaction worked," hang up the phone. No, they don’t do that. It’s the one time it doesn’t work, right? And usually, it’s your boss. And usually, she’s really angry. You know, all these things sort of add up. So you wanna avoid that, you wanna get ahead of this as much as you can.

So, Michael says, "You know, if you do these things, you can sleep at night, because we’re programming the network. I’ve got this covered. We can survive some problem, some mistakes, something that we forget, something that didn’t work." Because we all, you know, have these great moments. We sort of had the [inaudible 00:28:04] moment like, "Oh, I’m so glad that wasn’t me." The person who sort of fat-fingered the S3 buckets last year, does anybody remember this? Amazon buckets were like off and like, "Oh my gosh." It was one person, whoever that person was, that I feel really bad for them, right?

But all of a sudden they’re the patient zero, they’re the source when, of course, that isn’t how it works at all. Like, they probably were doing the same thing they’ve done day after day, week after week, month after month. By the way, this was a steady state moment, if you are keeping track on your scorecard, cleaning up some formats. But suddenly some few things sort of converged and everything happened all at once.

It’s the bugs you don’t know about that get you, right? And usually, it’s because we don’t quite understand how the system works, because that’s usually the first thing that somebody says, you know, when I become the victim, when I’m the one who fat-fingers the whole thing. Somebody says, "Well, didn’t you blah, blah?" And what do I say? I say, "Oh, I didn’t know it worked like that." We all say that, "I didn’t know it worked like that," because it’s a complex system.

So, surviving then becomes really important. So we’ve had these visits around these places about code. But as they say in the commercials, there’s more. So what I wanna do, is I wanna go back outside the code again and I wanna talk more about that network itself, because the rules and the network and the way we accomplish things on the network, because of time, and because of distance, and because of all these other things, are really, really different and then really important.

So let’s not talk about code for a bit. Let’s go back and talk about a couple other things on the map. So the first place I wanna stop is this place that is called the Gardens of Cyma [SP], the Rose Gardens. What’s really interesting about these gardens is there are all these roses, different colors, and different shapes, and different heights. And the people who tend these rose gardens can actually sort of communicate to each other as if they’re sort of like a sort of a zen of roses, like a three orange and a yellow and a red together means welcome or something.

There’s this whole little language that they understand with each other. And in this world, this Omni Terra world, whenever there’s a visiting dignitary, they put together these flower packages that explain things, that tell things. And it turns out this notion of being able to explain or tell things is really important, because this is how we communicate, this is how we talk to each other. And it’s part of this notion of interoperability of being able to communicate with others.

And I love this quote from Michael Platt at Microsoft. I mean, he talks about the difference between interop and integration. "Interop is a peer to peer kind of experience." This is you and I together. Integration is something else. integration is when my boss tells me what to do and I do it. I don’t have any choice. I’ve been subsumed by some other force. We might think of it as a robot.

It turns out, when you look at this notion of how things work on the network, and how things work together, how that stack operates, there’s all sorts of levels of things. But what we share between services is semantics, is what we mean to say and how we say it. It’s the shared understanding that really makes us all work as a community.

And on the Web, and Tim Berners-Lee’s portion of that, the way we understand each other turns out to be the actual action elements of HTML, the forms, and links, and images that bring data close to us, these are the things that we understand. And then we send additional information in data along the way, and humans kind of interpret it. And this idea of information that we send and having it represented in some way, and then having it interpreted is repeated in lots and lots of writing.

Jens Rasmussen, which is another person who’s sort of big in error culture and systems thinking, has this idea of signal, sign, and symbol. A signal is the piece of data you send. The sign is what that data represents. In this case, it represents a temperature in some way. And a symbol is what that representation means. So this number is 13, and that’s the temperature, and it represents the temperature of a particular chiller. And the symbol means 13, that’s too high for that chiller. I need to turn that down, right?

So these three things are really important, and we do this every day. We interpret things like this all the time. What I find really fascinating is, these match rather well with parts of the way we do the network. So signals are the protocols, the actual TCP/IP, and all these other things, UDP, and HTTP, they’re the protocol. They’re the agreements about how messages go back and forth. The formats are the signs, so HTML, and HAL, and Collection+JSON, and Siren, and Uber, and Mason, and CSV, and P&G. These are actually the formats of what we represent that information.

And then, finally, we actually have the vocabularies as the meaning, right? This is F name, first name, this is given name, all these things. We sort of have these shared meanings. And these really track well with things like HTTP, and CoAP, and HAL, and Dublin Core Application Profile, and ALPS, Level Profile Semantics. These are the ways we communicate with each other. But what I find amazing is we don’t often think about programming them individually.

So I can solve this problem with one of these protocols, one of these formats, and one of these vocabularies. I might be able to solve the same problem with a different format, and a different protocol, maybe the same vocabulary. They become independent elements. And what I think we’ve been really good at for the last 20 or 30 years, is those first two items on that list but not the third one. We’re not yet good at programming the semantics. Now that we’re spending more time in machine learning, now that we’re spending more time in AI, I think it’s gonna become more and more important.

And there are several companies that are focusing on this idea of separate semantics, even negotiating for semantics. So, Spring Boot uses ALPS in a couple of cases, Liferay uses it in their content management system. They’re definitely these ideas of starting to manage vocabulary separately. They have many large-scale companies that do a lot of this vocabulary management inside their own organizations. They call them ontologies, or they use some other technology like RDF, or something else to solve this problem. But managing that separately is really important.

So there’s another sort of mysterious place on our map, and that’s called the Valley of the Metamorphs. There’s these odd sculptures, these odd shapes, and people visit them and it turns out that over a long period of time, the shapes change. If you come back in the spring, they’ll be a slightly different shape, and over the years they’re a slightly different shape. What happens is, these sculptures magically somehow change over time, and it’s a big mystery to everyone.

It turns out we deal with change every single day, right? And things change, things change about our network, things change about our life, things change about…this room changes, we change, all our selves change, so on and so forth. Change is a fundamental indicator that you’re alive. And yet, for most of us when we build computer systems, we hope they never change. We sort of imagine that they will stay the same forever.

And in fact, we’re so bad at it, that when we change something we often break other people in the network. We had a great word for this, right, called this versioning. We’re gonna version, which means we’re gonna break some stuff, right? But it turns out we can do lots of things, we can make lots of changes without having to break things.

And there are just a few simple rules that you can follow when you’re designing a system with interactions to make sure that things don’t break. These are the rules that were followed by TCP/IP, right, when they… TCP/IP has been running for 40 years or so, maybe longer. And lots of code has changed, lots of bits and pieces of that have changed, lots of new libraries exist, but TCP/IP still works.

HTTP has changed from version 09 and version 1 and version… I think we had a two there, right, and 4.01 and XHTML and HTML5… I’m sorry, HTTP. I jumped ahead in the story here. But HTTP keeps changing, but it still works. Even the older versions works. The team is even smart enough to put an upgrade feature in HTTP. So if you wanna upgrade, you have an opportunity. And then HTML is the same way. So all of these things follow these rules. They don’t take things away, they don’t change the meanings, and all the new stuff is optional. So you can opt-in for things.

So there are lots of possibilities. If I wanna move something that returns status to some other location, I can’t just change the URL for everyone because they’re gonna keep using the same URL. So built into the protocol is this idea of saying, yeah, if you request this, I get to tell you it’s been moved, so you get to go to the new place. So I can tell you it’s been moved. So we’ve built in that notion of being able to change URLs without breaking anyone. Yet so many people I work with continue to change URLs, without taking care of this, and they break folks. That doesn’t make any sense.

You also can’t change the meaning of things. If I used to get some information about status and now you wanna make it something about how many machines there are, you can’t tell me that because now everybody who wants to know that status is A-okay, is gonna break. So what do I do? Is I add those things instead. I don’t take things away, I add them. I can’t change the meaning of what asking for status is, I can maybe add to it if I want to.

And finally, I have this idea where everything must be optional. If suddenly I want you to be able to get some information about machines, and I say you’ve gotta add a query string, I gotta change some of the address. Now everybody who doesn’t know that rule gets to break, that doesn’t make any sense. What I do is I allow both the old and the new to exist side by side, and that’s exactly how it works in life. That’s how it works in real life, right.

I’ve often said that I understand how nature does testing, because things die. Oops, that was a bad one. We won’t have any more of those, right. But I don’t understand how we have decided not to do evolution. We’ve decided to just go ahead and kill off the whole species and we’ll just imagine that they never were there. We’ll just go ahead and build something else. That doesn’t make sense. We need this same idea of acknowledging, accepting the change that we have over time.

So I’ll do a little appeal to authority moment here when Roy Fielding gets asked this question about versioning. I think we all know the answer that Roy gives. Roy gives the answer of "Don’t. Don’t break things," just change them in a compatible way. And lots of organizations do this. GitHub has this feature they call the no breaking pledge, where they talk about this notion about they’ll always maintain the existing things. They’ll offer new products, new things, but they’ll maintain what’s there.

Salesforce has been doing this for close to 20 years. They release a new API set every… I think it’s every four months. I don’t know if they still do it that often. But they always maintain the old ones that are there, because they don’t wanna break things. Because for Salesforce, that’s big money, that’s real money, right. So this idea of making sure that we acknowledge change as part of the network is really, really important.

Okay, there’s one more place I wanna stop along the way, and that has to do with this idea of discovery, the sea of discovery. What’s really amazing is there’s all sorts of new things we haven’t even done yet. There’s all sorts of new chances, new opportunities, new things people haven’t invented, and that’s our chance to make some discovery. But discovery, again, is one of the really vexing problems when we are building computer systems.

We’ve figured out how to discover what we know is going to be there - that’s usually called name resolution - but we haven’t figured out how to discover what we didn’t know was going to be there. How to recognize, how to use semantics, and symbols and signs in order to figure out that this means something that’s interesting. Now, I love this gentleman, J.C.R. Licklider. He was actually sort of a quiet hero of the internet in the 1960s. When nobody could get money to start building vast computer systems that would span a nation let alone the globe, J.C.R. Licklider owned a small budget at the Defense Department and he would fund sort of odd projects.

And one of the things that he had in his head was this notion of what he called the galactic network. He thought there could be a network of computers in our galaxy, the galactic network. And one of the questions in a memo he dashed off in 1963, he asked a question. He said, the classic sci-fi question, "How do you get communication started among totally uncorrelated sapient beings?"

What he means here is, people who have their own sense of self, right, they could be the aliens, they could be us, they could be people we’ve never met, how do you start communication when you’ve never met before? What do you do? And this was sort of a vexing problem in the '60s. And when you think about it, this is exactly what we do on our network, right. We have the same idea where we wanna communicate with other machines. And we’ve worked out a pretty good way to do this at the network level, right. That’s what TCP/IP is, and UDP, and all these other things.

We have sort of some basic agreements. We can even do things like negotiate for versions of HTTP. Are we talking HTTP 11? Are we talking HTTP 2? Are we gonna talk QUIC, right? We have some of these ideas. But we need to level this up to the level of applications as well. So think about the way discovery should work. We have lots and lots of services. How do we know that these services can talk to each other? How do we know these services even know each other exist, right?

So right now what we do is we actually tell them, right? We say, "This service over here can talk to that service over there." We write a little configuration file that points to that service, and we hope that that service over there is still there. And we might even, you know, get fancy and put a cluster of services behind that service over there, but that service sure better be there. Then we get even fancier and that is when we do the build, we’ll throw in some new addresses maybe, or we get even fancier and that is we make over there just a signpost and we resolve it with DNS.

So this is what Kubernetes does, this is what HashiCorp tools do, all these other things. As long as the thing I know is supposed to be there is there, it’s gonna work. But there’s all sorts of things out there that maybe I didn’t know about. What if all these are bots? What if all of these are robots? What if they have some sense of their own idea about my job is to go find somebody that manages credit card payments? How are they gonna communicate with each other? How are they gonna figure that out?

What if there are three services that manage credit card payments, and I could pick any one of them? Maybe I have an agreement already with two of them. And maybe one of them gives me a better discount when it’s a certain kind of card. Now all of a sudden, I want, at runtime, to connect to that one service that’s gonna give me the best option.

Now, I’m not sure if this sounds familiar, but this is what I did for about 10 years programming telephone switches in the United States. I knew during certain times of day that this short landline service would give me a better price for this call into this community than in that call, and overnight that I would always use this service instead. We’ve been doing this for more than half a century on telephones.

ATM machines do this as well. ATM machines can arbitrarily be displaced at any concert anywhere and they already start communicating with each other, and then they start running their own service. And they talk to any bank. You can just connect them up to a bank, they can find their network banks, and then navigate the rest of the network. Yet we have a difficult time designing our services to do this work.

Now put these robots in outer space and you have an idea of what J.C.R. Licklider was talking about. When we start to launch robots into outer space, what’s going to happen when they encounter somebody else’s robot in outer space?

So that is actually the memo from 1963 that Licklider wrote. And he started to sort of pars out the ideas about how he would do this. Now, what he ended up doing was designing sort of the thought experiment for ARPANET and how we got our internet today. How much smarts we would put in each machine versus how much smarts we would put on the backbone of the network. And it was sort of a tricky state. What I find really fascinating, remember, I told you I worked for a short time with a group called the Space Data Systems. Forty-five years later, they actually created a protocol for outer space for communicating machines through outer space and called it the Licklider Protocol, which I think is really kind of cool.

And the protocol is wacky. It looks sort of like TCP/IP or UDP, but it’s set up so that it could take days for a transaction to complete for a request in response to complete. It’s very cool, it’s very interesting. So that’s what we need, we need a Licklider-like protocol that allows us not just to resolve a name, but to actually, at the application level, ask questions. Now I can put a service out into outer space, into our network space, and have confidence that it will find who it’s looking for, and if not, it will keep looking until it finds one.

And it turns out we have all of the technology we need for that today. For machines that we know should be there, we use DNS, right? We turn names to numbers. And that works pretty good. But now we need to add, we need to have the ability to say, "I’d like you to talk in this particular protocol. I’d like you to talk in HTML. I’d like you to be able to use this particular vocabulary and solve this particular problem." We can do all of that.

So we need to deliver that. I’ve been actually working on a sort of proof of concept idea called open discovery, where basically you just have some real simple ways to do this. Now, we’ve done the service discovery thing a few times before. UDDI, I think I ran one of the second or third largest UDDI, open UDDI sites on the net 20 years ago. Nobody cared.

We tried this with lots and lots of things, and we do this now inside internal systems like Kubernetes and some of these other things today as well. The problem is, the barrier of entry is incredibly high and until the entire galaxy gets Kubernetes, it’s not gonna work for us, right. And Kubernetes is still focused on the notion of finding the thing I’m looking for rather than discovering what I didn’t know.

So it turns out we can create a very simple system. I need to register when I exist, when I grow up, when I get loaded. I need to be able to find other services. I need to bind them and say, "Okay, we’re gonna have an agreement. Maybe I need to make sure I pass some certificates, some keys to prove who I am. Now I need to renew my lease so that you know I’m still alive, I’m still available, and maybe I need to unregister when it’s time for me to go." It turns out this is not some listing of all the possible things you can do at any one point, it’s the momentary existence of what’s in your ecosystem today.

Who’s around me that I can use? Who’s here to help me? I will literally walk in a room and like, who can I meet? This is actually pretty simple. So here’s, like, a simple example of some code. So I wanna register myself on startup, I wanna find all the services I need, and if that’s a win, then I’m up and I’m ready to go. Now I don’t have to have lots of zookeeper files, lots of all these other configurations that have to be added. I can actually do this at runtime.

Now when it’s just me and three friends, this makes no sense, right, because we’ll just sit around on the table and we’ll couple together and we’ll figure it all out and it will be just fine. But when it’s people I’ve never met, when it’s services that I would love to use but I can’t talk to you because I’m in Kentucky and you’re here in Copenhagen, now this can become handy. Now we can create an economy, we can create a real economy of services, we can create a really economy of micropayments where people can play services. There’s a few interesting formats APIs JSON is a great sort of searchable description format that does a lot of this kind of work as long as you can get the protocol working.

So there’s lots of possibilities there, and now I can create services that can protect themselves and enlist others in the process. Now they can travel that network and find the things that they’re looking for. And the cool thing is, we don’t need any more technology, we don’t need any new protocols, we don’t need any new formats, we just need some agreements and we pass things back and forth. And that could be very, very powerful.

Okay, so we talked a lot about a lot of things. One of the things I said is we need better maps. We need to start creating situations where we tell people what’s possible. Probably the most effective way to do this is using hypermedia style formats like HAL the way Amazon does, or Siren the way Apogee and Google does, or some of these other formats, like Collection+JSON and so on and so forth. We need better maps that tell people what’s possible. Not just written in the docs but actually in machine level.

We wanna program the network. We need to acknowledge that there’s time, and there’s space, and there’s distance. And we need to figure out how to write code that means that we’re going to reliably survive when things don’t go well. We’re gonna use these kinds of patterns that Michael Nygard has talked about for more than a decade, and others as well.

Chris Richardson, Chris Richardson has a great site called microservices.io, where he talks about a lot of patterns like this, that we can start to incorporate into what we do. It means understanding the role of semantics, of actually digging into this pile that we’ve sort of avoided for so long. This idea of separating the protocols and the formats from the meaning of what we’re trying to do and create some standardized machine meaning into all of this. So machines can actually participate in this.

It means making sure that we manage change over time, that we don’t break things just because, we don’t just go kill things because we don’t like them anymore. We actually keep adding to the network, and it’s a thing that can be difficult for architects because we’d like to sort of imagine everything’s a greenfield. But in turn, we still have all sorts of leftovers and they’ll always be there. You know, I’ve got my little toe, and my tonsil, and my appendix, and my lizard brain. It’s really easier to leave them alone than it is to try to extract them out. Because it really messes with the system when you do. It can be costly. Surgery is dangerous.

And then this idea of discovery, creating service-level discovery, so that I can now design my machines to go find what they’re looking for. And not if somebody moves, but if there’s a new player in the room, I might be able to actually interact with that player and we can now create a business. We can create an opportunity. And that’s a lot. Right, that’s a lot of things to think about. The good news is, we thought about a lot of these over and over. We can stand on the shoulder of giants. We can make it on this journey.

And what we have to think about is we have to take that first step. We have to start thinking today about what would it be like if we were doing this. Let’s start acting like that’s possible. Now what does it look like? One of the things that Alan Kay says…Alan Kay, he’s sort of credited with giving us the way to start thinking about object-oriented programming, and he created a thing called the Dynabook which we now really sort of recognize as laptops as readers. He says, "One of the best ways to predict the future is to invent it."

And it’s sort of like the techie version of what the Gandhi attribution quote is, "Be what you wanna see in others. Be the future. Start to act like the future." So if we take this journey, if we go along with this and we start to explore the map and we find out what’s there, and we figure out what’s it like over here, what’s it like to be acting like this, what’s it like to be doing our own discovery, what’s it like to separate these vocabularies, then we get a chance to learn. We get a chance to understand.

And when we get that chance to do that, then we start discovering all sorts of other possibilities. And hopefully, with what we’ve talked about today, this will give us some ideas. You probably already are doing some of these things already. And if we start to share what we’re doing, then we’re gonna be able to discover lots of RESTful microservices on the Web. So, thank you very much. Thanks, thanks.