TRANSCRIPT: UX Week 2010 – Ben Fry
Computational Information Design
All right, thank you, Peter. Thanks very much for having me. This is fun. Good afternoon. You try going after the guy who invented giving a shit about design and the Web.
So as Peter mentioned, I’ll talk a bit about data visualization. I spend a lot of time thinking about numbers and such.
I last lived in San Francisco in the Bay Area about 12 or 13 years ago, and was working as an interface designer at Netscape, may they rest in peace. And I basically had grown up interested in design and computer science as sort of separate things since being fairly young. And I figured that UI design was kinda the way to use those powers for good, sort of save people from these awful computers, and make software easier to use and more tractable and such.
I was at Netscape for a bit and then left, what, as they began imploding, and I went to the Media Lab at MIT. And one of my first projects when I got to MIT was actually looking at sorta UIs that actually hate people. So basically, as a cathartic way of kind of getting all this “try to help people” UI designer stuff outta my system, instead this painting software that each of the colors actually had a different behavior – sort of clear that. And then take an antagonistic painting program, so I’m trying to draw a line.
You know, the yellow – it’s actually only gonna show me the mark before the mark that I’m drawing now.
This one, we’re sort of upside-down, thank you very much.
Or here we’re actually just erasing.
Yes, thank you. As designers, you’ve always wanted to add this button to your software, I would assume.
But this was very cathartic, and this was nice to kind of get out of my system.
As a designer, I began working sort of in design doing graphic design work. We do these information graphics, sort of tens and hundreds of data points, something you can actually sit down with Adobe Illustrator and lay out all by hand. And then moving – once you add code, it kinda becomes this data visualization thing of thousands and probably millions of data points that you’re trying to explain to people.
And so part of the wonderful job security in this is that we’ll never actually have less data, so there is no going back. Everyone talks about, like, there’s more data and more data, and we’re – you know, the average person has to deal with 16 thousand billion terabytes, petabytes, whatever, per day just in their newspaper, etc., etc. That’s not going backwards, and so it’s this magnificent job security for me.
But first, let’s begin with a story about sort of looking at data. So I really enjoy football. I actually really like football. And that’s not playing football, mind you. I’m not actually coordinated enough to throw, much less catch, a football. But a terrific sort of thing to watch because it’s basically – couldn’t be more opposite from any of the work that I do as far as thinking about numbers and images. I mean, look how much this guy is enjoying it. The ref right there, he’s just like – he’s ready to go.
One of the teams that I follow is this – is University of Michigan. I grew up in Ann Arbor. And they have this wide receiver, Mario Manningham. Just kind of an idiot, kind of a loudmouth, kind of a – just not a great person necessarily.
And I was curious about – and so the thing about wide receivers is that these are the guys you’re actually gonna hear about in the press. So Terrell Owens making all kinds of noise – you know, wide receiver. Chad Ochocinco, formerly Chad Johnson, whose number is 85, and so he changed his last name to Ochocinco. Don’t tell him that that’s not actually 85 in Spanish, but…
You know, these sorta loudmouth kind of characters.
And so one day I was reading about this story about Mario Manningham in particular, and he had scored a six on the Wonderlic test. And so the Wonderlic test is this thing, and it’s basically an intelligence test. A 20 on the Wonderlic means you’re average intelligence, sort of an IQ of about 100. It’s a – what is it? It’s a 50-score test taken in 12 minutes.
And so all football players – you know, so football players, intelligence tests, those go together, right? Perfect. So all football players take this as part of the NFL Combine, which is all of the college players go and they do all of these various things. They run a sprint, do some jumps. They do all this stuff, and then after they’ve done all these athletic things that they’ve been tested on, then they sit down and take this intelligence test, which is also just kind of a wonderful picture. I picture, like, a great big class of lawyers sitting down and taking the bar exam, and then afterwards they go run a 5K. And it’s not so much, like, “Does it make you a better lawyer?” any more than the intelligence test makes you a better football player, but hey, it’s one more data point, of course.
And so I was curious about how that actually works across different positions. And so on Wikipedia there’s a rundown of what the different scores are for different positions. And so starting with a diagram from Wikipedia, sorta do the basic design stuff and kinda clean things up a little bit. Here, I just cleaned up the colors and the layout and all that, and added the numbers to each of the positions. So guess who the 17′s are out on the outside here? Our beloved wide receivers. And so, also, we then actually just sized the dots based on those numbers, and then actually, we don’t actually have to show the numbers anymore. We can actually just put the positions back in.
So the wonderful sort of thing that comes out of this – so I’ve just taken this very simple set of numbers, just plotted it out, and what I can see is that QB, the quarterback there, isn’t actually the smartest guy in the field. It’s actually – the smartest guy is his center, right in front of him, and these two guys on the outside, who are basically most in charge of preventing him from getting killed. And so they have to adjust to a large number of situations, as opposed to the guys on this side. In red, we have the defense. They’re just programmed to kill.
They just need to go and attack and all that.
And so this is a really fascinating sort of story. And I think it’s interesting to take data sets like this, and especially for an audience who’s not necessarily even into this or hadn’t necessarily heard of it, but instead how can you actually engage people in data in ways like that.
Sometimes it’s actually just nice to see the data. So this is all 26 million road segments from the entire U.S., so just plotted out. The wonderful thing that happens – so the first is the obvious things of – so here’s Detroit and Chicago, and so things are gonna be exceptionally dense in that area. Here’s the Bay Area, so you can kinda pick out San Francisco there, and then moving down the Bay, and the way that things change as it heads into the mountains. This is Kansas City, so much more gridded.
But my favorite – so here’s the Appalachian Mountains, and basically defined by the roads avoiding them. So I haven’t actually done anything to include geography or anything like that. Instead, just showing the data actually brings that out. Like, it actually just – this extra layer actually kind of hops directly out of the information. Just sometimes data will kind of give that to you.
Because I can’t actually get degrees nor clients doing football plots or street maps, I do a lot of work in genetic data, so my Ph.D. work had to do with genetics. One of the typical things you do with working with DNA is, we just wanna see it. We wanna be able to browse through it. We wanna find a particular region, study it, see what sort of data is happening in that area.
This is the UCSC Genome Browser. This is looking at 160 base pairs of DNA; this is 10,000; this is 600,000. And the thing about this – this is very typical of data-oriented design – is that you sorta treat everything as “Well, it’s a bunch of start-and-stop positions; it’s all distance along a chromosome.” So we can just put these points on a line because they’re developers, and so they’re thinking about it in terms of, well, it’s just a bunch of starts and stops, and they’ve kind of gone up this level of abstraction instead of saying, “Well, how are people actually gonna use that data?” So at 600,000 base pairs, do I actually need all of these elements? Like, what’s actually relevant on each of these parts?
And then – so you were thinking I was Mr. Smart Designer – I started with – so the Powers of Ten. So everybody in design class has seen Charles and Ray Eames, Powers of Ten, or even those of you who aren’t designers, you’ve probably seen it.
And so it’s this wonderful thing. You’re moving through orders of 9 to 2. We can kind of zoom in. And this is the story of working with the genome, right? That, really, when scientists are dealing with it, you start out at 3 billion base pairs of DNA. They’re gonna do most of their work with about half a million to a million base pairs. You need to be able to zoom in to about 50,000 base pairs of these base pairs, just these letters of A’s and C’s and G’s and T’s. And you also need to get all the way from that 1 million scale down to individual letters. So I said, “Great. I’ll just kinda build out this browser that actually lets you sorta zoom through this information.”
And so here it is actually up and running. You can zoom in. We’re on a particular gene now. Now it’s kind of zoomed into individual letters. This is actually a terrible way to do a browser. However, Nick Nolte here is using to figure out, in the movie The Hulk, why his son keeps turning green and destroying buildings and so on.
So for your average scientist, it’s actually gonna be a great deal of seasickness because you’re sort of doing a lot of time sort of going through these layers of zoom. But in fact, you don’t actually learn anything from doing that zooming.
And so instead, a better solution is basically I wanna see all of that information at the same time. I wanna see the forest and the trees simultaneously. So up at the top here I’ve got these 600,000 letters, in the middle I’ve got 10,000, and then down at the bottom I have 160 individual base pairs.
And what I’ve done is that, up at the top, this is my region of interest. I have a couple little tick marks that say something interesting is happening there. And I can hone right in to just one of these little tick marks all the way down to an individual letter that might be a – the scientific term is a “causative allele for selection” as far as – I’ll spare you doing 18 minutes of a genetics lesson.
Basic idea: I wanna be able to get through these layers of detail, and I care about very different things on each of those levels.
This was a more advanced version based on the same browsing framework. Instead, here we’re looking at 12 different mammals and how similar they are to humans, so there’s this fascinating stuff where basically those pink plots there, that’s percentage of similarity from human all the way across chimp and dog and armadillo and etc. So we have this amazing sort of level of similarity with all these other species, and so it’s really quite interesting and fun to work with this data in this fashion.
So this is the tool version of it, and then this is an illustration I did for a magazine actually using that data. So real data – you can sort of see that. But instead just something that looks more – just looks interesting, feels good, feels like DNA, sort of this sorta spatial kind of thing.
And this is one of the things I try to do with my work in general, is be able to move back and forth between these more practical things of sort of tools for scientists that allow you to really analyze and work with the data. But on the other hand, how can you do things that are sorta more purely visual or more evocative, more sort of further out kind of things? And it just winds up being helpful, because if you do too much of one or the other, you kinda get stuck.
And this is a common thing that comes up, especially with data visualization work, and to a degree with design, this sort of balance of aesthetics and function, and that they’re treated as this sort of Cain and Abel type of battle between aesthetics and function and who’s gonna win out and all that.
And so perhaps better, we can think about things as a bit more like a spectrum. So can we move back and forth? The thing is, that’s not even quite the right way to look at it, that it’s maybe more we can have, like, an axis or something. So we’re further up the aesthetics axis on a particular piece; we’re further over this way on the function axis. But they’re not mutually exclusive, essentially.
But really, the main takeaway, though, is those aren’t actually the things that are gonna be most impactful on the project itself. And so much more often it’s gonna be your audience for whom the piece is created, actual context of use, the time that you have to actually implement the project. So there are all these other factors that are gonna have a much greater impact in terms of the way people work with the piece that you create. And sunspots.
Another illustration. This is looking at the chimp. So one of the main genes that’s different between us and chimps is this gene called FOXB2. And FOXB2 is believed to be connected to language acquisition, that essentially that’s one of the main differences between us and chimps within that gene. It’s about 72,000 letters of DNA, and there are just 9 single-letter positions that account for the actual functional differences between us and chimps. So among the 3 billion base pairs of DNA that we share with chimps, and amongst the 20,000 different genes that we all share, within this one gene there’s this 75,000-letter chunk of data. And of that, there are 9 letters that may essentially be the difference between us having language and their language being significantly more primitive.
And so this is a fairly simple – so this poster essentially shows all of those letters actually plotted out, and then it just highlights the different locations where those take place. And so, again, can you take this data set, or data sets like this, and be able to tell a story about what’s actually in that information?
And then this is back in the scientific tools side. This is the big-boy version used by the scientists to actually track down this type of data within – so this was a project I worked on with some collaborators at MIT and Harvard – basically as a way to track down these different areas, and try and find regions that are under selection.
So Processing. One of the other main goals of my own work is in basically getting more people creating things with code. So to that end, Casey Reas and I have started this project. It’s called Processing. It’s a free and open-source programming environment. The whole idea is to kinda make it easy to get up and running making visual things.
So this is Processing. I can – we wanted to be able to write a line of code and hit Run, and hopefully something show up on the screen. Or if I add a couple more lines. “Good God, he’s coding in front of a bunch of designers.”
So here, a very simple interactive thing. This is just following the mouse. And instead, let’s see. Let’s set the field to – I’m sorry, let’s set the stroke to the mouse position divided by 2. Let’s change a stroke – wait. And so on. So we shift the color a little bit.
And so what we wanted to do was get people up and running quickly with code and have a – put all the fun stuff at the beginning so that, later, they can actually – once we have them hooked, that they can expand out into other endeavors.
It’s Java-based. And I won’t go through this whole slide, but the whole idea, it’s freely available – Max, Windows, Linux. You can download it from Processing.org. And we have a really wonderful set of projects that have been created with it at Processing.org/Exhibition. I encourage you to check them out. I use it for visualization work, and then there’s a lot of – Casey Reas uses it for interactive artworks. And then there’s a whole range of things that people have done.
This is the growth of the project. It leads through this past February. And so Casey and I have been doing this project, and this is sort of fun and also a bit terrifying. As we look at sort of number of users per week actually using the software, we’re at about 25,000 a week.
One of the terrific things that hops out of this data is that nobody likes to code over Christmas. So here in January, we have this total downturn and also this sorta sloping thing that happens. This is heading into August, and then everybody comes back, and it’s “Oh, September. Time to actually work.”
And so this has been kinda fascinating to watch this grow. There are a number of books about the software. Most recently, Casey and I did a really small, very basic book that we wanted to make it really easy for anybody to get started, have it be a really inexpensive book. You don’t make any money on books anyway.
And actually, so I put this in to remind myself, but I have a bunch of little cards that O’Reilly gave me. So you can get, like, the e-book for, like, $5.00 or something like that. I think this means I actually have to pay O’Reilly money rather than actually, like, me actually making my nickel on it or something like that.
But I encourage you to check it out, ’cause the whole premise for this – that Casey and I’s grand scheme, grand goal in all this is that we’re really trying to ruin your career. We’re trying to get more designers to actually start doing programming, and more programmers to start doing design work. And we’ve actually been kind of successful with that, with a few cases. And it’s really wonderful watching people kinda making this transition and kind of working out different skill sets.
Let’s see. A personal project. This is looking at Charles Darwin’s Origin of Species. And so I got interested in this because a friend who worked in genetics was telling me about the fact that Origin of Species actually changed an enormous amount over the course of Darwin’s life, so I went and started looking into it. It went from about 150,000 words in the first edition that he wrote, up to about 190,000 words in the sixth edition – the sixth English edition that he wrote before his death.
And so I like that – I think that’s fascinating for a number of reasons. One is that we typically think of – particularly outside of science, you think of, like, scientific ideas as these things that kind of – like, way out. The theory of evolution: Darwin went up on a mountain, and he figured it out, and then he brought clay tablets down and kind of gave it to science, and then that’s it, evolution’s figured out. Except for the American right.
But one of the things – so instead, we can actually see Darwin himself actually kind of struggling with these different ideas. So this is a basic interactive viewer. Loads in all six editions. You can see them over the left-hand side. Blue is an addition; red is deletion. So basically, here we have Darwin plus Track Changes, so we can see –
– just interactively kind of flip through the book and kind of get a sense of the types of things that changed over time.
The final version instead looks something more like this. Where here I just wanted to show a composite of the book for an exhibition. So we start out with the entire text, so here’s all 150,000 words done in a sort of half-pixel font, just kind of Greeked in. With the mouse, you can actually read different portions of the text. And over time, it’s simply adding these other editions to the Greeked portion up here. And so we can actually – once this is built – has finished its animation, we can see, for any given word, where that actually came from and what the provenance of that was.
So we can pick out things like – so one of the things that Darwin ran into trouble with was he didn’t actually – he didn’t talk about God enough in the first edition, and so this was a considerable point of concern given the sort of attitudes of the day.
And so we can see in the first edition where he said, you know, “There’s grandeur in this view of life, with its several powers, having been originally breathed into a few forms or into one; and that, whilst the planet,” blah, blah, blah. But in the second edition, in that closing paragraph he adds this extra “by the Creator,” so he can actually kinda cover himself a little bit in terms of how people were feeling about this theory of evolution that kinda seemed to leave God out of the whole equation as far as who was making what and who was responsible for the incredible variation you see between different plants and animals and so on.
Actually, we’ll move on. This is online on my site. You can actually check it out there, with a little bit more time. Let’s see here.
So backing up a little bit, this also brings to mind a certain process of how you develop this sort of work. So as far as starting with data, what typically happens is that I have this pile of information. I need to be able to acquire it, parse through it, filter it, mine it. So this is kinda the computer science and math side of things. And then it gets kinda thrown over to the wall to people doing graphic design and interaction design and visualization and that sorta thing. And instead, this is a terrible way of actually working with information because of the way that each of these different parts actually inform one another.
Typically, in practice I’ll start by sort of jumping through a couple of these steps. So for instance, doing that initial piece with the sort of Darwin and Track Changes. But then once you’ve done that, you kinda work backwards and say, “Well, what’s the question? What’s the story that you can actually pull out of this?” And then once you’ve done that, the important part is this iteration step of the interaction – the way the interaction work is gonna affect how you do the data-mining portion and so on, and so basically you can’t really separate these things. And so really trying to look at things sort of from “have a data set” to how we actually understand it.
And this is a data set. So this is looking at some healthcare data. This was a client project for GE. This here – you think this is fascinating, but it actually goes on for another 160 columns, which gets better, and then 6 million rows. And so typically what you do is you kinda look at data like this, and there’s a tendency to say, “Okay, how do I make a picture of it?” instead of kind of “What’s the story? What am I trying to actually say about the data? Why did we collect it?” and then working back to the actual piece.
This was the piece that we created for them. Basically, it’s 6 million patient records from their electronic medical record database, and this is what it looks like. So of all of the people in the database, 97 percent do not have heart disease; 3 percent do. One percent have had a stroke. That’s the breakdown on smoking. And so very quickly – so this doesn’t feel like 6 million. We can actually get through the data in a very fluid way.
But also, it gets more interesting when we actually start comparing things. So one of the things about various conditions is that they don’t actually happen in isolation, so it’s all about correlations and comorbidities and things like that. But you can’t say that, and so instead, how can you actually demonstrate that to people and get them to start working with it?
So here, highlighting diabetes, you can see how in the database, 4 percent of people who have a normal body mass index have diabetes, and that rapidly goes up to 26 percent of the people who are morbidly obese who have diabetes. And so I can write a 1,000-word article about it, or I can actually just demonstrate it and get people hooked into the interaction of sort of flipping through it.
This is another looking at healthcare costs by age. So the angle of the wedge is the relative number of people with a particular condition. The area of the wedge is the overall cost for that condition. So at age 50, hypertension’s a big one. Back towards age 18, not so much; it’s actually asthma. And so we can actually just play with the data in a very fluid way to see what’s in that data set.
And then most recently, this is one looking at aging populations, so each of these bars represents people of age – so here it’s 0 to 5 – I’m sorry, 0 to 4; 5 to 9; 10 to 14; and so on. So the fascinating thing is that Japan basically has this enormous cohort of people who are age 60 to 64, and over the next couple years, what does that actually mean for their economy, their healthcare system? So here we are going through the next couple years. And we actually wanna – how do we actually tell that story with that information? We can actually just play this back or actually look at it just straight from 1950 all the way through 2050. Over at the right, we have this composite. We can flip between different countries here and so on.
This is also online. This is Healthymagination.com. And we’ve been doing projects like this to basically – how can you work with them, trying to understand some of this data?
More recently, I’ve been working for Google to actually look at how we can get the Processing software to run on – or actually be able to create things for Android devices. So this is actually the same piece, just up and running on Android. Here’s the cost piece. Everything kind of changes on mobile, and so you have to do a good bit of – sort of move things around, and how do people actually interact with it, and so on. We can talk about that some more later.
And then, finally, on the more – back to the more complicated end of the scale. So back to the genetics work that I showed, here’s a browser of the entire human genome, so actually running on a Google Nexus One phone. And this just works, so all 3 billion letters of human DNA, all 20,000, 25,000 genes, and you can actually just scan through it on a mobile device.
And this is – which is sort of an astonishing thing as far as, like, right now it’s an expensive sort of phone, but this is quickly becoming the norm. It’ll be, you know, $100.00 in a couple years. What does that mean for (a) the type of data that we can actually carry around, much less (b) sort of the healthcare side of what health information can we actually take along with us, and how does something mobile that just actually exists in our pocket that is owned by us?
So with that, I will close, and thanks very much.
[End of Audio]
Transcripts provided by Verbalink