The Challenges of Scaling Your Business With Slack’s Allan Leinwand
Jon Prial: Maybe you're talking business with a colleague, or maybe you just chatting with a friend or a family member, but no matter the case, have you ever said, gee, I wish I had that problem? Sometimes it's said facetiously, sometimes not, but there really are problems that people do wish they had. Today, our show is focused on one of those. It's a problem that you all wish you had. What's that you say, Jon? Well, today we're going to be talking about how to deal with the challenge of scaling your business. I'm right, am I not? I'm Jon Prial. Welcome to Georgian Impact Podcast. With us today is Allan Leinwand, Senior VP of Engineering at Slack. Now this is a person who knows all about scale and I'm thrilled to be talking with him. You see, Allan was an outstanding keynote speaker at our recent annual Scale Tech Conference. At the Scale Tech Conferences, some of the world's most experienced product and tech leaders share hard won secrets on startup growth, product direction, and business strategy. Allan, Slack is not the only company where you've had to help a company scale. Take us through a bit of your background and some of the growth challenges you've seen.
Allan Leinwand: Thanks, Jon. I have clearly been involved and that's what lights me up in the morning, is really scaling companies and scaling businesses and scaling engineering teams. So you're right, I've been lucky enough to be involved in a number of companies that have grown fairly large scale. I started my career back in Hewlett Packard, which was already at scale when I joined them. But then I inaudible a couple years of HP, I went over to Cisco Systems, where I was very early on and watched that business just boom as we were building this little thing called the internet. After that, I went over to a company called Digital Island, where I was one of the founding team members there and scaled that company from myself and a few friends, up to 1800 people. As we continue to grow that that, was out of an internet service provider, content distribution network company. Along the way, I did a couple of other startups, scaled those up as well, but probably the biggest scale challenge that I had throughout my career was I was involved in scaling up Zynga, which was the gaming company back in the heyday of Mafia Wars and Farmville and Words With Friends. And you smile because I can already see your picture in your Facebook feed that we were filling up at that time. And then I went over after spending a number of years at Zynga, to a company called Service Now, where I was in charge with scaling up engineering teams for that enterprise software company, primarily around workflow and workflow automation. And finally, I've been here at Slack for just about two years now and been scaling up the engineering teams here to handle workflow, workplace collaboration and everything we do here at Slack.
Jon Prial: That's great. So I'd like to go through some of the key components of scale and maybe help people prioritize, as one might focus. I mean, companies have an application infrastructure and that obviously needs to be coded without bottlenecks. But the application runs on an underlying layer of middleware infrastructure software. It could running on a third party provider. It could be in- house or a hybrid. So let me lay out two scenarios for you to talk about. First one is a holiday rush, which a system needs to handle. Maybe I should say a pandemic rush, hammers the likes of a Netflix or a Zoom, or a company that just continues to grow and grow and things are working and now things are slowing down and things might break. I think both of these scenarios are real and quite different. My sense is the first one is doable, although not trivial. It seems to be a known problem. Do you agree with that first problem that can reasonably be attacked?
Allan Leinwand: Yeah, I do agree that you can attack that, but I think you have to have the right mindset. One of the ways you have the right mindset, one of the things I've been telling teams for years is you have to plan for an order of magnitude change in everything you're building. That's a different mindset than let's just build it and get out the door. Yes, you have to build it and get out the door and we have to build it and get it out the door with the architectural knowledge of the fact that this thing could blow up 10X or a 100X before you know it. And the last thing you want to be doing is putting the proverbial duct tape on things as you're trying to scale. So you have to always plan for that order of magnitude change. I remember when I got to Service Now, I don't remember the exact numbers, but we had, call it N- customers at the time. And I remember standing in front of our engineering leadership at the time and saying," Let's imagine we have a 100X times N. And literally jaws on the floor and people saying, that's not going to happen. No, that's just crazy sauce. What are you talking about? Two years later, I stood in front of that same team and said," We're at 120X. What do you think?" And what it means is just, we had to start thinking about that. We had to start thinking about how do we scale the networking layer? How do we scale the architecture? How do we scale the database query? So every step along the way. One way to do that is, I always tell people, you have to be able to trace the entire application as you described the various components. I had to call it from the floor to the app stack. So you have to know what's happening at the physical layer, you have to know what's happening at the networking layer. You have to know what's happening at the system side. You have to know what's happening at the database side, at the caching tier, and the application logic. You can't be writing software that can't be tested. You have to make sure you load test things at every layer of the stack to make sure you're able to handle that scale and all that takes time. So, don't get me wrong. I don't think two people in a basement, first trying to come up with the next great app need to think about scaling to 10 million, 100 million people, but they can't design themselves into a corner that prevents them from doing that in the future. And that's the guidance I give people is imagine a time and a place. You don't have to build it now, but don't build yourself into a corner. And that usually leads to further compensations about how do you do that? Well, let's talk about horizontal scaling. Let's talk about socket layers that might need to grow. Let's talk about constructs you're building in the app infrastructure or in the boot process or how systems connect. Let's talk about how clients connect into the backend system and what if those clients are now across the globe, as opposed to a particular region? How do you think through those problems? And I just think getting the mental synapsis firing on that order of magnitude or two doors of magnitude to scale is what allows you to solve the problem. There isn't a magic bullet. There's lots and lots of things you have to think about along the way, but you have to have the right mindset.
Jon Prial: Right. So on that second case, which is the growing and growing and maybe painfully growing. My sense is, if you're not paying attention and I like your point about you've really got to plan and plan ahead. But as you grow and grow, some insidious things can happen and they could potentially happen if you're taking the eye off the ball. So I guess my question is, is there ever a time where a team could just for a moment take their eye off scalability, or is always something to be focused on?
Allan Leinwand: I think it's a problem that you want to have. And I think if you're not thinking about scaling then is something that will eventually come back and bite you in a couple different ways. One of the things I tell engineering teams a lot, is I say, the way to run your engineering team is this way. You wake up in the morning, you think are the customer issues I need to fix right now? Are there hot burning fires? Second thing you want to do in the morning is you want to say, are there things that are remedial effects or post- mortem action items that you need to look at from those customer incidents that you need to fix? And then lastly, you need to build an incredible product. So unfortunately, engineering teams tend to think about it in reverse. They wake up in the morning, they think about what the next innovative feature is that they want to build. When there is a fire, engineers are great. They run to the fire, they, they do the diving catch, they solve the problem, but they forget about all those action items. And that's what ends up building a tech debt. And that's usually where things break in scale. So what I'm trying to get to is that I think having a philosophy and having a way to think about how do you scale, whether it's implicit or whether it's explicit. You can say we're always scaling, but then you say, okay, well, what does that really mean? Okay. What that really means is when something breaks because of scale or because of growth, make sure it doesn't happen again. How do you make sure it doesn't happen again? Well, maybe it's solve the scale problem, maybe it's rearchitect, maybe it's go buy a different service, maybe it's thinking about a different provider, maybe it's think about a different way to implement. And then if you do that, right, if you actually solve those customer problems, scale or not, and make sure they don't happen again, you spend all your time innovating because you're not always in the interrupt cycle of doing that firefighting mode.
Jon Prial: Right. And you mentioned a couple of programmers in the garage and obviously their primary thought at the moment is features and functions, getting this product out the door. Clearly, it's necessary, but not sufficient.
Allan Leinwand: Well, I mean, I'll paraphrase. You and I were talking just before we get started here, our love of history and I'll paraphrase. Sort of the Eisenhower, no good plan survives contact with the enemy. By saying that no good products survives contact with the customer. I say that entirely facetiously. I mean, you want customers of course, to have contact with your product and you want to be able to see it scale. So yes, when you're first building a product, you think about my priorities, fix customer issues, make sure they don't happen again, innovate. If you've got no customers, it's all innovation. But the second you get your product out there and you start getting feedback and you start seeing issues, then I think you have to come back to that mnemonic of figuring out how to plan for that and how to actually grow and build that scalability along the way.
Jon Prial: Interesting. So let me frame my next question by declaring a little something. To me, and to George, security is one of these things that should not be an afterthought. And as we talk about organizations, we often say it can't be relegated to just a few people. It's got to be corporate wide, it's got to come from the top. It's got to spread through our organization. Does scalability work this way or can a small team of experts be in place to support the rest of the teams?
Allan Leinwand: No, I think scalability works the same way. I think you need to have everyone thinking about scale. Not everyone's going to be a scalability expert, just like inaudible a security expert. But I don't think you can have teams that ignore scale in order to be successful. Like I said, they might not have to be skilled buddy experts, they might not know about horizontal scaling or defense and depth and things like that, but you can't have a team that's building a piece of functionality that can't scale because they're just not scalability experts. I think you have to indoctrinate that into the entire organization and let everyone be a participant in that challenge.
Jon Prial: It really becomes culture. So every vendor and dare I say, investor, loves solutions that are sticky. So once a company though has made a commitment to a big piece of tech, whether it's AWS as a service offering Salesforce or whatever, do you ever go back and look at those pieces in light of scalability, or are you locked in forever? How do you reassess where you are?
Allan Leinwand: I think, it's our job to always challenge those assumptions and make sure that we are taking a look at those pieces over time. Yes. I mean, we do make vendor selections. We do use a number of different products, we do use a number of open source products. But I think the mindset in our world at least is every six months, to take a hard look at everything and make sure we're assessing and seeing if it is building to the scale we want. We just went through a big backend database migration because we were rolling out. In our history, we had a what's called Master- Master SQL replication. And we made a conscious effort that, that wasn't scaling for us. And we were going to an open source system called the Test, that allows us to scale and share the database infrastructure far more granularly, and far more easily for our environment. That was a 18, two year haul to go from a system that we thought wasn't scaling with our needs, to a system that we think scales very nicely. I suspect, in a year or two from now, we'll take a look at that and say, are we still scaling properly or do we need to take another hard look at that and think through it? So those are architectural changes and we do talk about, and those are programs that we do put key results and visit metrics around and drive to conclusion, generally, not as fast as anyone would like, but we get there.
Jon Prial: Well, in reality, a couple of years does not seem long to me for a project of that size. So in terms of investigating what you've got, I often think about machine learning models, right? So they're going to get updated and they're going to get revised and you're going to be looking more at tools, but from something as simple as ML and creating models with different techniques, what about the thought about, whoa, I really need to make a major shift. I need to go from ML to deep learning, for example. Is that a different thought process, the way you just talked about the database migration or is it similar?
Allan Leinwand: I think it's pretty similar. Slack has the advantage, at least in the inaudible right now being a fairly young company. So we've got a lot of technology that's fairly nascent, which means that we're not entrenched into, well, we've been doing it on Mainframe for 30 years and it's a huge cultural shift to think through that. People are always experimenting and always playing. They're always trying to figure out what's next. And I love that part of the culture. I love having a culture that allows you to bring new ideas forward. And we've actually implemented that internally. The internet engineering task force has this process called RSE request for comments. Basically you can write something up, you can submit it to the Corpus and then have people comment on it and spark ideas. We run a very similar RSE process with our architectural teams inside of Slack, and it is generated some really cool ideas and some really nice stuff for us to continue to scale and be innovative. I think you have to have something in the culture to do that. Otherwise, I think engineers will end up in an environment where they feel like they're on older technology, they're not growing, they're not enhancing their career and thinking how to be innovative. And I think you always want to have a team that's thinking about innovation along [ inaudible 00:13:37] have the customer issues resolved.
Jon Prial: And giving them something exciting to focus on. I like that thought to get your team motivated and happy. And I love a couple throughout the interview here, you've talked a bit about technical debt. I could see lots of nodding heads out there in our audience, and there's an understanding of those trade- offs. I'd like to go away from the technical side. Talk a little bit about non- technical scaling and a little bit of what happens outside of your direct engineering oversight. Nothing siloed anymore, so maybe let me just start by getting your take on product management, how this team should think about scaling the product, trade- offs of satisfying one customer with a one- off or something that supports many. Again, how do they begin to make sure they're not having negative impacts over time and getting caught up in things again? We've talked about it a little more, but now it's a different part of the organization. And how do you influence that thinking?
Allan Leinwand: Yeah, I think the way I do it and not being a product person, but having a ton of respect for my partners or her product is I tend to do the more analytical side of things. And that is, I'm looking for the key product indicators, key performance indicators from product. As you know, at Slack, we have lots of different surface areas of our product, around our messaging, around our calls, around our Slack Connect, which allows businesses to connect with each other across our product. And we're looking at those key performance indicators to understand which product features are landing. So I think that the advice I would have on product, when you're thinking about scaling is, run a lot of experiments, run a lot of trials, run a lot of AB tests, make sure you're getting stats SIG, sort of KPIs, so you can understand what is landing with the customer base. And you're right, you want to look at, in Slack's case, we have teams of two, up to teams of hundreds of thousands of people. So we need to look across the entire customer base understanding of what enabled them, what is moving the needle in the right direction, from our perspective, in terms of them getting benefit from the product. And then we need to engineer solutions that match those behaviors and driving forward. Again, so not being a product person, I would say, think hard about your KPIs. Think hard about measuring those and getting statistically significant values off of those, and then giving that feedback directly to engineering, so they can build something that allows you to drive that product even faster.
Jon Prial: Yeah. I think the linkage of engineering is really important. I'm thinking about maybe a simple go to market scenario. Think about, for me contrasting a product led, land and expand strategy versus maybe large enterprise sales. Do you look at those differently?
Allan Leinwand: We do. We do look at them slightly differently, but we see them as virtuous. At least here at Slack, we see our self- service business for people that are not in the enterprise sales funnel as being able to come into the product, self- serve themselves, get up to speed and running. We see that as a very good thing, and we want to pay attention to that cohort of users because we think that it gives us great information about how to do something similar within the enterprise sales model. So we do have the enterprise sales model as well and we think about, well in enterprise, Amazon's a big customer of ours. They have, I don't know, a million plus employees and from our perspective users. So the question is, that's just another million people that may have never seen Slack before. So how do we think about landing and expanding that cohort, in the same way we would about the next million people coming in and buying on credit card? So from our perspective, its similar learnings. Of course, on the enterprise side, there's different security concerns or compliance concerns and some other things that layer on top of that, but the growth curve of those are very, very similar.
Jon Prial: Interesting. So I had mentioned earlier, security as an exemplar of something that shouldn't be an afterthought and there are many reasons for that, but the biggest one is if you get it wrong and you break trust, it's something difficult to recover. And I think about scalability in the same light now, that if you have outages or slow downs, you really do have a challenge. I'd like to wrap up our discussion today with just getting your thoughts on trust and maybe even do it in two parts, between the team and you and between a company and its customers.
Allan Leinwand: Yeah. I think that could, the answer to both is the same, believe it or not. And that's transparent. I think it's transparency. I think the one thing that I do with my team and that I hope they do back to me is be as transparent and as clear as possible. I'm a big fan of Brene Brown and she says," Clarity is kindness." And I think that being very clear with team members, being very clear with what your objectives are, having objectives that are measurable and able to say, did we hit it, yes or no, is super important. Similarly, when you do have incidents because everyone's got incidents and computers and humans, you put a gray matter next to a computer and you're going to have a problem. I think when you do that, the worst thing you can do for a customer is to try and blow it over, try and make it not the issue it was. It was a customer incident, they felt it. They're upset about it. You need to go in, you need to be as transparent as you can be without going through all the details at every Nth level, because honestly they don't really care. But the customer wants to hear is as follows. And this is what we have to be transparent about. We know what the issue was, we have remediated it and it won't happen again. And here's the steps we're taking to prevent a future recurrence. If you're transparent on those things, here's what happened, here's the mistake that was made. Well, we found a technology, it was a process failure, we had a human error, something very clearly broke. We know we fixed it. That's what the customer wants to hear. They want to know, how do I know I'm not going to be skating on thin ice and fall through again. So have we remediated that issue and do we have it to a point where it's not going to happen again? And then the last one is what are those action items to make sure it doesn't happen again with specific dates on when you'll deliver them and follow through. And I think if you do that and be transparent about those things, you can rebuild that trust. But the worst thing you can do is break customer trust because that adds friction to the sales cycle and the marketing cycle, employee retention, new recruits, bringing in new folks to the company. It's just the worst thing you can do.
Jon Prial: That is a cascading inaudible of problems. Allan, this was just a fabulous discussion. It's so enlightening. I think everyone's going to love what we hear from you. Thanks again for talking at the conference. Thanks for talking to us on the podcast. It's just been a pleasure.
Allan Leinwand: Thank you so much.
Jon Prial: Those three points around transparency affect security, affect trust, affects issues you might have along the way in all of your workings with your customers. And I just want to end it on that. Don't forget those three points. Again, my thanks to Allan for insightful conversations. Thanks again for listening and for the Georgian Impact Podcast, I'm Jon Prial.