Learning using AI and Conversational Data with Chorus.ai
John Prial: Before we begin this podcast, I want to congratulate the Chorus. ai team on their acquisition by ZoomInfo. We look forward to seeing even more good things happen. Human interaction, there's a lot we know, but a lot to learn. We've all heard the old saw about people looking at you with their arms folded across their chests. But there's so much more, and often we don't know what we're missing. We walk away from a conversation thinking everything was roses and sunshine. While in reality, we didn't address it all what we felt we had. Is there more learning to be had? And can it be had if we took advantage of AI and conversational data? In short, yes. Now, what's a task, which we are all familiar with, where conversation and the ability to lead that conversation and listen well is crucial? It's selling. You see, calls could be openly recorded and you could learn a lot. You could leverage the success of a top seller. You could train new sellers. You could learn what works and what doesn't work, how often you're asking the right questions. How often you're hearing the same objections. Would you like to know whether your prospects are really excited or they're just blowing smoke? Hey, did I mention, you can learn a lot by analyzing data of this type. Let me state that again. You can learn a lot and you could improve the quality of your sales force. This is the future, and there's some really cool tech at play behind the scenes. I've been talking about one of Georgian's investments, Chorus. ai. And I'm very excited, as we have a lot to learn today. I'm John Prial, and welcome to Georgian's Impact Podcast.
Raphael Cohen: My name is Raphael Cohen, and I'm the VP of Research in Chorus. ai. I'm in charge of the Chorus' research team. We work on all the basic technology that takes conversations and turn them into data and information. And that means transcribing them, understanding who's speaking, making the transcripts beautiful and readable, looking at what's happening on the screen and making sense of that. Looking at what happens across organizations, how people are at having conversations, what works for them, what doesn't work for them, how can we help them to get better at conversations what metrics matters for sales organizations and foresees organizations. And in general, anything that is data or machine learning- related is done in my team.
John Prial: Thanks Raphael. Now, I was pretty general in the introduction. Why don't you drill down a bit and take us through some use cases that are really important here.
Raphael Cohen: Chorus helps people and organizations get better at conversation. This is our mission. Help people have better conversations. How do we do that? We help people record all of their interactions, whether it's Zoom meetings or phone calls or face- to- face meetings in a room which has a Zoom camera present. You can just record it. And we connect your meetings to your CRM data so you can understand them better in the context of their sales process. And then we help people consume this data very, very easily by extracting a lot of information about the call. Who is talking? What are they saying? What are the topics being discussed in different places? What is being presented on the screen? Is it a deck or a demo you can jump to the right place where a customer is asking a question while mentioning a competitor during a specific slide? So we are unleashing information that was not available before. And we were connecting all of these to your conversations and your CRM. Finally, we are helping people personally get better at goals by giving the metrics of how they're doing. How long are your monologues? How long do you usually talk about your decks? And how long do you demo? You can see what other people are doing. What are the top people in your company are doing?
John Prial: That's really neat. Helped me understand how we could leverage their skills.
Raphael Cohen: You can emulate them. If your monologues are actually shorter than theirs, you can go and see what they're saying. How they're filling the time with those great explanations of what they're doing. And finally, we help companies really ask those questions, birdseye, what do our customers care about? How can we make our sales process better so we win more? What is the ultimate way of demoing? How long should it be? When should it start in the call? Should it start after five minutes or is it better to get pain points in more details first and start the demo after nine minutes? So we enable people to ask complicated questions and also see what is happening. If I'm telling my reps to sell my new feature, let's say we are releasing a feature called Chorus Momentum, I want to see that they're mentioning it in their calls. If I'm an enterprise company, I probably have multiple product. It's very easy for some people to just sell one of them. I want to see that they're mentioning the right product to the right people. So having the ability to understand your entire sales conversation process, and connect this to emails and CRM data and the content and the people is what we are all about.
John Prial: It's so interesting that you're saying, are you spending enough time listening to the pain points before you jump to the demo? That's a really good example of how a sales rep could fail. You have to listen before you talk, right?
Raphael Cohen: You have to listen. You have to ask the right question to understand what are you going to demo? If my customer cares about coaching, I would show them the coaching tool that we have in Chorus. If they care about the onboarding, we would focus on how to create libraries of the best methodologies for winning, and then how to show them the best clips very, very quickly, and a how to measure what they're doing in that. So each customer has different pain points, and we want to understand them first, but also we need to balance that with the customers wants to see the actual product. And the sweet point is not going to be the same place for every customer. So we don't believe in silver bullets. We are not going to tell you, you need to talk 45% of the call and then you're going to win, because that's not true. We see that some customers and some reps talk 70% of the call and they're the top people. Because they're saying the right things, they're asking the right questions. They are answering them in depth. And while other top reps may talk just 40% of the call because they are able to draw the customer to share their most needed pain points and needed solutions. So we don't believe in one size fits all. And this is how we do the product. We enable people to understand their sales processes.
John Prial: That was a great example of how one size just doesn't fit all. Wow, 70% versus 45%. And obviously you're deep into a process, the sales process. But in addition to the application- specific nature of what Chorus does, there's some of the basics, recording, transcribing. How much in that base tech is there that just gets you started?
Raphael Cohen: We have the basic task is to record all of the organizations calls and map them to the CRM, understand who they're talking to. And this is the lowest part of value that we provide our customers with. And this is actually complicated because when you are going to record 100 million calls a year, it's hard to do right, especially as we grow and we serve bigger enterprise customers. So this is a complicated process of scraping people's calendars, identifying their calls, identifying which goals they want us to join, which calls they don't want us to join. Making sure we have enough agents to join the calls or stream the call. Make sure that we are downloading the calls as needed. Now we were talking on Zoom, but other people using different videoconferencing or using a dialer get a different approach for streaming or recording their calls. So the engineering end of getting everything perfect is actually very complicated. It may seem like it's all simple from outside, but we're talking about massive amounts of data and then also high variability in the way that different organizations work. So to have a good conversation intelligence tool, you have to solve those basic issues. Otherwise, almost every customer that you would onboard will have issues. And this was one of the first things we had to tackle. I mean, when we started having customers five years ago, we started identifying all of these problems and the engineering behind that, especially as we scale to 100 million hours of recordings per year, it's becoming more and more complicated. And this is on the engineering side.
John Prial: I really hadn't thought about the integration with CRM, but that really does start at the foundation. It really is the basics. Then, is there a layer that follows post- meeting in terms of contextualizing what you heard, in terms of the sales cycle, the type of sales rep, BD versus account exec, sentiment analysis? What happens next?
Raphael Cohen: So during the call, once we are capturing the call, this happens live. I mean, we have the Chorus Notetaker on this meeting because it's your meeting. crosstalk
John Prial: I know Chorus, I've seen demos, but it's neat that you came on and here's Chorus Notetaker, which is really cool.
Raphael Cohen: People from my team can now go to their recording page in Chorus and actually start listening in to our conversation because I left it open. And while it's happening, it's already been transcribed. So you can see the transcription one or two seconds delayed usually. And you can already do things with it. See who's talking. Am I talking too much? What is my longest monologue? We are releasing a new app that is going to show you that some of those things that are happening within Zoom. So people are going to be able to see what's happening. And all of that is getting updated during the call. Who's talking? What am I saying? And connecting those two is also important. So it's very important to see who said what, not just get some words that you can search afterwards. And while the call is happening, we are doing all of that processing. And within one to five minutes of the call ending, you already get everything processed to your email or in the website. And that already includes a lot of the higher- level AI that's happening. So if I mentioned that I'm going to send you an email or I'm going to send you some answers later, this is going to be captured as a next step. And capturing next step is not just having those word triggers because I may use... Language is very ambiguous. I started out as a researcher in natural language processing, and natural language processing is entirely about ambiguity. This is the beauty of human language, the fact that we can say anything using a finite set of words. And we are going to use the same words to express different things in different contexts. So those words that I'm going to use for next steps, they can also be used for many other things that I'm going to say. And while we are looking at the results of this transcription, we're going to identify places which may be a next step. And then we're going to feed these into another machine- learning model, which has been trained on curated data, where we had our analysts and product people talk to a sales enablement people and sales experts in defining what is the next step even. I mean, it's a call to action. Something that I'm telling you that I'm going to do, or that we are discussing about you doing. So there is a definition and it's kind of soft, but once we have it down, we created this pretty big data set with a lot of examples of what is the next step and what isn't. And we train this machine- learning model to be able to tell it apart.
John Prial: Of course, it is all about the data. How do you think about that?
Raphael Cohen: When you look at the AI, so we create a lot of data with AI. And we try to make people have these superpowers where they can review a call really, really quickly and jump to the right place. And find out what they're looking for with that specific need, whether it is to identify what are the next steps. Or is this deal going to close? Or do I need to get better at a certain objection? And having all of those data points that we extract from the call make this really, really smooth. And this is where we are looking for, what are the best for your organization? So we're going to see when is the best time to start the demo for your people. We're going to analyze different stages of calls, because talking about what is a good number of questions, what are the right questions, is only right in a specific context. It's interesting in a discovery or a commit stage, but you can't just lump everything together and say, you need to do something. We want to provide our customers with a report that explains things in contexts. How often does your competitor come up? How often do you win when the competitor comes up in a specific topic you care about come up? And then people are able to ask these questions and also answer them, and craft answers to their real issues. And we do that by doing things more fine- grained. So we combine the automation side on one hand of things so that allows you to answer questions very quickly. But we do that with a lot of care for the data. So we have data scientists answering these questions and not something automatic that shows you correlations. You're always going to find correlations, but they're not necessarily going to be the things that are driving your sales.
John Prial: So how would that work specifically for a sales manager who logs in to see how their team is doing?
Raphael Cohen: Able to find the minute that you care about and share this with your manager, or being a manager and finding the place where I had an objection presented to me, and you can explain to me how to handle it better, I think this brings a lot of the value. You can look at the deal's momentum. We have the spot which is also you can watch the momentum of deals within Salesforce, and it shows you how the deal's progressing. How many touch points did we have? Did risks come up or not? Are you threading the right people into the deal? You can see who did I talk to? Who did I email? Are people answering my emails as a rep? Who's answering my emails? You can learn from that. Whether things are progressing well or not, whether the rep should go in and try and rope in more people with more decision power into the deal, or are they doing fine?
John Prial: And that's not verbal. That's completely outside of a phone call or a Zoom call. That is crosstalk a whole series of interactions beyond just having a call with someone.
Raphael Cohen: Yeah. And we combine everything together. So we can identify the risks mentioned in the call, or we can identify what was discussed in the call and what type of call this is. But you can also combine it with all the rest of the data that you have to understand how is the deal progressing. On the other hand, we have core recommendations. If you are an executive and you just want to see what's happening on the production floor of your company, you can directly get an email that shows you some deals. I my case, it's going to mix them up from different teams, so I'm going to get some enterprise deals and some small business deals. And I can click in and see what people are saying. I can jump in and by using the speaker separation, I can see what the customer is talking about. Or I can try and see what our pitch look like. And are they pitching the things that I care about? And if the research team built a feature, is it ever getting mentioned? So when we rolled out our ability to understand what's onscreen and mark different slides, separate them between one another so you can easily listen to a specific part of the deck being presented. We weren't sure that it's going to be sticky, but then we found out that a lot of the representatives are actually opening their demo with that. So if that's effective, it's that wow moment. We know that we are on the right path with this feature. And if you want to have a coaching session with one of your reps, you can very easily find a meaningful conversation to zoom into. So it saves you time and it makes coaching effective if you use these core recommendations. So this is another way of looking at the context.
John Prial: It's really an amazing application. You could check the status of deals, coach your rep, finding key points. There's quite a lot of breadth there. Without going too deep, there's something new from Facebook on the audio side that's called wav2vec. How important is that?
Raphael Cohen: There is revolution happening. In general, AI has been revolutionized, but it's going field by field. So it's already almost 10 years ago we have seen the field of vision discover deep learning. I mean, discover it again. And have everything switch over from what was happening and have great advances, which enabled people to achieve a lot of things that they couldn't achieve with the older technology. And then afterwards, it got to voice technology. This is actually what made Chorus possible in 2015. This is why this whole field is coming up because before you had other companies who were able to tell you if a keyword was mentioned. And nobody could provide you with a complete transcript. That technology didn't exist. So we had a new playing field to play with. And the most basic open- source approach to the Kaldi source tool, which was an effort by multiple academic institutions, mostly owned by Carnegie Mellon. And they released this set of tools that allows you to easily build transcription pipelines. And this is what helped us launch Chorus in 2015, have good transcription technology available cheaply, en masse, that we can support. And we've been Kaldi users ever since. It's not an acronym. So Kaldi is the name of the mythical Ethiopian goat herder who discovered coffee. So the symbol of Kaldi's coffee beans. And now after those revolutions in the middle of the previous decade came a revolution in text analysis. So we had two revolutions in the areas of understanding natural language. The first one was deep learning, which did wonders. And then in 2019, there was another giant leap, which is called transformer technology, specifically a few models which allow this technology to be used with giant data sources, we're calling them text corpora. these are the data sources that you use to train your models. And they were basically able to achieve out- of- the box understanding of text, which you can then tweak with very little supervise examples to get the best results on any tasks that you have.
John Prial: You can't beat a good breakthrough. Now, how did these models get applied to business use cases that you're using at Chorus?
Raphael Cohen: The revolution that was started by ULMFit and BERT basically allowed people to train giant models without using supervised datasets like the one that I described for our next- step identification. So instead of training the data, training the model using some kind of tasks, you just delete or mask some of the data. In the case of words, it's very easy. We train children to learn how to read better by those clauses where we delete a few words and they have to guess what those words should be. And we do the same thing for the machine- learning model. And the ingenuity of the Facebook model is that they did the same thing for speech there. They are deleting these tiny segments and asking the model to learn how to fix them. And once the model have build this basic understanding of what speech has in it, you can train it with forest data than needed before for any task.
John Prial: I get the deal that you mask words and that trains a model. So they did a lot of work and created a model. And you're saying, once this model's in place now, you can take, I'll call it your Chorus data, and then you could use their model against your data to get the outcomes you're looking for from an application perspective. Is that correct?
Raphael Cohen: Yes. So what you do is you take the pre- trained model, they are called pre- trained models, and then you do a tuning step. So you take the model with all the knowledge that it managed to learn, and you just let it learn from your task, like our thousands of task example, which say that something is a next step or it's not a next step. And it overachieved than what it used to before. So it already knows what words look like and what words are similar and what things are similar in meaning, because it had to learn them in order to retrieve those missing words. And now it can learn your tasks much more easily than it did before.
John Prial: Well, that's a lot of processing, a lot of compute power, of course. This brings us to your work with the Georgian R& D team.
Raphael Cohen: So in general, the one problem with giant models is that they're giant. So that means they take a lot more compute. They're not necessarily slower or significantly slower. You can try to speed them up, but they take more compute resources. And that has impact on our bottom line, and also on the environment. So we don't want to adopt these models that are burning too much CPU and GPU resources if we can avoid it. Especially if we're thinking of a scale of 100 million hours. Once you start scaling things, you start feeling into the pain of those models.
John Prial: So if I understand correctly, some of the things that were happening was in any neural network, there's nodes and there's branches. So you've got to figure out how to prune these nodes and keep what's valuable to you.
Raphael Cohen: So when we started working with Georgian, we had risks that we're worried about. And this was that this technology is going to be too slow and require too much memory. And it's going to make our AWS bills 10 times larger, prevent us from scaling correctly. And as I mentioned, these models come from the field of natural language processing, and they already ran into these issues before. And there are a set of approaches to try and mitigate the problems. So one of them is what you call pruning, where you can cut off some of those nodes which are not important for your specific tasks. Because we said that you learn a general task, and then you want to learn smaller task. So you can prune it a little bit. That we were unable to make this, actually make the model faster or smaller. There is another way you are using pretty big numbers. So the numbers that you have in each node of the neural network can be, instead of being a length of 32 digits, that you can cut it down to be 16 or eight bits and, and make the model much lighter. This is actually very useful because you get the same effect of the model, but that you cut down the memory. So this is already there. The key thing that we had to have in order to use these models in production is having a lighter memory task, so we don't have to spin huge machines to run every call in.
John Prial: That's tied to the weights within a model. From my reading some of the blog posts that are up there, I think that's quantization.
Raphael Cohen: Yes.
John Prial: Are you affecting the weights or you're just affecting how much information each node is carrying.
Raphael Cohen: So the weights are supposed to remain the same, but you just make them have less information. Because they're really, if you're saving a big number for no reason, all those digits after the integer, they're not necessarily very useful for you. You don't have to have such a well- defined weight to represent the knowledge in that node. And this is what has been found out before for neural networks. So you can quantize it, make it smarter while retaining the same efficiency.
John Prial: And then another term I'm learning is knowledge distillation, to kind of rebuild a model to make it a little smaller. How does that work?
Raphael Cohen: Yes. So the final method, the one that really solved issues for the natural language community is called a distillation. So we want to have the same knowledge, but with a much smaller receptacle. So in the quantization we have the same number of nodes, we just make the nodes smaller. But we want to have forest nodes while retaining the same knowledge or something close to it. And this is a serious question. It's a challenge. And what the NLP community came up with for the main model that people used, which is called BERT, they call it the DistilBERT, and it's a very elegant solution. And they're saying, find another network architecture which is smaller. So in our case, instead of having, I don't remember the numbers, but instead of having 12 layers of transformers, we can cut it down to 8. Instead of having each transformer in a server size, we make it a smaller size, so we have a lot less nodes in the network. And we're saying, we are going to try and learn the same thing that the giant network learned using this smaller network. And it's obviously not going to be able to learn that from the data, because if it could be, we just train that network.
John Prial: Sure.
Raphael Cohen: So what they're doing instead, they're saying we are going to force this network to have similar representation to what the larger networks have. So first of all, we are just copying some of the weights into that, because the larger network took a long time to learn them, and we can just inject them into the smaller network. And afterwards, each time that you do that, you take the same sample and you fit it into both models. And if the other model had two layers and they had a certain result, and this model only has one layer, you are going to tell them that the smaller model, you are going to have to produce the same intermediary result. Which it's this inner neural representation that we don't really understand what it's saying. But we are going to tell you, you have to spit out very similar numbers to the first method. So we are forcing the student to have inner representation which is similar to the teacher. And we are doing this across all of the different layers, and this way it is able to mimic the other network, even now it's much smaller.
John Prial: Interesting. That's great. So you are working in parallel with the Georgian R& D team. They were working on basically streamlining a model and the work they did is available. And in parallel, you were really rebuilding your model based on the wav2vec. How far are you along in terms of deployment now?
Raphael Cohen: We're pretty close to deployment. We have a model which is showing us great improvement on our current model. And right now we are thinking of how to scale it. This is the biggest challenge for us. We have tens of thousands of users, and we want to make sure that everything runs smoothly and they get the call. Five minutes or three minutes after the call is ended, they get all the data that they need, even now it's slightly slower than the Kaldi models. So right now we're working on the engineering problems more than the network problems. We already have a very nice model that outperforms anything that we had seen in the market so far.
John Prial: Selling though, we know, is an art. Now NLP, although it's quite amazing, it's not involving science. Could you step back and talk about how you help train your reps and how you help differentiate Chorus?
Raphael Cohen: Yes. This is actually hard to explain if you just look at the task, because we are talking about systems that have between 10% to 20% error rate. This means that out of 100 words that we are saying, you're going to see 10 words wrong versus 20 words wrong. People care about transcription quality, but they don't usually have the tools to measure it. So what we started doing is internally sending out some of the newer customer's data. Too many are transcription, and giving the reps a monthly report on how are we doing on certain customers, and how are we doing compared to the market, so we can report on other systems and competitors. And they can see that our technology is actually the best in the market every month that we do that. And the important thing is to actually do these based on data, because it's very easy to come and say, we have the best technology. But even our own representatives, they don't want to go out there and say things that are not certain about. So when we are able to month after month show them data that proves that we are doing better than any other transcription service in our domain. I mean, we are not going to beat Alexa at answering your questions at home, or even understanding calls which are not in a business situation. We are experts of our domain. We are aiming at collecting the best data to understand business interactions. And we are able to do that every month and provide them with all the information they need to see that we are actually ahead of the curve.
John Prial: This is just so fascinating. I can't believe the depth of insights you can get from conversations. And hearing where this is going makes me more excited about the possibilities of what's to come. My thanks to you and the Georgian RD& team for great case study. For Georgian's Impact Podcast, I'm John Prial. Thanks for listening.
Human interaction, there’s a lot we know but a lot to learn. AI and conversational data can help us in this learning process. In this episode of the Georgian Impact Podcast, we will be talking with Raphael Cohen, previously the VP of Research at Chorus.ai and currently the Senior Director of Data Science at ZoomInfo. Recently acquired by ZoomInfo, Chorus.ai helps teams make decisions using the insights you'd get if you were sitting in on every sales or customer success call.
You’ll Hear About:
● Chorus.ai’s mission and how they help people get better at conversations.
● Understanding that one size doesn’t fit all and the need to build a product that enables people to understand their sales processes.
● How Chorus handles the massive amounts of data and adapts to the different ways organizations work.
● The impact Chorus’ work has at a management level.
● Wav2vec and how it will impact learning the structures of speech from audio.
● Why cutting down on compute resources was necessary for a model’s success.
● How Chorus sells their technology and differentiates themselves.