Episode 116: Level Up with Machine Learning

00:00

0.5
1
1.25
1.5
1.75
2

This is a podcast episode titled, Episode 116: Level Up with Machine Learning. The summary for this episode is: Machine learning isn’t just for the Googles and Facebooks of the world. But how can startups (or even, growth equity investment firms 😃🤓) do data science right? Our guest on this episode of the <a href= "https://www.georgianpartners.com/the-impact-podcast/">Georgian Impact Podcast</a> is Ji Chao Zhang. He says that data scientist may be the most important job of the 21st century – but also the least understood. Luckily, Ji Chao understands it better than most – he’s the Director of Software Engineering here at Georgian Partners, and he and his team have consulted with scores of companies around our thesis areas including <a href= "https://stage.georgianpartners.com/principles-of-applied-artificial-intelligence-whitepaper/"> Applied Artificial Intelligence</a>. You’ll hear about: <ul> <li>Our in-house software development, including our work on <a href="https://georgianpartners.com/products/">TensorFlow Privacy</a></li> <li>How data scientists differ from data analysts</li> <li>Tips for building your in-house machine learning team</li> <li>Our <a href= "https://georgianpartners.com/ml-maturity-framework/">Machine Learning Maturity Framework</a></li> <li>Andrew Ng’s <a href= "https://landing.ai/ai-transformation-playbook/">AI Transformation Playbook</a></li> <li>The value of an iterative approach to ML</li> </ul> Who is Ji Chao Zhang? <a href="https://georgianpartners.com/team/ji-chao-zhang/">Ji Chao Zhang</a> is Georgian Partners’ Director of Software Engineering and a member of the Georgian Impact team. In that role he leads our internal software engineering efforts and supports portfolio engagements. Prior to joining Georgian Partners, Ji Chao was a Software Development Engineer at Amazon, where he worked on the design and development of the data platform, business analytics and machine learning systems to support supply chain optimization and fulfillment. Ji Chao holds a Master of Computer Science in computer software engineering from the University of Ottawa and a Bachelor of Engineering in computer science from Zhengzhou University.

Transcript

Warning: This transcript was created using AI and will contain several inaccuracies.

If you're managing the machine learning or AI team at Google or Facebook, you might be Beyond this podcast. But again, if you've got a staff or leadership position work of our startup or growing company for maybe you'll even Google and Facebook I want to do just that and you need to build an AI team this podcast is for you today thoughts about creating a items and whether you're a first-line manager or a CEO and whether you're just getting started I have a small team built already. There's only one thing that matters you need a good team. So stay tuned to hear how I'm John profile and welcome to the Georgian impact podcast today. I'm talking with a guy who's the director of software engineering here at Georgian Partners now, I hope this doesn't go to his head. But he Chow is the perfect guest to talk about getting an early stage and Beyond Company stock.

Machine learning and AI chi cha was grown his career as a software engineer with his last stop at Amazon before joining Georgian. Not only has G Chao in his team consulted with scores of companies are unemployed artificial intelligence. But he also runs our software development team you see to leverage and scale Chicho's team to provide value to our company's means something beyond the traditional developing software or assets. Yep, a growth Equity investment firm is writing software G town is being developed software to help companies leverage key emerging Technologies particularly around mlna I said, we've already had one product accepted by Google into their tensorflow Library will put some details in the show notes and after each engagement that g Chows team completes we create assets that we can then replicate for oil companies makes a lot of sense to me. I'd like to start with a basic question. So with all this ML and AI stuff. Are we done with the traditional data analyst role off?

It existed, you know, there was someone who thought about the data maybe they ran queries worked with and advise different parts of the company, but we're are we on that the data analyst so he'll be there much longer than the data center Israel and I can see it's still going to play a big role, you know data-driven culture for the time being but there's a fundamental difference between the analysts row and the data centers real know I'm glad you made a distinction. So I might think that the scientists is strategic in the analyst is more execution. Is that probably too broad a way to think about it off, you know sense the way a distinguish that that analyst role and that sent Israel is a better analyst turned to work on broader issues and use data to support the decision making process for humans and focused on the cases where you don't have to make those decisions very often for example if you woke

The new product you want to understand whether the customers or actually adopting this new product. What are their feedback of the product? That's more like a data analyst home after all in the meantime the data scientist focus on more of those low steak type of decisions. You need to make quite frequently. For example, if you want to recommend a product on online retail website, you can have a human sitting there to make those recommendation to each customer. That's where the better sent is a come here to my name the patterns in the past user purchase data to make predictions in basically real time strategy and execution wasn't too bad strategy really is these long-term big company decisions are being made and you really want to see that the the scientists is focusing on things that can be automated and see how we're on this journey now to ml and Thursday.

I'm assuming there's some degree of overlap in terms of the tasks of each of these groups. Tell me understand a little bit how you see these groups interacting together how you should be focusing on building a team around these folks sure. They're definitely a lot of overlap between the activities and path taken by either the analyst role of the data centers are all the key difference. Is that thought better analyst needs much more, specific expertise. This is it need to be much closer to the organization to the product. They're building to their customers requirement while the timer on test need much more much stronger coding skills and need to be up to the in terms of development in the machine learning science research field. That's interesting. They really are different types of life skills. You talk about the analysts having the domain knowledge being much more closer to understanding the business in the analysts as much more of a programmer type category now dead.

I take it that's not a lot of movement between the two groups and is there much flexibility and fluidity in terms of these skills? I think we need to be careful because of the different skills that they need and also beefing up people's expectation. I have seen situations where a company organized at that science team, but certainly a lot of data analysis reporting type of new Tas to the sales team which caused two problems. The first problem is that really distracts that that science team from what they should be working on. They should be working on building automated Solutions. Not a general report for people to make a decision second issue is because you know creating report doing a lot of data analysis is not usually what that sentence up to you. If you keep asking them to do that again again, very properly. They're going to depart from the team and find a different job for themselves. Wow, that's great. I think that was a great distinction particular more people staying home.

Company being happy in their jobs. Remember what makes a technical person that program person happy which is different than making an analyst and operations person happy. So matching skills is great. So I'd like to step back and talk about the maturity model. This is something that Georgian and the impact team is developed and surely for measuring a company's machine learning status and progress. We're going to put some links in the notes of this and everybody should check out the good work that's been done and what I'd like to do to keep us fenced a little bit today is I'm just going to talk about really level one and level two maybe level three but let's stick with a greater number of companies. You ready to talk of maturity models. Sure. Let's go Joe. I'm just going to kind of extract from level one your pioneering machine learning adoption page. You're trying to prove value-add you're focusing on securing organizational buying. I mean, there's lots of unknowns is no clear executable machine learning roadmap yet not everybody believes.

Machine learning where they feel it's too expensive or the value is there so we're really this is really the beginning right? So what you just described wrong is what I call the exploring stage. So in my mind as you mentioned, this first stage is a key for organizations to go through to successfully adapt machine learning due to a lot of the unknowns. You just mentioned Faith seen many many companies setting up big ambition to adopt the machine learning that field at the starting line. So that's gotta be the worst has got to be demoralizing. So any specific thoughts terms of why they fail the number one reason. I see the feeders come from is a conflict between the expectation of the huge value ad and Iraq alone for the team to deliver because if there's a lot of stories about the machine learning about AI about all the breakthroughs we have in recent days fourteen who does not have them

Expert commercially me yet. It's very easy for them to get men's as that machine comes up all the problem. Whatever the problem I have in my in my product my offering as soon as I have a better send teams are going to solve it off you reality the team can eventually get there but it takes the team time to understand the business to answer their customer and to build up sufficient work process and the internal collaboration to get their wow. You've just got to get the level setting right if they're thinking we're going out for a home run and we don't even know the rules of baseball yet. It's really kind of don't have the basics of getting at the bat. Yeah. I'm I'm a big file of some guidelines. Dr. Andrew inside in his AI transformation Playbook. He said that it is more important for your first few AI project to succeed rather than be most valuable are projects should be meaningful enough so that the initial success will have your company and more familiarity with a I also being off.

Convince others in the company to invest in future projects. They should not be so small that others would consider it a trivial. The important thing is to get the Flywheel spinning off a item can gain momentum so doctor has quite the resume including me in the co-founder of Google brain. He's at Stamford. And is this quote that she charges had resonates you please check out more of writing, but I'd like to go back to the point of the quote and I like the thought about getting the firewall started and figuring out how to get some build some momentum here. I was just recently listening to an interesting podcast by birth date and future of Gene editing solving medical challenges and someone was asked the question said well, yeah, but you're solving all these little small tiny diseases with smaller populations versus going after these larger pervasive go problems and the answer was these are well-defined. They're smaller to get done. They're easy to solve and we can build from there. So even Gene editing is actually going right after a doctor and said as well. Yeah dead.

So paraphrasing what he has said that level one the exploring stage. The goal is to jumpstart the flywheel that building up the team and taking on some low-hanging fruit as well as the flywheel start with being more value-add will naturally happen over time. Another thing. I recently learned from the data product manager at tactic of judging panels portfolio company is actually to avoid half a sassy initiative. Wait a second. This is funny. You did say half-assed. You said three syllables, not the two syllable one that we've yet to hear on this Earth. Guess that might correct. That's right, according to the messages either higher full team or don't hurt at all avoid trying to test the waters by hiring at a time and never producing anything due to blockers or lack of knowledge other areas. This really wild aligned with my reservation of why some of the organizations fail to age

Flywheel spinning, so it's interesting because you say you don't want to shoot for the moon and go for this giant project. But at the same time as you say focus on something small am not going to focus on something small with fractional people. You still need to hire a team. You can't just do this with part-time resources. You see go build the team to get started, right? Yeah. That's right. I remember that choice is a compliment changing technology is going to change the entire company. We need the proper skill-set expertise to get all the different part of this machine running a good parallel is that wage similar to building a typical software solution, you need to have a product manager to interact with customer to decide what to build you need to have back and front end Engineers to build a solution and you also need to have opted Engineers to manage the deployment and operation of the software. You also need a product manager to identify initiatives that are most valuable and log

Feasible. The feasible part is also near very very important. We have seen a lot of situation where people come up with gigantic ideas, but it's not feasible yet with purpose all the scientific research at this moment. You need data scientist to build a model with the model ready also need an engineer to interpret the model with data with the downstream consumers of model so that it becomes really useful you some type of work flow use it for your customer or for some of the internal operation people expecting a single person to do all of those tasks really require a a unicorn and that is a far from practical accurate our white paper. We spent quite some effort in describing the composition of that sense team and off the different required skill sets great. So we'll put some links into that. It's interesting. I really do like the you've got to get started even though you're early early once you make the commitment like you said, it's a company-wide commitment you got off.

In and you gotta be serious about and build some people and and I know you you're you're very serious about this and you're not just saying I'm going to build an organization cuz I want x amount of people you're not doing this your ego you're doing this to get the job done. So I really appreciate this level of detail. So I'd like to just move up a little bit to the level two of the maturity model you're going to now continue to build organizational confidence in machine learning. We start delivering some material impact to the business perhaps as a model in production and some manual processes or begin in place to maintain the pipeline of models and you're beginning to automate now, so but what's interesting is we now have a model in product or in production and you've gotta now make sure all the scaffolding and support is there and can be replicated you now beginning to get a little more sophisticated fare. Yeah. That's right. If we say at level ones exploring stage the focus to get the right people and get those people some momentum by taking off some log.

Paying fruit as a second stage what we call the building stage the focus switch to the process the technology and automation. It's time to build a retractable process. So that the team can start work on some other high-priority opportunities the go at this stage to improve the repeatability of the ml product development process to deliver value-add you high priority opportunities, and we need to do this before we further skating up the machine learning or data science organization and it is crucial to establish best practices in terms of development process technology selection and automating some of the protests. So you're actually not going to yet grow the team you're now going to evolve took him to focus a little more on the Scaffolding in the automation. We're not quite ready to double the size of the team as we put a more proud of you really need to build some structure. So as a building stages,

I think we do have the necessity to expand the team because a level one the team already shows a feasibility that he can add value to the organization thought we should already have some more opposite organizational support to add more resource to the team, right we need to do a certain thing right before we further skill up the team is to make sure that we're not adding the second the third Force Team yet, but adding more resort to the core team at this moment. We'll help them to think about some of the focus. We want to talk at level two great as they mature so that that makes actually makes a lot of sense. So I mentioned that you're getting a repeatable process at the building state is very important, but it's also very hard because compared to software engineering that assigns is still very young field in one of the code. I recently read is that the title says manager has most important issue.

Job in the 21st century. But in the meantime, they have the least understood job. Nobody understands them yet. Oh, okay. That makes sense. So although there is a wow accepted processing software development something like I gel we don't have such a thing again Pattinson's process yet. The biggest difference comparing the database software engineering is that that sounds just not as predictable as suffer engineering and because of that that is effort need much faster rate and accurate focus on education because the output of one education is going to guide the effort in the second iteration. It's very hard to plan out everything as a bar beginning and just a follow it through so it's interesting. I'm quite surprised and it's kind of neat to me that you're saying this is not really agile. You need to iterate faster cuz you don't quite know yet.

We're going so we need to be iterating on trying all these models and what data sources are working and you have so many choices not like I'm going to sit down and write code to accomplish a specific task. I'm still exploring figuring all this out. That's really interesting. Yeah, exactly. Very cool. Talk to me about best practices. Like you said, it's least understood but she seen a lot. So what's your sense of best practices that are out there the key here is you need to adjust whatever the methodology others or past practice others created to the contacts of your business of your team of the culture of your organization. The team also need to be aware that whatever the process it created won't be the final stage. It's more like Evolution so that the teams going to create hypotheses saying this is the best way for us now, but trying to figure out what work and what does not work log.

And try to replace a piece which does not work with some new ideas. It's interesting. One of the terms are using is data science, but it's very clear what you're saying. There is no single answer and what I'm hearing is at this stage for this technology. There is as much art as science and getting this right what else should the teams be thinking about so maybe I can offer as judging partner what we're doing over here in terms of the process our R&D team adopted what we call it a generation based the process. Although it sounds similar to some of old concept from Nigel is quite different. The goal is to focus on quick registration over ideas and experimentations to gather information so that we know what should be the next step wage. So he instead of having a moonshot type of goal and hopefully everything will just work at the very end with more want Laps on the track type of solution so that we know off.

What progress we have made so far how much Gap we have and you know keep having newer newer iterations until we get there. And these are Generations our machine learning models necessarily not iteration of code development per se although models are code, but it's getting not programming you're talking about correct? It can be Bowls because y'all just do martial learning you have to translate your hypothesis sneak into experimentation. And a lot of the translation had to be done by coding. Right? So sometimes those situations about what data said we should be using some time is about working to the engineers of features for model sometime. It's about which Machining and model and what type of parameter work in to use. It really depends on the state you are at for the project. So the breadth of art is broader than I was even thinking about there's a lot of Peace parts to this. Yeah. That's right. Another thing that the team need to focus on as a building stage is automation.

So if we seem to being a software development yearly one, we create a new team and a new team work on a new project. Everyone got very excited. But after you have or first version release and the party would put into maintenance mode. No one really love to maintain and run a system because you're not a linear more from it the feature off the small you're putting patch here and they're the same nursing is applied to the bad science word as well you order to avoid having our pressure that signs resource working out the maintenance of the existing solution. We need to automate many of the processes so that they can keep doing Innovation. For example manually setting up a class of machines and running experiments on it really takes a lot of time and no-one enjoy doing it, but it's still core part of our iteration right? But right now they're a lot of birth.

Offerings which can be used ultimately such a process. I haven't seen good offering from all the major Cloud providers and there's certain party Solutions there as well. Let's just for this exam. If we have a good solution where we can efficiently run large-scale experimentation that helps our that sent him much much more productive. It's interesting. I see so many more balls that you're juggling that's that's not in a traditional software development sense. So this is this is really really interesting and intriguing for me. So I think it's great. I'd like to just wrap up a long talk to you about what you see in terms of machine learning today. I mean, you have seen hundreds of companies and leaving out the huge companies that Google's Etc. What's your take is that where most companies are related to machine learning? My impression is more than two-thirds of the companies are still using exploring or the building state. So what we just talked about there's about

So one sir of the company at the skating stage Google Facebook Etc zero amount of handful of companies that are actually at level four but God it really depend on the company's contact and the business environment not all the complex need to get to level four and it might be good enough to stay at the building stage. If you find jobs that machine learning is only going to be useful for your product offering and you're going to buy new smart solutions for the internet operation, etc. Etc. So depend on context that came need to make an explicit decision which stage they're going to start with but I like the context Point very much we believe, you know, and we've got obviously a theme around and here's applied artificial intelligence that companies that can differentiate themselves get better returns and better values by begin to leverage as emerging technology at the same time.

You're kind of getting a little bit by context and understand what fits within it and I mentioned applied artificial intelligence and we started talking about the judicial data science team. Go back to the word applied wage, you know, you don't have a data science team. You call them the applied research team and I'd love to hear why that's important to you because judging Partners what the team doing? I feel they're still big difference comparing to a traditional data science team. The reason we call as applied research team because we want to say it between Nash and Mia researchers and the data sent him in Industries. What do we do is try to monitor The Cutting Edge research in Academia and identify opportunities of feasible Solutions and bringing them into a industry adoption setting that's not what the traditional sense team being company will do but with that said,

As a company become more and more mature using adopting machine learning for example going from Level 1 level two to level three level for the data sent teams there with typically do more and more research work because as a stage, they want to further differentiate their offering from their Market competitors pure traditional data sets work won't be enough anymore at that point and when they're ready to do that your team will be ready to help them teach others was really interesting informative. I definitely want to have you back only talk more about your building these teams and growth maturity model, but this was insightful and a great discussion. Thanks so much for giving me the time today. Thank you for having me Joe.

DESCRIPTION

Machine learning isn’t just for the Googles and Facebooks of the world. But how can startups (or even, growth equity investment firms 😃🤓) do data science right?

Our guest on this episode of the Georgian Impact Podcast is Ji Chao Zhang. He says that data scientist may be the most important job of the 21st century – but also the least understood. Luckily, Ji Chao understands it better than most – he’s the Director of Software Engineering here at Georgian Partners, and he and his team have consulted with scores of companies around our thesis areas including Applied Artificial Intelligence.

You’ll hear about:

Our in-house software development, including our work on TensorFlow Privacy
How data scientists differ from data analysts
Tips for building your in-house machine learning team
Our Machine Learning Maturity Framework
Andrew Ng’s AI Transformation Playbook
The value of an iterative approach to ML

Who is Ji Chao Zhang?

Ji Chao Zhang is Georgian Partners’ Director of Software Engineering and a member of the Georgian Impact team. In that role he leads our internal software engineering efforts and supports portfolio engagements.

Prior to joining Georgian Partners, Ji Chao was a Software Development Engineer at Amazon, where he worked on the design and development of the data platform, business analytics and machine learning systems to support supply chain optimization and fulfillment.

Ji Chao holds a Master of Computer Science in computer software engineering from the University of Ottawa and a Bachelor of Engineering in computer science from Zhengzhou University.