//

“Why Knowledge is the Future of Data”: Michelle Yi, Senior Director of Applied Artificial Intelligence at RelationalAI (Video + Transcript)

March 8, 2022

Like what you see here? Our mission-aligned Girl Geek X partners are hiring!

Angie Chang: So our next session is Michelle Yi from RelationalAI. She is a Senior Director of Applied AI, and she’ll be speaking about harnessing knowledge and data and show us relational knowledge graphs in action.

Michelle Yi: Great. Thank you so much, Angie for the introduction. I’ll just go ahead and screen share, make sure everything is going smoothly. And let me make this big. Here we go. Okay. I think we are good to go. Okay. Yes. So happy International Women’s Day, everyone. And I, plus one, I saw in the chat, a comment about Reggie for president. So plus one to that.

Michelle Yi: So my name is Michelle Yi as Angie said, I’m super excited to share a little bit of perspective on why I believe knowledge is the future of data and how my personal experiences in the data space also align to this common vision that brought me to RelationalAI.

Michelle Yi: And so I thought I could start with sharing a little bit of context and background on myself and the journey that has brought me to RelationalAI. Our vision for what we’re doing really spoke to a lot of the challenges and problems that I saw in the machine learning and data space.

Michelle Yi: And actually to make this a little more fun and interactive, if you guys want to share a little bit about your own journey technology. I’d be curious to see what did you all study, whether it’s undergrad, PhD, masters.

Michelle Yi: What was your last educational focus? Something Heather said in the last talk actually really ties to the demo I have at the end of my talk, which is going to show a little bit of the backgrounds of the women that signed up for this conference.

Michelle Yi: So for me personally, I spent the last 16 years or so in the AI/ML space working with data from both our products R and D side, as well as a consulting perspective. So specifically, I don’t know if anyone will remember this, but in 2012, one of my first projects that I worked on actually aired as IBM Watson and this whole thing with Jeopardy where the computer was playing against the humans.

Michelle Yi: So that’s my one claim to fame. And then after that, I moved more into management consulting because I really wanted to understand the data and data science problems that many customers across many verticals were facing. And so through these experiences over the last couple of decades, I really got a lot of exposure to the impacts of the constantly shifting technical paradigms and how that impacted business.

Michelle Yi: So to give you an example, when I started at IBM ML 16 years ago, was on mainframe. This was before the Cloud. If you can even imagine an era before the Cloud. And then we were after we started getting migrated and pushed to go more Cloud oriented, moving away from on-prem, there was a big, no pun intended, big data movement.

Michelle Yi: Essentially saying, like, “Go collect all the things.” And we didn’t… We collected a lot of data without really always thinking about why did we need that data? And then we were sort of pushed to like, “Okay, well, if you want to use this big data thing and you want to make all those things that you collected useful, you need to go to MapR and Hadoop.”

Michelle Yi: And then what ultimately resulted was this data swamp architecture where we had data everywhere in different silos of many different types. And then that shifted into what’s now more of the modern Cloud data warehousing. So think about BigQuery, Snowflake, Redshift et cetera. And then after we consolidated all of these things, we’re like, “Oh, okay. We finally got it figured out.”

Michelle Yi: But then you kind of see another kind of paradigm around machine learning and for people to take advantage of that you need yet another patchwork tool chain. And we’re going to dig into this a bit more, but the question is why is it that every time we see a paradigm shift, or a new technology, or a new data structure that we kind of go through the same motions over and over again.

Michelle Yi: And so just to speak to a little bit of those problems, I don’t think this is going to be new to anyone in the data space, but basically with each iteration that we’ve gone through, we still see the same needs from the business and the technology side.

Michelle Yi: There’s this desire for kind of more data driven decision making across the board from your executive teams all the way down to the engineering teams. And then there’s this other problem of like, “All right, we went through big data and we collected all the things, but now we don’t really understand everything that we’ve collected.”

Michelle Yi: So we even today, I think many of us would agree that there’s really kind of a lack of understanding of the full extent of the data assets that an enterprise or even a startup has.

Michelle Yi: And then as a result of that, there’s this third bucket of problems where we’ve really seen a rise of just too many point solutions or too many point data applications that sometimes can be repetitive of each other.

Michelle Yi: I don’t know how many times I’ve [inaudible] this and seen to a customer and we’re like, “Hey, you’re interested in a fraud detection, no problem. Oh, by the way, they also built their own fraud detection solution over there in teams D or E.” And so we’re kind of seeing like this common theme across companies and across a long period of time. And again, we need to ask ourselves what’s the root cause of this.

Michelle Yi: And ultimately I think what I saw over and over again is that there’s really something missing from this modern data stack. If we’re really evolving the way that we think about data, why are we seeing the same problems manifest over and over again? And so this is the question I really want us to kind of hone in on and specifically around this concept of knowledge and I’m going to share because you’re like, “All right, knowledge.” That can mean so many different things to basically everybody on this call.

Michelle Yi: And I’d be curious how many data scientists, more on the ML side we have in the room today versus more of the software engineering data app side, I’ve lived in both sides of those worlds. And they’re converging in many ways, right?

Michelle Yi: Because a lot of intelligent data applications today at the core of them, they really are having embedded machine learning whether that’s a machine learning model that you and your teams build or managed service that you receive from a vendor that you buy.

Michelle Yi: And so from my personal experience, I wanted to share an example of a day in the life of a data scientist or a software engineer working on an intelligent application and really hone in on this question using a workflow example of like what happens to the knowledge. And tell me in the chat, let me see.

Michelle Yi: I want to make sure I have it topped up in the screen, but please tell me in the chat if you resonate with this, but one common thing that I think people really have experienced is that we tend to spend like 80% as a data scientist or someone building an intelligent app.

Michelle Yi: We spend like 80% of our time productionalizing things and maybe 20% of our time really modeling, collecting the requirements and the data, et cetera.

Michelle Yi: And if I go into this just like one more level deep and not to get too trapped in the weeds, but just to really hone in on the pain point and why knowledge and embedding knowledge in a workflow is so important is let’s say like all of us are on the same team together.

Michelle Yi: And we want to build this fraud detection application. And at the heart of this application is a machine learning model that gives some predictive score of like, “Yes, that transaction is 50%, 60% likely to be fraudulent.”

Michelle Yi: Well, let’s think about this. So step one, what do we really go do? We let’s say one, we get a sense of our own intuition of what kind of data we need. We probably need something about transactions.

Michelle Yi: And we probably need something about accounts and people related to these transactions and maybe that lives in, I don’t know, BigQuery, let’s say it lives in Teradata, and then it lives in Excel because how many of us store data… Plenty of us store data at Excel. And then let’s also say that we probably need some information from the public web because when people steal things, they need to go sell them and make money.

Michelle Yi: So we get this intuition, we make a list. And then we ultimately, what we end up doing is we go to the business owners or the business experts and saying, “Okay, does it make sense to have this kind of data? What are we missing? Oh, I see, this data has this flag that has a transaction type one. What does that actually mean?”

Michelle Yi: And so we spend a lot of time upfront collecting and gathering data. We work on a subset and that in this 20% bucket of data science work, in that 20% of time, we get a model working that we’re pretty happy with.

Michelle Yi: Let’s say we use Python and a Jupyter Notebook. steps one and two are done. We’re happy. And then we need to scale this up to production. And then what we end up spending 80% of our time on is rewriting everything that we learned in terms of collecting the knowledge from different business stakeholders and our own data science knowledge.

Michelle Yi: And we rewrite that in like Java, Spark and much more heavier imperative programming languages, just so we can productionalize what we already did in steps one and two.

Michelle Yi: So the question is why can we not preserve knowledge across the data, across this entire workflow end to end. And that’s where I really kind of started to think more about this problem, because imagine how many like teams, how much time it would save if I could just preserve all of my learnings that I collected up front from the business about the relationships between transactions and customers, and accounts, and then also like the different constraints.

Michelle Yi: So for example, if I am looking for pictures of cats, I know that cats have two ears. I shouldn’t even think or waste any time processing things with four ears or five ears. I mean, this is a toy example, but I think you get the idea. And then 0.3 is really like, “Okay. If I on team A, I’m building this fraud detection app, why can’t I just easily share this knowledge with somebody in team D so that they don’t have to go do the same requirements gathering?” Because you know, that happens in any organization. And so when we talk about knowledge, it’s how do we preserve these relationships and really save ourselves time and.

Michelle Yi: We preserve these relationships and really save ourselves time and make that accessible to more than just one team. So, there is this concept of a knowledge graph and so you’re like, “Okay, well, yeah. I’ve heard about knowledge graphs.”

Michelle Yi: And there’s sort of like this way of structuring and thinking about data that can somewhat solve this issue, but not exactly and let’s… I want to get into that a little bit really quickly.

Michelle Yi: And so, one of the things is that, here is just an example of a knowledge graph concept, right? And the thing about this picture is even if I don’t give you all the details of like, oh, this lives in inquiry [inaudible], this one lives in another database.

Michelle Yi: Conceptually, you can kind of get that a product has a brand and a product has a category where shoes is an example of a category and a company sells products. It doesn’t matter if you’re an engineer or a business person, you can pretty quickly see what this is.

Michelle Yi: And now imagine if you could actually just query your data as easily as you can read this picture. The thing with knowledge graphs though is that they’re actually not necessarily a new concept.

Michelle Yi: So, it was coined by Google when they created the Google knowledge graph. They wrote this paper that came out in 2012, over 10 years ago now, and it’s been a core competitive advantage to them.

Michelle Yi: So if you ever wonder why search is so powerful at Google, this is one of the secret sauces to that. And when you’re shopping on Amazon, if you’re like, “Wow, my recommendations are amazing.”

Michelle Yi: That’s also another reason why they’re so powerful, is that they’re using this thing called knowledge graphs. And so a lot of other companies have really adopted this thing called the knowledge graph. And you’re like, okay, you can do all these cool things. You can express your business knowledge in the same place as you would do your programming or your data querying, why isn’t everyone else adopting this?

Michelle Yi: Well, the problem is that, and there’s many, many problems, but there’s kind of like three that all high level boil it down to. But one of them is that yes, knowledge graph expertise is kind of rare and not everyone is Google or Facebook or LinkedIn, and they can’t hire hundreds of engineers to go build these things for them, right? There’s not enough people out there to do this.

Michelle Yi: And the second thing is that building and scaling knowledge graphs is really difficult because a lot of the existing solutions are built on really old paradigms. So like the Google knowledge graph paper came out 10 years ago, a lot of the commercially available systems today make it hard to use.

Michelle Yi: Some of these systems are based on theories that came out in the seventies in terms of navigational systems, right? And so it’s really, really hard to use any existing thing to build your own knowledge graph if that’s really what you want to do. And so similarly, operating and maintaining them is really challenging as well.

Michelle Yi: So it’s an amazing concept that just really hasn’t been more commercially viable and accessible to a broader audience. And so, there’s one thing that I want to quickly over is we’re kind of taking a slightly different take and then I’ll show a really fun example to make this more real and in honor of international women’s day here.

Michelle Yi: But one of the things that we’re trying to do is say, let’s build that next generation thing. What does that really look like if we were to take a knowledge graph and make that supercharged and really available to a broader audience.

Michelle Yi: And one of the things that’s key is you see the word knowledge graphs, and then you see this thing called relational and RelationalAI. So I’ll share a bit more before jumping into the demo quickly and then wrapping up.

Michelle Yi: But essentially when it comes down to what we’re trying to do is build this next generation database platform that really gives you that infrastructure layer that’s going to help you consolidate and keep knowledge in the end to end workflow based on a solid shared foundation of a relational knowledge graph.

Michelle Yi: So one of the things that being a relational knowledge graph does is, and this is a bit of an eye chart, but I’ll summarize it in one point, which is that the relational paradigm, when you think about why SQL databases, for example, or you think about why snowflake or BigQuery or Redshift are so popular today is because it separates a lot of the what from the how. So you don’t worry about this huge list of super technical things in the middle, right?

Michelle Yi: A lot of that is actually handled for you. And so that’s something that’s really, really cool about a relational knowledge graph versus other systems. Because again, we share those same technical foundations of what you really expect from that modern data stack and including things like warehousing, et cetera. And so when you think about your favorite SQL system or your favorite database system, I guarantee a large part of that adoption is because your business users, not just your engineering teams, can use it.

Michelle Yi: And so in the future what we’d love to see is like, because we share these same fundamental architectural paradigms, we’d love to see that layer of knowledge that sits across and really pairs with and augments the work that many organizations have already done to consolidate and clean up their warehouse. Basically all the work that everyone’s done going from Hudu to cloud data warehousing, et cetera. This is the thing that we want to say is missing from that modern data stack and that we want to augment and really bring out the power of these things across your organization.

Michelle Yi: All right. So with that said, I’m going to take a look at the chat here and just see at some of the backgrounds. Okay. I love it. Business management, psych. All right. So, in the last three minutes or so I want to wrap up again with just like a simple example where we took some data, thank you to Girl Geek X for providing some of this as well.

Michelle Yi: But basically we took some data on the types of folks we knew would be presenting and attending the conference today. And then we also took some information that’s already… So, for those of you that don’t know about DIFA, we took some information from them. They actually structure all of the information on the public internet in a knowledge graph. And so it’s super easy for us to be able to leverage that in our system. And we took a high level view of kind of the women participating.

Michelle Yi: And basically what you’re seeing here is we put a visualization of what’s called the weakly-connected components graph, right? And so it’s a type of graph algorithm where what you can see quickly is like there’s certain densities and there’s certain areas that are less connected on the edge here.

Michelle Yi: And so we took a survey of sort of what did people study, right? And for women that are in engineering or technology, what did they study as the most recent education? And so what I thought was really fun about this is that when you zoom in, you can kind of see the clusters you might expect.

Michelle Yi: This is a New York if I remember right. And then in New York, there’s lots of people with computer science degrees, et cetera, et cetera. But when you get to the edges a little bit further out, you see a lot of really, really cool majors and folks of women that are in our fields and that have really, really diverse backgrounds.

Michelle Yi: And I love seeing this. So you see like economics, I saw English, English literature. I saw health informatics right here, design and art direction. And so I thought this was like a really fun way using knowledge graphs to quickly show that it doesn’t matter what background you have, but there is a place for you in tech.

Michelle Yi: And the thing is that when you are kind of one of these weakly-connected components, you might sometimes feel like you’re the only one. Right? But actually it’s not true. There’s so many of us that are out here.

Michelle Yi: And so I thought this was a fun way to show that using some real data. So yeah, I thank you so much for all of your time. I think we’re right at the 45 minute mark. And so, really appreciate it. And if you have any questions or you’re interested in graphs or the tech, please don’t hesitate to reach out. Thanks so much.

Angie Chang: Thank you, Michelle. That was very informative. I love the chart and the graph and for explaining everything so clearly. 

Like what you see here? Our mission-aligned Girl Geek X partners are hiring!

Share this