VIDEO
Kelsey Brown (Sourcegraph Analytics Manager) will share how to find the metric that matters most, and obsess over it! She will discuss how to make A/B testing quick and easy. Attendees will learn to embed analysts with product and engineering teams for data-driven product development success.
In this ELEVATE session, Kelsey Brown, a data analyst at Sourcegraph, shares her journey of building an AI product called Cody and the lessons she learned along the way about working with the product team, focusing on a leading indicator of product quality, and running A/B tests to improve the product.
Like what you see here? Our mission-aligned Girl Geek X partners are hiring!
- Check out open jobs at our trusted partner companies!
- Watch more ELEVATE 2024 videos from the event, or just the “Best Of 2024” Videos!
- Does your company want to sponsor a Girl Geek Dinner or Virtual Conference? Talk to us!
Transcript of ELEVATE Session:
Kelsey Brown:
Hi everyone. I’m so excited to be here and thanks to everyone for coming out to this talk. Now I want to start with a disclaimer. The title of this talk may be a bit misleading because, while I would love nothing more than to sit here and tell you that I have built the one correct way to use data to build AI products, the field of AI is just so new and it’s changing so fast that I don’t think the one correct way really exists yet.
What I’m going to do to is tell you a story about my journey and how I helped to build a new product in the midst of this AI boom that we’re in right now, and some lessons that I learned from my amazing coworkers and the overall experience along the way.
I hope that you all can take the story and adapt the learnings to wherever you’re at in your data journey or your AI or your product journey and one day even build on it.
First, a little bit of background information just to set the stage for our story here. As I mentioned earlier, I currently work in data analytics at Sourcegraph, which is a software company, and we sell a developer tool that helps engineers understand, fix, and automate code and coding.
The way this tool works is by indexing, or in other words, cataloging, all code repositories. This index makes all sorts of things possible. For example, it makes it possible to search for a specific piece of code and the way that you might search for a specific word or specific phrase in a document using control F. It allows you to even replace that specific piece of code with some different piece of code the way you could find and replace in a Word document. It does a lot more than this, and it does it all at really massive scale. Think thousands and thousands of files across a company’s entire code base.
When ChatGPT was released kicking off this huge AI boom, the team at Sourcegraph wondered, well, what if we combined Sourcegraph’s ability to index an entire code base with an LLM’s ability to ingest and interpret large swaths of data and built an AI coding assistant named Cody? It doesn’t just know general principles about coding and software, like any LLM might, but it also knows anything and everything there is to know about your specific code base and the repositories that you work in for your job or your side project or whatever the case may be. That would be pretty cool, at least that’s what we thought.
For me as a data analyst at SourceGraph supporting our product and engineering teams, this presented a really unique challenge. I had at the time, never helped our team or really any team for that matter, built out a brand new product, and I’d certainly never done it in a brand new industry.
There was no industry standard or roadmap or Stack Overflow thread for me to fall back on or to save me here. I also didn’t necessarily have a clear directive or ask. The goal was just for the data team to come in and help. It was hard to know where to even start, and that brings me to my first learning.
Now at this point, SourceGraph has been around for a while. We have an established data team and with that comes established processes for interacting with our stakeholders. You submit a ticket via our tracking system, we’ll respond based on our bandwidth and the urgency of the issue, and we’ll scope out a solution and get back to you.
Processes like this are really important for any team really, but in the early stages of building this new AI product, they were just too slow. I probably don’t need to tell anyone here that the most successful teams in AI right now are moving at warp speed. Everyone is trying to get their hands on the latest technology and build something better than anyone else has seen before. If our data team couldn’t keep up with that pace, then we risked just being left behind altogether.
What did keeping up look like? Well, it looked like abandoning some of our existing processes and being present wherever the team was. I often attended their stand-ups and planning sessions, which was a great way to understand their questions and priorities and blockers and figure out where data could help.
Being present also meant meeting them in their preferred tools. In our case, it was GitHub and Slack rather than asking them to go through our process of filling out a ticket in our ticketing system when they needed something from us. Even though it meant abandoning all of our regular processes, being embedded as a member of the team meant that I could more easily keep up with the pace of development, focusing on the most important thing that week or that day, or sometimes even that hour.
Being an embedded member of the team also meant that I could provide some immediate value by proactively anticipating team needs. When I first joined the project, I didn’t have a clear task or request or roadmap of any kind, so I just started with what we had. I searched through Slack and GitHub issues, documents, looking for conversations people were having, and then the questions that people were asking.
Then, I’d look to see if we had any data to answer those questions. If we did, I’d write some queries, create a data visualization, store everything in a single dashboard where it was easy to find and access repeatedly. If we didn’t have the data, then I’d let the team know, and I’d helped them scope out how to build telemetry to get the data we needed to answer their question. Then, I’d transform that data and make it usable.
I did this really haphazardly at first, just small things here and there, because everything was still so new and it was hard to know what mattered most. But even small things like this paid off in really big ways because after doing this repeatedly over and over again, a few things happened.
The first was just that I got to know the team and they got to know me and how they could leverage me and the data to help them iterate on the product. The second thing was that some patterns started to emerge. Some questions kept resurfacing and what mattered became a lot clearer and easier to focus on, which brings me to my next learning.
Now by this point, we were tracking a lot of what I’d consider pretty traditional product usage metrics like daily active users, for example. A metric like this is an important indication of growth and the success of the product, but it wasn’t enough to help us determine the quality of what we were building and where we needed to improve specifically.
This is because this metric is affected by things the development team couldn’t necessarily control, like marketing campaigns that would bring in new users. And also because it’s a bit of a lagging indicator, product improvements might not immediately lead to more new users, for example.
We needed a metric that would be a leading indicator and a measure of the quality of what was being built.
For this reason, we picked a single metric that measured the quality of one of our most popular features in this new tool, which at the time was code autocomplete. The code autocomplete feature is the same as when you’re in your favorite documents tool and it finishes a sentence in that ghost text.
As you’re typing, you can usually hit tab to accept the proposed text if it’s accurately predicted whatever you had planned to say, and save yourself the hassle of typing it yourself. Our tool could do this, but it did it for code instead of language, and it was a really popular feature at the time. We wanted to focus on making it great.
The metric we picked, which is called Completion Acceptance Rate, or CAR for short, measured the percentage of our suggestions that were ultimately then accepted or tabbed to accept by the user. This was a leading indicator of the quality of the product. The higher the acceptance rate, the more value we were providing to the user, the more likely those users were to stick around or better yet tell their friends about our product.
Once we had decided on this metric, that was a leading indicator and a measure of quality of what we were building, we started obsessing over it. It featured prominently on our team dashboards, team members would often check it first thing in the morning and over and over throughout the day, it was regularly reported on and discussed at the company level and company meetings and other communications. Small jumps and dips in CAR were always investigated and interrogated, and we were all focused on this important goal. How do we improve our CAR, and by extension, make our product more valuable to the developers who use it?
We weren’t exactly sure what the answer was to that question, but the team had a lot of hypotheses and the only way to definitively prove which ones were correct was to test them.
The team started running A/B tests to try to understand which product changes would improve our CAR. When we decided to run the A/B tests, we discussed setting up one of the many great A/B testing services that are out there, but we ultimately decided not to.
We at the time were already tracking a lot of product usage metrics and had a lot of telemetry set up, and we could leverage that data for experiments, and we could run calculations on the data to test for statistical significance ourselves relatively easily.
The method we came up with for testing was pretty clunky compared to what an A/B testing suite could offer, and the processes definitely wouldn’t scale, but at the time we didn’t need it to. This homegrown solution was ready to go in just a couple of days, whereas the process of selecting, purchasing, setting up A/B testing software could have easily taken a week or maybe more. And that brings me to my last learning.
While testing is really important, especially for a new product, especially in a new industry, the speed still matters. At times like this, it’s often better to opt for a quicker solution rather than the most robust, most scalable one, just in the interest of time.
We are coming towards the end of our story here and lucky for our heroes, we do have a happy ending. After we defined this metric, obsessed over it for a while or ran tests after tests in an attempt to iterate and improved our Completion Acceptance Rate, we were able to increase it by about 50% over the course of a month when in the past, it really hadn’t budged much in the months prior. And because we’d chosen this leading indicator and this measure of quality, we saw improvement in our other more traditional product metrics like daily active users and retention as well.
Within a couple of months of this effort, we formally made our Cody product generally available, confident that we were providing a high quality code AI tool that developers everywhere would love.
Like I said at the beginning of this talk, this isn’t a definitive solution. It’s just a story of my journey of being a data analyst in the world of AI products and what I learned along the way from my smart coworkers and a little bit of trial and error.
I found that being embedded with the team so I could move at their pace, picking a leading indicator and obsessing over it and testing until we found what moved the needle, really helped us to create a high quality product. And now we’re even trying to replicate this framework for other products and features where it makes sense as well.
And at the end of the day, he most important lesson that came out of this, and it sort of ties all of these other points together, is that data and data teams need to be maximally flexible, particularly in this AI environment, and we need to adapt to the stage of the product life cycle that our company is in.
When we first started building Cody, moving fast and narrowing in on specific focus areas was the key to our success, and the data team needed to figure out how to enable that, even if it meant throwing away the things that we had done before.
And as the Cody product now moves into this sort of next phase of general availability, I can already see how our team and our data need to adapt again. The quick solutions we set up in the interest of time will need to be reworked to scale with our team and our user base as they grow. Telemetry that we implemented without strict standards or haphazardly will need to be standardized. Tooling will need to become more flexible so teams can answer their own questions and run their own A/B tests without a data member needing to be involved.
Wherever you are in your data or AI or product building journey, the best advice that I can offer from my experience is really just to always be ready and willing to adapt and adjust to meet the needs of the teams that you’re working to serve.
And with that, if anyone has any questions, I’m happy to answer them.
Amanda Beaty:
Okay, Priya, go ahead with your question.
Priya Shastri:
Kelsey, first of all, thank you for that amazing presentation. You have highlighted some of the important aspects of software development and how AI is coming very fast into the software development lifecycle and how we are incorporating the AI tools into it.
My question is, what are the AI tools that you use for analytics, and which ones are the market leaders? Which ones would you recommend to use for a product? And the third question is, how fast do you see the adoption of AI in the analytics market?
Kelsey Brown:
Yeah, great questions. I’m actually not sure I’m even the best person to answer what AI analytics products are the best out there. My company’s been building an AI tool that’s for developers, but it’s not specifically focused on analytics. I’d love to say that I have an answer for you there, but I really don’t.
In terms of how quickly I see AI affecting the analytics space, I think it’s going to be a huge game changer. I think the one thing that we need to solve for first though, is how we make sure that our data is structured in such a way that it can be readable and ingested properly by LLMs.
Right now most of our data infrastructure is really built for people to ingest and query, and so I’m really interested to see, and I think the speed at which we can adopt AI and analytics really depends on how quickly we can build infrastructure in such way that a robot could essentially interpret and give us results.
A lot’s depending on that. I think if some of the existing AI or existing data tool companies come out with interesting new tools to make that more possible, I could see it happening really fast, but if not, I could see us getting hung up on how to build data warehouses and data infrastructure for robots instead of people.
Amanda Beaty:
All right, Priya, I hope that answered your question. Kelsey, thank you. It’s time to end the session and hope everybody can join us at the next session.
Like what you see here? Our mission-aligned Girl Geek X partners are hiring!
- Check out open jobs at our trusted partner companies!
- Watch more ELEVATE 2024 videos from the event, or just the “Best Of 2024” Videos!
- Does your company want to sponsor a Girl Geek Dinner or Virtual Conference? Talk to us!