//

“A Gendered-Language Flipping Tool That Exposes Bias: Neutrality.wtf Case Study”: Moriel Schottlender, Principal Systems Architect at Wikimedia Foundation (Video + Transcript)

March 20, 2023
VIDEO

Moriel Schottlender (Principal Systems Architect at Wikimedia Foundation) exposes interesting impacts of gendered language with neutrality.wtf, and shares technical insights for open source development and architecting software “in the open” for creating career opportunities.


WATCH ON YOUTUBE

Like what you see here? Our mission-aligned Girl Geek X partners are hiring!

Sukrutha Bhadouria: I hope y’all are having a great time so far in the conference. We’ve been having one great session after another. It’s been amazing for me at least. I do wanna remind everyone to please post on social media, all the amazing things y’all are hearing, learning, and listening to today. Use the hashta IWD2023 for International Women’s Day and the hashtag ElevateWomen because we want you all and us all to lift as we climb.

Sukrutha Bhadouria: All right, so now we have the wonderful Moriel, who’s a principal systems architect at Wikimedia Foundation in New York. She’s a physicist turned software engineer turned systems architect, currently work working on modernizing Wikipedia’s architecture. She’s also an open source enthusiast, and we are excited to hear about neutrality.wtf. Welcome!

Moriel Schottlender: Thank you so much. I’m so excited to be here. And it seems like I have gone from one thing to the other, to the other, and that’s I guess, part of changing, growth, I don’t know. But now today I’m gonna talk to you about neutrality.wtf, a case study in gendered language flipping tool, which is a mouthful but hopefully we’ll learn something and share something today. First of all, at me, I am originally front end development very much a open source enthusiast. I do a lot of localization work and support and languages. I hoard domains, which I assume a lot of you do too. And I’m a principal systems architect at the Wikimedia Foundation. We operate Wikipedia and I also apparently do a pretty good you know, self cookie that I was actually very proud of that one.

Moriel Schottlender: Let’s delve in. What is neutrality.wtf? We all know that the gender gap is real, but the gender gap is not just about how many articles we have about women or non men. The gender gap is also a little subtle. The gender gap is about how do we talk about both genders? And sometimes, or a lot of times, even when we do have articles about women, we tend to talk about them, not quite the way we talk about men, but it’s really hard to differentiate, to see it. And we hear a lot of people say kinda like, well, you wouldn’t say that about a guy, would you, wouldn’t you? I don’t know. And this is where the idea of that tool came up.

Moriel Schottlender: I was like, is there a way for me to kind of test this hypothesis to see if we flip around the gender terms and we read an article or read whatever it is online, can we identify problems or does it is not a problem? I don’t know. That’s where it came from.

flipping gendered language wikipedia ada lovelace moriel schottlender

Moriel Schottlender: This is what it looks like. This is the article about Ada Lovelace in Wikipedia. And so if you move it you know, pass it through neutrality.wtf, it’ll just replace all of the terms from male to female and from female to male. All of the gendered language will be just flipped around. And so now you can read about Ada Lovelace as if she was a man. And then now you can say, would I say these things about a man? And I’ll let you decide. I think, for the most part, probably okay, but there are cases where it might not be as okay. And this is the point of this tool to kind of flush those out. This is an example.

Moriel Schottlender: This is the beginning of the article matriarchy in Wikipedia. This is from 2018. The problem that I’m gonna show you today was actually fixed, which is very exciting. But I was thinking, okay, if I passed matriarchy through a neutrality.wtf, everything is flipped. Well, now it’ll describe patriarchy, which technically is the opposite, right? Would it look the same? Would it sound right? And so I did. And oh, it didn’t sound exactly right. There were a couple of things where if you went to the article about patriarchy, you didn’t see there, most notably in mammals, to the exclusion of females, probably not what we want. Good news is that these things were fixed, were removed from the article. But that is the point of this tool. To also clarify, we do have limitations and there are very known limitations. The first thing is gender itself is not binary, and this tool is not making any statement about the gender being binary.

Moriel Schottlender: On the contrary, there’s a lot of explanation that is exposing the fact that gender is not binary, but gendered language, especially in English, tends to be binary. This is only making the point about gendered language. There’s also a limitation about one-to-one replacement of language, which this tool is doing. This tool is dumb. This tool is just looking for words and replaces them, which means we have problems with language processing or language context. There are verbs versus nouns, for example, count can be replaced by Countess, Count Dracula, for example. But I also count from one to 10. Is that count test from one to 10? Probably not. The system is a little dumb. We do have those mistakes happen. The idea is to make it a little bit, you know, funny, also to make the point. We do know, this said, there’s also ambiguous words.

Moriel Schottlender: We lack context. And a lot of times the demeaning terms. So if you see something written about girls, when you actually talk about women, we know that it is demeaning. When we flip it around and suddenly talk about boys, it’s not as demeaning, so it might not always deliver, right, the actual impact, but it works.

Moriel Schottlender: I’m an architect. Let’s delve into the actual tech. How did this happen? Well, this all happened in the 2017 Wikimedia hackathon. As you can see, this is me in my telltale hackathon. Look, I am overly caffeinated, but barely awake, hackathon. This was an idea for the hackathon. We took articles from Wikipedia specifically. The tool works, by the way, outside of Wikipedia as well. You can do do any sort of article, but Switzer and PHP based on a one-to-one replacement, JSON dictionary, and it has a hackathon architecture.

Moriel Schottlender: What does that mean? Well, hackathon architect is a hopeful one. I started writing this and I was saying, I know that I’m gonna write something that I want to basically just have work and never deal with it again, so I want to make it right. I’m going to separate concerns. I’m going to create two repositories, one for the front end and one for the backend. I’m gonna use it as a submodule. I’m gonna like separate everything. And that was great, except it didn’t really work. While we were working on this in the hackathon, everything started kind of like mis-mashing together. The business logic was all over the place. The caching was everywhere. The front end was fetching and the business and the back end was kind of like, you know, interpreted. It was, it was crazy. It ended up being very, very, very, very tangled, which means it was a monolith in disguise.

Screenshot at .. PM

Moriel Schottlender: I attempted to do decoupled. It was not decoupled at all. But look, as a hackathon project, it works, right? Like it’s just, it’s there. I put it out there. People use it, it works. It’s great. So why should I touch it? Well, it didn’t really work for me. The environment kept going stale. I had to keep going after kind of like upgrades and stuff and libraries. And then the caching layer was all over the place and started filling up my hard drive and moving to other things became really, really problematic. Managing the hosting was bad. And then monoliths in general, a very annoying because a little tiny bug, suddenly you have to fix everything everywhere because there’s no separation of concern.

Moriel Schottlender: I wanted to be lazy. I wanted to just let it live online without me constantly running after this. And the way to do that was to reconsider my architecture. Let’s think about this better, especially since it’s been a few years. Technology is new, there’s other options. Let’s figure it out.

Moriel Schottlender: How do you approach thinking about a new architecture to your thing? Well, the first thing is to think about what are the standards? What do I want? Like what am I aiming for, right? What I want is to make sure that it’s loosely coupled. It means that it’s made out of pieces that I can replace seamlessly. They’re not like directly touching one another so that I can, if something happens and I want to replace the middle layer or the top layer or whatever, I can just replace it. It needs to be easily maintainable. I needed to retain the same behavior, but allow for more features without constantly maintaining this. This was very, very important.

Moriel Schottlender: I had to make sure that I don’t run after everything all the time. And then there was another set of kind of standards that I wanted to add. This is open source. And because it’s open source, other people can come and just contribute. And that’s great. But it also means that I need to make sure that the separation of concerns are very clear. Because if someone else comes in and just adds code in, I need them to understand what is valid and not valid to add, right? And so it makes it everything a lot like easier to review and avoid diluting whatever I worked really hard for the con separation of concerns, even if changes, which is what I want to make sure, right? Like this is not a constantly like worked on project. I could leave it out there online and come back to it like months and months later. So I wanted to make sure that that works. All right.

Moriel Schottlender: Thinking about all of things, I came up with basically a three component, very broadly architecture. Okay? This is a plan. I’ll have a standalone Node.JS library. This thing is going to be relatively generic just to do replacement of terminologies inside HTML pages. And then I’ll have a microservice that takes the very, very generic and brings it into the specific, so it does all the translation needed to do a specifically neutrality.wtf replacing gendered languages thing. And I have a web front end that is completely decoupled and I can replace it any time. The standalone library, it has to be scoped and this is one of the biggest problems. We’re gonna touch on it in a minute. It has to be very, very well scoped. It does something very specific.

Moriel Schottlender: A black box, it goes in, goes out, HTML in, HTML out, does the replacement internally. It means also that it’s testable because it is very scoped. If I go in at some point in the future and I want to replace some functionality or add something, I can be sure that all my test pass and all my expectations are there, so it’s a lot easier than like a humongous monolith where who knows what I just touched on and what I did. And it’s open, so shareable. Some people can just take this library and do other replacements of other things. I don’t know. All webpages will have replaced cats to dogs or some images to other images, whatever it is. You have like a very kind of generic library or as generic as possible library to reuse. What it is due, it’s kind of a black box.

Moriel Schottlender: It accepts HTML string, it parses it into a document, performs all of the replacement. It’s accepted string and library sorry JSON replacement. It replaces everything. It serializes his backend, hands it over as a string. That is completely replaced. That is what it does. And then I have the microservice. The microservice needs to be the bridge between that backend, that like engine into the front end that can be like visible to users. And it does the product specific behavior. It validates all parameters. It does the fetching of the remote page because the library itself doesn’t know if it does fetching or whatever. It doesn’t know about HTTP or whatever. All it knows is about HTML documents. The micro server does it that. It handles all errors. It’s calling the library and it returns the output in a way that the front end can present it, which brings us to the front end.

Moriel Schottlender: The front end, straightforward. Decoupled. It can have multiple versions. I might have a web or mobile web or a browser extension, whatever I want, I could swap it around. And it needs to be accessible. It presents the interface. It has explanation pages on it. It accepts user input input, validation, all that kind of stuff. But then we need to consider complexity. There is an overlap a little bit between the microservice and the Node.js library here. What do we do about this complexity, right? Like I have overlap here. Which piece should do? What should the microservices always do the web specific stuff. It sounds right. But if I go with a very simplistic, then there’s a lot more complexity in the microservice, right? The microservice needs to do a lot more. If I go with a really complex library that can do anything, then now I’m risking making it unusable because it can do so many things that nobody knows how to use it, which we see online a lot, right?

Moriel Schottlender: But there’s another consideration here. The actual action of parsing and HTML string into a document is expensive. It takes time. And so if I do it twice, I’m actually doubling the amount of time that it takes the system to respond. I need to make sure that I only do it once, which means that if there are certain things that might be web specific but required to be done before I while I parse it, I might have to put it in the in the library, right? What do I do? You know, what happens? Where should the complexity live up, down in the middle? Well, like many architectural concepts, it depends. There is no clear answer here. I don’t have like a direct answer of this is certainly where you need to do it. What you need to do is consider your complexity and understand that no matter what you do, you never lose it.

complexity it depends moriel schottlender

Moriel Schottlender: People online say, I just reduced all complexity. You did not. You moved it somewhere. Okay? Either you did it consciously and you’re aware of the trade-offs or unconsciously and you’re going to encounter it later. When you think about complexity, you have to kind of think about it clear. What will help you with that is examine the trade-offs. What does it mean? Put it in here. Doing it more simple, less simple. Root yourself, right? With an actual use case. Yes, the standalone replacement library should be very generic, but it also is aimed at a use case. Root yourself with that. Because if you start thinking about anything it can do, you’ll never get away with it. Everything will be possible and then nothing will be possible. Root yourself with that. Consider your performance concerns. Those are external things that come at you that might need to kind of formulate where you put your complexity.

Moriel Schottlender: It dictates where that boundary exists. And that is totally valid. And then beware of diluting the actual meaning of the component you started out with. Like, you know, thinking about a microservice, a library and they have meanings. And if we lose track of where complexity lives, we can actually lose like the actual thing we want the library to do, right? We put so many things in it that now it can do so many things that it doesn’t actually do the thing we want it to do, right? Try to kind of like re remain with like what is the goal of this component? What do you wanna do? The trade-offs, the external stuff. And most importantly, document all your decisions. And this is important generally, but it is even more important when you have an open source thing that other people may come in and ask, why did you do this? Right?

Moriel Schottlender: With that said, here is the new architecture. As you can see, I have three main components to it. The web front end, Vue.js incidentally, but it could be anything else. There are clear boundaries here. The caching layer is very clear. The microservice does the operation to simplify the no, the the Node.js library. And I even added a little bit. If I want a browser extension, I can just use the npm package. It is generic enough to do that, right? And it is reusable. You can just use it. And this is what it looks like. You feel free of course to go neutrality.wtf to use it. It is open source. PRs are welcome. And that’s how it works. To summarize, always, always when you architect, when you think of code, even before you decide to re-architect anything, always consider your complexity.

Moriel Schottlender: Whoever tells you that you need to do something like complex misses the point that complexity always exists. And you need to do to have like a very conscious choice of where don’t just ask could it do this? Cuz that is very easy for engineers to do. We ask, could it do also this? And the answer is usually, of course it can. We’re engineers, we can do everything. Ask yourself, should it do this? Because if you get away at the end of it with a library or a component that does everything, then it’s probably not one component. It should be many or whatever you, you end ended up with like a tiny, tiny monolith again. So as so should do this it’s okay to want to be lazy. I know it’s counterintuitive. Everybody’s kinda like work hard. You should work hard, but you work hard.

Moriel Schottlender: And that’s true for engineers. If we actually think about it. Engineers are lazy in our core. We worked really, really, really hard on stuff to do stuff for us so that we don’t have to do them right? How can I end up having least overhead is a totally valid question that is architecturally sound. It is totally okay to do that. And then you decide what the trade-offs are so you know what to invest in. And it’s okay to decide, you know what, I’m going to take a little bit longer to do this thing but it will save me time later. That’s perfectly good. And that also means try to use external tools. Why make everything in-house? You don’t always have to. You have external tools. Preferring those usually is better. Not always. Sometimes you do need like very specific but make a conscientious choice.

Moriel Schottlender: It’s very easy to jump into directly. Oh no, nothing exists like that. Or whatever exists is not good enough. I’ll just create my own. That is usually what we call code smell. You need to really ask yourself if that’s actually what you need to be doing. And there’s another thing here underlying all of this conversation that I didn’t touch directly, but I touched a little bit. Working in the open, working in open source has a lot of benefits. A lot of benefits for the way that you work and for the architecture that you produce because it forces you to kind of think in a collaborative manner. Even if you don’t collaborate, even if no one ever produces a PR to your code, it doesn’t matter cuz you put it out there. So the first thing is that it kind of gives you permission to put it out there, even if it’s not really ready, you’re just out there.

Moriel Schottlender: And that’s important because we tend to only put out there things that are perfect. Don’t wait until it’s perfect, just put it out there. You get help from people. It forces you to put very clear boundaries on everything and it encourages you to document everything which is superbly important, even for yourself in a year from now to know why you just did what you did because you don’t remember. Trust me, I don’t remember what I did three months ago.

Moriel Schottlender: And finally, I think we don’t say that enough. Have fun with your projects! When you have fun with your project, it shows you’re excited about them, you’re passionate about them, and it shows also for hiring managers and for the public and for great products, and you keep on working on them. Find things that make you happy and passionate and work on those.

Moriel Schottlender: If you want more information neutrality.wtf is online. Feel free to use it, please. It’s on GitHub. PRs are welcome. Issues are welcome. And you can also talk to me. I’m “mooeypoo” almost everywhere, Twitter, Mastodon. And if you want to follow me around, go to moriel.tech to see what I’m up to. Thank you very much. It’s been a lot of fun. And go on and be passionate with your projects. Open source. I think I’m slightly ahead of time, which means I have time to look at questions. Potentially. There’s a lot of comments.

Sukrutha Bhadouria: Hi, this was absolutely wonderful. I was following the chats. <Laugh>, someone said, we all love your voice,

Moriel Schottlender: <Laugh>. Oh, thank you <laugh>.

Sukrutha Bhadouria: Anyway, I wanted to say thank you for, oh my gosh, I just noticed your handle. It’s so clever.

Moriel Schottlender: Ah-Ha. Yes.

Sukrutha Bhadouria: Yeah, your social media. <Inaudible> <laugh>. Anyway, thank you so much for your time. This was absolutely insightful and besides loving your voice, your content was amazing.

Moriel Schottlender: Thank you. Thank you. It’s been absolutely fun and I really hope that you know, we get more and more of these conferences. They’re really important and they’re awesome. I’ve been learning a lot,

Sukrutha Bhadouria: <Laugh>. Absolutely. We wanna keep this going. Thank you everyone. Bye.

Like what you see here? Our mission-aligned Girl Geek X partners are hiring!

Share this