Transcript#
This transcript was generated automatically and may contain errors.
Welcome to The Test Set. Here we talk with some of the brightest thinkers and tinkerers in statistical analysis, scientific computing, and machine learning. Digging into what makes them tick, plus the insights, experiments, and OMG moments that shape the field.
In this episode, we interview Vincent Warmerdam, first full-time hire at Marimo, founder of CalmCode.io, which has some really nice tech tutorials, and someone who seems to be a man of many talents. Vincent claims pursuing side activities, rather than doubling down on tech, are the killer feature of his career, and I honestly can't agree more. He started out bartending at a comedy theater, later he practiced jokes at his other job as a tour guide, finally he landed a job at the tech company Spacey by delivering a well-timed joke in a bar at a conference.
And this brings us to the meat of the conversation, Marimo notebooks and data analysis. At a time when AI can generate entire notebooks in seconds, Vincent argues for the opposite approach. Create with AI a notebook with a few cells filled out, and then just stare at a chart for a couple minutes. Ultimately, get involved, and be selfish in wanting to understand the data yourself.
So with that, Vincent Warmerdam. Vincent, thanks for coming on the test set.
Vincent's background and open source work
Yeah, where do we begin? I mean, I feel like you have such a broad set of things from your like open source contributions to CalmCode. Maybe you could just catch us up on, just give us a broad swath of sort of the range of things you've been up to.
Sure. I mean, one thing it's CalmCode.io, it's not CalmCoder, but no worries, no worries, there's a bit of a detail. Now I guess like, because I get the question a lot, because it is true that I've done like a bunch of things, but the way that I see it is I'm kind of open in the sense that I like to expose myself to things that just seem interesting, and then you end up doing stuff. And as long as you document it properly and like share it with people the right way, then you'll be known for things.
So one thing that just happened with, let's say the open source thing, my most downloaded package right now, I think is scikit-lego. It's been downloaded almost 2 million times now. But it was just me playing around with scikit-learn and sort of thinking like, you know, there's a bunch of stuff missing in scikit-learn. Whenever I go to a different client, there's like this one component that I always end up reusing, because it didn't have pandas support back in the day. So I had a couple of components of the pandas thing, and I had a couple of components that the group by thing. And there were a bunch of these tricks that I had in my pocket, but I kind of felt that contributing those upstream to scikit-learn was way harder than coming up with a little package that just shared the idea.
But then it turned out people started running it in production for realties. And like scikit-lego was kind of a cool name, so it stuck in people's memory. And then I did a conference talk about it, became more and more popular, we got contributors, etc. That was kind of the thing that happened by accident, but I was curious. I have tons of open source projects that went nowhere, just to be clear. But this was one that just happened to stick around.
And then you become known as the guy who did a good talk on scikit-lego and do more things, do more things. But just like with the keyboards, just like with the YouTube, just like with the career path, it is really just, I show up, I do things, I blog about it, make sure it's logged, and people remember me. That's the trick. Nothing planned, it just happens if you expose yourself and if you try to be a bit memorable about it.
Landing at Explosion and the Spacey pun
That might need an explainer too. So what happened there? I was at this consultancy that shall not be named because I'm still recovering. But I was attending, if I had that consultancy, this conference in Berlin. It was called Spacey in Real Life. I kind of forget. But I went there because I thought, you know, I want to learn more about these text embeddings. Something's happening there, and I don't understand it. And that's how I met Matt and Enos, the founders of Explosion, co-creators of Spacey.
And then I was just at a bar and, you know, the bar was packed. We were all like standing like this, side by side, shoulder to shoulder. And we went to a new bar. And I went up to Enos and I said, thank God, this bar is way more spacey. It's like really, really silly joke. Like really, really silly joke. But they remember me because of that. And then, you know, the day after we did some presentations and then Matt and Enos looked at me and said, oh, I think that guy could YouTube well, because they were looking for a YouTuber to maybe explain Spacey. And they thought, well, this Dutch guy seems funny.
And I told them, like, look, sounds cool. I've never done it before. You're going to have to coach me a little bit, but sure. And then those videos became pretty popular. And that's kind of how I got my first job at Rasa, which was this chatbot startup like from way before we did like LLMs, basically. And I turned out to be pretty good on the YouTube thing and explainy thing and made some packages and open source things there that kind of took off.
And then at some point Matt and Enos said, hey, we have funding now. Maybe it could be cool to join us. So I joined them and then did cool things there. And they ran out of funding. So then I was looking for something else to do. And then it turned out that scikit-lego was one of the most popular scikit-learn plugins by then. And then the scikit-learn people who started this company called Probable said, hey, Vizit, could you join us here maybe?
And then there I started doing a podcast for promotional reasons. And then it turned out that there was this project called Marimo, which was pretty interesting. I interviewed the founder because we had a podcast back, Probable, that I was hosting. And then it turned out that the founder was from the same town that I lived in. So I lived in California when I was 12. And he's also from Cupertino. He lived like two blocks away from me.
So we hit it off. And then he came to the conclusion that, oh, we also kind of need DevRel at some point when we have funding. And that's again how I kind of switched. It's just more coincidental than you might think. But it is conscious coincidental in the sense of, oh, that sounds interesting. I'm consciously going to move in that direction. But that's how this all happens.
It's just more coincidental than you might think. But it is conscious coincidental in the sense of, oh, that sounds interesting. I'm consciously going to move in that direction.
Side activities as a career superpower
I do. One bit of feedback that I have gotten quite a few times is it is useful if you can leave a memorable impression, because then people remember you. There's a lot of people on the planet. And a lot of people are easy to forget. But if for some weird reason, you're kind of funny, then people remember you.
Yeah, I think I gave my somewhat famous Pokemon lightning talk at that event. Anyway, it doesn't matter. But, but yeah, there is something to it. And like one other thing that I do think, and this is like a hindsight explainer, like you look at your life, and then somebody comes to a conclusion. One thing that in hindsight turned out to be like the best thing ever that I did back in college, this is a really, really random thing. But I was a bartender at a comedy theater in Amsterdam, like one of the like the oldest ones, like a really, really fancy one.
And all during college, during like, college season, like outside of the summer holiday, I was just serving beer to like the most famous Dutch comedians, basically. And in the summer, I would be a Dutch tour guide. So like people come to Amsterdam, you're a tour guide on the bike. And I was kind of thinking like, gee, it's actually kind of perfect that I can like practice my jokes that I kind of peek from the comedians, because then during summer, I just have a new group of tourists every day that I can practice on basically. And that was also the way for me to make it fun to do that gig. So I ended up doing that for like four years, basically.
And then you fast forward and like, you're the guy who's like really comfortable doing conference talks. And you do kind of go like, okay, that's actually a skill you learned back from your college days when you had that job, basically on the side. It's like one of these little things. I also was like really into rowing. I did like semi professional rowing for a good year, like little things like that do stack up if you just have a wide variety of like skills and things you've done. And that's something I do think some people don't do. And there's a mistake in that where I'm just going to double down on tech. And that's going to be the path forward. I mean, it could work. But you're probably going to be more interesting if you do more side activities. At least for me, I've noticed that, hey, yeah, in hindsight, that is actually kind of the killer feature of my career at this point.
I think the tour groups too is interesting to hear where you just have the like repeated chance to like, it's, it's a lot like the format that I guess comics use where they're like hitting open mics. Well, I mean, like another aspect of the tourism thing is Amsterdam has a pretty, pretty okay rich history, I suppose. But the main thing you practice with that, by the way, it's not so much jokes, it's storytelling. That's, that's the key thing. Cause you got to tell the story such that the tourists don't get bored effectively because then you don't get tips.
Backpacking and committing to a career
When I, when I asked how you wound up in your current role, you said you were backpacking through Latin America. Oh, right. Yeah. The story starts a bit more.
So let's, let's go back even further. Um, so you got to imagine, um, my degree is in econometrics and operations research. So very much applied math, but very little coding. Like they didn't really teach me any coding whatsoever. Like they gave you a Java course and like a course in Oxmetrics and like some other language no one writes in. Uh, but I had to teach myself Python and R and JavaScript and all of those things. So, but I wasn't sure if that was going to be my career because you're normally supposed to be like an actuarian or something like that, like good money and all that. But Oh, well, I don't know, like data science might become a thing.
It was like around the time that was kind of new, but I had to figure out if like coding was going to be a thing that I was going to do, if that was going to be a thing that I would, uh, enjoy doing. And then I figured, you know, I do also want to go backpacking, but if I take some work as an independent contractor with me, as I go to Buenos Aires and like Latin America up to Peru and like all those nice places. Um, if I'm willing, if, if, if at some point it hits me that I feel more like programming than to go clubbing every evening, then that'll be a signal that it's such that I can say like, okay, Vincent, now you got to commit to this career path a bit.
And I just noticed as much as, um, you know, going out, uh, in Buenos Aires and all those lovely places, as much as amazing as the cocktails in, uh, Lima in Peru really are. I just noticed that at some point I just felt like coding a little bit more than going out every evening. So that's where it started like that realization. Um, because then you kind of know, like, okay, that I don't know where I'm going to end up, but I do at least have any experience that really confirms a direction at least.
Discovering Marimo
So, the way this all started, you got to imagine I was back at Probable as the company that's like a good chunk of the scikit-learn core team is there, and I was doing DevRel stuff. And like, one of the things you can then do is you can say, let's look at all these adjacent technologies and let's sort of try it out and like do it in a live stream. People find that interesting. So, you know, Marimo was his new notebook thing. I was just kind of curious.
And I tried it out on a live stream. Then afterwards, I also did the podcast, got the founder on. And then I just played this game of like, okay, do I find myself going back to Jupyter or do I find myself going to Marimo? Because, you know, it's kind of up in the air. It's a new technology. It's kind of kind of scary to invest in it, right? And then I just kind of noticed that Marimo started to correct me.
Because the thing that Marimo does that Jupyter doesn't is Marimo basically says you can only define a variable once. There can be one cell where you define the variable, and there's none of that stuff in Jupyter where you can say a is equal to one in one cell and then a is equal to two in another cell. And depending on the order in which you run the cells, the state and memory changes. Marimo doesn't allow for any of that. So in Marimo, it's all very much reactive. You can define a variable somewhere. You can refer to it from other cells. But the moment you define a variable twice, oh, we're going to throw errors in your face because probably something is going wrong.
And I just noticed during a live stream, something totally went wrong in Jupyter that Marimo would have instantly fixed if I would just have used Marimo. And that was the point in time where I kind of feel like, okay, it's got that one feature where it's automatically correcting me. Oh, and it also has, because it's reactive, like you change a variable in one cell, all the other cells that depend on that variable automatically update. That is amazing if you want to do stuff with like widgets and really get the browser into Python, that's also like a thing you can really do. Like, oh, hang on. I am rethinking the way I want to work now. There's something here. I should chase this.
Marimo's growth and AI integration
Yeah. I think we're around 20 K or something, but the, if we just look at downloads, like that's the number that I typically look at. We grow about 5% a week.
That's pretty, that's pretty intense compounding. That is, it does get intense. Yes. I wish my bank account grew at 5% a week.
So one thing, I think the earliest AI feature that we had, and I could be wrong because these were features that were kind of already there when I joined. But we had GitHub autocomplete because that used to be free. It used to be an API for it. So that was a thing that was sort of added in the front end quite early. We also had, I think right around the time when I joined this thing in the sidebar where you could bring your own LLM API key. And then we could do this thing where we contextualize everything that's in the notebook right now, have that be part of the prompt. And then we have this thing on the side with a copy button that lets you copy code into cells.
One thing that happened was that Dylan, one of the engineers in our team, he just recognized that, hey, maybe it's not just the LLM so much when cloud code happens, maybe it's the harness that matters most. So we ended up making a linter just for Marimo. And it was really not meant for humans. It was meant for the LLM. So the thing that we mentioned earlier in Marimo, you have a cell, you cannot have the same variable that you declared in two cells. That is one of the things that this linter will pick up and it will do so in a way such that the text that comes out makes it very clear for the LLM how to fix it.
The other lucky break that we also kind of got is there's this pretty popular IDE called Zed. I don't know if you've heard about it, but they invested a lot in this ACP. I think it's called a protocol, but it's basically this way such that if you're an agent like cloud code or whatnot, that you basically have a kind of a protocol that other apps can plug into. So Zed kind of started with that. Just to round it out, ACP is agent client protocol for anyone.
So if you're, let's say a text editor, you don't have to do any nasty terminal, read the text that comes out, buffer it back in. It's like an actual protocol that cloud can understand and you can interface with it. And that's something we could basically just piggyback off of. And that thing works very well. So one thing that you can do with an ACP like that, and that's a trick that we like to use in Marimo, is let's say that you have a Polars pipeline or a Pandas pipeline, but something where you start with a data frame and then you do like a step. Like we have logs from a server, we add sessions, and then we filter out the bots. And the thing that comes out of that, we want to write some more Polars or SQL for, it would be amazing if the LLM could just get the schema from that data frame, instead of having to try to figure out what the columns are after six of these pipeline steps, so to say.
So what you can do in Marimo is you can tell the LLM at, and then pass in a variable, and then we can add context for that variable that's in memory right now. And we can have different contexts if it's an integer, and we can have different contexts if it's a data frame or a SQL table. And that also helps the LLM immensely when it comes to writing proper pipelines.
Widgets as Lego bricks
So on the topic of like widgets and stuff that you could do in Marimo, I do want to have like one plug for my colleague, Trevor. There's a standard called AnyWidget. I don't know if you've heard about it or seen it. It's getting more ubiquitous. But there is now a standard such that if you build a widget, according to this one spec, it'll work in Jupyter, it'll work in Colab, VS code or Marimo, like all the places basically. And the way like people are sitting on this thing and like sleeping on it, people should definitely invest way more in it.
But the thing that I really love about those widgets in particular is that you can kind of start imagining that we now can create these magic Lego bricks, if it were, such that you can actually attach browser functionality to Python, which is something that used to be super hard. I can now honestly make notebooks and let me use the game pad to run some Python scripts. And for annotation reasons, for evals and stuff, there's good reasons why that makes a lot of sense. But also webcams and also all sorts of other APIs that the browser has, you can just attach that to Python now and you've got access to JavaScript and all the good things.
One thing that I do try to remind people of is, it's not so much, yeah, we've got Marimo and that's great, but it's like the entire ecosystem is way more modern than it was 10 years ago. And that means that you should take a step back and just reflect on, hey, what's new? Because you can rethink things definitely at this point. It's not just LLMs, it's also the entire ecosystem has been made more modern, effectively.
Yeah, it's a good point too. If people are sleeping on any widget, maybe worth noting too, that people like Plotly are using it. Because I understand the adoption's pretty high across the ecosystem.
Altair also uses it now, I think. I've made this thing called Draw Data, where you get a 2D canvas and you can just draw a dataset that you can then use, like draw and get something into Polars or Pandas. So the sky really is the limit here. The most extreme thing that I've ever made, this is a really, really nerdy, nerdy thing, but for the longest time of my life, I always kind of understood differential equations, but nothing really clicked. And then I saw this one YouTube video about this differential equation called Lanchester's Law, which is about when two Age of Empires armies just smash into each other.
Apparently there's a differential equation such as you can predict how many, blue army is slightly bigger than red army. Can we calculate how many blue army soldiers survive is kind of the differential equation. And I thought, ah, it's so cool. I wonder if I could vibe code in any widget that just simulates all those battles with JavaScript because you can do sort of collision detection and stuff. And you totally can. So I built this collision detection battle simulator in JavaScript with Claude, and that generates data for the Python notebook to like check if the differential equation actually holds. And it does. And again, part of it is the LLM. Yeah, sure. But like part of it is definitely the widget. And you just couldn't imagine this two years ago, right?
So I kind of imagined this question might pop up. So, and I think I've got like the perfect analogy. So I've got a kid now, right? And I want to have a creative toy for my kids. So I could do two things. One thing I could do is I could buy a 3d printer and then you can 3d print all sorts of interesting things. Or I could buy the kid Legos.
One of the crucial differences between the two is that if it turns out that the toy that I 3d print, isn't exactly what my kid likes. The only thing to do is to start from scratch and 3d print a new thing from the top, like whole again, basically. And if it didn't think about Legos, like the kind of the magic thing is that you can always just click it in and click it out. It's something that's really reusable in a lot of different ways. And if you have a very useful widget, then there's more than one way to use it.
So, okay. What kind of widgets will be widgets then? Well, it'd be kind of like a map, like Google maps, kind of a thing inside of a Python notebook and kind of do GeoJSON stuff maybe, or a drawing data thing that can have lots of educational use cases for it. But the nice thing about a widget is that it's reusable in different ways and that it can click with Python and all the Python data ecosystem tools. That's kind of the way that I think about it. And if I think about vibe coding in particular, vibe coding to me is a lot more like the 3d printer where you're going to 3d print the thing kind of from scratch for like one specific purpose. But once you've got something done, it's not necessarily the case that you can retrofit it to immediately attach into a Python notebook as well. So widgets are like Legos. That's really the way that I like to think about it.
So widgets are like Legos. That's really the way that I like to think about it.
The human in the loop: data science and AI
To me, it's when the humans involved a little bit more. So like if you have a, oh, people have to use the app and like the innards don't really matter. You know, the react app thing sounds a bit more plausible. But when dealing with data, I mean, usually the whole problem with data science is that you want to understand the story that's in the data set. And if you understand it, then you can make better decisions. That's kind of the plan that we have. But in order for you to understand the data, the human has to be involved somehow. Like you have to be able to read a chart or like inspect or whatnot. So the whole point of these widgets is to make that part easier.
One of the, I mean, one of the big questions on my mind these days, and I'm obviously interested in what you think about it, is that in a world where people essentially stop writing code and they're just prompting, like what do you think, just looking at the current stack of tools, like the widgets and, you know, the notebook. And I mean, I think it's good that Marimo was built during, in this era already. So like, I assume that you've all been thinking hard about this, but like, what do you think the average user workflow looks like in two years or three years where like essentially nobody's writing any Python code anymore.
So like, part of it is the psychology of it. I feel like a little bit, right? Like, oh, we don't know. It feels kind of scary. And the whole thing with Claude, at least to me, is sometimes there's a wave of like, oh my God, it can do so much. But then there's a phase where I see it feel horribly that I kind of feel grounded again. So there's that psychology of it is happening in the background as that's happening in the back of my mind.
I just want to be honest about it. I don't exactly know what's going to happen, but if I think about my days in consulting, I'm like, what's the thing that really went wrong most of the time? It's that people just didn't know what was in the data and they were making all sorts of weird, like bad decisions because of it. And I don't know if the LLM is going to do everything and if it's going to be more or less bad decisions, but something in my mind is basically saying like, the magic of the notebook is it's also a nice way to debug.
There is something about, let's make a chart that should be a summary. And then just staring at the chart for five minutes can immediately give you this, oh, hang on, that should not happen. Why is that line going down in December? It should be going up. So it's a debugging thing. And in that sense, you could argue all the code is not necessarily the most important thing. The most important thing is the understanding. The notebook and the code are a tool in order to get to the data understanding bits, so to say.
The ChickWeight dataset and why chickens matter
The best example I have of this is in R, my favorite dataset, the ChickWeight dataset. So one column is the diet. The other column is the time. The other column is the chicken. And the other column is the weight of the chicken. So over time, you see all these chickens gaining weight. That's the whole idea. And you can train machine learning models on it. You can predict, given this diet, given this time, how fat is the chicken or how not fat is the chicken.
But there's one thing wrong if you're going to do modeling that way. And that is the fact that some chickens actually die prematurely. And the only way that the model can be made aware of that is if you, the human, are aware of that. Because if you're going to calculate some sort of regression average on timestamp 10, how is the model going to be aware of the fact that some chickens died at timestamp five for that specific diet? Well, we could have an LLM automate this. But this is my main example of things will go wrong if you don't understand what's in the dataset.
It's one of Hadley's quotes that I remember. The cool thing about visualization is it doesn't scale, but it can still surprise you. It's that surprising nature. It's basically this debugging tool, but for numeric stuff. That's the way that I experienced it. And even if the code is going to be maybe less valuable, the debugging numeric aspect of it definitely won't become less important.
So one thing I do a lot now is I try to just reproduce academic articles with vibe coding. Here's an article, here's an LLM. I want to understand this article as quickly as possible. And then the notebook again becomes this artifact where I can look at charts and I can try to understand if I understand the technique of the paper, so to say. That's, in my mind, the essence of what the notebook is all about. It's not so much about the code per se. It's more this artifact that helps me think. Or reason, I guess, is the better word.
Vibe-coded science and confirmation bias
Yeah, I'm definitely pretty AI-pilled these days. I feel definitely this sense, especially for the data science ecosystem, I feel like data science is one of the domains where the most judgment and nuance is present. It isn't like building a to-do list app. Doing data science is not the same as building a CRUD app. And so the choice of models and the decisions involved, there is a certain art and taste involved with choosing techniques and building a scientific process that tries to eliminate your predispositions or biases about what you think the analysis should, what results the analysis should have.
And what I've been hearing from people who are doing vibe-coded science is that essentially the models are kind of tuning into what you're trying to prove or what you're trying to demonstrate. And then they're subtly building data science that essentially tries to prove or fit itself to the assertion that you're trying to make, which is essentially amplifying your personal bias. I feel like this is exactly what we don't want. We don't want the LLMs essentially doing sycophantic science. Like, oh, what are you trying to prove here? Let me come up with an analysis that confirms your...
you know, it's like a more intense version of confirmation bias, which is a classic problem in statistics and science in general. So, I don't know, like it's in, yeah, in this new world where people don't want to, like, don't want to write code anymore.
You could argue though that, I'm assuming in your case, you do, I mean, you've written pandas and a bunch of things, so you can tell yourself, like, okay, I've got a taste, right? Like, there's, I need this to happen in a certain way, otherwise it's BS. Like, you can make claims like that quite comfortably, I think, right?
I mean, in my, in my mind, there's a couple of very human habits that I tend to rely on. But at least the thing that I taught myself, a chick weight example again, right? If I've learned anything from the chick weight dataset is that if I make a chart, I need to stop for five minutes and look at the damn thing. Cause otherwise you're not going to notice the line that stops in the middle where a chicken might've died. Like that, that's kind of the lesson.
And that also means that if you're vibe coding, okay, one chart at a time, I need to stare at the chart for a couple of minutes before I move on. I'm not going to, if I do serious work, generate entire notebooks from scratch, because then you're not going to be part of the story of what's happening there. And I don't know, like some of this does feel a little bit more like an added human psychological attitude thing than necessarily a LLMs can correct each other. Cause again, it's all about me trying to understand the story in the dataset. And if I remove myself from that equation, I'm going to be in the world of hurt.
I need to be involved somehow. And maybe the solution then at least the most plausible one to me is let's only generate three cells at a time, not, not any faster. And when you reach a milestone, make sure you understand everything that's been happening and then you move on.
I need to be involved somehow. And maybe the solution then at least the most plausible one to me is let's only generate three cells at a time, not, not any faster. And when you reach a milestone, make sure you understand everything that's been happening and then you move on.
The Gorilla dataset and staring at charts
Have you seen Bluff Bench, Vincent? That's something that Sarah and Simon put together where like, it turns out that if you ask an LLM to like read a plot, it often just like reads the access labels effectively, and then tells you a story about what it believes based on the correlation of those two variables. So I think there's also like a lot of like this visual intelligence that, you know, LLMs have yet to learn.
Have you heard of the gorilla data set? Oh, it's the best. This is the best. Okay. So there's a, I think it was in Italy. There's a statistics class. They split up the group in half and like good students, they're equally represented in both halves. One group gets a data set and says, here's a couple of hypotheses we want you to check. The data set has body mass index, number of steps, and I believe male or female. And they want to say things like, okay, do men have more steps? And like a couple of hypotheses you got to check. The other group basically just got the data set and they kind of went, do something with it. What's the story in this data set?
If you were to plot the actual data set, body mass index on the X axis and steps on the other axis, you would get like blue and red colors for male and female. And it'd be a picture of a gorilla waving back at you. Now, one of these two groups was more likely to discover this than the other one. Can you guess which one? It was the one that was preoccupied with the hypothesis checks that didn't bother making the plot.
So I thought as a cute exercise, it was like a year, a year and a half ago, I get this chat GPT analysis bot things. I figured, hey, let's give that, let's give the data set to this bot, see what it makes of it. It completely failed. I made a YouTube video of it. It was just super interesting. They're, they're really bad at charts, but again, the thing that I think humans are also pretty bad at charts. If they don't take five minutes, if you just glance at it and call it a day, there's a lot of stuff you can glance. But there's usually like, hey, why does the line go down there? That should never happen.
Learning, cognitive debt, and the weirdest trick in data science
To some extent, right? To what extent this is really a new thing, because I remember back in the day, what a lot of these data science people would do is they would just hit shift, enter in a Jupyter notebook, run the whole thing. They would call fit predict a few times, then they would say, look, number go up. Right? So to some extent, it's also not like a really, really new thing that there's cognitive, like, that some people are quote, unquote, a little bit lazy and sort of taking a step back and sort of saying, like, oh, it's great. Everything's automated.
I like to think, though, that one of the reasons why my career has progressed is because at some point I took the effort to go just a little bit deeper, such that I kind of knew more what was happening and could apply more tricks in practice. And sure, some of that was syntax, but some of that was also just, okay, what am I actually doing? Am I solving the right problem? And like that kind of a mental exercise. And again, like, even if there's LLMs around, that mental exercise is still part of me. Like, I think there still needs to be someone that understands the data set.
One thing you said earlier, and I think you mentioned like one soft skill underrated in data science is staring at a chart for five minutes. For five minutes, no less. Yes. Five minutes. Yeah. Maybe that's a skill that as AI rolls out, will be really hard to cultivate, like really important, especially important to cultivate.
Weirdest trick that works in data science. Back when I was at Probable, I was able to, I figured out a way to get logistic regression to beat XGBoost. The way that I did that is you got to imagine there's a data set with cars. So year that it was built, color, brand. That kind of data set. And then price. I want to predict the price. Second hand car.
So okay. What you could do is you could sort of say let's do one hot encoding. So like car brands like one, lots of zeros. And like colors like lots of zeros. And one somewhere, one hot encoded. And then what you can do is you could sort of try and do a distance. Between two different encodings of these cars. But the problem is. How are you going to compare two colors? That distance is always going to be one basically. Because it's a different color.
How are you going to do a similarity lookup? Okay. What you could do is you could have like a regression model. So you're going to take your encoding. And then you're going to train towards the price variable. Oh and that means that you have a coefficient for every single variable. Which includes like the one that was for a specific car brand. Let's say. Oh actually that can be an embedding technique now. You can take your linear model. And then we can use those coefficients. Multiply that by the original array. Oh and if you then want to do a k nearest neighbor kind of a thing. Then you now have a system that gets like a distribution of prices. Based on similar cars as well. And similarity now is not just how similar are the properties. But also how does that relate to the price. And it turned out that if you build a system that way. That you actually got better predictions than XGBoost. And you also got like an uncertainty bound around that as well.
Okay. I like this freedom. Like this is something that no textbook will tell you. But if you start thinking about the problem a bit more and more. Then you can come up with these super duper creative solutions. And that's also kind of feedback I do want to give to people. Because people have always been asking me. Like Vincent what book do you read to come up with these techniques? And I think the whole point is that I don't. Like I just take a step back. I take pen and paper. I have a coffee. And I think about the problem. Far away from social media and distractions. And I give myself permission to let my mind free.
And again, but the problem though is like if you want to then have LLMs help you. And LLMs are super powerful. It is very tempting to be intellectually lazy. And that's when you should kind of resist. And maybe like this was a tweet from Andrej Karpathy. Like I think a couple of years ago. Maybe if learning feels a bit painful. Maybe that's a good thing. Because it means that you're doing something that's uncomfortable. And you're doing something that's a little bit unknown. And if you look back at your week. And you kind of go like this entire week was easy peasy. Maybe you should start getting worried.
Pure math, operations research, and the value of struggle
I did pure math as an undergrad. And so I spent a lot of time like doing you know analysis. And topology and abstract algebra. And things like that. I think like you know. Certainly like I don't apply any of those skills now. Like was it useful to learn Galois theory? Like I did. But I've forgotten all of it. But I do think that like mental stretching. And like racking your brain really hard. To think about difficult theoretical problems. And being able to reason about like these abstract structures in your mind.
So I did operations research, so definitely a different field of math, but the thing with operations research is you've got to prove that your mathematical allocation is optimal. If it's not optimal, you can't prove it, it's wrong, basically. It's a thing that the professor really hard drilled into me. And something about that is a critical attitude. Like, I'm not going to believe anything anyone says, I want to see the full proof. And maybe it's not so much the theory, but more the attitude that comes with the science, I like to think, that also makes you good at debugging.
This kind of reminds me of the paper, or at least the title of the paper. It's called Better To Be Frustrated Than To Be Bored, about effective states students might have, and which ones are better for learning. And I do think what you described is, you know, like writing it out isn't necessarily boredom, but like frustration, or kind of responding to something and wanting to prove something or lock it down. That does strike me as kind of a key.
I have a kid now, so I can tell you this story. My theory right now is kids learn to speak because they're frustrated, because they have an idea in their mind they can't tell their parents. And there's this god-awful phase with a lot of screaming in the household. When the kid is making it absolutely sure that something is highly amiss, but eventually the kid learns, oh, I got to learn the language, and that's how I can properly communicate to my parents. There's a little frustration thing there that I do think is key.
Hadley on automation, joy, and musical instruments
I think the one thing that's really interesting is many of us now have the opportunity to automate parts of our job that we love doing. You can automate that, and you can find joy in different parts of your job, but you also don't necessarily have to. You still can keep doing those bits. Maybe not all the time, but tactically. You don't have to just optimize for velocity all the time.
But then the other thing I think about is why does anyone learn to play a musical instrument? You can find someone on YouTube or Spotify or Apple Music doing a way better job of playing any music than you. There's also some of these skills we gain joy from, regardless of whether we're the best at the world or not. And I think some of that, how do we teach this stuff? Coming back to when you do math in high school, most of that you're never going to use. But you're training your brain in this cool problem-solving way that you can get a lot of enjoyment from, independent of the strict utility.
And again, if I can come back to a notebook. One thing I really like about notebooks, it is a nice place where you can put your thoughts as well. There's something about a notebook that is able to do that way more than a code base. As an artifact, so to say. It definitely does feel a little bit more like a thing you would write down in your own little diary, as opposed to a theoretical book in the closet somewhere. It can still be very personal.
Vibe coding defensively and the bread machine example
I will say, with LLMs though, this is a habit. Even if I generate notebooks from scratch and I try to go super big and stuff, you can still vibe code defensively, if that makes sense, in a notebook. I have a video on this on the Marimo channel, if people are interested. But one thing I really love to do now is, if I make charts, the use case here is, I wanted to find out if buying a bread machine was going to save the family money. It's a really silly use case.
But you can imagine there's this one line going up that says, how much money do I spend going to the bakery? And there's this other line going up that's a little bit more flat that says, I only pay for the ingredients and I have this upfront cost of the bread maker. And those two lines intersect somewhere. Except I'm a Bayesian, so it's not a break-even point. It's a break-even distribution because both lines, they wiggle around a bit. There's a bit of uncertainty. We got to take that into account as well, right?
But one thing I explicitly told Claude to do, and I did it wrong a few times, and I kept hammering on it. Those two charts need to have the same x-axis and they have to be on top of each other. Because otherwise, I cannot mix. Otherwise, I cannot glance at both distributions at the same time. And when Claude was done, I noticed that it was way off because the distribution at the bottom did not fit the chart on top. You can just sort of draw straight lines. You can see it was really wrong. But the only reason I was able to find that bug was because I was coding defensively, quote-unquote. Those two charts have to be under each other. And there have to be all sorts of sliders, so I can also wiggle around to check my intuition. But I am very conscious of the fact that I have to sort of code defensively in a notebook these days, especially if I go for more than two cells that Claude can just go ahead and rip.
The Mythical Agent Month and agentic supervision
I wrote a blog post called The Mythical Agent Month that's been circulating. And that came out a couple of weeks ago. And the idea is that, you know, this, the Mythical Man Month is a book that came out in 1975. So over 50 years ago. And obviously software engineering and computers and the whole, the internet is, a lot of stuff has happened in the last 50 years. But Fred Brooks wrote a bunch of claims, essentially the core idea is that when you add people to a project, to a late project, it will become later. And that the larger a team is, the more communication overhead, coordination overhead you have.
And so essentially now with, with agents doing the development, more or less, you have like an agentic version of the Mythical Man Month, where, you know, essentially like you were the chief surgeon. And so you have to have this conceptual model of how the system works and what the agents are supposed to do. But you also have to review and supervise their work and make sure they're doing the right thing. And they aren't creating giant messes for each other to clean up.
And so, for what I'm seeing is that people are creating these, like half million line, you know, million line software projects, but they're just gigantic hairballs that are like, at the end of the day, like, you know, this could have been 50,000 lines of code and a great deal, simpler resource, less, you know, more resource efficient, like in a lot more, a lot more useful and without lots of like unnecessary features. Because there's nobody to say, no, you can add a hundred additional features. Like every day you wake up and over your morning coffee, you're like, oh, can you add these five new features that I had? I thought of, you know, a hundred more. And the agent's never going to say no.
And so it's, it's essentially like. Now like your human, like your judgment and your taste as a designer of these software systems has become the most single, most important skill and you're in your tool belt. And unless you are extremely judicious about what you build and what you don't build and how you proactively hold the agents accountable to build in accordance with your vision. And to not, if they create messes that you spot the messes and you clean them up.
There's a lot of people that are just kind of Yoloing it with their agent sessions and going in and essentially they're asking a thousand prompts in a row, like requesting one feature after another and they're spotting bugs and asking the agents to fix them. But with little regard for the types of code bases that they're, that they're generating. And I think people who are engaging in that motor are in for a world of pain at some point in the future when they hit the inevitable scalability wall.
But it's also, I mean, if I think about scikit-lego, right, there's a thing that I made and I, and I guess, sure, I made scikit-lego. That could have been vibe coded immediately. We didn't need Vincent to write that stuff on his laptop. The reason why scikit-lego is popular is because I went on stage at a couple of events and started talking about it and started spreading the message and maybe just taking a step back. The code is not all of it. It's also like, can you do the marketing around it? It's also about, can I demonstrate to other people that I've got taste? And can I convince them that even if I'm vibe coding, you are going to get value out of it if you use my thing, as opposed to some other dude who is also vibe coding, but maybe just wants to yell at it?
Closing thoughts and widgets revisited
In my mind, and we've mentioned it before in this episode, there is this one exception that stands above everything. It's like the giant pillar, and that's widgets. You can't actually comfortably say for this one analysis, I need this one widget now. You download the Marimo skills that we've got, and you can just generate the widget on the fly. There's this one math challenge with Unveritasium. I'll send you a link. But the whole math problem becomes a whole lot simpler if you can just have the D3 graph chart appear. That was done in five minutes with Claude, and it's part of the notebook, and it becomes so much easier to understand this one math problem.
I like Tectonic and Suguru. Those are like Sudoku kinds of puzzles. Oh, I have a widget now that lets me generate them on the fly and then have all sorts of... The fact that that's normal is just such a creative freedom. And I will say that was just not imaginable a few years ago, but that is something that is just... People are sleeping on this, for God's sakes. I still don't get why I'm the only guy so excited about these widgets, but there's so much intellectual freedom to gain here.
Yeah, Vincent, thanks so much for coming onto the test set. I'm also going to flag, you mentioned the Gorilla dataset. So for people listening, I think I found your article OpenAI versus the Gorilla dataset. So definitely recommend people check it out. And I guess like the only thing that the people listening, check out any widget and know that on the Marimo, there's a Marimo skills repository. You download that and then cloud code can generate these things on the fly for your Marimo notebook. Give that thing a spin.
And have Trevor Mance on the show. He's the guy that made it. He's a colleague of mine. I'm a big Trevor fan, so I feel like it's inevitable. Same. I think I'm on record, like besides Trevor, I'm the guy who made the most widgets and I might have beaten Trevor at this point. That's a sweet spot to hold, I feel like. It is. If people are interested in widgets, by the way, there's a library called Wiggly Stuff that has about 48 widgets at this point, I think, that people can directly use.
Yeah. Nice. Awesome. Well, thanks. Yeah. Thank you so much for coming on. I know it's like late in the Netherlands, so appreciate you burning the midnight oil for us. My pleasure. It's fun to hang out with you folks. Yeah. We'll see you on the internet. Thanks for coming on. I appreciate it. Thanks for having me.