Κυριακή 1 Μαρτίου 2020

Getting started with AI? Start here!


Everything you need to know to dive into your project

Oct 19, 2018 · 18 min read
You can’t expect to get anything useful by asking wizards to sprinkle machine learning magic on your business without some effort from you first.
Instead, the first step is for the owner — that’s you! — to form a clear vision of what you want from your dog (or ML/AI system) and how you’ll know you’ve trained it successfully.
  • Identify the use case
  • Do some reality checks
  • Craft a performance metric wisely
  • Set testing criteria to overcome human biases
Make sure the right person is tasked with the first step in machine learning and AI. More info here and here.

Figure out who’s in charge

The tasks we’re about to tackle are the responsibility of the project’s responsible adult. That’s whoever calls the shots. If a PhD researcher is selected for this role, it should be because of that person’s decision skills and deep understanding of your business. If you’re about to put them in this role and then second-guess them later, you’ve chosen the wrong person. The entity we’re calling The Decision-Maker (it could be a person or committee) is the one that’s supposed to get final say. Elect your benevolent dictator wisely.
If the decision-maker is someone you’re planning to second-guess later, you’re doing it wrong.
If the decision-maker isn’t well-versed in the art and science of decision-making, there’s a fix: pair them up with a qualitative expert. But if the person in charge doesn’t understand your business, you may as well just flush that cash now.

Identify the use case

Focus on the outputs

The key thing is that ML/AI is not magic and it doesn’t solve every problem. It’s a thing-labeler and it’s up to you to figure out what you need labeled.
Don’t hire that PhD guru before you’ve confirmed you need them. Focus on the outputs first.
If you read my recent article about how algorithms work, you will have noticed that the article took for granted that it’s worth labelling bottles of wine as liked or disliked by Cassie. Who signed off on wasting everyone’s time with that stupid use case?! How does it help the business? Should that classifier even exist? Supposing it can be made to work, is it worth building?
Just because you can do something, doesn’t mean it’s a good use of anyone’s time.
Imagine your ML/AI system is operational and ask yourself if you’re happy you sunk company resources into making it. No? Keep brainstorming. Better to discover no one needs your application before several PhDs waste their lives on it.

Now is not the time for inputs

Some of you decision folk are gorgeously fluent with data. You can talk inputs and outputs all at once… and you understand the difference. My advice may surprise you: resist temptation! Just don’t talk about inputs yet. Seriously. I know you can, but don’t. Here are two reasons of many.
Ask yourself, “Is this the end or the means?” If it’s the means, don’t talk about it for now.
The trouble with many fluent folk is that they think everyone shares their fluency. Some utterly brilliant people in tech have surprised me by not having this skill, so now I know not to take it for granted.
Some folks have trouble figuring out which variable is the input and which one is the output… it all looks like one big confusing lump to them and they need your focus to appreciate why the outputs are worth having.
I, for one, have to be on my guard against getting caught up in the technical correctness of minutiae. Joke: Technically correct is the best kind of correct. Tautologically correct is a kind of correct.

This is about teeming multitudes

Let’s imagine that automating the Y/N labelling of wine bottles is the use case you’re going for. (If you prefer tea to wine, here’s the alcohol-free version of this text.)
ML/AI is not for one-offs, so make sure your business needs an impressive number of items labeled.
You’re imaging labelling at least a few thousand of them? And when this thing is live, you’re sure you can’t just look the answers up instead of predicting them? Good. Let’s continue.

You might not be ready for machine learning

Still struggling to find a use case? Consider pausing ML/AI in favor of analytics for a while. The goal of analytics is to generate inspiration for the decision-maker. Once you’re inspired, you can come back to ML/AI and start over. Analytics (a.k.a. data-mining) is a great idea for every project, whereas ML/AI is only for projects where the goal is to use data to automate thing-labeling. Although the underlying math is often the same, the processes are very different. Data-mining is all about maximizing the speed of discovery, while ML/AI is all about performance in automation. In data-mining, there’s only one mistake your team can make, whereas ML/AI has an impressive list of potential kablooies. Sign up for those headaches only when you have a use case that’s worth the bother.

Who is it for? Think about your users!

Who is your dazzling invention for? Who benefits? This is a great time to consult with a UX (user experience) specialist and map out your application’s intended users.
Ideating new technology often starts with the what, but it’s important to cover the who before you proceed to the how.
Something I’ve learned by spending time with UX designers is that my knee-jerk description of who my users are… is usually rather naïve. Have I thought about usability for indirect beneficiaries, for society-as-a-whole, for other businesses, for machine systems using the output as input, for engineers engaged in debugging, and so on? To avoid shoddy UX design, please take the time to think about all your myriad user categories before you proceed. User doesn’t just mean customer or end user in the most obvious sense.

Is it ethical to proceed?

What if your idea isn’t uniformly beneficial to everyone? While mapping out your ideal use cases, consider those who might be harmed by the existence of your system. I don’t just mean competitors in your industry. Can any humans be harmed by your application? This is especially important if your technology scales to millions or billions.
Think about the humans your creation impacts! Who benefits and who might be harmed?
If you care about developing technology in an ethical and responsible manner, it’s your duty to think about all the people your creation could affect. Thinking about them after you’ve built it is irresponsible. The time for it is now! An ethicist can help you shoulder some of the burden. Bringing them on board as an advisor is a great idea if your project has the potential to strongly affect human wellbeing. Along with your UX specialist, their participation in the project will help you ensure that groups affected by your creation are given a voice.

Do some reality checks

Once you can clearly articulate what labels you’re after, it’s time for a quick reality check: do you have data about this business problem?
No access to relevant data or no computers to crunch it? There’s no nice way to say this…
This is your dream of ML/AI if you have no data.

Reality checklist

Make sure you can answer yes to all of these. Each one of these is likely to get its own guide later, stay tuned. This is just a quick overview of questions for identifying an nonstarter.
  • Reasonable expectations: Do you understand that your system might be excellent, but it will not be flawless? Can you live with the occasional mistake?
  • Possible in production: Regardless of where those decisions/labels come from, will you be able to serve them in production? Can you muster the engineering resources to do it at the scale you’re anticipating? You’ll be looking into this question in more detail when you sit down with engineers, but for now the glimmer of a sanity check will suffice.
  • Data to learn from: Do potentially useful inputs exist? Can you gain access to them? (It’s okay if the data don’t exist yet but you have a plan to get them soon.)
  • Enough examples: When you’re sharing a beverage with your statistician or machine learning engineer buddy and you casually mention the number of examples available plus the kind of output you’re aiming at, is their expression free of cringes? (I’ll teach you how to develop this intuition yourself in a future article.)
  • Computers: Do you have access to enough processing power to handle your dataset size? (Cloud technologies make this is an automatic yes for anyone who’s open to considering using them.)
  • Team: Are you confident you can assemble a team with the necessary skills?
  • Ground truth: Unless you’re after unsupervised learning, do you have access to outputs? If not, can you pay humans to make them for you by performing the task?
  • Logging sanity: It’s possible to tell which input goes with which output, right?
  • Logging quality: Do you trust that the data actually is what its purveyors claim it is? (To learn from examples, you need good examples to learn from.)

Assemble your team

Once you’ve cleared the reality checklist, it’s time to start recruiting, which you can do in parallel with wading through the rest of this guide. My advice on the roles you’re looking for is here.

Craft a performance metric wisely

Own the tradeoffs

The next bit can get a bit tricky if you’re new to it. You’re in charge of deciding how much each kind of outcome is worth. Is the disgusting bottle of wine that got a Y twice as bad as the delicious bottle we missed out on? 3.4823 times? Up to you!
Economists make surprisingly useful advisors for AI projects.
Now that you’ve figured out how to trade off various results on one single output, it’s time to think about how you’d like to score a few thousand of them at once. Using the average is an average choice, but you needn’t be average. Again, decision-maker is boss here. The right way to score it depends on what’s right for your business.

(Optional) Expert mode is heavy on simulation

Tricky, complex projects benefit tremendously from simulation. That’s where an analyst specializing in generating fake but plausible data can help you see the potential consequences of the choices you’re making in this article.
A dress rehearsal helps opening night go smoothly.
Simulation gives you a dress rehearsal, which helps you get a bunch of kinks ironed out before you have to start your project for-realsies. Like analytics, it takes some of the burden off the decision-maker’s mandate to meditate hard and think of everything.

Make your metric

There are many different ways to make a metric. In our wine example, you could go for the really simple one: accuracy a.k.a. “Don’t be wrong.” Every mistake is equally bad (0), every correct response is equally good (1), then you take an average (which you’ve been itching to take — someone needs to put a central limit on how appealing people find averages).

Ask for an expert review

In applications where human wellbeing is substantially on the line, seek a consultation with a panel of experts to verify that it isn’t possible to get a high score on your metric in some perverse and harmful way.

Hello, business performance metric!

When you’re done, you’ll hear an angelic chorus. You’ve created your business performance metric!

(Jargon trigger warning) Something your AI experts should know

Here’s a nuanced thing you can skip until we dig into it in a later post, but if you know what a loss function is then you’ll realize we’ll have two metrics in play. If that means nothing to you, let’s not worry about it for now, your job here is only to ensure your ML/AI experts read the next paragraph. Many of them missed this lesson in school. Warning: many decision-makers will find that it reads like a jargon-gibberish nightmare — my apologies! — so just forward it along.
The loss function is for optimization, not for testing.
“In applied ML/AI, the loss function is for optimization, not for statistical testing. Statistical testing should ask, “Does it perform well enough to build/launch?” where perform should be defined by the business problem and its owner. You’re not supposed to alter the business problem statement to suit your convex optimization ambitions. For expediency, you’re free to optimize using a standard loss function that moves in the same direction as the function your leader’s imagination just spawned (perform correlation checks* analytically or with simulation), but please test with their function.** Have you been using loss for all your evaluations? Don’t worry, it’s a common mistake which may have something to do with software defaults, college course format, and decision-maker absenteeism in AI.”

Set testing criteria

Define your population of interest

Talking about your system “working” makes no sense until you specify which instances you’re intending for it to work on. All US summer inputs? All inputs globally? (These are different!)

Commit to crushing it!

Now that you have your performance metric and population handy, you have one more job to do before you can go put your feet up. Since getting here can take months, your poor feet must be pretty sore.
Setting a bar for testing the system is a responsibility the decision-maker should take very seriously. (Image: Maria Fernanda Murillo winning a high jump competition, credit: Diego Sinisterra)
Setting criteria up front is part of how you keep yourself (and me) safe from horrible machine learning and AI.
This criterion is not a guiding star. You can also have a performance level that you tell your team to reach for (purrrrfection?), but that’s not what you will test against. Test against the bare minimum.

You biased species you

Why are you coming up with this cutoff now, before you’ve even assembled a team for this project? Turns out that as members of the human species, we suffer from a few lovely cognitive biases (don’t say sunk cost or endowment effect or confirmation bias unless you want that economist to appear out of nowhere again, this time with a psychologist in tow) which boil down to this: when humans invest time and effort into something, we fall in love with what we have made… even if it is a pile of poisonous rubbish. Then we find ourselves bargaining: “Awww but the performance is not so bad. I’m kind of proud of 12% accuracy. Maybe we could launch it anyway? Why don’t I test against 10%? See? It passes. It’s statistically significantly good enough.”
We humans fall in love with what we have poured effort into… even if it is a pile of poisonous rubbish.
While we are still sober(!) and before we’ve poured much of ourselves into it, we’re going to take a cold, hard look at the business problem and say, “If it does not meet this minimum requirement, I promise that it DIES.”

Better than human?

I hope you have an inkling now as to why I’m forced to suppress a chuckle when people ask me, “Is AI better than humans at stuff?”
Better than human? Tautologies are tautological.
If its maker required it to be better than human and did the testing properly, then if it wasn’t better than human… they crushed it. If it exists, then yes. Unless they didn’t require it to be better than human, in which case, it probably isn’t. Why are people asking me? Ask those decision-makers how they set up their criteria and testing. (Speaking of populationswhich humans is it supposed to be better than? That one volunteer?)

Don’t be too demanding

Always requiring better-than-human performance might make you miss out on profit. It’s a bit like saying that you’ll only hire an Olympic gold medalist to lay bricks for you. Maybe an Olympian is stronger than your average Joe, but having such a strict hiring bar might leave you with no workers at all. Lower your standards to where they make good business sense, but no lower. Setting the testing bar at your bare minimum is incentive-compatible, as economists might say. (If you’re fresh out of B-school and want to nerd out on mechanism design, we’re essentially setting up a BDM auction for our hypothesis testing procedure.)
Don’t miss out on a profitable solution by testing against an excessively high bar.
Sometimes automation gives lower per-unit quality than handmade goods, but the scale and speed of the machine makes it worth it for business. Is it worth it for your business? Hey, you’re in charge, not me. Good luck!

WRITTEN BY

Head of Decision Intelligence, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. twitter.com/quaesita


1 σχόλιο:

  1. 1) Be aware of the cravings whenever they appear, detect them and think "I'm feeling I'm having a craving for X."
    2) Don't fight it. Let the craving rest in your brain and be conscious of it
    3) Instead of shaming yourself for giving in to a craving, tell to yourself what you'd say to a friend of yours who's had a slip up.
    3) If craving overpowers you, "TALK to YOURSELF", in the same manner you would talk to a friend if he/she is struggling.

    ΑπάντησηΔιαγραφή