What I actually look for in a Data Scientist

What I actually look for in a Data Scientist

Background/Introduction

This is part 3.1 in a series to give some guidance on milestones/goals/expectations or really just “best practices” at various levels in the data science career. For guidance on getting to the pre-Data Scientist level look here. To learn about moving up from Data Scientist to Senior Data scientist you want this link.

This is just the first section on this topic. Elsewhere on the blog I consolidated everything into one long post but it is pretty intimidating. Here are links to each section: One Two Three Four

What I’m actually looking for in a Data Scientist

This section focuses on what I, Paul Brenner - Head of DS at PlaceIQ, personally want to see in a working Data Scientist. It leaves out many “table stakes” included in other sections. If you work at PlaceIQ then it is a good way to understand what I’m looking for. If you don’t, then perhaps it is worth reading through, evaluating how it compares to your current situation/goals, and then forming an opinion on why you agree or disagree with each point.

1. Understand and be able to explain WHY

Why are you even doing this project? Why are you using the tools you are using (scala? Spark? A nifty ML package? etc)? Why are you paid so much?

In summary, there are multiple skills here: Understand the value of your work, have some awareness of how the company works, understand the value of skills you have or don’t have, be able to persuade the people up the chain when you have a strong case, and finally bonus points if you are starting to learn how to manage your own career. But also there is a whole tangent here if you want to go down that rabbit hole - Part 4.

2. “Simpler is better”

Believe this, don’t just pretend to agree with it, but really deeply believe it even when no one else is looking.

Have experience reading code that was too smart for its own good, and appreciate why dumb, readable, and easily changed code is better than flashy code. Ideally have written that code yourself, then come back to it later and have been like “what the… what is this, why?”

See a blog post or talk or random anecdote or company description where ML was used and think “this is not a good use case for ML, they should have done something simpler”

Accept that all that time you spent learning the theory of model select and how SVM works was probably educational, but won’t actually be relevant: if you need to use ML you should probably start by tossing automl or an xgboost varietal at the problem, checking the results, then only digging in if the results/performance don’t meet requirements

3. Leave Everything Better Than You Found It

When you learn something new share it. Someone else will probably be happy to learn that new thing too, find that person. Or, people love hearing about things they already know it either helps them understand it better or at least feel smart. Give them that.

Oh and if the new thing you learned is a “result”, know when to share those too. When you get results you should…

  1. Document them for a technical audience (put them on a ticket perhaps?)
  2. Reflect on “who needs to know about this”
    1. If the answer is no one then, what the, why did you do this project?
    2. If the answer is “just the people on the ticket” then stop and think “really? There is no one else who I can show these nice results off to?”
    3. If there is someone else, identify the best way to share with that audience. Is it a slack message? Email? A meeting? A few slides?
  3. Ok now share appropriately

4. Every problem you find is your problem

Job #1 is to do good work on whatever project you are supposed to be working on.

Job #2 is to start looking for areas of opportunity. That means if you stumble across a problem with something, your response should be “nice! I found something that might need fixing and maybe that fixing could make the company better”.

5. Make mistakes and really believe that it’s fine to make mistakes

Your job is to move quickly, write code that may only run once, and apply the 80/20 rule. Embrace that you aren’t a software engineer who needs to write rock solid code that needs to meet an uptime requirement with five nines. YOU WILL MAKE MISTAKES. And that is fine.

Everyone else on your team makes mistakes too. Do you believe that? Please do.

The real job is NOT to NOT make mistakes. It is to catch most of your mistakes before someone else does. The real job is also not to catch all of your mistakes before someone else does… nope, other people will catch your mistakes sometimes and you will feel embarrassed and like the world must be about to end. When you make a mistake you need to own it, realize that it’s fine to make mistakes, and help clean up the mess.

Failure modes here would be

  1. Catch a mistake after it is out in the world and try to cover it up
  2. Get defensive if someone else catches your mistake
  3. Get distraught and beat yourself up if you make a mistake and it has impact
  4. Fail to appreciate that mistakes are a job expectation
  5. Go overboard and embrace this philosophy too much and just not even try to avoid mistakes… oops, that went too far, don’t do that either, find glorious balance
  6. All that said, the real solution to mistakes is to build systems/procedures that aren’t vulnerable to human error when possible. If a mistake causes a major business problem, then that means it is time to find a way to make it so that such a mistake can’t cause as much damage in the future

6. Take care of yourself

You aren’t invincible. Your brain isn’t invincible. Your will power and motivation also are both not invincible. In fact these are all finite valuable resources and need to be treated as such.

At this stage you should already be learning some ways to take care of yourself. That means I want to see that you start building some skills to defend against burn out and start helping your manager help you. Speak up when you need something. Look deep inside yourself and have hard thoughts about how you are holding up then share the results.

Definitely, DEFINITELY, do not just read this section and think “nah, whatever, I’m good, this doesn’t apply to me”. Actually don’t think that about any of these sections.

On to part 2!