Minimum Best Practices For Data Scientists
This is part 3.3 in a series to give some guidance on milestones/goals/expectations or really just “best practices” at various levels in the data science career. For guidance on getting to the pre-Data Scientist level look here. To learn about moving up from Data Scientist to Senior Data scientist you want this link.
This is just this third section on this topic. Elsewhere on the blog I consolidated everything into one long post but it is pretty intimidating. Here are links to each section: One Two Three Four
Specific “Checklist” To Help Get That Data Scientist Title/Job
Now that I’ve forced you through two sections on expectations for a Data Scientist, if you still think that is a job you want, then here is a list of specific accomplishments that would be good to have under your belt to get that title.
Important Disclaimer:
Here is basically a checklist for someone who does not currently have the title of DS that wants to get that title. This sort of thing is dangerous. It isn’t a promise that if you do all of these things you will get a promotion/job. It also isn’t a blocker. If you haven’t checked off every “requirement” but are just crushing it on a few of the others, that could be plenty. In fact, being a data scientist requires enough business acumen that I’m just going to put that on the list “understand that you can check all these boxes and still not get the title/job and really get why that is… but also that the reverse is true”
- Technical Problem Solving: Provide “proof” that you have problem solving experience and that your abilities are worth proper evaluation. Some examples that could count as proof, whether fair or not, are below. More is always better, but the requirement here is at least one of the options in this section.
- In School:
- A PhD that included beating your head against a hard problem for multiple years and the ability to articulate why it was so hard and what your role was in solving that problem. Note that there are plenty of people who make it through a PhD and don’t acquire the problem solving skills needed for data science, so it isn’t a sure thing. But there is a decent correlation between those successful at finishing a PhD and those who have this problem solving box checked.
- A Masters that includes some really above and beyond extra projects or lab work. Masters programs tend to focus a higher percentage of time on coursework/homework/class projects which are just too guided/not open-ended enough to really check this box. There is nothing wrong with that! There are plenty of people who learned valuable skills from their masters program. But you should just know that if your masters really involved some outstanding demonstration of problem solving then you need to make that clear.
- Technical project at work/home:
- It doesn’t actually matter if you just create your own home project and see it through to the end (impressive because it also shows internal motivation) or it is a work project assigned to you (impressive because of working inside the complexity of a company).
- Two (or more) projects that may have come with a high level problem statement / requirements but required your own independent work/thought on the following 4 steps in the problem solving process:
- Problem Identification: Figure out the actual sub problems that needed to be solved to complete the project
- Defining unknowns: Identify what questions need to be answered/data collected/experiments run to get the data that will inform which solutions should be attempted
- Solution Brainstorming: Propose multiple solutions to the problem, prioritize which ones to try first
- Solution Evaluation: Identify the best solution based on an appropriate set of criteria
- In School:
- Non-DS problem solving anywhere in life:
- Problem solving is not just looking at data and writing code. Have an example where you can talk through the problem solving process in problems you encountered with…
- People
- Non-people (physical objects, pets, I don’t know, get creative)
- Logistics (like planning a trip, being lost and getting unlost, or making some process better)
- Ideally, the less fancy the better here. Your goal is to make it clear that problem solving is so deep in your blood that you can’t help but do it all the time and you can articulate how you do it.
- Problem solving is not just looking at data and writing code. Have an example where you can talk through the problem solving process in problems you encountered with…
- Interpreting Results: You should have a lot of experience under your belt interpreting results, the numbers here are rough guidelines (completely made up?), but if you hit these guidelines you should naturally be able to convince someone that you have great experience interpreting results.
- 50+ times that you have looked at data (counts? A chart? Rows in a table?) and made a decision. No you shouldn’t remember all of them because 50 is a lot. But basically this should be something you have done a lot. 50 is a completely made up number here to represent “a lot”. This should be really easy to hit
- 3 examples of the above that you remember because there was some really cool conclusion/decision/next step that came out of looking at data. Be able to tell the story here. This should be quite easy to hit.
- 1 example of the above where you shared your data with someone else “smart” (Your team? A well informed friend? Your boss if you are lucky and have a smart boss? What I mean is not just your dog) and YOU had the great aha discovery moment to figure something out from the data before they did. PLUS you explained it to them and they said “good job, you must be some sort of data scientist” (they didn’t actually need to say this, you just need to have explained it and they understood and agreed). This may be a hard one for so many reasons: maybe you are too humble to take credit, maybe an outside opinion/fresh set of eyes is just what you needed the times you shared… or MAYBE you don’t share your results enough which is its own problem. Pretty cool how this requirement forces you to share your results a lot though.
- Coding ability: Provide proof that you have a good enough handle on the basics of coding that you can solve problems using your coding language of choice.
- Finish some sort of coding class/bootcamp/book/web tutorial/whatever that covers coding fundamentals. That should mean familiarity with everything on this page. I don’t care how you do it. But if you show up being a wiz with dataframes and sklearn but don’t know what a for loop is that would be annoying (but not necessarily a deal breaker!)
- Solve 10 “easy” coding problems (like the easiest problems on HackerRank) using Variables, Functions, Conditions, Loops, Arrays, and Strings. OR if you are pretty confident you can do this and don’t want to spend the time practicing just be able to solve such a problem if someone were to test you for some reason.
- Two (or more) projects that require the use of dataframes and any imported package. This one should honestly sound like a preposterously low bar.
- Learning ability: Provide proof that you can learn basically anything and quickly
- 2 examples of learning to use a new package/tool for a project when it was appropriate for the problem at hand
- 1 example of learning to use a new algorithm/statistical concept/methodology/etc for a project
- 1 example of something you learned recently related to data science that you can explain and convince me is actually pretty cool/useful
- 1 example of something you have learned/know that is really just too complicated to explain in depth in 30 minutes
- Nice to have: 1 example of learning at least the basics of a programming language beyond your first. You may be surprised how fast you can get the hang of javascript or R or whatever basics (1 day?) and then you can always cite that as an example.
- Communication: Provide proof that you value communication and can do it. Note that the ordering of these and the number of examples required implies something about how each one is valued. By this point in life you certainly should have a good example of presenting your results in a structured format, but admitting you don’t know the answer to a question during that presentation may still be scary.
- 3 examples answering a question from someone you respect with “I don’t know”
- 3 examples telling someone you respect that you don’t know/don’t understand something even though they didn’t ask you a question and you could have just pretended you understood.
- 2 examples learning about a problem someone else has and helping them with it
- 2 examples sharing a problem you are in the middle of and having someone else help you with it
- 1 example of a really great short email you have written that really got the point across in a concise way
- Sharing something you learned/results you collected/a thing you created/ a project you finished…
- 1 good examples successfully sharing in an ad hoc format (just talking and waving your hands around, maybe pulling up a chart on the fly, etc)
- 1 good example successfully sharing in a structured format (slides? A document? etc)
- Understand that you can check all these boxes and still not get the title/job and really get why that is… but also that the reverse is true
On to part 4!