Is 'Data Scientist' the 'Sexiest Job of the 21st Century'? And How Do You Get One of Your Own? Even if you're not versed in advanced analytics and data science, you can understand the thought process data scientists go through.
By Asha Saxena
Opinions expressed by Entrepreneur contributors are their own.
When you hear the word "data scientist," what does that term mean to you?
Is it the "sexiest job" of the 21st century as the Harvard Business Review suggested? Does it describe a really smart person with advanced degrees in computer science, applied math, statistics, economics? Someone who analyzes and extracts business value from big data?
Related: Think Your Company Needs a Data Scientist? You're Probably Wrong.
A data scientist can be all of these things and more. This type of professional looks for patterns and trends in large sets of data, using a variety of tools, techniques and critical thinking to arrive at practical solutions to real-life data-centric problems.
According to Hugo Bowne Andersen in HBR, "Data scientists use online experiments, among other methods, to achieve sustainable growth. They also clean, prepare, validate structured and unstructured data to build machine learning pipelines, and personalized data products to better understand their business and customers and to make better decisions."
Now, even if you didn't go to school in advanced analytics and data science, understanding the thought process data scientists go through might help your early-stage startup understand what it is exactly these professionals do:
Data scientists ask good questions.
Any data science project will have a set of expectations on deliverables, goals, results, length of time, etc. And as James Le pointed out in the Medium article "How to Think Like a Data Scientist in 12 Steps," a helpful way to understand a person's expectations more fully is to ask good questions.
"Good questions are concrete in their assumptions, and good answers are a measurable success without too much cost," Le wrote. Improving your skill in asking good questions is valuable in any business situation. It may help your early-stage startup if you're on a journey to become more data-driven. Mark Schindler discussed on TowardsDataScience.com how "creating a question landscape" can be useful for creating a data strategy; he suggested placing your questions into three categories:
- What questions could you answer right now?
- What questions could you answer if you did a little digging with your current data?
- What questions can't you answer because you don't have the data yet?
Schindler offered examples of questions for each category:
- "How many downloads did you have in the past 30 days?" might fall into the first category.
- "What are the age demographics of your most frequent users in the past 30 days?" might fall into the second.
- And "What is the average session length of your top and bottom quartile of users?" might fall into the third.
This useful exercise helps you figure out things about your business and your data that you can answer, and may point you in the direction (a data road map) of new questions or hypothetical scenarios which may not yet be proven or known that you would like to explore further.
Data scientists understand how to identify data sources and their value.
Bill Schmarzo, CTO of big data practice at Dell EMC, created a 28-page white paper, Think Like a Data Scientist workbook. In it, he delved into the data scientist thought process of using predictive and prescriptive analytics to find the right answers so that a business can achieve its objectives.
Related: 3 Ways Scrappy Entrepreneurs Can Keep Data Scientists on Board and Motivated
I particularly liked the section called "identify data sources," which explained that during the eight-step workbook exercise, a reader will find all kinds of new data sources that "might" provide value with respect to: 1) the targeted business initiative (increasing sales, revenue, website traffic, conversions, etc), and: 2) key business decisions he or she is looking to answer. Likely data sources, the white paper said, include:
- Historical operational and transaction systems data (ERP, financials, HR, supply chain, sales force automation and marketing, for which data is captured, but likely not available on readily accessible platforms.
- Internal unstructured data sources like email conversations, consumer comments, clinical studies, research papers and notes from employee and customer interactions.
- External data sources, including social media, newsfeeds, weather, traffic, economics, research papers, white papers and public domain data from government and college institutions. (like the Think Like a Scientist workbook).
In the workbook, Chipotle is a frequent example. Chipotle's data sources could consist of: point of sales transactions, market baskets, Product Master, store demographics and competitive stores sales, store manager notes, employee demographics, consumer comments, Yelp, Zillow/Realtor.com,Twitter/Facebook/Instagram and more.
Once you have identified a variety of data sources, the next step is to assess the business value that each source brings with respect to supporting certain key business decisions. You can set up a spreadsheet and plot the data sources as row headers in the first vertical column, then plot the key business decisions as horizontal column headers in the first row. In the example with Chipotle, some of the business decisions were:
- Increasing store traffic
- Increase shopping bag revenue
- Increase promotional effectiveness
You can do this exercise yourself by putting in business use case questions relevant to your industry and startup.
Some helpful tips for hiring your first data scientist.
If you're a startup or business looking to expand into the world of big data and machine learning so as not to fall behind your competitors, it might be time to hire your first data scientist engineer. Hiring for this role can be more complex than, say, a software developer. Forbes contributor Shourjya Sanyal recently wrote, in a post titled "How To Hire Your First Data Scientist," that this task is more complex because:
- It is difficult to write up a job description for a data scientist role.
- A large number of data scientists may be willing to apply, yet few have the required experience.
- Few industry standards and benchmarks are available.
Shourjya suggested questions that may help the interviewing process. If, for instance, you are building a data product or app, hiring a scientist directly from the academic world and allied research laboratories might not give you the "software engineering experience, as well as some management experience" needed to prioritize tasks and drive your business's value.
Of course there is always the example of Uber which hired a VP of data science at Twitter. Scraping and preparing data from different sources to build data pipelines may also be helpful for launching data-driven products. Shourjya also mentioned asking about a candidate's portfolio, and, if it includes a team project, what specifically the candidate did to contribute.
Overall, you need to define your company's needs for your first hire. In "The art of hiring data scientists,", Sara Vera, a data scientist at Insightly, wrote that, "If you're looking to build ad-targeting or recommendation engines or [to] do algorithmic training, then you are going to want to look for a candidate with really strong mathematics and computer science background."
Just remember that the field of data science is relatively new and is often categorized under an umbrella of different but overlapping skill sets such as data mining, data engineering, data prep, artificial intelligence, machine-learning, analytics, big data, statistics or even data visualization.
For instance, if you need a data scientist who will be reporting to managers about how your product is doing, or how user growth has increased or dropped off, then finding one good at "data storytelling" is helpful. Forbes contributor Brent Dukes described this kind of job as "a structured approach for communicating data insights that involves a combination of three key elements: data, visuals and an overarching narrative of what is going on."
And, as Vera wrote, these types data scientists may come from "social science academic backgrounds because in the medical field, for example, sociology, economics or geography makes them accustomed to doing this with their data already."
Related: The Five AI Professionals Companies Need to Succeed in 2019
Once you do more research into exactly what your startup needs, you can make a more informed hiring decision on who your first data scientist hire should be. Your foray into understanding this professional's engineering and computer programming language strengths -- and weaknesses -- can help you build a hiring road map toward growing an entire data science team down the road.