As you start to consider which career path in data is the right one for you, it’s important to establish those data fundamentals that are going to apply no matter which route you take.
The skills required of analytics professionals depend on the specific focus of their work, as well as on the targeted goals of individual companies and the imperatives of different industries. An analytics profession might, for instance, deploy various techniques to mine data, use advanced statistical models to analyze that data, and then employ computer programming skills to design a predictive or interpretive algorithm.
“The most important takeaway I’ve gotten from this role is that it increasingly isn’t enough to just know SQL [programming], or to just know statistics, or to just know software engineering.”
When describing key data analytics skills, experts often cite the following:
Statistical and Quantitative Analysis
Applied statistics is a baseline skill in data analysis and business intelligence. Professionals commonly conduct trend analyses, A/B testing, correlation analyses, profiling, and analysis of maximum likelihood estimators. Interpreting large data sets often calls for advanced statistical models and techniques, like time series predictive analyses, regression and decision trees, text analytics, and more. The online analytics publication Data Nami reports that quantitative analytics professionals were once recruited primarily into the finance sector, but companies from many different industries have since followed suit.
Analytics is grounded in mathematical methods and concepts. While statistics is perhaps the primary discipline used in data analysis. Linear algebra and multivariable calculus are the basis for a lot of the machine learning techniques data scientists and analysts are now using to mine and analyze very large and complex data sets.
Data Mining, Scraping, and Munging
Analytics professionals can scrape data from a myriad of sources, but it is not always useful or structured in a usable way. Data munging is the process of sorting and optimizing large data sets before they are analyzed, while data mining involves examining that data to generate new information. Analysts who mine data using machine learning can build and train predictive analytics applications for classification, recommendation, and system personalization.
If you’re at a large company with huge amounts of data or working at a company where the product itself is especially data-driven, it may be the case that you’ll want to be familiar with machine learning methods. This can mean things like k-nearest neighbors, random forests, ensemble methods —all of the machine learning buzzwords.
It’s true that a lot of these techniques can be implemented using R or Python libraries—because of this, it’s not necessarily a deal breaker if you’re not the world’s leading expert on how the algorithms work. More important is to understand the broad strokes and really understand when it is appropriate to use different techniques.
Multivariable Calculus and Linear Algebra
If you are asked to derive some of the machine learning or statistics results you employ elsewhere in your interview. Even if you’re not, your interviewer may ask you some basic multivariable calculus or linear algebra questions, since they form the basis of a lot of these techniques. You may wonder why a data scientist would need to understand this material if there are a bunch of out of the box implementations in sklearn or R. The answer is that at a certain point, it can become worth it for a data science team to build out their own implementations in-house.
It is important for data analysts to be able to communicate data findings to colleagues and managers so they can make more informed decisions. Data visualizations built with languages like d3.js and ggplot can convey technical information in an accessible way. Programs such as Tableau and QlikView allow data analysts and scientists to explore data visually by creating 2- and 3-dimensional graphics.
If you’re interviewing at a smaller company and are one of the first data science hires, it can be important to have a strong software engineering background.
You’ll be responsible for handling a lot of data logging, and potentially the development of data-driven products.
Think Like a “Data Scientist”
Companies want to see that you’re a data-driven problem solver. Which means, at some point during your interview process, you’ll probably be asked about some high-level problem—for example, about a test the company may want to run or a data-driven product it may want to develop. It’s important to think about what things are important, and what things aren’t.
Data Science Job Scenarios
“Data Scientist” is often used as a blanket title to describe jobs that are drastically different. To help you navigate through multiple opportunities, let’s get an understanding of different types of data science jobs by looking at some common scenarios:
You’re the First Data Hire
The number of companies get to the point where they have a lot of traffic (and an increasingly large amount of data), and they’re looking for someone to set up a lot of the data infrastructure that the company will need moving forward. They’re also looking for someone to provide analysis. You’ll see job postings listed under both “Data Scientist” and “Data Engineer” for this type of position. In a situation like this, you’re liable to be one of the first data hires, so it’s likely less important that you’re a statistics or machine learning expert.
A data scientist with a software engineering background might excel in a role like this, where it’s more important that a data scientist make meaningful data-like contributions to the production code and provide basic insights and analyses.
Mentorship opportunities for junior data scientists may be less plentiful, so as a result, you’ll often have great opportunities to shine and grow via trial by fire.
You’re the Scientist and the Analyst
There are many companies where being a data scientist is synonymous with being a data analyst. Your job might consist of tasks like pulling data out of MySQL databases, becoming a master at Excel pivot tables, and producing basic data visualizations (e.g., line and bar charts).
You may on occasion analyze the results of an A/B test or take the lead on your company’s Google Analytics account. A situation like this is a great opportunity for an aspiring data scientist to learn the ropes. Once you have a handle on your day-to-day responsibilities, a company like this can be a great environment to try new things and expand your skill set.
The Company is Data-Driven
A lot of companies fall into this bucket. In this type of role, you’re probably joining an established team of other data scientists.
Generally, these companies are either looking for generalists or they’re looking to fill a specific niche where they feel their team is lacking, such as data visualization or Machine Learning. Some of the more important skills when interviewing at these firms are familiarity with tools designed for ‘big data’ and experience with real-life datasets.
Despite that, all of these job postings would likely say “Data Scientist,” so look closely at the job description for a sense of what kind of team you’ll join, what kinds of challenges you’ll face, what kind of opportunities for growth you’ll have, and what sorts of skills you’ll need to develop.