Hiring Data Experts? Key Takeaways from my Experience
Blog | December 1, 2021 | By Saravanan Murugan
Currently, almost all organizations go through “Talent War”, and leaders are spending significantly more time than usual days in adding good talent for their organization. In this article, I want to emphasize why it is important to spend time in evaluating your future employees and understand broad based skills. We already know from experience that there is no shortcut to this.
I will take an example of Data Scientist hiring. While some consider anyone with Python coding skill is a good fit for Data Science role, let’s understand what the key skills are required for a Data Scientist.
As per the above Skillset Venn diagram, an ideal Data scientist should have skills at the intersection of all three circles. There are at a high level three Major Skills and three overlapping Skills and all are equally important. Let us see why they are important.
Business / Domain Skills
Any Data Science / Machine Learning project will have direct impact on the business, in a way, they will either Transcend, Transform or Disrupt the Business model. Let’s take an example of retailer who wants to use ML based system to optimize their Marketing spend. The Data Science expert should first understand how the current marketing spend decisions are made. What are all the key factors considered by the Marketing team before investing in campaigns? How are the campaigns finalized? How the revenue, effectiveness and hence the value of Marketing cost is recognized? And several other aspects associated with Marketing function. Senior leaders and experts in this field who run this function effectively for long period has wealth of knowledge.
A Data Scientist should have foundation knowledge in the domain (here Retail domain with Marketing Function Specialty) to understand the business process / business problem effectively. This skill is essential, but it is not always possible for every data scientist to bring domain skills in all the areas. A senior person with multi-faceted domain knowledge gained through work experience will have to use the foundational knowledge to hone the specific functional knowledge in project. Usually, Data Scientists with foundational knowledge and research experience starts the journey with Literature review and systematic approach in gaining the domain skills faster.
Math, Statistics and Algorithm Skills
This is a formidable skill expected from every Data Scientist. Every Data Science / ML project deals with data and every form of data is represented as number in computers (images, text, video etc. are also represented as numbers). Primary math skills required are Linear Algebra, Vectors, Matrix, Calculus and most importantly Statistics.
Some reasons why we need them
- Before developing any ML model, data should be statistically tested for validating population related assumptions
- Optimizing Regression models need Linear Algebra knowledge
- All distance-based algorithms need Vectors and Matrix knowledge to choose right method
- Deep Neural nets, Text Analytics require Vectors, Matrix and Calculus
Based on the type of problem, they are very involved and complex. For example, to build an efficient CNN model for image detection, how to identify the right size Kernel Matrix, how many (deep) layers are required, how to reduce the hyper parameters, how to avoid vanishing gradient, exploding gradient problems etc., require Matrix and Differentiation knowledge. To give a guideline on what are we dealing with, a to detect 1000 different images accurately with million input data, we need a 224x224x64 input CNN with 19 layers. This model uses around 138+ Million training parameters, with 3 different types of Kernels. This network was 99% accurate and required multiple training iterations.
Model Development is a major task, which demands algorithms knowledge such as Supervised Learning Algorithms, Unsupervised Algorithms, Deep Neural Nets, NLP / NLG etc. and most importantly choosing the appropriate one for the given problem. Similarly applying the algorithm(s) and combining them effectively if required to get superior results require good knowledge on Permutations and Combinations. For eg. In our Retail Marketing example, if we want to find the marketing effectiveness, regression models with combination of forecasting models can be used. This knowledge is more important for Applied Data Scientist Role where as Math knowledge is more important for Research Data Scientist Role.
IT / Product Engineering Skills
Of course, Python and other Data Science tool kits are part of technical skills required. Beyond that, Data Science projects use large volume of data, high velocity and variety of data. Efficiently collecting these data, or embedding the results into various channels such as ecommerce platforms, mobile platforms, SaaS products etc. require Technology skills. While many of the actual implementation on Technology front can be handled by IT specialist, but ML pipeline is essentially a technology eco system and good understanding of this is essential for successful implementation. An example is, several real-time analytics such as Fraud detection system, Hyper- Personalization System etc. require seamless integration with pipeline Architecture. With 5G implementation, soon every data science implementation should have federated and multi-tenant operations with smart sensors & devices.
Minor skills overlap with at least 2 of the major skills. For example, Data Engineering can be one of the minor skillset required is overlap between Product / IT skills with Data science skill. If your organization is in Product development where you deal with huge volume of data and need to provide real time analytics, you have to evaluate candidate on how to use Probablistic Data models and Algorithms.
Now, going back to original question on Finding the right talent, employers should look for candidates with strong, broad-based skills in 2 out of 3 major areas. Invest time in assessing the candidate’s potential in minor areas. There are some general important aspects need to be tested. For eg, one of the biggest challenges in Data Science projects are communicating the details with Business Team. That is one of the reason, why Storytelling is an important minor skill expected from Data Scientists. All digital born companies use innovative ways to conduct interviews and identify the skills from candidates. Experience also shows higher joining rate by following this.
And last but not the least, make sure the interview process is enjoyable and not a burden. In our organization, candidates have to interact with multiple people and hence we make sure to collect feedback from candidates. We are proud to know that feedback from candidates are always positive and the experience was good.
This method is not only for Data Scientists. As a Full-Stack Data Organization, we hire talent across Data landscape. Some of the key skills are
- Data Pipeline Experts
- Knowledge on ETL / ELT,
- Different DB types, RDBMS, NOSQL DBs like Columnar, Document Types, KVP, Graph DBs, In-Mem DBs etc.
- Scalable Pipeline Architecture, Real-time, Lambda
- Enterprise Data management
- Knowledge on on-prem and Cloud DWH
- Distributed Computing, Big Data Architecture and Eco systems
- Data Models such as OLTP, OLAP, 3NF, Data Vault etc.
- Business intelligence
- Latest Visualization Tools
- Data Storytelling
- Augmented Analytics skills
There are other overlapping skills such as
- SQL, NOSQL, NewSQL skills
- Big Data Analytics (Probabilistic Structures)
- Data Quality, Governance, Cataloging, MDM, Lineage etc.