What It Means to Be a Data Scientist Today
By Yao-Nan Chen, Machine Learning Scientist, Appier
Unless you have been hibernating under a rock for a few years now, you already know that explosive growth in the volume of available data is disrupting business as we know it. This data can be a goldmine for businesses that know how to capture, analyze and use it to power artificial intelligence (AI) technology. And that’s where data science and my role come in.
IBM has predicted that demand for data scientists will increase by 28 percent by 2020. The Harvard Business Review, way back in 2012, said that being a data scientist is the sexiest job of the 21st century.
I have been working in data science since 2013 and I still come into work at Appier each day eager to solve new problems.
What Data Scientists Actually Do
Simply put, data science involves using data to generate solutions that solve practical, real-world problems. In the business world, examples revolve around AI-powered solutions, such as pushing recommendations for users based on their demographic or usage pattern, or analyzing why sales of a particular product is dropping.
Data scientists set out on solving such problems by first extracting and consolidating data, which we then analyze for patterns and trends. We use this to build predictive models, derive insights, and implement proof of concepts to test the proposed solution to the problem at hand. The problems that we work on are very specific and often have no one standard solution. Hence, data scientists are tasked with thinking out of the box to come up with a variety of possible solutions.
The impact of our solutions is known only when they are implemented; so often, if the solution fails to meet the desired outcome, we have to go back to the drawing board and start over. But this just adds to the challenge and the excitement of trying to pin down that elusive solution and make it work.
What Makes for a Good Data Scientist
Of course, every job has some less lovable bits and the burden of the data scientist is data cleaning! In most cases, the data we gather is ‘dirty’, with errors and discrepancies in it. For example, data showing that sales of a product have dropped dramatically may simply mean that malfunctioning machines have failed to capture the data accurately.
Most data scientists will agree that data cleaning is the most boring part of this job. Our inside joke is that data science is 80 percent cleaning of data and 20 percent complaining about it!
But jokes aside, data cleaning is painstaking but important work. If not done right, it can have a huge impact on the accuracy and reliability of insights.
Aside from this kind of assiduity and attention to detail, a good data scientist, no matter how good they are technically, must also have a thorough understanding of business domain and the organization’s business goals. Our solutions have to be creative, but also useful and practical.
Keeping Up with the Latest Research
In this context, keeping up with the latest research in the area of machine learning can help us stay on top of trends and monitor breakthroughs to specific problems. We don’t need to reinvent the wheel – if a particular problem has been solved before, we can always work off that.
I regularly read papers on advances in machine learning, as well as in the specific domains that I am interested in.
It’s equally important to engage in discussions with peers, keep track of their recent research and poll their opinions on machine learning trends. This will help you keep abreast of all that is happening in this area.
Growing Demand for AI Expertise
Unfortunately, there is a gap between the growing demand for data scientists and the supply of talent in the area. AI is a new track and there is a shortage of people with the required expertise. What widens the gap is that not every data scientist is a good business person. They may be stellar at solving problems in an academic or research-based environment, but often fall short when it comes to real-world business problems.
Data scientists today must constantly evolve in terms of skill set. As the adoption of AI and deep learning grows, we are automating lower-level tasks and moving onto more complex problems. We already have some mature tools that can be used to build simple models for many business cases, and these are becoming simpler to use.
In the near future, data scientists will be required to know how to leverage and use problem-specific information. As AI becomes more complex, data scientists will need to work on more abstract problems and leave simple processing and analyses to automation software.
About the author:
Yao-Nan Chen is Machine Learning Scientist at Appier. He has more than five years of experience in Machine Learning, Data Science and Data Engineering and three years of experience in practical E-commerce recommendation system. Prior to joining Appier, he worked at Yahoo Taiwan on E-commerce Recommendation System, App notification recommendation system, Model tuning for sales volume prediction, etc.