Five Must-Haves to Be A Data Scientist

Chia-Hui (Alice) Liu
4 min readJan 22, 2021
Figure 1. Five Must-Haves to Be A Data Scientist

(Kindly refer to my YouTube for detailed info on this topic!)

Being a data scientist is probably everyone’s dream as it sounds fascinating and sexy. Indeed, as a data scientist, I learn a lot of things daily and enjoy the process of transforming the data into useful information. Today, I would like to talk about the 5 must-haves to be a data scientist in the real world. The 5 must-haves contain:

  • 2 Professional knowledge
  • 2 Programming skills
  • 1 Soft skill

Professional Knowledge

To be a data scientist, I always encourage people to have a better understanding of machine learning (must-have #1) and how statistics work throughout the process (must-have #2). Apply machine/deep learning is easy, all you have to do is to import the package and then run the functions with some pre-defined parameters. However, it would be a lot more helpful if you know how exactly the algorithm works and the statistical rationales behind it.

Here, I’d like to recommend two learning resources:

  1. Machine Learning by Andrew Ng on Coursera (link)
  2. An Introduction to Statistical Learning: with Applications in R (ISBN 10: 1461471370 ISBN 13: 9781461471370)

For the first one, I think Andrew Ng’s course has highlighted the key points to build a solid foundation to understand machine learning. Although the assignment was not used Python, it is still worth taking. I still watch this from time to time to keep my memory refreshed.

On the other hand, I bought the book — An Introduction to Statistical Learning: with Applications in R when I was a graduate student. It is helpful as it takes baby steps to walk you through how machine/statistical learning works from a statistical point of view. Stanford also has a free online course for this book, and it was helpful to me.

In addition to these two resources, I would recommend people to accumulate some basic statistical knowledge before you start digging into the books/courses as statistics will for sure give you a better picture while learning. To summarize for the professional knowledge, you must have solid foundations for:

  1. Machine Learning
  2. How statistics work in algorithms(Statistical Learning)

Programming Skills

Next, I would like to talk about the two must-have programming languages, which are the ones for applying algorithms, and the others for querying.

For programming languages that are used to build machine/deep learning models, most people are using Python or R. I highly recommend you to master at least one — no matter it’s Python or R because you will definitely use it a lot in the future. And for querying, I heard a lot of people using SQL for their day-to-day duties. SQL is often required when it comes to job-hunting as many firms have started to store their data into a centralized database instead of multiple excel files.

In terms of learning resources, I would recommend you the following platforms:

  • Data Camp for Python
  • W3School for SQL

Both of the websites have fruitful information and they are well-organized. The structure of the courses are pretty easy to follow and they have the web-based IDE handy so that you can write some codes and test them right after you finish them, you don't need to install anything on your laptop.

If you have more time to polish your programming skills, I would highly encourage you to get yourself familiar with spark/scala in the big data environment, such as Hadoop.

Soft Skill

The last thing I’d like to talk about is the soft skill — Storytelling. This skill is hard to obtain as a non-native English speaker. Yet, it can bring a lot of value to your results when presenting to other people. A good storyteller can guide the audience step by step even though they are new to the topic. Also, presentation skills along with understanding your audience are part of the storytelling as various ways of how you present the information can result in different consequences.

You can polish your soft skills for sure. Currently, I’m a member of the Toastmaster Club. Here, I’m going to share my Toastmaster journey with you. I was not comfortable speaking in front of people two years ago, back in that time I was asked to do a presentation in front of the audiences that I’m not familiar with. I ended up shaking my voice during the presentation and also totally forgot to make eye-contact with them. I felt ashamed and kept blaming myself for my poor and unorganized storytelling skill as I only covered about 30% of the results that I’d like to share. I joined the Toastmaster club to improve my public speaking skills, and it helps a lot as I’m more comfortable speaking in front of people and have the ability to structure an organized speech within a short time period.

Toastmaster is an international organization, you can refer to their website to find the club near you, and you can always join the sessions as a guest first.

That’s all for today’s 5 must-haves to be a data scientist. Let’s do a quick recap.

The 5 must-haves contain three aspects: professional knowledge (2), programming skills (2), and soft skills (1). Professional knowledge includes understanding machine learning and how statistics works among algorithms; Python/R and SQL are the two must-have programming languages; Storytelling is the soft skill that each data scientist should be master.

Thank you so much for reading and I hope this information is helpful to you. Please give a clap as encouragement and all the information here is synced in my YouTube. Stay tuned! Have a great day.

--

--