Original Image by Gerd Altmann from Pixabay

Have been working as a data scientist for a few years, I’ve heard some myths/misconceptions about data scientists. Today, I’d like to share the top 5 myths that I frequently heard about. Hopefully, this could provide with you some facts about data scientists by breaking out the myths.

Kindly refer to my YouTube channel for detailed information!

1. Data Scientists are well-paid

Well…It depends on what kind of industry you’re working in right now, which state you currently reside in, and the years of experience. …


Have you been invited to Clubhouse? This is the most frequent greeting that I’ve received among my friends recently, probably as frequent as “How’s it going?”. Clubhouse — the exclusive social networking app- have got famous after Elon Musk gave a speech via this platform. Today, let’s talk about this application and discuss why it turns out famous in recent days.

Why Clubhouse becomes so popular?

Founded by Paul Davison and Rohan Seth in March 2020, Clubhouse is a voice-based application that provides a virtual place for people to share their thoughts and networking opportunities worldwide. The time when this social networking app was founded…


Figure 1. Five Must-Haves to Be A Data Scientist

(Kindly refer to my YouTube for detailed info on this topic!)

Being a data scientist is probably everyone’s dream as it sounds fascinating and sexy. Indeed, as a data scientist, I learn a lot of things daily and enjoy the process of transforming the data into useful information. Today, I would like to talk about the 5 must-haves to be a data scientist in the real world. The 5 must-haves contain:

  • 2 Professional knowledge
  • 2 Programming skills
  • 1 Soft skill

Professional Knowledge

To be a data scientist, I always encourage people to have a better understanding of machine learning (must-have #1) and…


Figure 1. Job hunting & resume (Image by VIN JD from Pixabay)

For the past few years, I’ve been working on resumes a lot for myself, proof-reading for my friends, and interviewing over dozens of people. I wanted to make a little contribution to this society by sharing my two cents based on my experiences. Today, I’d like to take some time to share the five critical mistakes that I’ve seen so far with you.

Typos and Grammatical Errors

First of all, I’d like to bring up the most common and fatal mistake that I’ve seen so far, which is the typos and grammatical errors.

If you would like to hunt for a professional job, you…


Figure 1. PC: outsourceworkers.com.au

I bet y'all have heard (a lot) about machine/statistical learning (abbreviated as ML/SL)in your daily life for several years. So, how do you denote ML/SL? If you need to briefly introduce ML/SL to people who are complete newbies, what will you talk about?

Let's start with the term "statistics". Statistics are the numerical summary used to describe sample data. Statistical learning, literally speaking, means that learning data in statistical ways. From my point of view, ML/SL is a vest set of (programming/statistical) tools for us to understand data.

Then, how does ML/SL really learn from data? Based on whether we…


To recall what we’ve done about avocado prices, please check the previous post here.

So… a quick recap about what assumptions we have last time.

  1. The AveragePrice varies in regions(this may be inferred as region plays a critical role in predicting AveragePrice), and the AveragePrice of conventional avocado was getting more expensive from 2015 to 2018 regardless of regions.
  2. Organic avocados are more expensive than conventional ones.
  3. The AveragePrice of avocados is affected by years, regions, types.

With the above three assumptions, let’s build a couple of models to see how different machine/statistical learning models predict the average prices of…


Figure 1. Avocado (PC: Pixels.com)

This is for data science taking Kaggle dataset — Avocado Prices as demonstration. After finishing reading this blog, you'll be able to know...

  1. How to describe data with Python.
  2. How to choose which machine/statistical learning model to be used by data distribution.
  3. How to evaluate the performance of the model.

As there are a lot of details needed to be gone through in the aforementioned three parts, I'll divide it into three parts and this part will be talking about how to use Python to describe the data.

  • Required packages:
    Pandas, OS, matplotlib, seaborn (using pip install to have these packages…

Chia-Hui (Alice) Liu

Hi! I'm a master from the University of Texas at Austin, and I’m passionate about machine learning, statistical analysis, and data science. Hook’em horns!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store