A Useful Deep Learning Specialization Course: Structuring Machine Learning Projects

I have been attending deeplearning.ai’s specialization courses for a while, and I’ve completed the second part of the series this week. In this course, Prof Andrew Ng shares his experiences and best practices to build efficient machine learning pipelines. Although there is not any programming assignment in the course syllabus, there are 2 case studies that are very close to real-world experiments, and it is expected to give the right decisions on these scenarios. I strongly recommend practicing this short training.

Good news.. I will be joining MIT Media Lab

As of November, I have started my 2nd year of Ph.D. study at Politecnico Milano. By the way, there was an opportunity for the students to apply MIT for a research stay abroad, and even there were too many requirements to get an acceptance, I was lucky that I’ve been successful at all, and I’ve been awarded a Rocca fellowship for two semesters at MIT! It will be a wonderful experience to be there, and I hope everything goes well, I can learn a lot and advance my current research many steps forward. The lab I will work with is Prof. Deb Roy’s Social Machines Lab, where they build advanced models for understanding human behavior in online social networks.

 

A Great Open-Source Visualizing Framework: RawGraphs.io

The Density Design Research Lab of Politecnico Milano has created a wonderful library for providing interactive visualizations. My favorite graph among numerous kinds of graphs is probably the Alluvial graph, which ensures to track the changes in the defined categories. Its difference from the popular Sankey diagram is that in Alluvial graphs, you don’t have to link your data to a new category; instead, you may link the source category to the same category as well. In fact, it is a feature that is not existing in Plotly.

If you want, you can clone the GitHub project of RawGraphs very easily, and then you may benefit from its features from your own local web server.

Below is one of my latest visualization which shows a change in pattern of two categories.

 

How do we measure the correlation of time series? Pearson correlation analysis

When you discover that your time series have the similar trend, you may want to measure how much are they correlated. In that case, the Pearson correlation coefficient is one of the most widely used metric. It is widely used in the sciences. It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s. Pearson’s correlation coefficient is the covariance of the two variables divided by the product of their standard deviations. (Source: Wikipedia) If you are interested to know more about that, this paper may be relevant to you. For others who want to calculate the Pearson value, Scipy library provides a function named “pearsonr”. Alternatively, numpy library has the function named “corrcoef”. Here is my example:

(1) It can be easily discovered that the two plots have the similar trend, even though the scale of y values are different.

(2) By using pearsonr function of Scipy library, we calculate the Pearson correlation coefficient. Here are the Python codes.

from scipy.stats.stats import pearsonr
import numpy as np
import sys

if __name__ == "__main__":
    list1 = [241, 69, 72, 143, 128, 68, 126, 82, 126, 108, 68, 90, 81, 60, 72, 93, 80, 97, 65, 74, 71]
    list2 = [621711, 190310, 204282, 319612, 367879, 200600, 329108, 226406, 399833, 253989, 233108, 301069, 257548,
             206579, 255322, 268418, 279106, 304694, 216643, 236923, 254406]

    if len(list1) != len(list2):
        print("error, two series should contain same size of elements")
        sys.exit

    # scipy library
    print("scipy result: ", pearsonr(list1, list2))

    # numpy library
    print("numpy result: ", str(np.corrcoef(list1, list2)))

(3) As a result, we see that two series are highly correlated, with a Pearson coefficient value as 0.94. (Pearson’s R has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation)

(4) Alternatively, you may want to see how that value will be affected when we change one single value from the series. To discover that, change the value of the last element for the list1 from 71 to 710.

(5) You will observe that the Pearson score decreased significantly from 0.94 to 0.21.