I’ll be joining MIT Media Lab as Visiting PhD Student

I’m glad to write this blog post since I’ve just been awarded as Rocca fellow after being accepted by Laboratory for Social Machines, one of the research groups of MIT Media Lab led by Prof. Deb K.Roy. For me, it has been a great pleasure and an opportunity to advance my research capabilities.

In my Ph.D. research, I am focused on the development of a framework for the interpretation of political phenomena through Data Science techniques, and my first use case was the long-running Brexit debate. Now, I will be focused on the interpretation of US elections. An amazing experience is waiting for me starting by February 2019.

 

 

For football fans, life without football is possible, and it’s probably better!

Football is a huge balloon that contains billions of people and people from all over the world are strongly committed to football teams, go to the stadiums, watch TV games and spend their time reading the latest sports news, social media messages. It is very interesting that I was one of these people until recently. But after a few attempts, I’ve managed to leave all kinds of football-related activities, and now I have more time to discover things I’m not aware of. For example, I’ve completed reading two great books in the last two months, and I have time to clear and renew my mind to think more clearly. There are a lot of things I’ve started to do and I think it’s a complete change, like quitting smoking.

 

Simple Note for Python Beginners: Accessing Elements of Pandas DataFrame

For the developers who are experienced in programming languages Java or C#, it takes some time to discover the benefits of Python libraries. In my case, while I was used to writing complex iterative loops in many lines of code, I am really surprised and impressed by the functionalities of Pandas Dataframe. This is a code snippet from my latest codes.

df_grouped = df[['user_id', 'r1']].groupby(['user_id', 'r1']).agg({'r1':["count"]}).reset_index().groupby('user_id')

With a single line of code, I filtered the relevant columns, then I grouped data by two columns, then aggregated on the second column, and finally grouped again on the other column. You can extend this single line of code by adding other operations. This is not just simple, but also this code snippet runs faster than any code you write.

Also, the dataframe comes with very useful methods, such as iloc and loc. While both methods give you access the row by selecting the index, its basic difference is the iloc takes an index as an argument, and loc takes the name of the index as argument. For additional information, you may find good examples to index-level operations on Pandas dataframes.

Using iloc, loc, & ix to select rows and columns in Pandas DataFrames

 

Text-based Classification – Github Repo

For the recent years, I have worked on several text classification challenges, such as predicting peoples’ vote from their social media posts, predicting the relevancy of a content to a context, predicting the best possible answer to a question in a chatbot system, predicting the real age and gender information of telecom subscribers from their network behaviours, etc. Sharing knowledge is the best thing ever, and that’s why I always commit the latest codes to my Github repo. Feel free to use all of them, contribute and push new changes to make the codes more stable and powerful.

Topic Discovery – Latent Dirichlet Allocation

Topic discovery is an important research area, and one of the most important algorithms used in this field is Latent Dirichlet Allocation, which is an unsupervised learning algorithm based on statistics, and its inventor is Columbia University Professor David M Blei. In his research paper published in 2013 (click to view the paper), he gives the details of the algorithm. (He is the co-author with Andrew Ng and Michael I. Jordan) Until today, many variations of the algorithm are invented for different needs, but I mostly focused on the LDA algorithms that are capable to discover topics on short texts, such as Tweets. I will share the prominent extensions of LDA from this post.

 

 

 

Machine Learning Milan – Meetup

Today, I joined to the Meetup event Machine Learning in Milan, hosted by Marcosca in Via Bligny. It was the first meeting of this group, so there were a few people knowing each other. It had a great opportunity to meet with people having different backgrounds. The purpose of this group is to create a Machine Learning community in Milan by organizing meetups and events and then accelerate this ecosystem with startups and investors. I feel lucky since this event is realized when I was in Milan to do my personal issues. I thank everybody who contributed to this organization. The next meetup will be in 27 October. You may find the details from the Meetup app.

Data Management Platforms (DMP)

Today, the digital companies are working extensively to catch their potential customers on the web. Actually, the companies and consulting firms build their digital marketing strategies separately for each channel, in social media, paid media and their owned media channels.

At this point, there are many available Marketing tools and the some of the most prominent ones are Tealium and Adobe Marketing Cloud tools. Moreover, Data Management Platforms(DMP) enables more efficient targeting with the help of 3dr party data. Today, the trend is to upload the customer data to DMP platforms in order to get a revenue from the commercial campaigns of other companies. For example, if you want to target the mothers, who had a child in the last 3 months, you are able to target this specific segment through DMP platforms even though you don’t have any data related with mothers.

You may find current actors in DMP business, such as Oracle Blukai product from this link.