Simple Note for Python Beginners: Accessing Elements of Pandas DataFrame

For the developers who are experienced in programming languages Java or C#, it takes some time to discover the benefits of Python libraries. In my case, while I was used to writing complex iterative loops in many lines of code, I am really surprised and impressed by the functionalities of Pandas Dataframe. This is a code snippet from my latest codes.

df_grouped = df[['user_id', 'r1']].groupby(['user_id', 'r1']).agg({'r1':["count"]}).reset_index().groupby('user_id')

With a single line of code, I filtered the relevant columns, then I grouped data by two columns, then aggregated on the second column, and finally grouped again on the other column. You can extend this single line of code by adding other operations. This is not just simple, but also this code snippet runs faster than any code you write.

Also, the dataframe comes with very useful methods, such as iloc and loc. While both methods give you access the row by selecting the index, its basic difference is the iloc takes an index as an argument, and loc takes the name of the index as argument. For additional information, you may find good examples to index-level operations on Pandas dataframes.

Using iloc, loc, & ix to select rows and columns in Pandas DataFrames


Text-based Classification – Github Repo

For the recent years, I have worked on several text classification challenges, such as predicting peoples’ vote from their social media posts, predicting the relevancy of a content to a context, predicting the best possible answer to a question in a chatbot system, predicting the real age and gender information of telecom subscribers from their network behaviours, etc. Sharing knowledge is the best thing ever, and that’s why I always commit the latest codes to my Github repo. Feel free to use all of them, contribute and push new changes to make the codes more stable and powerful.

Topic Discovery – Latent Dirichlet Allocation

Topic discovery is an important research area, and one of the most important algorithms used in this field is Latent Dirichlet Allocation, which is an unsupervised learning algorithm based on statistics, and its inventor is Columbia University Professor David M Blei. In his research paper published in 2013 (click to view the paper), he gives the details of the algorithm. (He is the co-author with Andrew Ng and Michael I. Jordan) Until today, many variations of the algorithm are invented for different needs, but I mostly focused on the LDA algorithms that are capable to discover topics on short texts, such as Tweets. I will share the prominent extensions of LDA from this post.




Machine Learning Milan – Meetup

Today, I joined to the Meetup event Machine Learning in Milan, hosted by Marcosca in Via Bligny. It was the first meeting of this group, so there were a few people knowing each other. It had a great opportunity to meet with people having different backgrounds. The purpose of this group is to create a Machine Learning community in Milan by organizing meetups and events and then accelerate this ecosystem with startups and investors. I feel lucky since this event is realized when I was in Milan to do my personal issues. I thank everybody who contributed to this organization. The next meetup will be in 27 October. You may find the details from the Meetup app.

Data Management Platforms (DMP)

Today, the digital companies are working extensively to catch their potential customers on the web. Actually, the companies and consulting firms build their digital marketing strategies separately for each channel, in social media, paid media and their owned media channels.

At this point, there are many available Marketing tools and the some of the most prominent ones are Tealium and Adobe Marketing Cloud tools. Moreover, Data Management Platforms(DMP) enables more efficient targeting with the help of 3dr party data. Today, the trend is to upload the customer data to DMP platforms in order to get a revenue from the commercial campaigns of other companies. For example, if you want to target the mothers, who had a child in the last 3 months, you are able to target this specific segment through DMP platforms even though you don’t have any data related with mothers.

You may find current actors in DMP business, such as Oracle Blukai product from this link.