A Great Open-Source Visualizing Framework: RawGraphs.io

The Density Design Research Lab of Politecnico Milano has created a wonderful library for providing interactive visualizations. My favorite graph among numerous kinds of graphs is probably the Alluvial graph, which ensures to track the changes in the defined categories. Its difference from the popular Sankey diagram is that in Alluvial graphs, you don’t have to link your data to a new category; instead, you may link the source category to the same category as well. In fact, it is a feature that is not existing in Plotly.

If you want, you can clone the GitHub project of RawGraphs very easily, and then you may benefit from its features from your own local web server.

Below is one of my latest visualization which shows a change in pattern of two categories.

 

Simple Note for Python Beginners: Accessing Elements of Pandas DataFrame

For the developers who are experienced in programming languages Java or C#, it takes some time to discover the benefits of Python libraries. In my case, while I was used to writing complex iterative loops in many lines of code, I am really surprised and impressed by the functionalities of Pandas Dataframe. This is a code snippet from my latest codes.

df_grouped = df[['user_id', 'r1']].groupby(['user_id', 'r1']).agg({'r1':["count"]}).reset_index().groupby('user_id')

With a single line of code, I filtered the relevant columns, then I grouped data by two columns, then aggregated on the second column, and finally grouped again on the other column. You can extend this single line of code by adding other operations. This is not just simple, but also this code snippet runs faster than any code you write.

Also, the dataframe comes with very useful methods, such as iloc and loc. While both methods give you access the row by selecting the index, its basic difference is the iloc takes an index as an argument, and loc takes the name of the index as argument. For additional information, you may find good examples to index-level operations on Pandas dataframes.

Using iloc, loc, & ix to select rows and columns in Pandas DataFrames

 

Text-based Classification – Github Repo

For the recent years, I have worked on several text classification challenges, such as predicting peoples’ vote from their social media posts, predicting the relevancy of a content to a context, predicting the best possible answer to a question in a chatbot system, predicting the real age and gender information of telecom subscribers from their network behaviours, etc. Sharing knowledge is the best thing ever, and that’s why I always commit the latest codes to my Github repo. Feel free to use all of them, contribute and push new changes to make the codes more stable and powerful.

Günlük yazıyor musunuz?

İlkokul zamanıydı, öğretmenimiz hepimizden bir defter almasını ve bunu günlük olarak kullanmamızı istemişti. O zamanlar buna anlam veremesem de, “Sevgili günlük” kelimeleriyle başlayan 100lerce sayfa yazı yazmıştım. Daha sonra ilerleyen yıllarda da zaman zaman günlük tutma ihtiyacı hissettim. Şimdi ise, dijitalleşen dünyada kağıdın yerini tabletlerin aldığı günümüzde tabi ki günlüğün de dijital versiyonu çıkmasaydı şaşırırdık. Aslında ben çok geç keşfetmişim Penzu uygulamasını, 2008 yılından beri faaliyet gösteren Kanada çıkışlı bir şirketmiş . Topladıkları 100 binlerce yazı verisiyle ne tür analitik çalışmaları yaptıklarını düşünmeden edemesem de (meslek hastalığı), ben bu sefer sadece müşteri tarafında olmayı ve hergün günlük tutmayı seçiyorum. Basit ama en güzel özelliklerinden biriyse, hergün akşam 6da, yani tam olarak işimi bitirip ofisten çıktığım anda hatırlatıcı e-posta göndermesi. Tramvay ve metro ile eve doğru yol alırken ses kaydı ile yazıyı elde etmek ve bunu Penzu’ya yüklemek 5 dakikamı alıyor. İlerde bir gün uygulamayı açıp eski yazılarımı okuyacak mıyım bilmiyorum ama, kendimi, yaptıklarımı ve düşüncelerimi günlüğe dökmek bile başlı başına psikolojik olarak rahatlatıyor. Denemenizi tavsiye ederim.

Halkı Kültür Sanatla Beslemek

İtalya’da yaşamanın en güzel yanlarından biri her ayın ilk pazar gününde müzelerin tüm ziyaretçilere ücretsiz olması. Bu sayede genci yaşlısı herkes Pazar gününü dolu dolu geçirebiliyor. Özellikle bugün gördüğüm insan profilleri beni oldukça düşündürdü. Mesela 20’li yaşlardaki sevgilileri de, 60’lı yaşlardaki teyzeleri amcaları da müzedeki tabloları dikkatle incelerken gördüm. İşte İtalya’daki bu uygulama, bir idari kararın (muhtemelen Kültür Bakanlığı’nın) toplumu ne kadar olumlu etkilediğine dair en güzel örneklerden biri. Türkiye’de buna benzer bir uygulama yapmak neden olmasın ki?

Bu arada Piazza della Scala’daki Gallerie d’Italia, gezmekten keyif aldığım müzeler arasında 1. sıraya yerleşti. Mutlaka gezmenizi tavsiye ederim.

Protect your WordPress site against DDOS and bruteforce attacks, and other security issues

While checking the web analytics logs of my website, I discovered that there are too many bruteforce attack attempts originated from Brasil and France. Until today, I was just using Akismet Anti-Spam to prevent spam comments but it wasn’t enough. Then, I found a solution by adding new security plugins that are already used by 100K+ WordPress users. To be honest, I am really surprised that I’ve just met with this problem. Here, there is a list of the active security plugins on my site. You may consider using them.

1. Akismet Anti-Spam: Used by millions, Akismet is quite possibly the best way in the world to protect your blog from spam. Your site is fully configured and being protected, even while you sleep.

2. Anti-Spam by CleanTalk: Max power, all-in-one, no Captcha, premium anti-spam plugin. No comment spam, no registration spam, no contact spam, protects any WordPress forms.
3. Anti-Malware Security and Brute-Force Firewall: This Anti-Virus/Anti-Malware plugin searches for Malware and other Virus like threats and vulnerabilities on your server and helps you remove them. It’s always growing and changing to adapt to new threats so let me know if it’s not working for you.
4. Protection against DDoS
5. Stop User Enumeration: User enumeration is a technique used by hackers to get your login name if you are using permalinks. This plugin stops that.
6. WP Security Optimizer: Protect your site from vulnerability scanner and hackers


How do we measure the correlation of time series? Pearson correlation analysis

When you discover that your time series have the similar trend, you may want to measure how much are they correlated. In that case, the Pearson correlation coefficient is one of the most widely used metric. It is widely used in the sciences. It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s. Pearson’s correlation coefficient is the covariance of the two variables divided by the product of their standard deviations. (Source: Wikipedia) If you are interested to know more about that, this paper may be relevant to you. For others who want to calculate the Pearson value, Scipy library provides a function named “pearsonr”. Alternatively, numpy library has the function named “corrcoef”. Here is my example:

(1) It can be easily discovered that the two plots have the similar trend, even though the scale of y values are different.

(2) By using pearsonr function of Scipy library, we calculate the Pearson correlation coefficient. Here are the Python codes.

from scipy.stats.stats import pearsonr
import numpy as np
import sys

if __name__ == "__main__":
  list1 = [241, 69, 72, 143, 128, 68, 126, 82, 126, 108, 68, 90, 81, 60, 72, 93, 80, 97, 65, 74, 71]
  list2 = [621711, 190310, 204282, 319612, 367879, 200600, 329108, 226406, 399833, 253989, 233108, 301069, 257548,
       206579, 255322, 268418, 279106, 304694, 216643, 236923, 254406]

  if len(list1) != len(list2):
    print("error, two series should contain same size of elements")
    sys.exit

  # scipy library
  print("scipy result: ", pearsonr(list1, list2))

  # numpy library
  print("numpy result: ", str(np.corrcoef(list1, list2)))

(3) As a result, we see that two series are highly correlated, with a Pearson coefficient value as 0.94. (Pearson’s R has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation)

(4) Alternatively, you may want to see how that value will be affected when we change one single value from the series. To discover that, change the value of the last element for the list1 from 71 to 710.

(5) You will observe that the Pearson score decreased significantly from 0.94 to 0.21.