Günlük yazıyor musunuz?

İlkokul zamanıydı, öğretmenimiz hepimizden bir defter almasını ve bunu günlük olarak kullanmamızı istemişti. O zamanlar buna anlam veremesem de, “Sevgili günlük” kelimeleriyle başlayan 100lerce sayfa yazı yazmıştım. Daha sonra ilerleyen yıllarda da zaman zaman günlük tutma ihtiyacı hissettim. Şimdi ise, dijitalleşen dünyada kağıdın yerini tabletlerin aldığı günümüzde tabi ki günlüğün de dijital versiyonu çıkmasaydı şaşırırdık. Aslında ben çok geç keşfetmişim Penzu uygulamasını, 2008 yılından beri faaliyet gösteren Kanada çıkışlı bir şirketmiş . Topladıkları 100 binlerce yazı verisiyle ne tür analitik çalışmaları yaptıklarını düşünmeden edemesem de (meslek hastalığı), ben bu sefer sadece müşteri tarafında olmayı ve hergün günlük tutmayı seçiyorum. Basit ama en güzel özelliklerinden biriyse, hergün akşam 6da, yani tam olarak işimi bitirip ofisten çıktığım anda hatırlatıcı e-posta göndermesi. Tramvay ve metro ile eve doğru yol alırken ses kaydı ile yazıyı elde etmek ve bunu Penzu’ya yüklemek 5 dakikamı alıyor. İlerde bir gün uygulamayı açıp eski yazılarımı okuyacak mıyım bilmiyorum ama, kendimi, yaptıklarımı ve düşüncelerimi günlüğe dökmek bile başlı başına psikolojik olarak rahatlatıyor. Denemenizi tavsiye ederim.

Halkı Kültür Sanatla Beslemek

İtalya’da yaşamanın en güzel yanlarından biri her ayın ilk pazar gününde müzelerin tüm ziyaretçilere ücretsiz olması. Bu sayede genci yaşlısı herkes Pazar gününü dolu dolu geçirebiliyor. Özellikle bugün gördüğüm insan profilleri beni oldukça düşündürdü. Mesela 20’li yaşlardaki sevgilileri de, 60’lı yaşlardaki teyzeleri amcaları da müzedeki tabloları dikkatle incelerken gördüm. İşte İtalya’daki bu uygulama, bir idari kararın (muhtemelen Kültür Bakanlığı’nın) toplumu ne kadar olumlu etkilediğine dair en güzel örneklerden biri. Türkiye’de buna benzer bir uygulama yapmak neden olmasın ki?

Bu arada Piazza della Scala’daki Gallerie d’Italia, gezmekten keyif aldığım müzeler arasında 1. sıraya yerleşti. Mutlaka gezmenizi tavsiye ederim.

Protect your WordPress site against DDOS and bruteforce attacks, and other security issues

While checking the web analytics logs of my website, I discovered that there are too many bruteforce attack attempts originated from Brasil and France. Until today, I was just using Akismet Anti-Spam to prevent spam comments but it wasn’t enough. Then, I found a solution by adding new security plugins that are already used by 100K+ WordPress users. To be honest, I am really surprised that I’ve just met with this problem. Here, there is a list of the active security plugins on my site. You may consider using them.

1. Akismet Anti-Spam: Used by millions, Akismet is quite possibly the best way in the world to protect your blog from spam. Your site is fully configured and being protected, even while you sleep.

2. Anti-Spam by CleanTalk: Max power, all-in-one, no Captcha, premium anti-spam plugin. No comment spam, no registration spam, no contact spam, protects any WordPress forms.
3. Anti-Malware Security and Brute-Force Firewall: This Anti-Virus/Anti-Malware plugin searches for Malware and other Virus like threats and vulnerabilities on your server and helps you remove them. It’s always growing and changing to adapt to new threats so let me know if it’s not working for you.
4. Protection against DDoS
5. Stop User Enumeration: User enumeration is a technique used by hackers to get your login name if you are using permalinks. This plugin stops that.
6. WP Security Optimizer: Protect your site from vulnerability scanner and hackers


How do we measure the correlation of time series? Pearson correlation analysis

When you discover that your time series have the similar trend, you may want to measure how much are they correlated. In that case, the Pearson correlation coefficient is one of the most widely used metric. It is widely used in the sciences. It was developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s. Pearson’s correlation coefficient is the covariance of the two variables divided by the product of their standard deviations. (Source: Wikipedia) If you are interested to know more about that, this paper may be relevant to you. For others who want to calculate the Pearson value, Scipy library provides a function named “pearsonr”. Alternatively, numpy library has the function named “corrcoef”. Here is my example:

(1) It can be easily discovered that the two plots have the similar trend, even though the scale of y values are different.

(2) By using pearsonr function of Scipy library, we calculate the Pearson correlation coefficient. Here are the Python codes.

from scipy.stats.stats import pearsonr
import numpy as np
import sys

if __name__ == "__main__":
    list1 = [241, 69, 72, 143, 128, 68, 126, 82, 126, 108, 68, 90, 81, 60, 72, 93, 80, 97, 65, 74, 71]
    list2 = [621711, 190310, 204282, 319612, 367879, 200600, 329108, 226406, 399833, 253989, 233108, 301069, 257548,
             206579, 255322, 268418, 279106, 304694, 216643, 236923, 254406]

    if len(list1) != len(list2):
        print("error, two series should contain same size of elements")
        sys.exit

    # scipy library
    print("scipy result: ", pearsonr(list1, list2))

    # numpy library
    print("numpy result: ", str(np.corrcoef(list1, list2)))

(3) As a result, we see that two series are highly correlated, with a Pearson coefficient value as 0.94. (Pearson’s R has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation)

(4) Alternatively, you may want to see how that value will be affected when we change one single value from the series. To discover that, change the value of the last element for the list1 from 71 to 710.

(5) You will observe that the Pearson score decreased significantly from 0.94 to 0.21.

Topic Discovery – Latent Dirichlet Allocation

Topic discovery is an important research area, and one of the most important algorithms used in this field is Latent Dirichlet Allocation, which is an unsupervised learning algorithm based on statistics, and its inventor is Columbia University Professor David M Blei. In his research paper published in 2013 (click to view the paper), he gives the details of the algorithm. (He is the co-author with Andrew Ng and Michael I. Jordan) Until today, many variations of the algorithm are invented for different needs, but I mostly focused on the LDA algorithms that are capable to discover topics on short texts, such as Tweets. I will share the prominent extensions of LDA from this post.

 

 

 

Privacy-first onsite Data Analysis for Facebook apps

I was thinking in this morning, especially after the last Cambridge Analytica scandal on Facebook, that there should be a new kind of privacy-first data analysis process in Facebook without sharing the data with external companies. In the current system, the flow is: the user accepts permission request of the application, and the app owner is collecting the data on its own platform and doing analysis on it, selling it to another firm, etc… Instead, the data analysis task should be executed on the control of Facebook. In the new system, when the app wants permission, the facebook will alert: this app wants to do an analysis of your data, we will never share the data with him, the analysis that the app will do will be executed on my platform, I reviewed and controlled their codes, (like the Apple Store code review process ), and we’ll share the result of the analysis with both you and the app owner ( such as you are supporting 80% conservative party), and the app owner will also tell you how and for what purpose it will use this result.

Comparison of two face recognition software: Clarifai and Face++

Recently, I tried several products to extract demographic information from a profile image. My target was to obtain information about age, gender, and ethnicity. I found the prominent companies in the sector are Clarifai and Face++. I integrated my trial software with both products and I found Clarifai’s accuracy better than Face++. My reasons are:

  1. Clarifai provides the probability value of its predictions. (predicted gender is female with a probability %52) So, it is possible to eliminate the results having low prediction score. On the contrast, Face++ does not provide that value. This is an unwanted situation because, in binary classification technique, the prediction always has a result, even its score is not very high.
  2. Clarifai correctly predicted the ethnicity of the image below as “White”, while Face++ wrongly predicted it as “Black”. But on the other hand, Clarifai could not found the gender value correctly (female %51, male %49) while Face++ correctly marked it as male (we don’t know its probability).
  3. The disadvantage of Clarifai is its low quota for free usages. It permits only 2500 API calls per month for free accounts. But Face++ does not specify any upper limit for free accounts. It has only one single limitation, which is one single API call per second.

I hope my hands-on experience with these services will help you choose the right product.

 

 

Result of Clarifai: (https://clarifai.com/demo)

Gender: feminine (prob. score: 0.510), masculine(prob. score: 0.490)
Age: 55 (prob. score: 0.356)
Ethnicity (Multicultural appearance):  White: (prob. score: 0.981)

Result of Face++: (https://www.faceplusplus.com/attributes/#demo)

Gender: male
Age: 53
Ethnicity (Multicultural appearance): Black