Privacy-first onsite Data Analysis for Facebook apps

I was thinking in this morning, especially after the last Cambridge Analytica scandal on Facebook, that there should be a new kind of privacy-first data analysis process in Facebook without sharing the data with external companies. In the current system, the flow is: the user accepts permission request of the application, and the app owner is collecting the data on its own platform and doing analysis on it, selling it to another firm, etc… Instead, the data analysis task should be executed on the control of Facebook. In the new system, when the app wants permission, the facebook will alert: this app wants to do an analysis of your data, we will never share the data with him, the analysis that the app will do will be executed on my platform, I reviewed and controlled their codes, (like the Apple Store code review process ), and we’ll share the result of the analysis with both you and the app owner ( such as you are supporting 80% conservative party), and the app owner will also tell you how and for what purpose it will use this result.

Comparison of two face recognition software: Clarifai and Face++

Recently, I tried several products to extract demographic information from a profile image. My target was to obtain information about age, gender, and ethnicity. I found the prominent companies in the sector are Clarifai and Face++. I integrated my trial software with both products and I found Clarifai’s accuracy better than Face++. My reasons are:

  1. Clarifai provides the probability value of its predictions. (predicted gender is female with a probability %52) So, it is possible to eliminate the results having low prediction score. On the contrast, Face++ does not provide that value. This is an unwanted situation because, in binary classification technique, the prediction always has a result, even its score is not very high.
  2. Clarifai correctly predicted the ethnicity of the image below as “White”, while Face++ wrongly predicted it as “Black”. But on the other hand, Clarifai could not found the gender value correctly (female %51, male %49) while Face++ correctly marked it as male (we don’t know its probability).
  3. The disadvantage of Clarifai is its low quota for free usages. It permits only 2500 API calls per month for free accounts. But Face++ does not specify any upper limit for free accounts. It has only one single limitation, which is one single API call per second.

I hope my hands-on experience with these services will help you choose the right product.

 

 

Result of Clarifai: (https://clarifai.com/demo)

Gender: feminine (prob. score: 0.510), masculine(prob. score: 0.490)
Age: 55 (prob. score: 0.356)
Ethnicity (Multicultural appearance):  White: (prob. score: 0.981)

Result of Face++: (https://www.faceplusplus.com/attributes/#demo)

Gender: male
Age: 53
Ethnicity (Multicultural appearance): Black

Converting texts to high-res images

A very inspiring research is made at the end of 2016. With the help of deep learning, now it is possible to generate images from given texts.

Here is the link to the news and here is the link to that research paper.

Could you imagine some use cases based on this technology? I found an interesting use case.. Imagine you are in a police station, about a robbery occurred in a bank… The thief could not be found and you explain the visual profile of thief as you are the unique eyewitness of this event. At that time, a computer automatically generates the image of thief based on the visual details you describe… At the same time, the computer increases the precision of that visual by matching it with other records of past robbery events.

The Future of Education

Within the last month, the future of education was one of the main topics in Davos. There were very interesting debates, and in of them, Jack Ma (the founder of Alibaba) told that it is strongly and urgently needed to change the current education system due to the rising impact of robots. Since robots are able to obtain the knowledge, by learning from their past experiences, they will do most of the things people do today. In order to adapt ourselves to the modern world, we need to educate our children in a way that cannot be copied by robots. Rather than teaching mathematics or physics to our children, we should support their more humanistic skills such as music and art.

I agree with Jack Ma’s ideas and I think we need to think more about people’s main advantages and disadvantages over robots in the next 20 years. Today, our children start learning to code in primary school, in order to communicate better with the robots and understand their logic. But when the world will be dominated by robot activities, all the things will be changed and humans should be in a place where robots do not see them as a threat.

 

 

Mongo as Document based No-SQL Database

I started to the MongoDB developer course given online by MongoDB University. I have worked a lot with Mongo at Vodafone but I was using only 10-20% of its key features. Now at Politecnico, the things are more complex so I need to pay more attention to the performance issues. In my research project, I use MongoDB to store Tweets and perform text analysis over the records.

I currently completed the Week-1 course. I hope I will learn more in the upcoming weeks.

Understanding Feelings and Behaviours of English People about Brexit Referendum

In these days, my research motivation is to find some insights by analyzing Twitter data to understand how English people react to Brexit referendum. There are various researches already made about this topic, and most of them are done by universities in England such as Imperial College London and the University of Bristol. I found it as a quite interesting research topic since social media is an important environment to present our ideas to the community and there is a need for more research to understand people’s opinions. I will give more detailed information about my study in the upcoming weeks. If you have any recommendation for me, please feel free to send me an email.

Data Science Research Group at Politecnico Milano

Good news, I joined to Data Science Research Group of Politecnico Milano as Research Fellow. At the moment, this is a position with a temporary contract since we need to see if I could contribute well to the research of this community. My advisor will be Marco Brambilla, who is known for his studies in Web Science and Software Engineering.

I am very pleased with becoming a member(even as a temporary researcher) of this valuable group. I believe I will gain a broad knowledge by making research with my professors and I will be able to solve a significant research problem in my Ph.D. thesis.

You can get more information about the research group from here. http://datascience.deib.polimi.it

Suggestions for new PhD Students

Today, I participated in Welcome Day for new Ph.D. students of Politecnico Milano. We were nearly 300 new Ph.D. students from different backgrounds, from Architecture to Industrial Design, and from Mechanical Engineer to Information and Technology. The dean of Ph.D. school had a great presentation and I think the following slide was the summary of all possible suggestions for Ph.D. students. Thanks to everybody for their unlimited support and kindness.

 

Finalizing my Vodafone experience with Word2Vecs

In our latest project at Vodafone R&D, I was working under the supervision of Istanbul Technical University professor Gulsen Eryigit. All of the academical community in Turkey believe that Prof. Eryigit is the most important professor in the domain of Natural Language Processing based on Turkish Language. I had the chance of studying with Prof. Eryigit. Within a month, I implemented a predictive model based on Word2Vecs using Python language and Gensim framework.

The Word2Vec approach has been one of the trend topics of Natural Language Processing since 2014. In this technique, a neural network-based model is trained with a large vocabulary in order to identify the words as multi-dimensional vectors. Within a month, I implemented the predictive model, and then we calculated the score of our model using the Mean Average Precision technique, which is a well-known approach when there are ordered results (the ranking of outputs are important) Now we are following the existing research in Semeval 2017 task 3.

It is a great pleasure to finalize my working experience at Vodafone that lasted for 5 years with such a meaningful project!

 

 

PhD Study in Politecnico di Milano

5 months ago, I wrote a blog post here that we have decided to move to Milan for the next 4-5 years. Finally, I am enrolled to Politecnico di Milano for the Ph.D. program in Computer Science and Engineering. It will be a great experience to join to Politecnico di Milano, which is in the 49th ranking in worldwide. As a result, I am leaving my current job at Vodafone. Actually, I really learned and applied various things in Vodafone. Thank you to my all colleagues there for their friendship.