Topic discovery is an important research area, and one of the most important algorithms used in this field is Latent Dirichlet Allocation, which is an unsupervised learning algorithm based on statistics, and its inventor is Columbia University Professor David M Blei. In his research paper published in 2013 (click to view the paper), he gives the details of the algorithm. (He is the co-author with Andrew Ng and Michael I. Jordan) Until today, many variations of the algorithm are invented for different needs, but I mostly focused on the LDA algorithms that are capable to discover topics on short texts, such as Tweets. I will share the prominent extensions of LDA from this post.
I was thinking in this morning, especially after the last Cambridge Analytica scandal on Facebook, that there should be a new kind of privacy-first data analysis process in Facebook without sharing the data with external companies. In the current system, the flow is: the user accepts permission request of the application, and the app owner is collecting the data on its own platform and doing analysis on it, selling it to another firm, etc… Instead, the data analysis task should be executed on the control of Facebook. In the new system, when the app wants permission, the facebook will alert: this app wants to do an analysis of your data, we will never share the data with him, the analysis that the app will do will be executed on my platform, I reviewed and controlled their codes, (like the Apple Store code review process ), and we’ll share the result of the analysis with both you and the app owner ( such as you are supporting 80% conservative party), and the app owner will also tell you how and for what purpose it will use this result.
Recently, I tried several products to extract demographic information from a profile image. My target was to obtain information about age, gender, and ethnicity. I found the prominent companies in the sector are Clarifai and Face++. I integrated my trial software with both products and I found Clarifai’s accuracy better than Face++. My reasons are:
- Clarifai provides the probability value of its predictions. (predicted gender is female with a probability %52) So, it is possible to eliminate the results having low prediction score. On the contrast, Face++ does not provide that value. This is an unwanted situation because, in binary classification technique, the prediction always has a result, even its score is not very high.
- Clarifai correctly predicted the ethnicity of the image below as “White”, while Face++ wrongly predicted it as “Black”. But on the other hand, Clarifai could not found the gender value correctly (female %51, male %49) while Face++ correctly marked it as male (we don’t know its probability).
- The disadvantage of Clarifai is its low quota for free usages. It permits only 2500 API calls per month for free accounts. But Face++ does not specify any upper limit for free accounts. It has only one single limitation, which is one single API call per second.
I hope my hands-on experience with these services will help you choose the right product.
Gender: feminine (prob. score: 0.510), masculine(prob. score: 0.490)
Age: 55 (prob. score: 0.356)
Ethnicity (Multicultural appearance): White: (prob. score: 0.981)
Ethnicity (Multicultural appearance): Black
A very inspiring research is made at the end of 2016. With the help of deep learning, now it is possible to generate images from given texts.
Here is the link to the news and here is the link to that research paper.
Could you imagine some use cases based on this technology? I found an interesting use case.. Imagine you are in a police station, about a robbery occurred in a bank… The thief could not be found and you explain the visual profile of thief as you are the unique eyewitness of this event. At that time, a computer automatically generates the image of thief based on the visual details you describe… At the same time, the computer increases the precision of that visual by matching it with other records of past robbery events.
Within the last month, the future of education was one of the main topics in Davos. There were very interesting debates, and in of them, Jack Ma (the founder of Alibaba) told that it is strongly and urgently needed to change the current education system due to the rising impact of robots. Since robots are able to obtain the knowledge, by learning from their past experiences, they will do most of the things people do today. In order to adapt ourselves to the modern world, we need to educate our children in a way that cannot be copied by robots. Rather than teaching mathematics or physics to our children, we should support their more humanistic skills such as music and art.
I agree with Jack Ma’s ideas and I think we need to think more about people’s main advantages and disadvantages over robots in the next 20 years. Today, our children start learning to code in primary school, in order to communicate better with the robots and understand their logic. But when the world will be dominated by robot activities, all the things will be changed and humans should be in a place where robots do not see them as a threat.
I started to the MongoDB developer course given online by MongoDB University. I have worked a lot with Mongo at Vodafone but I was using only 10-20% of its key features. Now at Politecnico, the things are more complex so I need to pay more attention to the performance issues. In my research project, I use MongoDB to store Tweets and perform text analysis over the records.
I currently completed the Week-1 course. I hope I will learn more in the upcoming weeks.
In these days, my research motivation is to find some insights by analyzing Twitter data to understand how English people react to Brexit referendum. There are various researches already made about this topic, and most of them are done by universities in England such as Imperial College London and the University of Bristol. I found it as a quite interesting research topic since social media is an important environment to present our ideas to the community and there is a need for more research to understand people’s opinions. I will give more detailed information about my study in the upcoming weeks. If you have any recommendation for me, please feel free to send me an email.
Good news, I joined to Data Science Research Group of Politecnico Milano as Research Fellow. At the moment, this is a position with a temporary contract since we need to see if I could contribute well to the research of this community. My advisor will be Marco Brambilla, who is known for his studies in Web Science and Software Engineering.
I am very pleased with becoming a member(even as a temporary researcher) of this valuable group. I believe I will gain a broad knowledge by making research with my professors and I will be able to solve a significant research problem in my Ph.D. thesis.
You can get more information about the research group from here. http://datascience.deib.polimi.it
Today, I participated in Welcome Day for new Ph.D. students of Politecnico Milano. We were nearly 300 new Ph.D. students from different backgrounds, from Architecture to Industrial Design, and from Mechanical Engineer to Information and Technology. The dean of Ph.D. school had a great presentation and I think the following slide was the summary of all possible suggestions for Ph.D. students. Thanks to everybody for their unlimited support and kindness.
In our latest project at Vodafone R&D, I was working under the supervision of Istanbul Technical University professor Gulsen Eryigit. All of the academical community in Turkey believe that Prof. Eryigit is the most important professor in the domain of Natural Language Processing based on Turkish Language. I had the chance of studying with Prof. Eryigit. Within a month, I implemented a predictive model based on Word2Vecs using Python language and Gensim framework.
The Word2Vec approach has been one of the trend topics of Natural Language Processing since 2014. In this technique, a neural network-based model is trained with a large vocabulary in order to identify the words as multi-dimensional vectors. Within a month, I implemented the predictive model, and then we calculated the score of our model using the Mean Average Precision technique, which is a well-known approach when there are ordered results (the ranking of outputs are important) Now we are following the existing research in Semeval 2017 task 3.
It is a great pleasure to finalize my working experience at Vodafone that lasted for 5 years with such a meaningful project!