This is an emerging research area targeting to solve social science theories by using computation, and most notably with Big Data and Machine learning techniques.
In these days, my research motivation is to find some insights by analyzing Twitter data to understand how English people react to Brexit referendum. There are various researches already made about this topic, and most of them are done by universities in England such as Imperial College London and the University of Bristol. I found it as a quite interesting research topic since social media is an important environment to present our ideas to the community and there is a need for more research to understand people’s opinions. I will give more detailed information about my study in the upcoming weeks. If you have any recommendation for me, please feel free to send me an email.
People most generally make a mistake by comparing MapReduce with Spark.
Actually, MapReduce is a programming paradigm, so we cannot compare MapReduce with Spark. But we can compare how Hadoop uses MapReduce and Spark uses MapReduce.
In Hadoop MapReduce, each job has one Map and one Reduce phase; but in Spark MapReduce, the Map and Reduce phases can be made together. Secondly, while in Hadoop MapReduce the output of jobs is written as a file, Spark writes them to the memory. As a result, it accelerates the overall execution time of the master job.
When you make some analysis on Hadoop, Apache Pig is one of the simplest ways to get and transform the data. Another alternative is Apache Hive, which seems more easy for people who already know SQL. Well, I used both, but writing scripts with Pig are better since you become able to see your data in each step of the codes. Moreover, it is more human-readable than SQL style code blocks (nested SQL, etc)
In the last two years, I wrote many Pig scripts. I would like to give some tips about Pig Scripting.
- Use DEFINE functions to separate the file loading functions into a different Pig, which can be named as Loader.pig
- When Pig does not provide the desired functionalities, write your own User Defined Functions with Java. For example, if you need to compare the object values, or if you want to use a sorting algorithm, then you may use your own Java codes and make them call from Pig script. This feature totally increases the flexibility of Apache Pig. When you enter the Java UDF world, then you can do everything with the collaboration of Java and Pig. Here, the main challenge is to track the objects called in UDF but you can develop yourself by making lots of trials.
- Parameter Substitution is a prominent feature of Pig. With @declare annotations, it is possible to define custom variables. However, the dynamic value assignment is a challenge.
- Before running in pig mode, complete your tests with the pig -x local mode with a small amount of data since it becomes inefficient to wait and see the script results in pig mode.
Today, with the advantages of analyzing billions of data in real time, companies can better understand their customers’ emotions. Especially the tech companies should develop bigdata based models to increase user experience and reduce churn.
At this point, I would like to share my experience about Need for Speed iOS app. In the last 3 days, there was a competition in the game that awards people with a Jaguar sports car. However, the car becomes available after a full dedication to the game with a high probability of spending real money.
My wife and I were playing this game during last 3 days in our spare times on the road, at home etc. After achieving 75% progress in the game I realized that I wouldn’t succeed in a limited time then I deleted the game. On the other hand, my wife completed 95% of the game but the end was the same as me: she deleted the game. The point that I would like to emphasize in that story that using big data technology and a churn prediction model, the game company could keep us playing the game at least my wife.
Here, an algorithm may calculate a churn score of each player. By collecting location data from each device, it is possible to identify the couples (small communities). When one member of the community deletes the app, the churn possibility of other members highly increase. Uninstalling the app can be identified by daily usage routine of the user. Finally, the app can propose extra advantages to rest of the members of a single community.
This is my advice to the Big Data team of EA Games. I am aware that they already follow the application usage routines and offer extra money in each trial. However, there is a substantial need for the development of custom algorithms.