Text Mining for Social Media

Richi Nayak - Basant Agarwal

Abstract: Social media is becoming increasingly important by being ubiquitous, dynamic and real-time. With the exponential growth in the number of social media users and their engagement with the networks such as Twitter, Facebook, Flickr, and many others, a massive amount of user-generated data has been generated. A substantial amount of this data is in the form of text such as posts, reviews, tweets, and blogs. This provides numerous challenges as well as opportunities for the application of text mining methods to discover meaningful information. The research area of text mining is established and these methods have been successfully applied in analysing various social media applications such as hashtag analysis, sentiment mining, abuse/fake detection and emerging trend analysis.

This tutorial aims to illustrate the current text mining methods and applications in social media. We will highlight the challenges that traditional text mining methods face when they are applied on social media data. We will demonstrate how the emerging text mining methods such as factorization and deep learning deal with those challenges and find useful information from the social media data. We will explain well-known text mining methods for hashtag analysis, sentiment mining, abuse/fake detection and emerging trend analysis in detail. We will finish the tutorial by highlighting the open issues in this area and pointing out the hot spots of today’s research. We believe that this tutorial would help bridge multiple research tracks, thereby attracting a greater audience with a view to extending text and web analytics methods.

Tutorial Description

Recognising the increasing interest in social media analytics, this tutorial aims to discuss challenges that occur while exploring social media data and provide the text analytics solutions. The tutorial will cover the basic problems in applying traditional text mining methods to social media text data, solutions that address those problems, advanced methods such as word embedding, deep learning, and factorization methods as well as the details of various applications specific solutions. The tutorial will provide issues and directions for research and development work in the future.This tutorial will be organized in two parts. The first part will delve into the text mining methods. In this part, we will first explain the basic text mining methods, and challenges in dealing with social media data. Generally, the social media text is short which is challenging to work due to sparsity problem. More specifically, social media text is mostly noisy due to high variation in the text such as improper use of sentence structure, grammar, wrong spellings, use of emoticons, and embedded URL to name a few. This kind of irregularity in the social media text provides various challenges to extract any meaningful information. We will discuss the solutions to address these problems, and advanced techniques such as word embedding, ranking-based, deep learning, and factorization methods. In the second part, we will present various social media applications and show how text mining methods are extended to support social media analytics. We will delve into the area of sentiment mining, abuse/fake detection, and emerging trend analysis. A summary of our own research, as well as related work by others in the area, will be discussed.

We propose this as a three-hour tutorial (or 1.5 hrs) with one hour (or 30 min) devoted to explaining the basic methods and challenges and the next two hours (or 1 hr) devoted to advanced methods and applications. This can be flexible and an alternative timing can be adjusted as per the need of conference by controlling the depth of information. We would explain the basic concepts followed by details of new trends, not assuming much prerequisite knowledge from the audience. The topics covered in the tutorial are summarized next.

Prerequisite knowledge of the audience

We do not assume prerequisite knowledge from the audience and would explain the basic concepts necessarysuch as data representation models, similarity matrices, factorization process and various application specific concepts. However, a basic knowledge oftext mining would help readers understating the concepts much better.

Related References

  1. T Sutanto, R Nayak, (2018), Fine-grained document clustering via ranking and its application to social media analytics, Social Network Analysis and Mining 8 (1), 2018.
  2. Basant Agarwal, Heri Ramampiaro, Helge Langseth, Massimiliano Ruocco, (2018), A deep network model for paraphrase detection in short text messages, Information Processing & Management, 54 (6), pp: 922-937.
  3. Hiba Sebei, Mohamed Ali Hadj Taieb, Mohamed Ben Aouicha,(2018), Review of social media analytics process and Big Data pipeline, Social Network Analysis and Mining, Dec 2018..
  4. Stefan Stieglitz, Milad Mirbabaie, Björn Ross, Christoph Neuberger, (2018), Social media analytics—challenges in topic discovery, data collection, and data preparation. International Journal of Information Management,39, pp:156–168.
  5. Leila Bahri, Barbara Carminati, Elena Ferrari, (2018), Decentralized privacy preserving services for Online Social Networks, Online Social Networks and Media, 6. pp: 18-25.
  6. Ahmad Hany Hossny, Terry Moschuo, Grant Osborne, Lewis Mitchell, Nick Lothian, (2018), Enhancing keyword correlation for event detection in social networks using SVD and k-means: Twitter case study, Social Network Analysis and Mining, 8: 49.
  7. Yi-Cheng Chen,(2018), A novel algorithm for mining opinion leaders in social networks, World Wide Web, pp: 1-17.
  8. Mattia G. Campana, Franca Delmastro, (2017), Recommender Systems for Online and Mobile Social Networks: A survey, Online Social Networks and Media, 3(4), pp: 75-97.
  9. Simona Balbi, Michelangelo Misuraca, Germana Scepi, (2018), Combining different evaluation systems on social media for measuring user satisfaction, Information Processing & Management, 54(4), pp: 674-685.
  10. Georgios K Pitsilis, Heri Ramampiaro, Helge Langseth, (2018), Effective hate-speech detection in Twitter data using recurrent neural networks. In Applied Intelligence Journal (APIN). Springer.
  11. Ifada, N. and R. Nayak, (2016). How relevant is the irrelevant data: leveraging the tagging data for a learning-to-rank model. in Proceedings of the ninth ACM international conference on web search and data mining..
  12. Basant Agarwal, Namita Mittal, (2016), Prominent Feature Extraction for Sentiment Analysis, in Springer Book Series: Socio-Affective computing series, Springer International Publishing, DOI: 10.1007/978-3-319-25343-5, pages: 1-115.
  13. Chen, Lin, Vallmuur, Kirsten, & Nayak, Richi (2015), Injury narrative text classification using factorization model. BMC Medical Informatics and Decision Making, 15(s5).
  14. Basant Agarwal, Soujanya Poria, Namita Mittal, Alexander Gelbukh, Amir Hussain, (2015), Concept Level Sentiment Analysis using Dependency-based Semantic Parsing: A Novel Approach, Cognitive Computation,7(4), pp 487–499.
  15. Basant Agarwal, Namita Mittal, Pooja Bansal, Sonal Garg, (2015), Sentiment Analysis Using Common-Sense and Context Information, Computational Intelligence and Neuroscience, Article ID 715730
  16. Kutty, Sangeetha, Nayak, Richi, & Chen, Lin (2014), A people-to-people matching system using graph mining techniques. World Wide Web, 17, pp. 311-349.
  17. Ifada, N. and R. Nayak, (2014), Tensor-based item recommendation using probabilistic ranking in social tagging systems. in 23rd International Conference on World Wide Web. ACM.
  18. Basant Agarwal, Namita Mittal, (2014), Semantic Feature Clustering for Sentiment Analysis of English Reviews, IETE Journal of Research, Taylor Francis, 60 (6), pp: 414-422.
  19. Basant Agarwal, Namita Mittal, (2014), Prominent Feature Extraction for Review Analysis: An Empirical Study, Journal of Experimental and theoretical Artificial Intelligence, 28(3), pp:485-498.
  20. Chen, Lin & Nayak,Richi (2012), Leveraging the network information for evaluating answer quality in a collaborative question answering portal. Social Network Analysis and Mining, 2(3), pp. 197-215.
  21. Kutty, S., R. Nayak, and Y. Li, (2011), XML documents clustering using a tensor space model. Advances in Knowledge Discovery and Data Mining, pp. 488-499.
  22. Rawat, R., R. Nayak, and Y. Li, Clustering of web users using the tensor decomposed models. User Modeling, Adaptation, and Personalization, 2010: p. 37-39.

Information on presenters

Dr Richi Nayak is Associate Professor of Computer Science and Head of the Data Science Discipline in the School of Electrical Engineering and Computer Science, Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia. She is an internationally recognized expert in data mining, text mining and web intelligence. She has been successful in attaining over $2.1 million in competitive external research funding over the past ten years in the area of text mining. She consults a number of government agencies in data, text and social media analytics projects. She has an h-index of 26. She is a steering member of Australasian Data Mining committee in Australia (AusDM). She has presented tutorials on related topics previously in the conferences such as EDBT2010, DASFAA2009 and WISE2017. These tutorials were very well received. She has also chaired a number of workshops in conferences such as ICDM, PAKDD, and INEX. She is highly engaged in text analytics education to undergraduate and postgraduate students. She has supervised fifteen HDR students to completion in the area of social media analytics. She is founder and leader of the Applied Data Mining Research Group at QUT. She has received a number of awards and nominations for teaching, research and service activities.

Dr Basant Agarwal is working as an associate professor at Swami Keshvanand Institute of Technology Management Gramothan, India. He is also affiliated as visiting Research Fellow at Department of Computer Science, Norwegian University of Science and Technology (NTNU), Norway. He has been awarded prestigious PostDoc Fellowship by ERCIM (European Research Consortium for Informatics and. Mathematics) through “Alain Bensoussan Fellowship Programme” in 2016. He worked as PostDoc Research Fellow at NTNU, Norway. He has also worked as Research Scientist at Temasek Laboratories, National University of Singapore (NUS) Singapore. He has authored one book on topic of sentiment analysis in Springer Book Series: Socio-Affective Computing series. He has published more than 40 reputed conferences and Journal. Dr. Agarwal is serving as senior member, technical program committee member, member of editorial board/reviewer board of various renowned international conferences/journals such as Knowledge-based Systems, IEEE Intelligent Systems, Information Processing & Management to name a few. His research interest is in Artificial Intelligence, Text mining, Natural Language Processing, Machine learning, Deep learning, Intelligent Systems, Expert Systems and related areas.

Early Bird registrations

Until September 30th, 2018