skip to main content
Several features on this page require Premium Access.
You are using the Basic Edition. Features requiring a subscription appear in grey.
research-article
Free access

Spam Mobile Apps: Characteristics, Detection, and in the Wild Analysis

Published: 03 April 2017 Publication History

Abstract

The increased popularity of smartphones has attracted a large number of developers to offer various applications for the different smartphone platforms via the respective app markets. One consequence of this popularity is that the app markets are also becoming populated with spam apps. These spam apps reduce the users’ quality of experience and increase the workload of app market operators to identify these apps and remove them. Spam apps can come in many forms such as apps not having a specific functionality, those having unrelated app descriptions or unrelated keywords, or similar apps being made available several times and across diverse categories. Market operators maintain antispam policies and apps are removed through continuous monitoring. Through a systematic crawl of a popular app market and by identifying apps that were removed over a period of time, we propose a method to detect spam apps solely using app metadata available at the time of publication. We first propose a methodology to manually label a sample of removed apps, according to a set of checkpoint heuristics that reveal the reasons behind removal. This analysis suggests that approximately 35% of the apps being removed are very likely to be spam apps. We then map the identified heuristics to several quantifiable features and show how distinguishing these features are for spam apps. We build an Adaptive Boost classifier for early identification of spam apps using only the metadata of the apps. Our classifier achieves an accuracy of over 95% with precision varying between 85% and 95% and recall varying between 38% and 98%. We further show that a limited number of features, in the range of 10--30, generated from app metadata is sufficient to achieve a satisfactory level of performance. On a set of 180,627 apps that were present at the app market during our crawl, our classifier predicts 2.7% of the apps as potential spam. Finally, we perform additional manual verification and show that human reviewers agree with 82% of our classifier predictions.

Formats available

You can view the full content in the following formats:

References

[1]
Ion Androutsopoulos, Georgios Paliouras, Vangelis Karkaletsis, Georgios Sakkis, Constantine D. Spyropoulos, and Panagiotis Stamatopoulos. 2000. Learning to filter spam e-mail: A comparison of a naive Bayesian and a memory-based approach. arXiv preprint cs/0009009 (2000).
[2]
App Annie. 2016. App Forecast: Over $100 Billion In Revenue by 2020. Retrieved from http://blog.appannie.com/app-annie-releases-inaugural-mobile-app-forecast/.
[3]
AppBrain, Inc. 2016. New Android apps per month. Retrieved from http://www.appbrain.com/stats/number-of-android-apps.
[4]
Apple. 2014. Common App Rejections. Retrieved from https://developer.apple.com/app-store/review/rejections/.
[5]
Apple. 2016. App Store Review Guidelines. Retrieved from https://developer.apple.com/app-store/review/guidelines/.
[6]
Hrishikesh B. Aradhye, Gregory K. Myers, and James A. Herson. 2005. Image analysis for efficient categorization of image-based spam e-mail. In Proceedings of the 8th International Conference on Document Analysis and Recognition. IEEE, 914--918.
[7]
Vitalii Avdiienko, Konstantin Kuznetsov, Alessandra Gorla, Andreas Zeller, Steven Arzt, Siegfried Rasthofer, and Eric Bodden. 2015. Mining apps for abnormal usage of sensitive data. In Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 426--436.
[8]
AVG. 2014. Website Safety Ratings and Reputation. Retrieved from http://www.avgthreatlabs.com/website-safety-reports/app.
[9]
Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney. 2004. A probabilistic framework for semi-supervised clustering. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining. ACM, 59--68.
[10]
Fabrıcio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgılio Almeida. 2010. Detecting spammers on twitter. In Proceedings of the 7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference.
[11]
Enrico Blanzieri and Anton Bryl. 2008. A survey of learning-based techniques of email spam filtering. Artificial Intelligence Review 29, 1 (2008), 63--92.
[12]
Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and Regression Trees. CRC Press.
[13]
Iker Burguera, Urko Zurutuza, and Simin Nadjm-Tehrani. 2011. Crowdroid: Behavior-based malware detection system for android. In Proceedings of the 1st Workshop on Security and Privacy in Smartphones and Mobile Devices. ACM, 15--26.
[14]
Omar Canales, Vinnie Monaco, Thomas Murphy, Edyta Zych, John Stewart, Charles Tappert, Alex Castro, Ola Sotoye, Linda Torres, and Greg Truley. 2011. A stylometry system for authenticating students taking online tests. In Proceedings of the Student-Faculty CSIS Research Day (2011).
[15]
Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock, and Fabrizio Silvestri. 2007. Know your neighbors: Web spam detection using the web topology. In Proceedings of the 30th Annual International Conference on Research and Development in Information Retrieval. ACM, 423--430.
[16]
Rishi Chandy and Haijie Gu. 2012. Identifying spam in the iOS app store. In Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality. ACM, 56--59.
[17]
Nitesh V. Chawla, Aleksandar Lazarevic, Lawrence O. Hall, and Kevin W. Bowyer. 2003. SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 107--119.
[18]
Kai Chen, Peng Liu, and Yingjun Zhang. 2014. Achieving accuracy and scalability simultaneously in detecting application clones on android markets. In Proceedings of the 36th International Conference on Software Engineering. ACM, 175--186.
[19]
Paul-Alexandru Chirita, Jörg Diederich, and Wolfgang Nejdl. 2005. MailRank: Using ranking for spam detection. In Proceedings of the 14th International Conference on Information and Knowledge Management. ACM, 373--380.
[20]
Gordon V. Cormack, José María Gómez Hidalgo, and Enrique Puertas Sánz. 2007. Spam filtering for short messages. In Proceedings of the 16th Conference on Information and Knowledge Management. ACM, 313--320.
[21]
Jonathan Crussell, Clint Gibler, and Hao Chen. 2013. AnDarwin: Scalable detection of semantically similar android applications. In Computer Security--ESORICS 2013. Springer, 182--199.
[22]
Andrea Di Sorbo, Sebastiano Panichella, Carol V. Alexandru, Junji Shimagaki, Corrado A. Visaggio, Gerardo Canfora, and Harald Gall. 2016. What would users change in my app? Summarizing app reviews for recommending software changes. In Proceedings of the 2016 ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE).
[23]
Harris Drucker, S. Wu, and Vladimir N. Vapnik. 1999. Support vector machines for spam categorization. IEEE Transactions on Neural Networks 10, 5 (1999), 1048--1054.
[24]
Miklós Erdélyi, András Garzó, and András A. Benczúr. 2011. Web spam classification: A few features worth more. In Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality. ACM, 27--34.
[25]
Jeffrey Erman, Anirban Mahanti, Martin Arlitt, Ira Cohen, and Carey Williamson. 2007. Offline/realtime traffic classification using semi-supervised learning. Performance Evaluation 64, 9--12 (Oct. 2007), 1194--1213.
[26]
Adnan Farooqui. 2016. Apple Promises To Clamp Down On Spam Apps. Retrieved from http://www.ubergizmo.com/2016/03/apple-promises-to-clamp-down-on-spam-apps/.
[27]
Yu Feng, Saswat Anand, Isil Dillig, and Alex Aiken. 2014. Apposcopy: Semantics-based detection of android malware through static analysis. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 576--587.
[28]
Dennis Fetterly, Mark Manasse, and Marc Najork. 2004. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In Proceedings of the 7th International Workshop on the Web and Databases. ACM, 1--6.
[29]
Rudolph Flesch. 1948. A new readability yardstick. Journal of Applied Psychology 32, 3 (1948), 221.
[30]
Yoav Freund and Robert E. Schapire. 1996. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, Vol. 96. Morgan Kaufmann, 148--156.
[31]
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The Elements of Statistical Learning. Vol. 1. Springer Series in Statistics. Springer, Berlin. 367--370.
[32]
Bin Fu, Jialiu Lin, Lei Li, Christos Faloutsos, Jason Hong, and Norman Sadeh. 2013. Why people hate your app: Making sense of user feedback in a mobile app store. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1276--1284.
[33]
José María Gómez Hidalgo, Guillermo Cajigas Bringas, Enrique Puertas Sánz, and Francisco Carrero García. 2006. Content based SMS spam filtering. In Proceedings of the 2006 Symposium on Document Engineering. ACM, 107--114.
[34]
Google. 2014. Rating your application content for Google Play. Retrieved from https://support.google.com/googleplay/android-developer/answer/188189.
[35]
Google. 2016a. Google Play Developer Policy Center. Retrieved from https://play.google.com/about/developer-content-policy-print/.
[36]
Google. 2016b. Impersonation and Intellectual Property. Retrieved from https://play.google.com/about/ip-deception-spam/impersonation-ip/.
[37]
Google. 2016c. Spam. Retrieved from https://play.google.com/about/ip-deception-spam/spam.
[38]
Alessandra Gorla, Ilaria Tavecchia, Florian Gross, and Andreas Zeller. 2014. Checking app behavior against app descriptions. In Proceedings of the 36th International Conference on Software Engineering. 1025--1035.
[39]
Michael Grace, Yajin Zhou, Qiang Zhang, Shihong Zou, and Xuxian Jiang. 2012. Riskranker: Scalable and accurate zero-day android malware detection. In Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services. ACM, 281--294.
[40]
Emitza Guzman and Walid Maalej. 2014. How do users like this feature? A fine grained sentiment analysis of app reviews. In Proceedings of the 2014 IEEE 22nd International Requirements Engineering Conference (RE). IEEE, 153--162.
[41]
Zoltán Gyöngyi, Hector Garcia-Molina, and Jan Pedersen. 2004. Combating web spam with trustrank. In Proceedings of the 13th International Conference on Very Large Databases. VLDB Endowment, 576--587.
[42]
Mark Harman, Yue Jia, and Yuanyuan Zhang. 2012. App store mining and analysis: MSR for app stores. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories. IEEE Press, 108--111.
[43]
Claudia Iacob and Rachel Harrison. 2013. Retrieving and analyzing mobile apps feature requests from online reviews. In Proceedings of the 2013 10th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, 41--44.
[44]
Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intelligent Data Analysis 6, 5 (2002), 429--449.
[45]
Nitin Jindal and Bing Liu. 2007. Review spam detection. In Proceedings of the 16th International Conference on World Wide Web. ACM, 1189--1190.
[46]
Nitin Jindal and Bing Liu. 2008. Opinion spam and analysis. In Proceedings of the 2008 International Conference on Web Search and Data Mining. ACM, 219--230.
[47]
Ron Kohavi and George H. John. 1997. Wrappers for feature subset selection. Artificial Intelligence 97, 1 (1997), 273--324.
[48]
Vijay Krishnan and Rashmi Raj. 2006. Web spam detection with anti-trust rank. In Proceedings of the 2nd International Workshop on Adversarial Information Retrieval on the Web, Vol. 6. 37--40.
[49]
Barry Leiba, Joel Ossher, V. T. Rajan, Richard Segal, and Mark N. Wegman. 2005. SMTP path analysis. In Proceedings of the 2nd Conference on Email and Anti-Spam.
[50]
Walid Maalej and Hadeer Nabil. 2015. Bug report, feature request, or simply praise? On automatically classifying app reviews. In Proceedings of the 2015 IEEE 23rd International Requirements Engineering Conference (RE). IEEE, 116--125.
[51]
Dragos D. Margineantu and Thomas G. Dietterich. 1997. Pruning adaptive boosting. In ICML, Vol. 97. 211--218.
[52]
Vangelis Metsis, Ion Androutsopoulos, and Georgios Paliouras. 2006. Spam filtering with naive Bayes—Which naive Bayes? In Proceedings of 3rd Conference on Email and Anti-Spam. 27--28.
[53]
Gilad Mishne, David Carmel, and Ronny Lempel. 2005. Blocking blog spam with language model disagreement. In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web, Vol. 5. 1--6.
[54]
Arjun Mukherjee and Bing Liu. 2010. Improving gender classification of blog authors. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 207--217.
[55]
Alexandros Ntoulas, Marc Najork, Mark Manasse, and Dennis Fetterly. 2006. Detecting spam web pages through content analysis. In Proceedings of the 15th International Conference on World Wide Web. ACM, 83--92.
[56]
Jon Oberheide and Charlie Miller. 2012. Dissecting the Android bouncer. Retrieved from https://jon.oberheide.org/files/summercon12-bouncer.pdf.
[57]
Oracle. 2014. Naming a Package. Retrieved from http://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html.
[58]
Boykin P. Oscar and Vwani P. Roychowdbury. 2005. Leveraging social networks to fight spam. IEEE Computer 38, 4 (2005), 61--68.
[59]
Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado A. Visaggio, Gerardo Canfora, and Harald C. Gall. 2015. How can I improve my app? Classifying user reviews for software maintenance and evolution. In Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 281--290.
[60]
Patrick Pantel and Dekang Lin. 1998. Spamcop: A spam classification 8 organization program. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization. 95--98.
[61]
Hao Peng, Chris Gates, Bhaskar Sarma, Ninghui Li, Yuan Qi, Rahul Potharaju, Cristina Nita-Rotaru, and Ian Molloy. 2012. Using probabilistic generative models for ranking risks of android apps. In Proceedings of the Conference on Computer and Communications Security. ACM, 241--252.
[62]
Sarah Perez. 2013a. Developer Spams Google Play With Ripoffs of Well-Known Apps Again. Retrieved from http://techcrunch.com.
[63]
Sarah Perez. 2013b. Nearly 60K Low-Quality Apps Booted From Google Play Store in February, Points To Increased Spam-Fighting. (2013). http://tcrn.ch/14SwCQj.
[64]
Sarah Perez. 2016. Apple’s Phil Schiller promises to address the issue of spammy apps being featured in the App Store. Retrieved from https://techcrunch.com/2016/03/14/apples-phil-schiller-promises-to-address-the-issue-of-spammy-apps-being-featured-in-the-app-store/.
[65]
Thanasis Petsas, Antonis Papadogiannakis, Michalis Polychronakis, Evangelos P. Markatos, and Thomas Karagiannis. 2013. Rise of the planet of the apps: A systematic study of the mobile app ecosystem. In Proceedings of the 2013 Conference on Internet Measurement Conference. ACM, 277--290.
[66]
PocketGamer.biz. 2016. Count of Application Submissions. Retrieved from http://www.pocketgamer.biz/metrics/app-store/submissions/.
[67]
J. R. Quinlan. 1996. Bagging, boosting, and C4.S. In Proceedings of the 13th National Conference on Artificial Intelligence - Volume 1. AAAI Press, 725--730.
[68]
Mehran Sahami, Susan Dumais, David Heckerman, and Eric Horvitz. 1998. A Bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 Workshop, Vol. 62. 98--105.
[69]
David Sculley and Gabriel M. Wachman. 2007. Relaxed online SVMs for spam filtering. In Proceedings of the 30th Annual International Conference on Research and Development in Information Retrieval. ACM, 415--422.
[70]
Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, and Amri Napolitano. 2010. RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 40, 1 (2010), 185--197.
[71]
Suranga Seneviratne, Aruna Seneviratne, Dali Kaafar, Anirban Mahanti, and Prasant Mohapatra. 2014a. Why My App Got Deleted: Detection of Spam Mobile Apps. Technical Report. NICTA, Australia.
[72]
Suranga Seneviratne, Aruna Seneviratne, Mohamed Ali Kaafar, Anirban Mahanti, and Prasant Mohapatra. 2015. Early detection of spam mobile apps. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). International World Wide Web Conferences Steering Committee, 949--959.
[73]
Suranga Seneviratne, Aruna Seneviratne, Prasant Mohapatra, and Anirban Mahanti. 2014b. Predicting user traits from a snapshot of apps installed on a smartphone. ACM SIGMOBILE Mobile Computing and Communications Review 18, 2 (2014), 1--8.
[74]
R. J. Senter and E. A. Smith. 1967. Automated Readability Index. Technical Report AMRL-TR-66-220. Aerospace Medical Research Laboratories.
[75]
Ian Soboroff, Iadh Ounis, J. Lin, and I. Soboroff. 2012. Overview of the TREC-2012 microblog track. In Proceedings of the 21st Text Retrieval Conference.
[76]
Statista, Inc. 2016. Number of apps available in leading app stores as of June 2016. Retrieved from http://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/.
[77]
Tecno Buffalo. 2016. Apple exec responds flood of spam apps in App Store. Retrieved from http://www.technobuffalo.com/2016/03/14/apple-exec-responds-flood-of-spam-apps-in-app-store/.
[78]
Nicolas Viennot, Edward Garcia, and Jason Nieh. 2014. A measurement study of google play. In Proceedings of the 2014 International Conference on Measurement and Modeling of Computer Systems. ACM, 221--233.
[79]
Alex Hai Wang. 2010. Don’t follow me: Spam detection in twitter. In Proceedings of the 2010 International Conference on Security and Cryptography. IEEE, 1--10.
[80]
Wikipedia. 2014. Wikipedia: Lists of common misspellings. Retrieved from http://en.wikipedia.org/wiki/.
[81]
Wei Yang, Xusheng Xiao, Benjamin Andow, Sihan Li, Tao Xie, and William Enck. 2015. Appcontext: Differentiating malicious and benign mobile app behaviors using context. In Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 303--313.
[82]
Yueqian Zhang, Xiapu Luo, and Haoyang Yin. 2015. Dexhunter: Toward extracting hidden code from packed android applications. In Computer Security--ESORICS 2015. Springer, 293--311.
[83]
Yajin Zhou, Zhi Wang, Wu Zhou, and Xuxian Jiang. 2012. Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. In Proceedings of the 2012 Network and Distributed System Security Symposium. The Internet Society.

Cited By

View all
  • (2025)Detecting and Characterising Mobile App Metamorphosis in Google Play StoreIEEE Transactions on Mobile Computing10.1109/TMC.2025.355012124:8(7489-7504)Online publication date: Aug-2025
  • (2025)BERTDetect: A Neural Topic Modelling Approach for Android Malware DetectionCompanion Proceedings of the ACM on Web Conference 202510.1145/3701716.3717501(1802-1810)Online publication date: 8-May-2025
  • (2023)Semantic similarity for mobile application recommendation under scarce user dataEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.105974121:COnline publication date: 1-May-2023
  • Show More Cited By

Recommendations

Comments