Show me your social network and I’ll tell you who you are…

Image : shutterstock

Over the last decade social media has revolutionized the way people all over the world interact with each other, resulting in new types of data being produced every day at an unprecedented rate. Interesting patterns found in our private networks of friends and the content we share and consume are used extensively by companies, governments and nonprofit organizations for decision making, targeted marketing and providing personalized customer experiences. What kind of information are we giving away through social media services and what are its positive and negative uses?

By Dominika Tkaczyk

Some say that if you are not on Facebook, you don’t exist. An exaggeration perhaps, but indeed it is hard to imagine our everyday life without social media anymore. Services like Twitter and Instagram have quickly become an easily accessible channel for anyone to share information with the world, effectively taking over a large fraction of our social activity, including conversations with friends, meeting new people, dating, exchanging opinions and seeking advice.

One of the side effects of this revolution is millions of individuals leaving huge amounts of personal data on the internet. New types of data related to peoples’ interests, opinions and preferences, and their networks of friends and followers are widely available to data miners nowadays. The need to understand this data has brought brand new challenges and opportunities for interdisciplinary researchers, including, but not limited to data analysts, sociologists and psychologists. Social media data is also a rich source of valuable information for various businesses and public sectors institutions.

Due to the volume of available social media content, the only sensible way of processing it is with the use of intelligent automated algorithms, such as machine learning and natural language processing methods. Every day, automated agents harvest data about internet users and process it in order to create personal profiles of the individuals or obtain patterns from peoples’ opinions and browsing behavior.


Equipped with the information we leave (intentionally or not) on the internet, machines are able to provide a personalized user experience for us. For example, Google’s search engine uses information about its users to refine the results of searches, especially when the search query is ambiguous. For example, searching for “ruby” (which can mean a programming language or a gemstone, among other things), a person subscribed to many mailing lists on software development will most likely get different results to a person who has shown interest in minerals. An interesting experiment is to compare the search results of our favorite queries between Google and, which is a search engine known for not storing or using any personal information about its users.

Careful analysis of the networks of followers, their posts and comments can help to identify trendsetters and early adopters, which are often the best individuals to target with new product campaigns.

Similarly, personal information can be used for targeted marketing. The interests and needs expressed by our online behavior, including the pages we visit or comments we make, are an important indicator of the types of products we might be interested in more than others. Also, our demographic details such as age, sex or wealth can often be deduced by intelligent algorithms from the posts we leave on forums and blogs, or from our photos. This information can be used to assign users to customer segments and target them with specific ads, resulting in more profit compared to displaying ads at random.

Social media can also be mined for information about various future events such as concerts, sports events or conferences. This information is also very helpful in advertising specific hotels or flights to select groups of people.

Credit and insurance companies can also use the information mined from social media data to assess the credibility of potential customers, get more insight about their financial or health status and history. This information can then be used to decide about the types of products and their parameters that are offered to people.

An interesting research area is the analysis of how new information spreads on social networks. It is a pretty well-known phenomenon that people differ a lot in how much they influence others and how willingly they try new products or services. Careful analysis of the networks of followers, their posts and comments can help to identify trendsetters and early adopters, which are often the best individuals to target with new product campaigns.


Apart from using social media data to populate databases of peoples’ personal profiles, another important research area is mining general trends or patterns for a larger group of people, ranging from groups distinguished with some demographic criteria, through populations of regions or countries to the entire population of internet users. This can be viewed as an alternative form of market research or polling.

The posts, comments, tweets or reviews we leave on the internet contain useful information about our opinions on products, services, and even entire companies. Even though natural language is not so easily understood by machines, modern algorithms are able to grasp the opinions expressed in the text to some extent. With the right natural language processing tools, companies can automatically monitor people’s subjective responses on a large scale, or even extract more detailed information about exactly what people like or don’t like. This important feedback can be used for decision making related to selling or marketing strategies.

Social media is also a rich source of valuable information for law enforcement agencies and governments. They use social media data to mine information about breaking news events in real time; activist protests, general global trends and moods, and also political movements in order to prevent and fight crime and monitor social moods in general.


There are important issues related to social media mining that should always be considered by all parties involved: companies using the results for decision making, data analysts and, last but not least, users sharing information about themselves online.

Algorithms will also assume we are surrounded by people similar to us, so we might want to be careful about adding random or unknown people to our circles.

First of all, it should be noted that not all data is equally available to mining algorithms due to privacy settings. For example, anyone can access the vast majority of Twitter posts. Twitter went even further by launching a machine-friendly firehose to maximize the accessibility of its data to automated algorithms. On the other hand, Facebook content is often shared only between closed groups of people. It does not, however, mean that Facebook data is not mined at all. The difference is only related to who mines the data: the content in closed platforms is usually only mined by the owning companies, while public services such as Twitter, forums and blogs can be processed by virtually anyone with internet access.

Building people’s personal profiles based on their social media activity may also suffer from data decentralization. Since different platforms have different themes and are used for different purposes, one can only gain a partial picture about a person from mining their activity from one database, while aggregating information from multiple sources results in a much richer profile. This requires automatic identification of the same users across multiple platforms, and ideally multiple devices used for browsing.

Analysts should also be very careful in drawing general conclusions from the information mined from social media. An average social media user is not an average person in general. As a result of working with a non-representative (biased) sample, the findings obtained from social media data cannot be directly generalized to the entire population.


Mining social media data can be viewed as beneficial not only for companies and agencies, but also for internet users, who get recommendations carefully tailored to their needs and personalized user experience. However, there are particular risks related to sharing information on social media, and everyone should be aware of them.

First of all, we should remember that anything we put online can be mined, from the text and images left on public websites to private conversations and our connections to other people. Of course, in general, processing is not done by humans, but rather automated agents building personalized profiles of us or searching for general patterns. Even though algorithms are not judgmental, sometimes you might be better off keeping specific personal details off the internet.

We should also be aware that automated algorithms are in fact blind tools. They do not understand sarcasm or have sense of humor, so joking online about bankruptcy or drug abuse might not be a good idea. Algorithms will also assume we are surrounded by people similar to us, so we might want to be careful about adding random or unknown people to our circles.

Social media is a great and efficient way of communicating in the modern society, but it is a good idea to treat it with a touch of caution. After all, the information we share can be used by intelligent algorithms to influence decisions that might affect us one way or another. And we most likely won’t have a chance to voice our objections.


Pin It