Studying the viral growth of a connective action network using information event signatures
First Monday

Studying the viral growth of a connective action network using information event signatures by Jeff Hemsley



Abstract
The Arab spring and Occupy Wall Street movements demonstrated that networks of individuals who share interests or grievances could quickly form on social media. There is a reciprocal relationship between the growth of these networks and the information that flows through them. This study examines this relationship by using viral information event signatures, which show the changing rate of sharing of a specific message over a period of time. The Occupy movement and the digital interactions of its participants provides a context and rich corpus of data from which to study the relationship between the signatures of information flows and the growth the Occupy network. Using exploratory data analysis and multivariate regression to analyze Occupy related tweets drawn from a corpus of over 64 million tweets, this study first provides a parameterized signature model and then uses regression to show that a relationship exists between the shape of the signature and the rate at which key actors gain followers. This work also finds a quadratic decline, over the life cycle of the movement, in the rate at which the actors gain followers. The contributions of this work include the parameterized signature model, a demonstration of its usefulness, and a new perspective on the growth of the Occupy movement.

Contents

Introduction
Viral events and their signatures
Data and analysis
Regression results
Discussion: Viral network growth
Conclusion

 


 

Introduction

The Arab spring and Occupy Wall Street (hereafter “Occupy”) movements demonstrated that networks of individuals who share interests or grievances could quickly form on social media. Social networking sites like Facebook, Twitter and YouTube all support the capability to discover and establish new connections with those who have posted content they find novel and interesting. There is a reciprocal relationship between the structure of social networks within these sites and new information that flows through these networks. The structure of the networks, in terms how people are linked together, both constrains and enables information exchanges. When users discover information through a secondary or remote contact they may be stimulated to form a new direct contact with the user who posted the content originally. By creating this new link they change the structure of the social network, potentially creating shorter paths for future information flows.

One way to study the reciprocal relationship between a network and the flow of information through that network is to study the changing rates of sharing or viewing content over time. For example, we can plot the number of times per minute a tweet is retweeted. While each plot will be somewhat different, over a large set there are patterns. I shall refer to these distinctive shapes as signatures. The signatures of content that spreads virally in networks tend to show a spike in sharing soon after the initial post followed by a more gradual decline, or decay, in sharing. More gradual decays after the peak of sharing are indicative of viral or social sharing (Broxton, et al., 2010; Crane and Sornette, 2008), where viral or social sharing is when many people discover and share content in their social networks such that the content spreads beyond their own circle of friends (or followers), to the friends of their friend’s friends (Hemsley and Mason, 2013; Nahon and Hemsley, 2013). In other words, viral sharing can spread farther from its source than non-viral content and this viral reach is detectable by examining viral event signatures.

Studying signatures of viral events can help researchers predict when a protest will become a movement. By examining the signatures of information flows sent out by key Occupy accounts from October 2011 to June 2012, this paper provides empirical evidence that information flows with more gradual signatures are related to the creation of more links in the Occupy Twitter follower network. The paper also shows that key Occupy accounts gained followers quickly at beginning of the study period, reflected in more gradual signature decays, but that the rate they gained followers slowed, and the signatures became steeper, as the movement progressed in time. The decline in the rate of growth of followers is reflected in sharper signature decays.

Occupy provides an excellent case from which to study the relationship between information flows and the growth a social network. Agarwal, et al. (2014) describe the Occupy Wall Street Movement as a complex connective action network (Agarwal, et al., 2014; Bennett and Segerberg, 2012; Bennett, et al. 2014). A connective action network supports collective action by employing technologies that link together users who all have their own network. The result is a network of networks where people can share resources (e.g., information) that supports a group identity and facilitates group responsiveness to external stimulus. In other words, compared to other kinds of protests movements, technologies that support linking and sharing of information were critical to Occupy’s growth. Agarwal, et al. (2014) also claim that Twitter acted as a key service for the movement’s growth, the coordination of individuals and resources and for linking together different networks (the network of networks). Thus, this work focuses on the information flows within the network of networks on Twitter. The Occupy protests began on 17 September 2011 with roughly 1,000 protesters gathering on Wall Street in New York City. Protesters occupied nearby Zucotti Park, maintaining a camp there over the following weeks. By November protesters were encamped at more than 2,000 cities around the world [1] and daily Twitter traffic containing Occupy related hashtags (e.g., OWS, OccupySeattle, OccupyDenver) ranged between 300,000 and 1,000,000 tweets per day (Agarwal, et al., 2014).

 

++++++++++

Viral events and their signatures

Jurvetson (Jurvetson and Draper, 1997; Jurvetson, 2000), credited with coining the term viral marketing (Kirby and Marsden, 2005; Nahon and Hemsley, 2013), provides an example of how the process of virality works in his study of HotMail. He wrote that soon after a single user in a tightly knit group, or cluster, adopted HotMail, others in the cluster would quickly follow suit. Then a single user in another cluster would adopt HotMail, followed by others in that new cluster also adopting. The link connecting the clusters is what Granovetter [2] referred to as a weak tie. Simply put, the people we would consider a weak tie are our acquaintances: old school friends, people we worked with at past jobs. Alternately, strong ties can be thought of as people we interact with frequently or consider close. Work by numerous authors (Bakshy, et al., 2012; Barabási, 2003; Burt, 2004; Granovetter, 1973) suggests that information moves quickly within strong tie clusters, but needs weak ties to move to new clusters.

Nahon and Hemsley capture these mechanics in the following definition of viral events: “Virality is a social information flow process where many people simultaneously forward a specific information item, over a short period of time, within their social networks, and where the message spreads beyond their own [social] networks to different, often distant networks, resulting in a sharp acceleration in the number of people who are exposed to the message” [3]. Importantly for this work, their definition of virality is scalable. That is, since they see virality as the process by which information spreads in social networks, a tweet does not need to be retweeted thousands or millions of times to be considered viral. Rather, a viral event is one where content is socially shared and reaches new audiences.

The phrase sharp acceleration in their definition refers to how quickly viral events spread and reach their audience. Indeed, a key aspect of their overall concept of what differentiates a viral event from other kinds of information flows includes the observation that viral events exhibit a general temporal pattern: a burst of sharing activity that culminates in a peak, which is followed by a gradual decay in sharing activity. The burst, or sharp acceleration, is a result of exponential growth in the number of users who are exposed to and share the content in social networks. A maximum, or peak, in sharing activity is quickly reached, after which the number of people exposed to and that share the content decays. That is, the rate of growth in sharing leading to the peak is greater than the rate of decay of sharing after the peak (see Figure 1).

 

Example viral event signature
 
Figure 1: Example viral event signature. Note the initial burst of sharing activity followed by a more gradual decay that eventually shows little activity.

 

Nahon and Hemsley (2013) use the term signature to capture this temporal pattern of viral information flows and suggest that the shapes of signatures (see Figure 2a) may be related to the ratio of “top-down promotional and bottom-up social processes” [4]. In their view, virality is driven by a mix of (bottom-up) individuals sharing content and (top-down) promotional forces. Promotional forces here means actors that have the ability to broadcast content to large numbers of users, often across many clusters in social networks, and often across dispirit platforms, such as Twitter, Facebook and mainstream media channels. Huffington Post and Fox News are examples of these actors, but those with many followers, such as celebrities or individuals with credibility on a given topic, also have the ability to promote content. Nahon and Hemsley (2013) refer to these actors as gatekeepers and theorize that viral content that relies more on promotion than bottom-up social sharing creates steeper temporal signatures.

 

Different signature shapes
 
Figure 2: a) Signature shapes driven by promotional or social forces; b) Example power law formula and shapes.

 

There is support for this view. Crane and Sornette (2008) studied the temporal pattern (signature) of daily views for five million YouTube videos. Using clustering techniques they found that while roughly 90 percent of the videos receive an insignificant amount of views or can be temporally modeled with a Poisson distribution, the reminder displayed a burst of viewing activity followed by power-law relaxation response (see Figure 2b). That is, they were able to compare the shapes of signatures by fitting the curve of a power law from the peak of daily view activity and down through the decay phase of their videos. Videos that exhibited more gradual decays (lower values of alpha, the “shape parameter” of a power law shown in Figure 2b) were associated with endogenous events: videos that originated within the YouTube user community and were spread by users sharing links to the videos. Crane and Sornette suggested that these endogenous videos were what we would call viral videos. Alternately, sharper decays were associated with exogenous events, videos that originate outside of the community and were shared less, but still received a large number of views. An example might be a news clip from a prominent media outlet like CNN. Broxton, et al. (2010) also examined the rate of daily views for YouTube videos, looking specifically at the relationship between the rate of video views and the fraction of those views that resulted from social sharing, cases where viewers of videos came to the video as a result of someone sharing a link with them. Similar to Crane and Sornette (2008), they found that videos with a higher percentile of social sharing tended decay off of their peaks more gradually.

To see how a greater ratio of social sharing might result in signatures with more gradual decays, consider two hypothetical information flows, F1 and F2, posted by some actor A (see Figure 3). In F1, eight of A’s followers discover and re-post the content into their own social networks and the flow stops. This represents an information flow that spreads only one degree from its origin. Following Wasserman and Faust’s (1994) nomenclature I shall refer to this topology as a star, shown in Figure 3. In F2, eight of A’s followers discover and re-post the content into their own social networks. However, as we can see in Figure 3, actors J, K and L also discover and re-post the content from actors C, H and K, respectively. The result is that the information flow topology of F2 contains a star but also contains longer sharing chains.

 

Example information flow topology
 
Figure 3: Example information flow topology showing a star (left) and a star with a few longer sharing chains (right).

 

If we assume that it takes roughly the same amount of time for actors to discover and share content, referred to as the wait time, then F1 should look more like the ‘Promoted’ signature in Figure 2a and F2 should look more like ‘Social’ signature. Why? In F1, we would observe the re-posts from actors B through I at about the same time, creating one single peak. In F2 we would have the same size peak since there are the same number of first degree re-posts, but then we would have re-posts from J and K, followed by a re-post by L. Since we are assuming a roughly constant wait time, and since a temporal signature shows the rate (y-axis) of shares or views over time (x-axis), F1’s signature would reflect eight shares at time 1, followed by a flat line (Figure 4), but F2 would have 8, 2 and 1 shares at times 1, 2, and 3 respectively.

 

Example signatures for hypothetical information flows
 
Figure 4: Example signatures for hypothetical information flows F1 and F2.

 

But is a constant time interval a reasonable assumption? Barabási (2005) found that wait times for responses to e-mail, online game messages, and instant messages could be approximated by a heavy-tailed, or Pareto, distribution. This means that the majority of messages are responded to very quickly and only a few have very long wait times. So wait times from the initial post to each re-post can be modeled as a random variable. For this generalized comparative example we can hold wait times constant for our two diffusion events. More generally, over a large number of signatures, with many individual actors, we can assume that differences will show up as noise in models and that signatures with more gradual decay phases reflect information flow events with more social sharing.

When novel information reaches new audiences as a result of social sharing, some of these audience members will find it interesting enough that they will opt to follow the source of the information. In support of this, Teng, et al. (2012) found that novel information is related to new follower links in Twitter discussion networks. Thus, we would expect that long chain diffusion events, as indicated by more gradual decays of the event signatures, will be related to greater increases in the number of followers of the individual who sent out the original tweet.

Note that tracking the growth in the number of people who follow key actors can help us understand how collective action networks like Occupy grow over time. On Twitter users can opt to follow other users. When they do, the tweets of the person they follow are streamed to their Twitter home page. Kawk, et al. (2010) crawled the entire Twitter site over the summer of 2009 and characterized the Twitter follower network. As of that time there were nearly 42 million users with 1.47 billion links. The distribution of the number of followers that users have follows a power law up to the upper end where a small number of Twitter accounts have more followers than a power law would predict (e.g., New York Times had more than one million followers even as early as 2009). A power law (like) distribution means that the vast majority of users have few followers and small minority have many. In the context of this network, a user with 1,000 followers is rather popular. Users with tens of thousands of followers are unusual and might act as gatekeepers (Nahon, 2005; Shoemaker, et al., 2010; Shoemaker and Vos, 2009) or network hubs (Barabási, 2003; Wasserman and Faust, 1994) that exercise some level of control over the flow of information and, importantly, link together dispirit networks. Thus, active user accounts with high numbers of followers can act as the hubs that link together the network of networks that is a hallmark of connective action networks (Bennett and Segerberg, 2012).

Studying changes in the number of people who follow key accounts as a method for looking at the evolution of Occupy’s connective action network also represents a departure from the work of others who have examined the growth the Occupy movement by measuring the volume of tweets containing occupy related hashtags (#OWS, #OccupyOakland, #OccupySeattle) and URLs (Agarwal, et al., 2014), or by linking the volume of tweets with the hashtag #OccupyWallStreet to demonstrations on the ground, showing the movement shift from local to national to global (Tremayne, 2014). Indeed, many researchers have linked high volumes of tweets to ground based activities (Caren and Gaby, 2011; Conover, Ferrara, et al., 2013; Nahon, et al., 2013; Tan, et al., 2013). Others (Conover, Davis, et al., 2013; Hemsley and Eckert, 2014) have examined the growth of the Occupy network as measured by @mention networks, that is, by measuring how many times different actors @mention each other on Twitter. But this is, again, fundamentally measuring tweet volumes.

Interesting, the Occupy follower network still exists, while the volume of Occupy tweets has dwindled to a trickle [5]. The OccupyWallStreet [6] Twitter account, one of the key Occupy accounts during the height of the movement, has 210,000 followers and OccupySeattle [7] has over 14,000 followers. Both of these accounts still features tweets almost daily. Of course, other city-based Occupy accounts are quiescent: OccupyKansas [8] has 7,000 followers but hasn’t sent out a tweet since May of 2013. During the height of the movement, these city-based user accounts, with their massive followings, acted as hubs in the Occupy connective action network. Examining the growth of their followers can give us valuable insights into the formation of collective action networks.

Since viral events with more gradual decays are associated with more social sharing, I expect that more gradual decays are related to higher gains in followers by these key actors. The rate of decay of a signature is measured by fitting a power-law and estimating the shape parameter, alpha. Lower values of alpha reflect more gradual decays. This leads to our first research question, stated as a hypothesis test:

H1: There is a negative relationship between alpha and the change in followers during information flows initiated by the Occupy city accounts.

While studying the growth of followers of key accounts will give us insight in the growth the Occupy collective action network, additional insights might be gained by contrasting the growth phase with the decline phase. Agarwal, et al. (2014) also show that during the decline of movement, measured by the volume of tweets, the types of links being shared and retweeted within the protestor network changed from news stories and external links to links to Occupy related Web sites. Interestingly, even though the movement appeared to be in decline, this content might have still spread widely within the Occupy network where users would have shared the content because it resonated with them. Nahon and Hemsley (2013) suggest that one motivation for sharing content is that it resonates with a user: they feel strongly about the content and want others to see it. Conover, Ferrara, et al. (2013) also studies the decline by tracking the initial growth and decline in the volume of tweets of key, highly followed Twitter users. Over time, these users appeared to lose interest in the movement because they stopped tweeting about it. These authors all examine the growth and decline of the movement in terms of the volume of tweets, either focusing on hashtags, specific user’s tweets or on the flow of URLs as resources.

Based on this discussion we can expect to detect a qualitative difference in the rate with which our Occupy accounts gain followers between the early part of the movement and later in the movement.

H2: Holding all else equal, there is a significant decline in the gain in followers of Occupy city accounts later in the lifecycle of the Occupy movement.

 

++++++++++

Data and analysis

Twitter data related to Occupy is analyzed using inferential statistics and exploratory data analysis to address the research questions.

Data

The data for this analysis is drawn from a large corpus of tweets related to the Occupy movement. A team of researchers collected tweets from Twitter’s streaming application interface (API) from 19 October 2011 to 7 June 2012. The researchers curated a list of keywords that included hashtags, Occupy city accounts and other words found to be used by the occupiers. These efforts resulted in a dynamic list of search teams that stabilized at 355 terms early on in the collection efforts. The entire corpus includes 64,298,061 tweets, sent from 12,159,856 users. This study uses a subset of these tweets as described below.

I operationalize the concept of a viral event by grouping together retweets of a given tweet into a retweet event (RTE). That is, this work assumes that a retweet represents a flow of information from the user who sent the initial tweet to the user(s) who retweeted the tweet. The metadata associated with retweets provides the information needed to construct the RTE signature and to track the change in followers of the actor who posted the initial tweet. The signature uses the timestamp of when the retweet was posted. Note that the initial tweet is embedded in each retweet and provides data about the user, including the number of followers they have at the time of the retweet. While manual retweets, modified tweets (MT) and vias can all be conceptualized as the same information flow, they are excluded from this study because the metadata of these tweets do not contain the embedded initial tweet, and thus, the updated follower information is not present.

When plotted, the signatures of RTEs show the volume of retweets per minute. Figure 5 provides an illustration of a parameterized model of a signature. Note that time is measured in minutes along the x-axis and the rate of retweets is captured on the y-axis. The peak rate of sharing occurs 0 or more minutes after the initial tweet was posted. The rate of decay is measured as the estimated shape parameter, alpha, of a power law (y=1/xα) fit to the curve drawn from the peak onwards. When the rate of retweets falls sharply, alpha is higher than when the rate of retweets decays more gradually.

 

Signature model shows the phases of its life cycle and quantifiable characteristics
 
Figure 5: Signature model shows the phases of its life cycle (ramp-up, peak, and decay phase) and quantifiable characteristics (peak time, peak rate, and its shape given by alpha).

 

Retweets posted 120 minutes after the peak are dropped from the analysis. This 120-minute window has been selected because 90 percent of retweets have been found to happen within the first 60 minutes after the initial tweet (Kwak, et al., 2010). When retweets do occur outside of the 120-minute window they tend to be isolated events hours, days or weeks after the initial post. We expect that our Occupy users can gain followers at any time due to factors outside of this analysis. For the purpose of this study we assume that follower gains not related to information flows are randomly distributed in time. Thus, including isolated retweets that occur after long periods of quiescence can inflate the number of followers gained during a RTE. The 120-minute window has the additional benefit of creating a uniform set of signatures for comparison.

The visual, parameterized model in Figure 5 represents the (roughly) average pattern found in empirical data used for this study. During the initial phases of this study, exploratory data analysis (EDA), encouraged by John Tukey (1980, 1977, 1962), revealed fairly consistent patterns of RTE signatures. EDA is an approach to gain insight about data, a kind of quantitative detective work that emphasizes flexibility of viewpoint and a willingness to find what we expect as well as what we don’t. Tukey and others (Hoaglin, et al., 1983) focused on graphical representations of data, tables, and summary statistics as the tools of EDA. Figure 6 plots the signatures of 1,000 randomly drawn RTEs from the dataset. The figure shows the mean, median, first and third quartile of the rate of retweets from the initial tweet.

 

Signatures of 1,000 randomly drawn RTEs
 
Figure 6: The signatures of 1,000 randomly drawn RTEs shows a great deal of variance, but some consistent patterns. The model of RTE Signature is drawn from this exploratory work.

 

The initial dataset contains 12,349,759 retweets that make up 4,002,284 distinct RTEs. The size of RTEs follows a heavy tailed distribution, such that 2,777,447 RTEs are of size 1; 531,144 RTEs are of size 2; 219,512 of size 3 and so on. RTEs with a decay phase that can’t be fit to a power law are dropped as are RTEs not initiated by an Occupy account, the hubs of Occupy’s connective action network. After this narrowing of data, 429 RTEs made up of a total of 76,419 tweets/retweets, initiated by 86 distinct Occupy user accounts, are used for the analysis.

Model

We test hypotheses 1 and 2 using a multivariate regression model. Such models have long been used in studies of diffusion (Bass, 1969), in studying the relationship of network centrality measures and information flows (Susarla, et al., 2012), and life cycles of viral events (Nahon, et al., 2011). Multivariate variance regression models are well suited to explaining or predicting the relationship between a dependent variable and independent or explanatory variables while controlling other independent variables (Faraway, 2006, 2005; Kahane, 2001; Ott, 1993). Thus, we can include variables beyond those identified in the signature model in the previous section that can control for, and thus provide additional information, about the context within which a given RTE occurred, such as whether or not the RTE occurred early or late in the overall Occupy movement. Another important strength of a variance model is that although it assumes that variable relationships are linear, non-linear relationships can still be assessed through variable transformations (Kahane, 2001). The use of transformation also means that variables are not required to be continuous or to follow a normal distribution (Faraway, 2006). These are important considerations because, as will be seen in the section describing the data, many of the variables do not follow a normal distribution.

Figure 7 provides a graphic illustration of the regression model. The independent variables, detailed below, are broken into three groups: 1) alpha, the shape of the decay phase and the variable tested in H1; 2) RTE Factors are intended as control variables related directly to the RTE; 3) Temporal Context includes variables related to when the RTE occurred and includes the variable tested in H2.

Y = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + β6x6 + β7x27 + β8x8 + ε

 

Graphical version of regression model with variable groups and their specific variables
 
Figure 7: Graphical version of regression model with variable groups and their specific variables.

 

Changes in followers (dependent variable): Net change in followers, during the RTE, of the Occupy account that initiated the RTE (sent the tweet that was retweeted). Due to the skewness of this variable, the Box-Cox method (Faraway, 2005) was employed to estimate the optimal power transformation (0.19) for this model. The transformation puts the variable more in line with a normal distribution and resolves a problem with non-constancy of variance in the predictor. Importantly, transformations of the dependent variable make interpretation of the effect sizes of independent variable coefficients difficult. As such, the interpretation of findings consists of noting the significant variables and interpreting the direction of the relationship.

Alpha: Variable for H1. Lower alpha values (more relaxed decay) are expected to be related to higher gains in followers. As per Clauset, et al. (2009), the estimation of alpha employs a maximum likelihood algorithm that optimizes for the best p-value for a Kolmogorov-Smirnov distribution test (Marsaglia, et al., 2003) between the fitted distribution and a given sample.

peak (max) rate: Highest rate of retweets. Skewed variable is log transformed. A higher peak means more people are simultaneously retweeting, resulting in a greater potential for more retweets in the next minute. A positive relationship is expected: more retweets ought to lead to more followers, all else equal.

peak time: Minutes from initial tweet to peak retweeting. Variable included to control for novelty effects (Teng, et al., 2012). Negative relationship expected: tweets that take longer to reach a peak may be less novel, and thus attract fewer followers.

initial follower count (init followers): Number of followers of the RTE initiating Occupy account at the time of the initial post. More followers create more opportunities for the initial tweet to be discovered and retweeted (Suh, et al., 2010), possibly triggering new follows. Expecting a positive relationship. Variable is log-transformed.

mean followers: Mean number of followers of users who retweet the initial tweet. For this data the number of followers is highly skewed and thus median might seem like the logical choice, but mean has the advantage of capturing cases where a user with a massive number of followers retweets the initial tweet. Expecting a positive relationship (Suh, et al., 2010). Variable is log-transformed.

occupy day: Variable for H2. Number of days from the start of the Zuccotti Park protest on 17 September 2011, to the peak of the RTE. The relationship with the dependent variable is curved and so enters the model as a linear and quadratic term (Faraway, 2005). Expecting a negative relationship with linear term to note a decrease in the growth of followers over time. Expecting a positive relationship with quadratic term to note concave curve: a faster decrease in follower gains later.

Overlap: Controls for cases where an Occupy user initiated a second (or more) RTE during the window of the first RTE. Without this control, overlap could inflate the change in followers for RTEs. Square root transformation, as is typical for count variables (Faraway, 2005).

 

++++++++++

Regression results

The R2 of the regression model was 0.822, indicating that the model explained 82 percent of the variance seen in the change in followers. The F statistic (8, 420 df) for the model was 242.1 (p-value < 0.000), indicating the model is significant or that all coefficient estimates are non-zero. Table 1 contains the coefficient estimates, the confidence interval (2.5 percent CI and 97.5 CI), standard error, t-value, and p-value for the coefficient estimates. Confidence intervals are important checks when the dependent variable is transformed (Faraway, 2005).

 

Regression model results
 
Table 1: Regression model results.

 

The hypothesis associated with research question 1 is supported: a negative relationship exists between the change in followers and alpha. This indicates that information flow signatures with more gradual decay phases are related to higher gains in followers, all else equal. Recall that more gradual decays are associated with more social sharing (Broxton, et al., 2010; Crane and Sornette, 2008), implying longer sharing chains that bring these Occupy user’s posts to distant parts of the network where people may find the content novel and create new links in the network. This finding is also consistent with Granovetter’s (1973) theory of weak ties. Holding all else equal, higher levels of sustained retweets (more gradual decays) ought to provide more opportunities for the message to cross weak tie bridges to new clusters in the viral process described by Nahon and Hemsley (2013). Interestingly this also implies that sharper signatures will tend to be associated with smaller gains in followers and may not flow as far away from the source due to less social sharing. More generally, the fact that a relationship exists between the signature’s shape and a gain in followers provides empirical support for the notion that the flow of information can alter the linking structure of networks through user’s choice to make new following connections.

The hypothesis associated with research question 2 is also supported: a negative relationship exists between the change in followers and the number of days into the Occupy movement. The positive coefficient of the quadratic term indicates that not only do gains in followers slow over the course the movement, but that the rate of slowing is slower earlier and faster later. This is consistent with the work of (Agarwal, et al., 2014) that shows that during the decline of movement the volume of tweets declined and the types of links being shared and retweeted within the protestor network changed from news stories and external links to links to Occupy related Web sites. This shift to internal content may have been less novel and thus less likely to be spread in long sharing chains that reached new audiences.

Note that with the exception of Peak Time, all of the control variables in the model were significant, though not all as expected. Peak Rate and mean followers were both positively related to the change in followers, whereas Initial Followers was negative. This means that RTEs with higher peaks and RTEs where participants had higher numbers of followers correspond with greater increases in followers. Overlap was also significant and positive, indicating that higher increases in followers were detected when RTEs overlapped. Interestingly, higher follower counts of our occupy users tended to be related to lower gains in followers when other variables were held constant.

With respect to the validity of the regression model as an adequate statistical representation of the data, diagnostics plots indicate that the residuals appear to have roughly constant variance and follow a nearly a normal distribution. A Durbin Watson test statistic of 1.85 (p-value < 0.114) indicated that the residuals were not correlated (autocorrelation) (Ott, 1993). All of the variables had a variance inflation factor (VIF) of less than four, indicating that the independent variables were not strongly correlated, thus multicollinarity was not an issue (Kahane, 2001).

 

++++++++++

Discussion: Viral network growth

The work thus far shows that there is a relationship between information flow and the growth in the number of followers of the Occupy users. Also, signatures with more gradual decays are associated with higher gains in followers. I have argued that these gradual decays are associated with more social sharing that brings novel information to new clusters of people that form new follower relationships. This leads to the question: what is the nature of social sharing and how can signatures give us insight in the growth of the Occupy connective action network? Answering these questions is the goal of this section.

The gradual decay of signatures is likely related to many short sharing chains rather than one or a few very long chains. In support of this, research has shown that length of information flow chains in recommender networks (Leskovec, et al., 2006), the blogosphere (Leskovec, et al., 2007) and Twitter (Bakshy, et al., 2011; Kwak, et al., 2010) follows a power law where the vast majority of chains included two or fewer links. Huberman and Adamic (2004) provide an explanation for this phenomenon. They found that there is an upper limit to how far messages can flow from the source because actors in a network tend to be more dissimilar in their content choices when they are farther apart in networks. Thus, there is a kind of resistance to the flow of information in networks that inhibits long chains. Yet, the OccupyWallStreet Twitter account has 210,000 followers as of this writing, indicating that over the course of the movement these actors did reach new audiences.

The findings from the regression, coupled with the assertion that sharing chains tend to be short, suggest that the Occupy accounts grew, in terms of their follower audience, in waves. That is, each new set of followers gained as a result of a RTE flowing over short sharing chains would bring the Occupy account within reach of a new set of followers for the next information flow. Figure 8 illustrates this. In panel ‘a’, a RTE initiated by an Occupy account (red) is retweeted by followers (blue) to their followers and so on (dashed lines and nodes). The resistance to long chains means the flow is shared only within a few degrees of separation, or links, from the source. This is enough to bring novel content to new users who may initiate a new follower relationship. In panel ‘b’ we can see that the new followers bring additional users within reach of short sharing chains, which can result in new followers again, shown in the final panel, c. In panel ‘a’, the highest number of nodes between the Occupy account and another user, referred to as the path length, is 6. After a few short sharing chain information flows, panel ‘c’ shows that the longest path is now only 3.

 

Occupy account's reach grows with each successive information flow
 
Figure 8: The Occupy account’s (red) reach grows with each successive information flow. Panel a) shows an information flow from the Occupy account, through its followers and beyond. In panel b), new followers gained from the last flow can extend the reach for the next one. Panel c) shows the final state after two RTEs that have resulted in new followers for the Occupy account.

 

It is the followers of those that follow the Occupy account that bring new followers, not those who follow the Occupy account directly. But the list of who follows the Occupy accounts was continually in flux. It grew as new tweets were posted, bringing new followers who extended the reach of the Occupy accounts in successive waves.

The short sharing chains of RTEs are consistent with the definition of a viral event provided by Nahon and Hemsley (2013). That is, a RTE is an “information flow process where many people simultaneously forward a specific information item, over a short period of time, within their social networks, and where the message spreads beyond their own [social] networks to different, often distant networks ...” [9]. So then the process whereby the Occupy accounts gained followers is a result of, or a part of, the viral process, at least in the Occupy connective action network. The social sharing of a viral event not only brings novel information to new users, but, as a result of who retweeted the information, may also re-contextualize and make relevant the information for a new set of users (Nahon and Hemsley, 2013). The re-contextualizing of information essentially overcomes the limit to how far messages may flow from the source found by Huberman and Adamic (2004).

The results of the regression also suggest that over time this process slowed for the Occupy accounts. Since the relationship found in the regression was quadratic for the Occupy Day variable we know that the growth of the gatekeepers followed an arc of fast to slow growth over the course of the Occupy movement. That is, as the number of followers grew for the Occupy accounts, they began to approach an upper limit. Thus, what we may be seeing in the slowdown is a saturation of the network of available susceptible individuals who are likely to find the content sufficiently interesting enough to want to follow the Occupy account. This calls to mind Rogers’ (1995) theory concerning the rate that new ideas and inventions diffuse. Rogers’ S-curve, illustrated in Figure 9, has various stages of adoption. The data investigated for this study does not include the first month of the Occupy movement so we are unable to examine the initial growth rate of the Occupy network gatekeepers. However, the findings do fit the curve roughly the inflection point near the early adopters, through the early majority and to the late adopters.

 

Rogers' graph showing the rate of diffusion of innovations
 
Figure 9: Rogers’ graph showing the rate of diffusion of innovations.

 

By looking at the changing rate of decay of signatures over time, for a given user or set of users, we may be able to identify an inflection point from when a connective action network is going to take off and also the point at which saturation of the network is beginning, and thus, the growth is slowing. Studying signatures may give us a means of predicting the ultimate reach of a connective action network on Twitter or other social media sites. Indeed, movements and trending topics themselves may have larger signatures worth studying and comparing.

 

++++++++++

Conclusion

Virality is a social process. People discover content that others share with them and then share the content again with others. In the process, content can reach new people who might find it interesting or compelling enough to create a new link in the network that connects them with the source of the information. For the Twitter users associated with the Occupy connective network, this process successively extended their audience and reach over many viral events. However, these Occupy users appear to have a limit to their growth that reflects a saturation of the overall network. This work suggests there phases of growth similar to the diffusion of innovations (Rogers, 1995): a period of ramping up, followed by a faster increase and then slowdown in growth.

While the information flows that constructed the Occupy connective network were ephemeral, the network itself still exists and is not entirely quiescent. Most of the Occupy accounts haven’t posted new tweets in a few years. However, a few, like OccupyWallStreet and OccupySeattle, still post tweets on a nearly daily basis. Under the right conditions these networks could become active again. But if Occupy users effectively saturated the overall twitter network within which the Occupy collective action network is situated (it is a sub-network of the larger network), then its activation may not lead to further growth. Alternately, if we assume the decline in Occupy and slowdown in its growth were entirely related to events on the ground, then the network may not be saturated. In this case, future growth would start from already massive network with great reach. Growth of the network could be explosive beyond what has been witnessed so far with other connective action networks.

Nahon and Hemsley (2013) theorized that the shape of signatures may reflect different processes driving viral events. This study suggests that signatures of viral information events can provide information about not just the process but the result of information flow. Signatures that decay more gradually are related to larger numbers of new followers and reflect flows through numerous social sharing chains that reach new audiences in distant clusters. By examining the changing rate of decay of signatures over time, for a given user or set of users, we may be able predict the growth trajectory of future connective action networks on Twitter. Additionally, the signature model (Figure 5) outlines additional parameters that can be studied that might give us additional insight into the growth of such networks. End of article

 

About the author

Jeff Hemsley is an Assistant Professor at the School of Information Studies at Syracuse University. His research is about understanding information diffusion and user interaction in social media. He draws on theories from sociology and communication to frame his thinking and research questions, but uses computational methods to collect, wrangle, visualize and analyze large, heterogeneous datasets. He is co-author of the book Going viral (Polity Press, 2013; winner of ASIS&T Best Science Books of 2014 Information Award and selected by Choice as an Outstanding Academic Title for 2014), which explains what virality is, how it works technologically and socially, and draws out the implications of this process for social change.
E-mail: jjhemsle [at] syr [dot] edu

 

Notes

1. http://www.theguardian.com/news/datablog/2011/oct/17/occupy-protests-world-list-map.

2. Granovetter, 1973, p. 1,361.

3. Nahon and Hemsley, 2013, p. 16.

4. Nahon and Hemsley, 2013, p. 25.

5. Argarwal (REF) study covers from 24 October 2011 to late spring in 2012. During that period the volume of tweets could reach 100,000s per day for the hashtags they looked at. As of this writing, a search for the hashtag #OWS returns just 1,200 tweets for a 24 hour period, with most being sent from just a few accounts.

6. https://twitter.com/OccupyWallSt.

7. https://twitter.com/OccupySeattle.

8. https://twitter.com/occupykc.

9. Nahon and Hemsley, 2013, p. 16.

 

References

S.D. Agarwal, W.L. Bennett, C.N. Johnson, and S. Walker, 2014. “A model of crowd enabled organization: Theory and methods for understanding the role of Twitter in the Occupy protests,” International Journal of Communication, volume 8, at http://ijoc.org/index.php/ijoc/article/view/2068, accessed 15 July 2016.

E. Bakshy, I. Rosenn, C. Marlow, and L. Adamic, 2012. “The role of social networks in information diffusion,” WWW ’12: Proceedings of the 21st International Conference on World Wide Web, pp. 519–528.
doi: http://dx.doi.org/10.1145/2187836.2187907, accessed 15 July 2016.

E. Bakshy, J.M. Hofman, W.A. Mason, and D.J. Watts, 2011. “Everyone’s an influencer: Quantifying influence on Twitter,” WSDM ’11: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 65–74.
doi: http://dx.doi.org/10.1145/1935826.1935845, accessed 15 July 2016.

A.-L. Barabási, 2005. “The origin of bursts and heavy tails in human dynamics,” Nature, volume 435, number 7039 (12 May), pp. 207–211.
doi: http://dx.doi.org/10.1038/nature03459, accessed 15 July 2016.

A.-L. Barabási, 2003. Linked: How everything is connected to everything else and what it means for business, science, and everyday life. New York: Plume.

F.M. Bass, 1969. “A new product growth for model consumer durables,” Management Science, volume 15, number 5, pp. 215–227.
doi: http://dx.doi.org/10.1287/mnsc.15.5.215, accessed 15 July 2016.

W.L. Bennett and A. Segerberg, 2012. “The logic of connective action: Digital media and the personalization of contentious politics,” Information, Communication & Society, volume 15, number 5, pp. 739–768.
doi: http://doi.org/10.1080/1369118X.2012.670661, accessed 15 July 2016.

W.L. Bennett, A. Segerberg, and S. Walker, 2014. “Organization in the crowd: Peer production in large-scale networked protests,” Information, Communication & Society, volume 17, number 2, pp. 232–260.
doi: http://doi.org/10.1080/1369118X.2013.870379, accessed 15 July 2016.

T. Broxton, Y. Interian, J. Vaver, and M. Wattenhofer, 2010. “Catching a viral video,” 2010 IEEE International Conference on Data Mining Workshops, pp. 296–304.
doi: http://doi.org/10.1109/ICDMW.2010.160, accessed 15 July 2016.

R.S. Burt, 2004. “Structural holes and good ideas,” American Journal of Sociology, volume 110, number 2, pp. 349–399.
doi: http://doi.org/10.1086/421787, accessed 15 July 2016.

N. Caren and S. Gaby, 2011. “Occupy online: Facebook and the spread of Occupy Wall Street,” Social Science Research Network (24 October), at http://papers.ssrn.com/abstract=1943168, accessed 15 July 2016.
doi: http://dx.doi.org/10.2139/ssrn.1943168, accessed 15 July 2016.

A. Clauset, C.R. Shalizi, and M.E.J. Newman, 2009. “Power-law distributions in empirical data,” SIAM Review, volume 51, number 4, pp. 661–703.
doi: http://dx.doi.org/10.1137/070710111, accessed 15 July 2016.

M.D. Conover, E. Ferrara, F. Menczer, and A. Flammini, 2013. “The digital evolution of Occupy Wall Street,” PLoS ONE, volume 8, number 5, e64679 (29 May).
doi: http://doi.org/10.1371/journal.pone.0064679, accessed 15 July 2016.

M.D. Conover, C. Davis, E. Ferrara, K. McKelvey, F. Menczer, and A. Flammini, 2013. “The geospatial characteristics of a social movement communication network,” PLoS ONE, volume 8, number 3, e55957 (6 March).
doi: http://dx.doi.org/10.1371/journal.pone.0055957, accessed 15 July 2016.

R. Crane and D. Sornette, 2008. “Robust dynamic classes revealed by measuring the response function of a social system,” Proceedings of the National Academy of Sciences, volume 105, number 41 (14 October), pp. 15,649–15,653.
doi: http://dx.doi.org/10.1073/pnas.0803685105, accessed 15 July 2016.

J.J. Faraway, 2006. Extending the linear model with R: Generalized linear, mixed effects and nonparametric regression models. Boca Raton, Fla.: Chapman & Hall/CRC.

J.J. Faraway, 2005. Linear models with R. Boca Raton, Fla.: Chapman & Hall/CRC.

M.S. Granovetter, 1973. “The strength of weak ties,” American Journal of Sociology, volume 78, number 6, pp. 1,360–1,380.

J. Hemsley and J. Eckert, 2014. “Examining the role of ‘place’ in Twitter networks through the lens of contentious politics,” HICSS ’14: Proceedings of the 2014 47th Hawaii International Conference on System Sciences, pp. 1,844–1,853.
doi: http://dx.doi.org/10.1109/HICSS.2014.233, accessed 15 July 2016.

J. Hemsley and R.M. Mason, 2013. “Knowledge and knowledge management in the social media age,” Journal of Organizational Computing and Electronic Commerce, volume 23, numbers 1–2, pp. 138–167.
doi: http://doi.org/10.1080/10919392.2013.748614, accessed 15 July 2016.

D.C. Hoaglin, F. Mosteller, and J.W. Tukey (editors), 1983. Understanding robust and exploratory data analysis. New York: Wiley.

B.A. Huberman and L.A. Adamic, 2004. “Information dynamics in the networked world,” In: E. Ben-Naim, H. Frauenfelder, and Z. Toroczkai (editors). Complex networks. Lecture Notes in Physics, volume 650. Berlin: Springer-Verlag, pp. 371–398.
doi: http://doi.org/10.1007/978-3-540-44485-5_17, accessed 15 July 2016.

S. Jurvetson, 2000. “What exactly is viral marketing?” Red Herring (May), pp. 110–111; version at https://currypuffandtea.files.wordpress.com/2008/03/viral-marketing.pdf, accessed 15 July 2016.

S. Jurveston and T. Draper, 1997. “Viral marketing: Viral marketing phenomenon explained,” at http://dfj.com/news/article_26.shtml, accessed 1 May 2011.

L.H. Kahane, 2001. Regression basics. Thousand Oaks, Calif.: Sage.

J. Kirby and P. Marsden (editors), 2005. Connected marketing. New York: Routledge.

H. Kwak, C. Lee, H. Park, and S. Moon, 2010. “What is Twitter, a social network or a news media?” WWW ’10: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600.
doi: http://doi.org/10.1145/1772690.1772751, accessed 15 July 2016.

J. Leskovec, A. Singh, and J. Kleinberg, 2006. “Patterns of influence in a recommendation network,” In: W.-K. Ng, M. Kitsuregawa, J. Li, and K. Chang (editors). Advances in knowledge discovery and data mining. Lecture Notes in Computer Science, volume 3918. Berlin: Springer-Verlag, pp. 380–389.
doi: http://doi.org/10.1007/11731139_44, accessed 15 July 2016.

J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst, 2007. “Cascading behavior in large blog graphs,” arXiv (20 April), at https://arxiv.org/pdf/0704.2803.pdf, accessed 15 July 2016.

G. Marsaglia, W.W. Tsang, and J. Wang, 2003. “Evaluating Kolmogorov’s distribution,” Journal of Statistical Software, volume 8, number 18, at https://www.jstatsoft.org/article/view/v008i18/kolmo.pdf, accessed 15 July 2016.
doi: http://doi.org/10.18637/jss.v008.i18, accessed 15 July 2016.

K. Nahon, 2005. “Network gatekeeping,” In: K.E. Fisher, S. Erdelez, and L. McKechnie (editors). Theories of information behavior. Medford, N.J.: Information Today, pp. 247–253.

K. Nahon and J. Hemsley, 2013. Going viral. Cambridge: Polity Press.

K. Nahon, J. Hemsley, S. Walker, and M. Hussain, 2011. “Fifteen minutes of fame: The power of blogs in the lifecycle of viral political information,” Policy & Internet, volume 3, number 1, pp. 1–28.
doi: http://doi.org/10.2202/1944-2866.1108, accessed 15 July 2016.

K. Nahon, J. Hemsley, R.M. Mason, S. Walker, and J. Eckert, 2013. “Information flows in events of political unrest,” iConference 2013 Proceedings, pp. 480–485.

R.L. Ott, 1993. An introduction to statistical methods and data analysis. Fourth edition. Belmont, Calif.: Duxbury Press.

E.M. Rogers, 1995. Diffusion of innovations. Fourth edition. New York: Free Press.

P.J. Shoemaker and T.P. Vos, 2009. Gatekeeping theory. New York: Routledge.

P.J. Shoemaker, P.R. Johnson, H. Seo, and X. Wang, 2010. “Readers as gatekeepers of online news: Brazil, China, and the United States,” Brazilian Journalism Research, volume 6, number 1, at http://bjr.sbpjor.org.br/bjr/article/view/226, accessed 15 July 2016.

B. Suh, L. Hong, P. Pirolli, and E.H. Chi, 2010. “Want to be retweeted? Large scale analytics on factors impacting retweet in Twitter network,” SOCIALCOM ’10: Proceedings of the 2010 IEEE Second International Conference on Social Computing, pp. 177–184.
doi: http://doi.org/10.1109/SocialCom.2010.33, accessed 15 July 2016.

A. Susarla, J.-H. Oh, and Y. Tan, 2012. “Social networks and the diffusion of user-generated content: Evidence from YouTube,” Information Systems Research, volume 23, number 1, pp. 23–41.
doi: http://doi.org/10.1287/isre.1100.0339, accessed 15 July 2016.

L. Tan, S. Ponnam, P. Gillham, B. Edwards, and E. Johnson, 2013. “Analyzing the impact of social media on social movements: A computational study on Twitter and the Occupy Wall Street movement,” ASONAM ’13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1,259–1,266.
doi: http://doi.org/10.1145/2492517.2500262, accessed 15 July 2016.

C.-Y. Teng, L. Gong, A.L. Eecs, C. Brunetti, and L. Adamic, 2012. “Coevolution of network structure and content,” WebSci ’12: Proceedings of the 4th Annual ACM Web Science Conference, pp. 288–297.
doi: http://doi.org/10.1145/2380718.2380756, accessed 15 July 2016.

M. Tremayne, 2014. “Anatomy of protest in the digital era: A network analysis of Twitter and Occupy Wall Street,” Social Movement Studies, volume 13, number 1, pp. 110–126.
doi: http://doi.org/10.1080/14742837.2013.830969, accessed 15 July 2016.

J.W. Tukey, 1980. “We need both exploratory and confirmatory,” American Statistician, volume 34, number 1, pp. 23–25.
doi: http://dx.doi.org/10.2307/2682991, accessed 15 July 2016.

J.W. Tukey, 1977. Exploratory data analysis. Reading, Mass.: Addison-Wesley.

J.W. Tukey, 1962. “The future of data analysis,” Annals of Mathematical Statistics, volume 33, number 1, pp. 1–67.

S. Wasserman and K. Faust, 1994. Social network analysis: Methods and applications. New York: Cambridge Univ Press.

 


Editorial history

Received 28 March 2016; accepted 15 July 2016.


Creative Commons License
This paper is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Studying the viral growth of a connective action network using information event signatures
by Jeff Hemsley.
First Monday, Volume 21, Number 8 - 1 August 2016
http://www.firstmonday.dk/ojs/index.php/fm/article/view/6650/5598
doi: http://dx.doi.org/10.5210/fm.v21i8.6650





A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2017. ISSN 1396-0466.