RedRock Version 2 is Now Open Source


You asked and we listened!

RedRock-v2 is now available on GitHub.

RedRock was received with such great enthusiasm, we wanted to give you more — so here it is. We are happy to introduce “RedRock Version 2”.

In RRv2 we took what we learned from our first version of RR and made it better. In this example application, we include Akka Actors, Redis, and Gephi. RRv2 allows you to delve deeper into Twitter communities, see how they align, and what they think.

Like the first RedRock, the application is split into a back and front end. Both are open source today.

The Nuts and Bolts

First, an Akka Actor pulls a 10-minute chunk of Twitter Decahose data from IBM Bluemix and writes it to HDFS in a folder monitored by Apache Spark™ Streaming. Spark preprocesses the tweet, selecting English tweets, extracting word tokens and tweet sentiment. After preprocessing the tweets, Spark writes the results to HDFS and redis.

The front end is an iPad application that interacts with the back end via a REST API.

User Interaction

The objective of the app is to discover communities of similar-minded Twitter users who are discussing a particular topic. The topic of discussion is defined by twenty related terms obtained from a Spark Word2Vec model that is trained on English tweets received over a period of seven days. Once a topic is selected, we filter out retweets that include any of the twenty terms. From these filtered retweets we generate a network of users called a retweet graph. In this graph the users are placed on the nodes and links between the nodes are created if the users retweet each other.


The user enters a search term, for example something current and polarizing: "#trump". On the back end, Spark Word2Vec is used to give us terms that are closely related to our search term, the distance from the center is how closely each term is related to the original search term, and the size of the bubble is related to the frequency of that term.


At this point, the users can tune their searches by clicking on one of the closely related terms. This will cause the chart to recenter around that term, displaying the most closely related terms to the new search term.

Once the user is satisfied with the related term, they can click on the "Communities" button at the top left to have a look at the communities tweeting about those terms.


The user is presented with retweet graph where the layout is generated using ForceAtlas2 algorithm implemented in Gephi. Each dot is a Twitter user and the colors represent the different communities. These communities are determined using Parallel Louvain algorithm.

To see what the communities are the user clicks on one of the tweets to see more information about that community.


Community Details displays the most commonly used term as well as the overall sentiment being expressed by that community.

As you can see, the app allows users to quickly find communities on Twitter and discover what brings them together. For more information, checkout the project on github.

We hope you enjoy this app a much as we enjoyed building it.


You Might Also Enjoy

Kevin Bates
Kevin Bates
9 months ago

Limit Notebook Resource Consumption by Culling Kernels

There’s no denying that data analytics is the next frontier on the computational landscape. Companies are scrambling to establish teams of data scientists to better understand their clientele and how best to evolve product solutions to the ebb and flow of today’s business ecosystem. With Apache Hadoop and Apache Spark entrenched as the analytic engine and coupled with a trial-and-error model to... Read More

Gidon Gershinsky
Gidon Gershinsky
10 months ago

How Alluxio is Accelerating Apache Spark Workloads

Alluxio is fast virtual storage for Big Data. Formerly known as Tachyon, it’s an open-source memory-centric virtual distributed storage system (yes, all that!), offering data access at memory speed and persistence to a reliable storage. This technology accelerates analytic workloads in certain scenarios, but doesn’t offer any performance benefits in other scenarios. The purpose of this blog is to... Read More