RedRock Version 2 is Now Open Source

alt

You asked and we listened!

RedRock-v2 is now available on GitHub.

RedRock was received with such great enthusiasm, we wanted to give you more — so here it is. We are happy to introduce “RedRock Version 2”.

In RRv2 we took what we learned from our first version of RR and made it better. In this example application, we include Akka Actors, Redis, and Gephi. RRv2 allows you to delve deeper into Twitter communities, see how they align, and what they think.

Like the first RedRock, the application is split into a back and front end. Both are open source today.

The Nuts and Bolts

First, an Akka Actor pulls a 10-minute chunk of Twitter Decahose data from IBM Bluemix and writes it to HDFS in a folder monitored by Apache Spark™ Streaming. Spark preprocesses the tweet, selecting English tweets, extracting word tokens and tweet sentiment. After preprocessing the tweets, Spark writes the results to HDFS and redis.

The front end is an iPad application that interacts with the back end via a REST API.

User Interaction

The objective of the app is to discover communities of similar-minded Twitter users who are discussing a particular topic. The topic of discussion is defined by twenty related terms obtained from a Spark Word2Vec model that is trained on English tweets received over a period of seven days. Once a topic is selected, we filter out retweets that include any of the twenty terms. From these filtered retweets we generate a network of users called a retweet graph. In this graph the users are placed on the nodes and links between the nodes are created if the users retweet each other.

alt

The user enters a search term, for example something current and polarizing: "#trump". On the back end, Spark Word2Vec is used to give us terms that are closely related to our search term, the distance from the center is how closely each term is related to the original search term, and the size of the bubble is related to the frequency of that term.

alt

At this point, the users can tune their searches by clicking on one of the closely related terms. This will cause the chart to recenter around that term, displaying the most closely related terms to the new search term.

Once the user is satisfied with the related term, they can click on the "Communities" button at the top left to have a look at the communities tweeting about those terms.

alt

The user is presented with retweet graph where the layout is generated using ForceAtlas2 algorithm implemented in Gephi. Each dot is a Twitter user and the colors represent the different communities. These communities are determined using Parallel Louvain algorithm.

To see what the communities are the user clicks on one of the tweets to see more information about that community.

alt

Community Details displays the most commonly used term as well as the overall sentiment being expressed by that community.

As you can see, the app allows users to quickly find communities on Twitter and discover what brings them together. For more information, checkout the project on github.

We hope you enjoy this app a much as we enjoyed building it.

Spark Technology Center

Newsletter

Subscribe to the Spark Technology Center newsletter for the latest thought leadership in Apache Spark™, machine learning and open source.

Subscribe

Newsletter

You Might Also Enjoy