Project RedRock: Design + Data

*“There’s a huge market opening up for data analytics. Whoever turns the technology into products that are simple, beautiful, and easy for anyone to use, wins.” David Townsend, IBM Designer *

“Anyone can collect big data. Analyzing it and extracting a useful conclusion from it is much harder. RedRock is about opening up the world of big data to people who aren’t programmers and data scientists by providing a simple interface that anyone can use.” Rosstin Murphy, IBM iOS Developer

“Good design is good business.” Thomas J. Watson, CEO Emeritus, IBM.

Part of the beauty of Apache Spark™ is that it lets data scientists and developers work together in a unified platform. If you think of data scientists as hockey players and developers as football players trying to play the same game, Spark lets the hockey players see the football players as if they’re playing hockey, and vice versa. With this lens, data science and developers can iterate more quickly. It’s a better, faster game.

Developers, data scientists, and designers in IBM all use IBM Design Thinking—a fast-paced, iterative process that puts the user front and center.

And IBM designers have invested years in creating a consistent, shared vocabulary to translate between the complexity of technology and human needs.

Project RedRock was an experiment. What if we took IBM’s data scientists, developers, and designers, added Spark and IBM design thinking, and put them together in the same room for 10 days? Could we come up with a usable, intuitive app that thinks and learns from you?

And could we model a new working process for developing and designing with Spark, a combustible combination of expertise and process that could scale throughout IBM and into the open source community? Could we invent a better, faster game?

A team of 2 data scientists, 4 developers, and 5 designers decided to come together for 10 days in the same room to build an app—and a new working method. Someone said, “game on”. And we set to work.

What is RedRock?

RedRock was the brainchild of data scientist Hao Wang and developer Joel Figueroa. It started as an entry in an IBM Spark hackathon—an internal contest that produced 100 innovations in 10 days.

“The original idea of RedRock was simply to bring big data analytics to more people. A perfect tool would be a combination of the ease of use of Excel and the power of Hadoop and Spark. It allows a user to analyze big data without a single line of code, works with terabytes of data, provides REST APIs for easy integration – and it’s lightening fast.” Hao Wang, IBM Data Scientist

RedRock2“I saw RedRock on our community pages and thought, it would be great to see what we can do in a couple of weeks. How fast can we pull dev, data science, and design together and create an app with Spark that people are going to love to use?” David Shultz – IBM Designer

What does RedRock do?

RedRock is an alpha app that lets the user act on data-driven insights discovered from Twitter. Powered by IBM Analytics running on Spark, it finds patterns in user tweets to see influential individuals, related topics of interest, and where in the world the conversation is taking place. In the hands of a marketer, this tool could become an extremely powerful way to connect with your target demographic or find emerging markets you might not have thought to look for. In the hands of someone at the increasingly overwhelming SXSW, this tool would help filter weather, private corporate events that aren’t announced, surprise artists, pop-up studios, even food.

Data Science in RedRock

The data science algorithms used in RedRock are Word2Vec and K-means. Both of these methods are part of MLlib, which is the machine learning library that comes with Spark. The Word2Vec algorithm is based on deep neural networks and it assigns a numerical vector to each of the words in the Twitter data. Once a feature matrix is formed with the Word2Vec algorithm, K-means was applied to it to cluster words. These two algorithms are used to build screens in the app. Our next step is to incorporate SystemML into RedRock. SystemML provides machine learning algorithms built in IBM’s Almaden Research Center.

Screen Shot 2015-07-27 at 12.30.55 PMThe Word2Vec algorithm is based on deep neural networks and it assigns a numerical vector to each of the words in the Twitter data. Once a feature matrix is formed with the Word2Vec algorithm, K-means was applied to it to cluster words. These two algorithms are used to build screens in the app. Our next step is to incorporate SystemML into RedRock. SystemML provides machine learning algorithms built in IBM’s Almaden Research Center.

Screen Shot 2015-07-27 at 12.31.09 PMIBM Design Thinking Enters the Picture

  • IBM Design Thinking rekindles design by modifying the core principles of design thinking to be applied at scale, where IBM’s strength lies. … it serves as the basis for the radical collaboration that allows domain experts from across IBM’s vast portfolio to contribute equally, without getting bogged down in theory or domain-specific jargon.” *Adam Cutler, IBM Designer

Team RedRock put radical collaboration into action. Everyone shared wildly different ideas at the table. They went off into different corners of the room to work on possible solutions, and then came back together to focus on a single idea.

The design team started by taking Hao’s prototype and sketching. Once their ideas looked solid on paper, they brought what we were then calling “Project Clue” to our studio and began to wireframe—sketch out the user flow of the app. The early, sparse wireframes quickly transformed into full, thoughtful user flows.

RedRock3 RedRock4The visual designers began to iterate quickly through various explorations of color and layout before settling on a teal color palette and a simple layout that directs focus to the data visualizations.

RedRock5The developers, over on the other side of the table, took the data scientists’ algorithms and the early design wireframes and created a working prototype of the interactions and visualizations.

Everybody checked in at regular intervals to decide whether the current state of the design or the coded prototype was ready to build on, or whether they needed to revisit other parts of the design thinking process and revise. The criteria for deciding whether or not to move on: objective observational research, especially usability testing. Bugs were found, compromises were reached, and RedRock became a reality.

*The New Model: Design + Development *

In the short time we had together, development pushed more than 7 versions of working code to TestFlight. Design and development met daily to review the code and the design and iterate on the fly.

RedRock6“The iterative process was faster-paced— it makes you learn faster. I have never been involved in a project where everyone is as motivated as in RedRock. It was amazing to see the design and visualizations coming from the algorithms almost immediately. We were able to stop, think, and correct on the fly as a team. I was able to use machine learning algorithms from both MLlib and SystemML with Spark” Jorge Castanon, IBM Data Scientist


“Having the data scientists in the room and having them explain the Spark development process filled this black hole in the back of our heads.” Raphael Bouchard, IBM Designer

*RedRock isn’t a product. *

It’s a model of the way the Spark Technology Center will use new technologies, capabilities, and a working method made possible by the unified platform supplied by Spark and the iterative, user-focused process of IBM Design Thinking to transform how people use data in their lives and work.

RedRock represents future applications that think and learn from you, designed to be desirable and easy to use.

The next iteration of RedRock will focus on refining and expanding its learning capabilities to allow for prescriptive analytics. We want to create an app that not only lets you take action, but also suggests an action for you to take.

*Where this will go *

IBM is building Spark into the core of IBM’s Analytics and Commerce platforms.

And RedRock’s relevance extends beyond IBM products. We don’t yet know how far this working process and technology will reach. RedRock is a miniature, scalable version of what you can do at the STC: create products no one’s ever thought of before, but once they exist, you can’t imagine life without them.

The timing is right. Analytics are more powerful than they’ve ever been, and they have more raw material than ever to digest. Data flows in from traffic signals, the thermometer on your wall, MRI machines, outer space, and like it or not, your Facebook feed.

Spark gives us the ability to build applications that apply intelligence to data. It’s fast—transformatively fast. Spark lets developers iterate quickly, and it’s easy to learn and easy to import algorithms to.

But there’s an extra step to making this data useful: you have to be able to understand them to use them. The information inside data has to be given to the people who need it in a form they can use without reading a manual, getting a PhD, or giving up and reverting to the now suddenly old-school data dashboard.

*“Design makes analytics easier—way easier. We can get more people using analytics, faster, by providing simplicity and better design.” David Townsend, IBM Designer *

Simple, intuitive, data analytics that are a pleasure to interact with give people the tools they need to do what people do best: solve real world problems with human intelligence.



You Might Also Enjoy

Kevin Bates
Kevin Bates
10 months ago

Limit Notebook Resource Consumption by Culling Kernels

There’s no denying that data analytics is the next frontier on the computational landscape. Companies are scrambling to establish teams of data scientists to better understand their clientele and how best to evolve product solutions to the ebb and flow of today’s business ecosystem. With Apache Hadoop and Apache Spark entrenched as the analytic engine and coupled with a trial-and-error model to... Read More

Gidon Gershinsky
Gidon Gershinsky
a year ago

How Alluxio is Accelerating Apache Spark Workloads

Alluxio is fast virtual storage for Big Data. Formerly known as Tachyon, it’s an open-source memory-centric virtual distributed storage system (yes, all that!), offering data access at memory speed and persistence to a reliable storage. This technology accelerates analytic workloads in certain scenarios, but doesn’t offer any performance benefits in other scenarios. The purpose of this blog is to... Read More