data products

Datapalooza! Real World Data Products

Before I get started, I highly recommend you go now to and buy a ticket. It may sell out before you finish reading this post.

Now let's begin.

Back in 2008, Apache Hadoop wasn’t a household name, the Business Intelligence group held all of the keys to the data. Statistics was left to research projects to emerge from time to time to provide insight. In 2010, we were suddenly given this free gift of a new malleable distributed environment to sculpt data into different shapes into what we now refer to as Data Products. Data products are the result of applying transformations, applied mathematics and domain specific knowledge to data to turn it into a usable product for specific outcomes.  A common example is the PYMK (people you may know) data product that helped Linkedin grow its install base. It is not, however, an insight that helped a decision maker nor was it the Linkedin application itself. This is the stuff dreams are made of. Its what every business large and small wants to uncover for their business and many did. Silicon Valley startups like Netflix, Facebook, Twitter, Nest, and even more traditional companies like Peugeot, ConstantContact, US Open, NASA are all disrupting their industries with data products.

At a large big data and analytics conference last week, I was struck by the idea that we are no longer focused on making data products effective. Most of the talks were by vendors selling technology or researchers talking about deeply technical capabilities of all the emerging technology that is flooding the market.  I, for one, am less interested in what the latest technology is and more interested in the outcomes of what people are achieving with this technology. For this reason, I am pleased to announce a first of its kind music concert combined with data + design workshops we at the Spark Technology Center call Datapalooza.


Not another conference where people speak at you. Come build a data product.

From November 10th to 12th, the Spark Technology Center in San Francisco hosts the first-ever Datapalooza — a deep-dive with industry leaders from AMPLab, Galvanize, Typesafe, Silicon Valley Data Science, IBM Watson, Spare5, Declara and numerous leaders who are the leaders of making data products. Take your data skills to the next level with hands-on experience and one-on-one coaching to make a data product in only three days. We have three main tracks we’ll support throughout the course of the entire three day event.

Data Engineering

Harmonize the Instruments

These courses aim to teach a suite of data engineering skills in the areas of data wrangling, data munging, and data pipelines. Our instructors will covers topics such as Twitter analyzing with Apache Spark and Watson, Building Word2Vec models, Natural Language Processing and more.

Data Science

Compose the Music

Build foundational knowledge around data variables, models and scoring methods with a compilation of courses focused around hot topics such as Recommendation Algorithms, Machine Learning Capabilities, Full-Text & Geospatial Search. These courses will show you how these techniques can be used to create beautifully designed data products with examples like RedRock.

Data App Development

Produce the Concert

What makes Datapalooza unique? Our instructors will tie together sessions from our Data Engineering and Data Science courses to help you bridge the gap between analyze, build, and deploy. These courses are focused around application frameworks, product launches, storytelling and data visualization. Featured data products like CalTrain Rider are at the core of our curriculum.

Oh, and did I mention we’ll have the band Big Data headline our event.

How awesome is that!

At Datapalooza, you’ll combine analytic and innovative skills to attack real world challenges using natural language processing, machine learning, cognitive computing, stream computing, distributed processing, design thinking, reactive platforms and many more key skills to make your product a success.

San Francisco is the kickoff event to a world tour that will take Datapalooza on a world tour to a city near you.

Join the movement

PS, if you share this post, I’ll inMail you a discount code to save 20% on the registration fee.

Spark Technology Center


Subscribe to the Spark Technology Center newsletter for the latest thought leadership in Apache Spark™, machine learning and open source.



You Might Also Enjoy