03 April 2017

Spark 2 and Data science at scale

Matt Brandwein @Cloudera

Spark without Map/reduce, nor hadf

Dataset API (RDD+Dataframes)

Structured Streaming

From PoC to Prod, all using Spark, dependance package

Reporting and Dashboarding
Batch pipeline/scoring (ETL)
On-line serving app

scikit doesn’t work with Spark solve conflict between data scientist and IT=> familiar env
Cloudera Data Science Workbench

All packages installed are persistent, ready for sharing. Cloudera architect

Tech 42

Meetup Paris 30

blog comments powered by Disqus

Number of visits： - |