Spark Summit Training

http://files.training.databricks.com/events/summit-2017-10/spark-tuning/Labs.dbc

http://files.training.databricks.com/events/summit-2017-10/deep-learning/Labs.dbc

http://files.training.databricks.com/events/summit-2017-10/data-science/Labs.dbc

DL

@Adam Breindel

分享10月24日三个training的全部资料目录：

Training 1: [Understand and Apply Deep Learning with Keras, Tensorflow and Apache Spark 2.x] (tinyurl.com/DeepLearning102417)

[DataBricks Community Jupyter notebook] (http://files.training.databricks.com/events/summit-2017-10/deep-learning/Labs.dbc)

Training 2:
[Data Science with Apache Spark 2.x] (tinyurl.com/DataScience102417)

[DataBricks Community Jupyter notebook] (http://files.training.databricks.com/events/summit-2017-10/data-science/Labs.dbc)

Training 3: [Apache Spark Tuning and Best Practices] (tinyurl.com/SparkTuning102417)

[DataBricks Community Jupyter notebook] (http://files.training.databricks.com/events/summit-2017-10/spark-tuning/Labs.dbc)

SVM Amenable to “online” learning (http://www.isn.ucsd.edu/papers/nips00_inc.pdf)

Linear => Non-linear(sigmoid) => ReLu => Dropout

https://class171024-deep.slack.com/messages/C7H1JJQSH/

TensorFrame

@Tim Hunter

Numerical computing with Spark

data-heavy computation-heavy

=> speed good target for optimization

TF + Spark => TensorFrame processor speed + memory + network acces to processor

Q: How Images -> df? A: pixels

Streaming and deep learning

@Matei Zaharia

Both imp but complex with current tools

Low level API (MR) => composable high level API
seperate tools -> unified app

Structured Streaming

same API both streaming and batch continuous procesing without microbatch

Deep learning

ML pipeline APIs support for non-YARN and AWS servers

@Sue Ann Hong challenges

Transfer Learning => deep embedding to eliminate labels

Building custom ML pipelinestages for BMW

warranty incidents are “no trouble found”

dataset with 7000 features+, sparsity, abnormalty

Spark Pipeline Relational Data Ware House => ETL => handle imbalance => Preprocessing => Feature Selection => Classifier

DL pipelines

DL at scale DL pipelines End-to-end workflow with DL pipelines

TL classification, featurization for similarity-based ml

Batch prediction as an MLlib Transfomer Spark SQL UDF => for everyone who knows SQL to call the function

NLU

@Alex Thomas at Indeed