Learn find out how to use, install, and handle Apache Spark with this complete information, written through the creators of the open-supply cluster-computing framework. With an emphasis on enhancements and new options in Spark 2.0, authors Invoice Chambers and Matei Zaharia holiday down Spark subjects into particular sections, every with distinctive targets.

You’ll discover the fundamental operations and not unusual purposes of Spark’s Established APIs, in addition to Established Streaming, a brand new top-stage API for development finish-to-finish streaming programs. Builders and machine directors will be told the basics of tracking, tuning, and debugging Spark, and discover system finding out tactics and situations for using MLlib, Spark’s scalable system-finding out library.

  • Get a steady evaluation of huge information and Spark
  • Learn approximately DataFrames, SQL, and Datasets—Spark’s center APIs—thru labored examples
  • Dive into Spark’s low-stage APIs, RDDs, and execution of SQL and DataFrames
  • Understand how Spark runs on a cluster
  • Debug, screen, and song Spark clusters and applications
  • Learn the facility of Established Streaming, Spark’s circulate-processing engine
  • Learn how you’ll follow MLlib to plenty of issues, together with type or recommendation