Description

Get in a position to free up the facility of your information. With the fourth adaptation of this complete information, you’ll learn to construct and deal with dependable, scalable, dispensed techniques with Apache Hadoop. This e book is perfect for programmers taking a look to investigate datasets of any dimension, and for directors who wish to arrange and run Hadoop clusters.

Using Hadoop 2 completely, creator Tom White gifts new chapters on YARN and a few Hadoop-similar initiatives reminiscent of Parquet, Flume, Crunch, and Spark. You’ll find out about contemporary adjustments to Hadoop, and discover new case research on Hadoop’s position in healthcare techniques and genomics information processing.

  • Learn basic parts reminiscent of MapReduce, HDFS, and YARN
  • Explore MapReduce extensive, together with steps for creating programs with it
  • Set up and deal with a Hadoop cluster working HDFS and MapReduce on YARN
  • Learn information codecs: Avro for information serialization and Parquet for nested data
  • Use information ingestion equipment reminiscent of Flume (for streaming information) and Sqoop (for bulk information switch)
  • Understand how prime-stage information processing equipment like Pig, Hive, Crunch, and Spark paintings with Hadoop
  • Learn the HBase dispensed database and the ZooKeeper dispensed configuration service