• About
  • Privacy Policy
  • Contact us
  • Submit a story
Linux-News Linux news from the Blogosphere
  • Article
  • Howto
  • News
  • Opinion
  • Review
  • RSS Feed
  • Twitter
  • Facebook

An Overview of Apache’s Hadoop

By
News
– June 14, 2012Posted in: Article, Review, Submitted

By Chandra Heitzman

Designed by the Apache Software Foundation, Hadoop is a Java-based open-source platform designed to process massive amounts of data in a distributed computing environment. Hadoop’s key innovations lay in its ability to store and access massive amounts of data over thousands of computers and to coherently present that data.

Though data warehouses can store data on a similar scale, they are costly and do not allow for effective exploration of huge amounts of discordant data. Hadoop addresses this limitation by taking a data query and distributing it over multiple computer clusters. By distributing the workload over thousands of loosely networked computers (nodes), Hadoop can potentially examine and present petabytes of heterogeneous data in a meaningful format. Even so, the software is fully scalable and can operate on a single server or small network.

Hadoop’s distributed computing abilities are actually derived from two software frameworks: the Hadoop Distributed File System (HDFS) and MapReduce. HDFS facilitates rapid data transfer between computer nodes and allows continued operation even in the event of node failure. MapReduce distributes all data processing over these nodes, thus reducing the workload on each individual computer and allowing for computations and analysis beyond the capabilities of a single computer or network. For example, Facebook uses MapReduce for analysis of user behavior and advertisement-tracking, amounting to about 21 petabytes of information. Other prominent users include IBM, Yahoo, and Google, typically for use in search engines and advertising.

A typical application of Hadoop requires the understanding that it is designed to run on a large number of machines without shared hardware or memory. When a financial institution wants to analyze data from dozens of servers, Hadoop breaks apart the data and distributes it throughout those servers. Hadoop also replicates the data, preventing data loss in the event of most failures. In addition, MapReduce expands potential computing speed by dividing and distributing LARGE data analysis through all servers or computers in a cluster, but answers the query in a single result set.

Though Hadoop offers a scalable approach to data storage and analysis, it is not meant as a substitute for a standard database (e.g. SQL Server 2012 database). Hadoop stores data in files, but does not index them for easy locating. Finding the data requires MapReduce, which will take more time than what can be considered efficient for simple database operations. Hadoop functions best when the dataset is too large for conventional storage and too diverse for easy analysis.

The digitization of information has increased nine times in the last five years, with companies spending an estimated four trillion dollars worldwide on data management in 2011. Doug Cutting, creator of Cloudera and Hadoop, estimates that 1.8 zettabytes (1.8 trillion gigabytes) were created and replicated in the same year. Ninety percent of this information is unstructured, and Hadoop and applications like it offer the only current method of keeping this data comprehensible.


For more information about SQL 2012 development, visit Magenic who have been one of the leading software development companies providing innovative custom software development to meet unique business challenges for some of the most recognized companies and organizations in the nation.

Article Source: http://EzineArticles.com/?expert=Chandra_Heitzman

Article Source: http://EzineArticles.com/6928548

Tags: apache software foundation, big data, computer clusters, computer nodes, data query, data warehouses, Facebook, google, hadoop, hdfs, individual computer, massive amounts, open source platform, rapid data transfer, search engines

About News

No Comments

Start the ball rolling by posting a comment on this article!

Leave a Reply Cancel reply

Your email address will not be published.

  • Suggested Sites
    http://linuxaria.com everything about Linux http://linuxaria.com everything about Linux
  • Curated topic
  • Recent Posts
    • The Ultimate Hypothyroidism Diet Plan PDF: Your Guide to Thyroid Health
    • How to Ace Colonoscopy Prep: A Comprehensive PDF Guide
    • How to Effortlessly Turn a PDF into Word: The Ultimate Guide
    • Unlock Seamless Document Editing: How to Convert PDFs to Editable Word Documents
    • How to Resize PDF Files: A Comprehensive Guide for the PDF Niche
    • Unlock Credit Clarity: Your Guide to the Fair Credit Reporting Act PDF
    • Secure and Efficient PHI Release: A Comprehensive Guide to General Release of Information Forms in PDF
    • How to Write an Effective Conclusion PDF: A Comprehensive Guide
    • Far Aim 2023 PDF: Your Roadmap to a Successful Future
    • Access the Telugu Aditya Hrudayam as a PDF: A Guide to Divine Devotion
  • Ranking

About Arras WordPress Theme

All site content, except where otherwise noted, is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.