hadoop - Output of reducer sent to HDFS where as map output is stored in data node local disk? -


I'm a bit confused about HDFS storage and data node storage. Below are my doubts

  1. The map function will be saved in the output data node Local disk and reducer output will be sent to HDFS. As we all know that data blocks are stored in data nodes, there is no other disk space available for HDFS in the data node in the local disk ??

  2. What is the physical storage location of the reducer output file (Part-NNNN-R-00001)? Will it be stored in the name node hard disk?

    So I believe that the data node is part of HDFF, I believe that the data node local disk is also part of HDFS is.

    Related Suresh

    You know the difference between virtual concept and actual storage needed . HDFS (Hedop Distributed File System) only specifies how data will be stored in date data. When you say that a file is placed in HDFS, it means that it will actually be considered as an HDFS file, but will actually be stored in Detanode's disk.

    Look at the details, how it works:

    • HDFS as a block-structured file system: This personal file Will break into a fixed size block (by default 64 Mbytes). These blocks are stored in a cluster of machines made of one name and many detox. NameNode handles the metadata structures (eg, names and files of) and regulate access to files, it also executes tasks such as open / close / rename. To open the file, contacts a customer name and retrieve the list of locations for the block, which contains the file. These locations identify the DataNodes that hold each block, then the client reads the file data directly from the dateanet server, possibly in parallel. NameNode is not directly involved in this bulk data transfer, keeping its upper part to the minimum.

    • DataNodes will be responsible for completing read / write requests and creating / blocking / blocking replication. That's why every block in the HDFS system is actually stored in Daytonet.

Comments

Popular posts from this blog

Pass DB Connection parameters to a Kettle a.k.a PDI table Input step dynamically from Excel -

multithreading - PhantomJS-Node in a for Loop -

c++ - MATLAB .m file to .mex file using Matlab Compiler -