compression - How to extract data from Hadoop sequence file? -


Hadoop sequence file is really weird I can not pack images and retrieve images in the sequence file. I do a few simple tests and I did not even get the size of the byte before and after the sequence file.

  configuration confHadoop = new configuration (); File system fs = FileSystem.get (confHadoop); String filename = args [0]; Path file = new path (fs.getUri (.) ToString () + "/" + fileName); Path seqFile = new path ("/ temp.seq"); Index file Author writer = null; FSDataInputStream Inn = Null; {Author = sequence file.Createreitre (Confhepedope, Writerfile), Writer. Key class (textex), Writer.valueclass (bytesable.class)); In = fs.open (file); Byte buffer [] = IOUtils.toByteArray (in); System.out.println ("Basic Size ---->" + String.valueOf (Buffer Lamp)); Author.append (new text (filename), new bytesrate (buffer)); Println (calculateMd5 (buffer)); Writer.close (); } Finally {IOUtils.closeQuietly (in); } Sequence file. Reader Reader = New Sequence File. Reader (Confahadop, Reader.file (Sequemble)); Text key = new text (); Bytesable Val = new bytes (); While (reader.next (key, val)) {System.out.println ("Get the size from the sequence file ---> String.valueOf (val.getLength ()); String MD5 = val.getBytes ()); Path readSeq = new path ("/ back back.png"); FSDataOutputStream = Exit; Out = fs.create (readSeq); Out.write (val.getBytes ()); //Yes! ORIGIANL IAMGE got out. Close (); Println (md5); .............}   

The output shows that I get the same number of bytes, and when I write the image back to the local disk , So I'm sure I got the original image, but why MD5 value is not the same?

What have I done wrong here?

  14/04/22 16:21:35 INFO compress.CodecPool: Brand - New Compressor [.deflate] Basic Size ---- & gt; 485709 c413e36fd864b27d4c8927956298edbb 14/04/22 16:21:35 INFO compress.CodecPool: The size of the brand's new Decompresser [D Defit] meets the sequence file --- & gt; 485709 322cce20b732126bcb8876c4fcd925cb  

I finally solve this weird problem, and share it with me Have to do. First of all, I will show you the wrong way to get the byte sequence.

  configuration conf = new configuration (); File system fs = FileSystem.get (conf); Path input = new path (inpath); Reader Reader = New Sequence File Reader (Conf, Reader.file (Input)); Text key = new text (); Bytesable Val = new bytes (); While (reader.next (key, val)) {fileName = key.toString (); Byte [] data = val.getBytes (); // Do not think you got the data! }   

Due to getBytes () does not exactly resize your original data. I have inserted the data into

  FSDataInputStream in = zero; In = fs.open (input); Byte [] buffer = IOUtils.toByteArray (in); Author Author = Sequence file.Cut (password, writer.file (output), author keyboard (text square), author.valueclass (bytesable.class)); Author.append (new text (inPath), new bytesrate (buffer)); Writer.close ();   

I check the size of the output sequence file, this is the original size plus head, I'm not sure that due to getBytes () give me more bytes than the original. But we see how to get data properly.

Option # 1, Copy the size of the data you need.

  byte [] rawdata = val.getBytes (); Length = val.getLength (); // The exact size of the original data byte [] data = Arrays.copyOfRange (rawdata, 0, length); This is the courter   

  byte [] data = val.copyBytes ();   

This is more sweet :) In the end it became correct.

Comments

Popular posts from this blog

Pass DB Connection parameters to a Kettle a.k.a PDI table Input step dynamically from Excel -

multithreading - PhantomJS-Node in a for Loop -

c++ - MATLAB .m file to .mex file using Matlab Compiler -