python - Concurrent SAX processing of large, simple XML files? -


I have some large xml files (10 GB-40GB) that have a very simple structure: only one root root node I'm trying to parse multiple line nodes in them using Python for SAX, but for me every additional additional processing means that 40GB takes the entire day to complete the file. To speed things up, I want to use all the core together. Unfortunately, it seems that the Sachs Parser can not handle the "distorted" sections of XML, which you find for an arbitrary line in the file And try to parse from there. Since the SAC parser can accept a stream, I think I need to split my XML file into eight different streams, each with [number of rows] / 8 rows and with duplicate opening and closing tag Bale How would I go about doing this? Or a ???? Is there a better solution that I can remember? Thanks!

You can not easily break SAX parsing in many threads, and you need to: If you run the parsing without any other processing, then it should be running in 20 minutes or more, focus on processing that you do in your content headquarters data.

Comments

Popular posts from this blog

Pass DB Connection parameters to a Kettle a.k.a PDI table Input step dynamically from Excel -

multithreading - PhantomJS-Node in a for Loop -

c++ - MATLAB .m file to .mex file using Matlab Compiler -