html - Get contents Attribute Value pairs using BeautifulSoup or XPATH -
For the following XHTML snippets, I use the attributes value combinations with structured HTML, using BS4 or xpath to get the attribute name The H5 is present in the tag and its value is either low in the SPAN tag or AP tag.
For the code given below, I should get the following output as a dictionary:
Husband management: 'Animals: Cow farmer: Mr. Smith,' < p> Milk range: 'milk supply'
Services: 'cow's milk, ghee'
Animal color: 'red, brown ...'
  & lt; div id = "animalcontainer" class = "container last fixed height" & gt; & Lt; H5 & gt; Husband management & lt; / H5> & Lt; Period & gt; Animals: Cows & lt; / Span & gt; & Lt; Period & gt; Farmer: Mr. Smith & Lt; / Span & gt; & Lt; H5 & gt; Lactation category & lt; / H5> & Lt; P & gt; Milk supply & lt; / P & gt; & Lt; H5 & gt; Services & lt; / H5> & Lt; P & gt; Cow's milk, ghee and lt; / P & gt; & Lt; H5 & gt; Animal color & lt; / H5> & Lt; Period & gt; Green, red & lt; / Span & gt; & Lt; / Div & gt; Html code.findAll ('h5') finds the h5 elements, but I want the h5 element and successor before 'h5'    < "post-text" itemprop Example solution to use XPath = "text">   lxml.html  and XPath:   -  all  h5 < elements  -  and for each  h5  element,  -  select the elements of the next sibling -  the following- sibling :: *   -  that is not  h5 , itself - -  [no (auto :: h5)]   -  and dependent on it  h5  before the number Brother -  [count (predecessor-sibling :: H5) = 1]  then 2, then 3 ...      ( for  loop  enumerate (starts from <1)   The sample content with simple prints of the content of the text elements (elements on the  Lxml.html  ' using .text_content () ):    import lxml.html Html = "" "& lt; div id = "animalcontainer" class = "final container fixed -height" & gt; & Lt; H5 & gt; Husband management & lt; / H5> & Lt; Period & gt; Animals: Cows & lt; / Span & gt; & Lt; Period & gt; Farmer: Mr. Smith & Lt; / Span & gt; & Lt; H5 & gt; Lactation category & lt; / H5> & Lt; P & gt; Milk supply & lt; / P & gt; & Lt; H5 & gt; Services & lt; / H5> & Lt; P & gt; Cow's milk, ghee and lt; / P & gt; & Lt; H5 & gt; Animal color & lt; / H5> & Lt; Period & gt; Green, red & lt; / Span & gt; For i & lt; / Div & gt; "Doc = lxml.html.fromstring (html) header = doc.xpath ('// div / h5'), enumerate hummer (header, start = 1): print" - ---------- -------------------- "print header.text_content (). strip () in header.xpath for the following (" "" following-siblings :: * [ (Auto: H5)] [Calculation (predecessor-sibling :: H5) =% d] "" "% i: Print" \ t ", the following text. ()    This output:     ------------------------ -------- Husband management Animals: Cow farmer: Mr. Smith -------------------------------- Milk range milk supply - ------------------------------- Services cow milk, ghee -------------- ------------------ Animals color green, red    
 
 
 
 
 
 
 
 
Comments
Post a Comment