html - Get contents Attribute Value pairs using BeautifulSoup or XPATH -

- September 15, 2014

For the following XHTML snippets, I use the attributes value combinations with structured HTML, using BS4 or xpath to get the attribute name The H5 is present in the tag and its value is either low in the SPAN tag or AP tag.

For the code given below, I should get the following output as a dictionary:

Husband management: 'Animals: Cow farmer: Mr. Smith,' < p> Milk range: 'milk supply'

Services: 'cow's milk, ghee'

Animal color: 'red, brown ...'

  & lt; div id = "animalcontainer" class = "container last fixed height" & gt; & Lt; H5 & gt; Husband management & lt; / H5> & Lt; Period & gt; Animals: Cows & lt; / Span & gt; & Lt; Period & gt; Farmer: Mr. Smith & Lt; / Span & gt; & Lt; H5 & gt; Lactation category & lt; / H5> & Lt; P & gt; Milk supply & lt; / P & gt; & Lt; H5 & gt; Services & lt; / H5> & Lt; P & gt; Cow's milk, ghee and lt; / P & gt; & Lt; H5 & gt; Animal color & lt; / H5> & Lt; Period & gt; Green, red & lt; / Span & gt; & Lt; / Div & gt; Html code.findAll ('h5') finds the h5 elements, but I want the h5 element and successor before 'h5'    < "post-text" itemprop Example solution to use XPath = "text"> 
  lxml.html  and XPath:  
  all  h5 < elements  
 and for each  h5  element,   select the elements of the next sibling -  the following- sibling :: *   
 that is not  h5 , itself - -  [no (auto :: h5)]   
 and dependent on it  h5  before the number Brother -  [count (predecessor-sibling :: H5) = 1]  then 2, then 3 ...      ( for  loop  enumerate (starts from <1)  
 The sample content with simple prints of the content of the text elements (elements on the  Lxml.html  ' using .text_content () ):  
  import lxml.html Html = "" "& lt; div id = "animalcontainer" class = "final container fixed -height" & gt; & Lt; H5 & gt; Husband management & lt; / H5> & Lt; Period & gt; Animals: Cows & lt; / Span & gt; & Lt; Period & gt; Farmer: Mr. Smith & Lt; / Span & gt; & Lt; H5 & gt; Lactation category & lt; / H5> & Lt; P & gt; Milk supply & lt; / P & gt; & Lt; H5 & gt; Services & lt; / H5> & Lt; P & gt; Cow's milk, ghee and lt; / P & gt; & Lt; H5 & gt; Animal color & lt; / H5> & Lt; Period & gt; Green, red & lt; / Span & gt; For i & lt; / Div & gt; "Doc = lxml.html.fromstring (html) header = doc.xpath ('// div / h5'), enumerate hummer (header, start = 1): print" - ---------- -------------------- "print header.text_content (). strip () in header.xpath for the following (" "" following-siblings :: * [ (Auto: H5)] [Calculation (predecessor-sibling :: H5) =% d] "" "% i: Print" \ t ", the following text. ()    This output:  
  ------------------------ -------- Husband management Animals: Cow farmer: Mr. Smith -------------------------------- Milk range milk supply - ------------------------------- Services cow milk, ghee -------------- ------------------ Animals color green, red




















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




Pygame memory leak with transform.flip -



-



May 15, 2010








    Today I came across an interesting issue which, by writing a small side scroller, instead of a sprite sheet, I was planning to use because I lacked a good "packer" for making phantom sheets (and honestly with patience and passion as it is only a test project).   Anyway, I showed the animation running on the right of 8 PNG (with transparency). I considered it a good idea to prepare the flipped images to run on the left in the consultant, and the frame is not framed when the game is running (Flip actually consumes a lot of CPU like I Was measured). > To get access to the loop, I get the memory leak    self.move_l = [# 8 PNG images with transparency to the right] For the self in IMG. Move_l: self.move_r.append (pygame.transform.flip (img, true, false)    error traceback:    traceback (most recent Call final): The file "D: \ Python 34 \ _Projects \ efg \ efg.py", in line 196, in the & lt; module & gt; main = master program ((1024, 680)) "D: \ Python ...





Read more





python - Writing Greek in matplotlib labels, titles -



-



June 15, 2015













    I'm trying to write some text for labels, shape titles etc. with my plots, but no benefit till now Has not happened .   I do not want to print specific characters (I know how to use special characters), I want to write the entire text in Greek (maybe Unicode and  text ' ?).  , suggesting that I would like to mention that for some reason I can not get matplotlib to cooperate with Tex (using IPython notebook from Anaconda in Ubuntu 14.10) so that there really is an option Will not be the first.   I tried loading the Arial font and it loads successfully but again I get square blocks instead of characters. I    import matplotlib.font_manager FM PROP = FM Font Properties (fname = '/ usr / share / fonts / truetype / msttcorefonts / Arial.ttf')    And then to display the string I  u'Î? ? I ?? Î¿Ï ?? ÎμÎ »second ?? Î¼Î ± i ?? I ± i ?? I ?? Used Î¿ किया? ? Î¿Î¼Î¿Î¯Ï ?? I ?? I · I ?? ' . Ariel is considered to be fully presented to Greek and I have used it many times ...





Read more





c# - LINQ to Entities does not recognize the method 'Int32
IndexOf(System.String, System.StringComparison)' method -



-



September 15, 2014








    I have performed a linq query using Entityframework as below    GroupMaster getGroup = Null; GetGroup = DataContext.Groups.FirstOrDefault (item = & gt; keyword.IndexOf (item.Keywords, StringComparison.OrdinalIgnoreCase)> = 0 & amp; amp; item.ISNAmd)    I found an exception like the method when executing it     LINQ units do not recognize the 'Inter 32 index of (system string, system. String compromise)' method, and This method can not be translated to Store expression.     Includes () method is sensitive by default, then I need to convert to minimize. Is there any method in which a string match match is examined and whether there is any method to solve the index question of the method?      You actually have four options    Databases globally Change the colon of it can be done in many ways, a simple Google search should reveal to them.   Change the mix of individual tables or columns.   Use a stored procedure and specify the call statement on your query   Perform...





Read more

Search This Blog

BAVO

html - Get contents Attribute Value pairs using BeautifulSoup or XPATH -

Comments

Post a Comment

Popular posts from this blog

Pygame memory leak with transform.flip -

python - Writing Greek in matplotlib labels, titles -

c# - LINQ to Entities does not recognize the method 'Int32 IndexOf(System.String, System.StringComparison)' method -