python - How to count items in a string -
I am trying to capture the file ID, grab the captured file, and then do the sentence count, the statement sent by the statement Now I'm getting 0 for all the sentence calculations. The problem I have just checked back is that my grabbing abs are not catching the essence for me. Can someone examine me the code and tell me what's the problem? Thank you.
grabFile = re.findall (r'File \ s + \: \ s + (\ w \ d {7}) ', mytext) if LAN (handfile) == 0: matchFile Newtext = re.sub (r '\ s +', '', newtext) grabAbs = "N / A" Other: matchFile = grabFile [0] newtext = re.sub ('r' \ n ',' ', mytext) Re.findall (r'Abrstract \ s + \: (\ w. +) ', Newtext) if LAN (for grab) == 0: matchAbs = "N / A" else: matchAbs = grabAbs # filesents = {} Sentcount = 0 for each in MatchAbs.split ('\.'): If each == 'N / A': the number sent is = 0 and: the number of the numbered = 1 printed number, the match file Abstract of one of the files (A 95000006) from the text:
Abstract:
9500006 Wang industry Academic contact a grant opportunity for this award (rounding) research project will develop a new method to reduce aluminum automotive geometric variation in space frames for.
regex in this line grabAbs = re .findall (r'Abrstract \ s + \: (\ w. +) ', Newtext) It always believes that at least one' between 'abstract' and 'colon' Whitening Characters: 'If I was writing an essence, then I would summarize the essence of my abstract with the table after the word of Abstract; If you change `\ s +` to `\ s *, then you will allow this case that there is no place before the colon. Try and see if it solves your problem. --EDIT - After looking at your sample input, the problem is more likely that there is a line break after the colon, and you have not set the multiline search flag on your regex. Try it: grabAbs = re.findall (r'Abrstract \ s + \: (\ w. +) ', Newtext, flags = re.M) - edit -
@jeman reported that the row above what I mentioned removes all the locations, so will be removed In addition, the top line removes all new characters. Since the new line is being removed after the colon, the \ w in the capture group can not match. Perhaps the capture group should be just like (. +) : grabAbs = re.findall (r'Abrstract \: (. +) ', Newtext) < / code>
Comments
Post a Comment