Python Latin Characters and Unicode -

- July 15, 2015

I have a tree structure in which keywords can contain some Latin characters. I have a function that grows through all the leaves of trees and adds each keyword to a list under certain conditions.

Here I have the code to add these keywords to the list:

  Print "Add:" + Self. Keyword leaf_list.append (self.keyword) print leaf_list    If the keyword is the keyword  universitÃ © , then my output is:  
 Adding: code: universitÃ © ['universit \ xc3 \ xa9']    It appears that the print function correctly shows Latin, but when I add it to the list, It gets decoded.  
 How can I change it? I need to be able to print the list with standard Latin characters, not their decoded versions.   
 
  You do not have Unicode objects, but byte string with UTF-8 encoded text. To print such byte strings on your terminal  may  work if your terminal is configured to handle UTF-8 text.  
 When a list is converted into a string, then the contents of the list  representation ; The result of  repr ()  function Represented string object, printable ASCII uses escape codes for any byte outside the range; For example, new lines are replaced by  \ n . Your UTF-8 bytes are presented by the  \ xhh  escape sequence.  
 If you were using Unicode objects, the representation of  \ xhh  will be escaped  still , but only Latin-1 class (outside ASCII ) For Unicode codepoints (the rest are displayed on the basis of  \ uhhhh  and  \ Uhhhhhhh  their codepoint); When reading, automatically encodes such values in the right encoding for your terminal:  
  gt; & Gt; U'universitÃ © 'u'universit \ xe9' & gt; & Gt; & Gt; Lane (U'nagriti '©') 10> gt; & Gt; Print YuinGeeriti © 'University'    Compare it with a byte string:  
  & gt; & Gt; & Gt; 'University' '' University \ xc3 \ xa9 '' gt; & Gt; & Gt; Lane ('universitÃ ©') 11> gt; & Gt; 'University' © DCDAD ('UTF8') You'Ingerit \ xe 9 '& gt; & Gt; & Gt; Print 'universitÃ ©' universitÃ ©    Note that the length indicates that  ÃƒÆ'Ã ¢ â,¬Å¡Ãƒâ € šÃ,Â «It was my terminal that  Python with \ xc3 \ xa9  bytes presented in the Python session with the paste of the  Ãƒâ € šÃ,Â  character, the way it is configured to use UTF-8 , And Python has detected it and decoded bytes when I have defined literally the  u '..'  Unicode object.  
 I firmly recommend that you can read the following articles to understand how Python handles Unicode, and what is the difference between Unicode text and encoded byte string:  < ul> 
  Joel Spolsky   
    
  by Ned Bottler   < / Ul>  

 




  



















Get link





Facebook





X





Pinterest





Email





Other Apps

Comments Post a Comment

Search This Blog

BAVO

Python Latin Characters and Unicode -

Comments

Post a Comment

Popular posts from this blog

Pygame memory leak with transform.flip -

c# - LINQ to Entities does not recognize the method 'Int32 IndexOf(System.String, System.StringComparison)' method -

python - Writing Greek in matplotlib labels, titles -