Java: convert UTF-8 to unicode escape string
There are 2 notes for this topic, click above title to see all notes.
August 03, 2010 16:23:55 Last update: October 22, 2010 15:35:20
According to Java documentation:
The Java compiler and other Java tools can only process files which contain Latin-1 and/or Unicode-encoded (\udddd notation) characters.
This utility converts a utf-8 encoded file to ascii with unicode escape strings for non-ascii characters.
It is equivalent to:
using the standard Java native2ascii utility.
The Java compiler and other Java tools can only process files which contain Latin-1 and/or Unicode-encoded (\udddd notation) characters.
This utility converts a utf-8 encoded file to ascii with unicode escape strings for non-ascii characters.
import java.io.*; /** * Reads file in UTF-8 encoding and output to STDOUT in ASCII with unicode * escaped sequence for characters outside of ASCII. */ public class UTF8ToAscii { public static void main(String[] args) throws Exception { if (args.length < 1) { System.out.println("Usage: java UTF8ToAscii <filename>"); return; } BufferedReader r = new BufferedReader( new InputStreamReader( new FileInputStream(args[0]), "UTF-8" ) ); String line = r.readLine(); while (line != null) { System.out.println(unicodeEscape(line)); line = r.readLine(); } r.close(); } private static final char[] hexChar = { '0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F' }; private static String unicodeEscape(String s) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < s.length(); i++) { char c = s.charAt(i); if ((c >> 7) > 0) { sb.append("\\u"); sb.append(hexChar[(c >> 12) & 0xF]); // append the hex character for the left-most 4-bits sb.append(hexChar[(c >> 8) & 0xF]); // hex for the second group of 4-bits from the left sb.append(hexChar[(c >> 4) & 0xF]); // hex for the third group sb.append(hexChar[c & 0xF]); // hex for the last group, e.g., the right most 4-bits } else { sb.append(c); } } return sb.toString(); } }
It is equivalent to:
native2ascii -encoding utf-8
using the standard Java native2ascii utility.
2 comments 