Unicode Encoding in C# Examples
In C#, Unicode encoding is crucial for representing and manipulating text data, particularly in applications that handle multiple languages or need compatibility with different platforms. The System.Text namespace provides several classes to manage Unicode encoding, including UnicodeEncoding for UTF-16 encoding. This article offers practical examples of using Unicode encoding in C#.
Understanding Unicode and UTF-16
Unicode is a universal character encoding standard that assigns a unique code point to every character across various languages and scripts. UTF-16 is a common encoding format that uses two bytes per character for most characters but can use four bytes for supplementary characters.
UnicodeEncoding Class
The UnicodeEncoding class handles UTF-16 encoding in .NET, supporting both big-endian and little-endian byte ordering.
Encoding and Decoding Examples
Example 1: Encode String to UTF-16 Bytes
Here's an example of converting a string to UTF-16 bytes:
using System;
using System.Text;
public class UnicodeEncodingExample
{
public static void Main()
{
// Initialize the string to encode
string originalText = "Hello, 世界!";
// Create an instance of UnicodeEncoding
UnicodeEncoding unicode = new UnicodeEncoding();
// Encode the string to a byte array
byte[] encodedBytes = unicode.GetBytes(originalText);
// Display the encoded bytes
Console.WriteLine("Encoded bytes:");
foreach (byte b in encodedBytes)
{
Console.Write($"{b:X2} ");
}
}
}
Example 2: Decode UTF-16 Bytes to String
To decode bytes back to a string using UTF-16, the following example demonstrates:
using System;
using System.Text;
public class UnicodeDecodingExample
{
public static void Main()
{
// UTF-16 encoded bytes (for "Hello, 世界!")
byte[] encodedBytes = { 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 44, 0, 32, 0, 39, 30, 121, 16, 33, 0 };
// Create an instance of UnicodeEncoding
UnicodeEncoding unicode = new UnicodeEncoding();
// Decode the byte array back to a string
string decodedText = unicode.GetString(encodedBytes);
// Display the decoded string
Console.WriteLine("Decoded string: " + decodedText);
}
}
Example 3: Big-Endian Unicode Encoding
In cases where big-endian encoding is required:
using System;
using System.Text;
public class BigEndianUnicodeEncodingExample
{
public static void Main()
{
// Create an instance of UnicodeEncoding with big-endian byte order
UnicodeEncoding bigEndianUnicode = new UnicodeEncoding(true, true);
// Encode a string
string text = "Bonjour, monde!";
byte[] encodedBytes = bigEndianUnicode.GetBytes(text);
// Display the encoded bytes in big-endian order
Console.WriteLine("Big-endian encoded bytes:");
foreach (byte b in encodedBytes)
{
Console.Write($"{b:X2} ");
}
}
}
Practical Applications
- Internationalization: Ensure that text data can handle multiple languages and scripts.
- Data Exchange: Facilitate text exchange between applications with different encoding standards.
- File I/O: Read and write files in formats that support Unicode characters.
Conclusion
Unicode encoding in C# is essential for working with diverse text data. By understanding and implementing UnicodeEncoding correctly, developers can manage globalized text applications effectively.