unicode encoding c# examples

unicode encoding c# examples
In this article [Show more]

    Unicode Encoding in C# Examples

    In C#, Unicode encoding is crucial for representing and manipulating text data, particularly in applications that handle multiple languages or need compatibility with different platforms. The System.Text namespace provides several classes to manage Unicode encoding, including UnicodeEncoding for UTF-16 encoding. This article offers practical examples of using Unicode encoding in C#.

    Understanding Unicode and UTF-16

    Unicode is a universal character encoding standard that assigns a unique code point to every character across various languages and scripts. UTF-16 is a common encoding format that uses two bytes per character for most characters but can use four bytes for supplementary characters.

    UnicodeEncoding Class

    The UnicodeEncoding class handles UTF-16 encoding in .NET, supporting both big-endian and little-endian byte ordering.

    Encoding and Decoding Examples

    Example 1: Encode String to UTF-16 Bytes

    Here's an example of converting a string to UTF-16 bytes:

     

    using System;
    using System.Text;
    
    public class UnicodeEncodingExample
    {
        public static void Main()
        {
            // Initialize the string to encode
            string originalText = "Hello, 世界!";
    
            // Create an instance of UnicodeEncoding
            UnicodeEncoding unicode = new UnicodeEncoding();
    
            // Encode the string to a byte array
            byte[] encodedBytes = unicode.GetBytes(originalText);
    
            // Display the encoded bytes
            Console.WriteLine("Encoded bytes:");
            foreach (byte b in encodedBytes)
            {
                Console.Write($"{b:X2} ");
            }
        }
    }
    

    Example 2: Decode UTF-16 Bytes to String

    To decode bytes back to a string using UTF-16, the following example demonstrates:

     

    using System;
    using System.Text;
    
    public class UnicodeDecodingExample
    {
        public static void Main()
        {
            // UTF-16 encoded bytes (for "Hello, 世界!")
            byte[] encodedBytes = { 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 44, 0, 32, 0, 39, 30, 121, 16, 33, 0 };
    
            // Create an instance of UnicodeEncoding
            UnicodeEncoding unicode = new UnicodeEncoding();
    
            // Decode the byte array back to a string
            string decodedText = unicode.GetString(encodedBytes);
    
            // Display the decoded string
            Console.WriteLine("Decoded string: " + decodedText);
        }
    }
    

    Example 3: Big-Endian Unicode Encoding

    In cases where big-endian encoding is required:

     

    using System;
    using System.Text;
    
    public class BigEndianUnicodeEncodingExample
    {
        public static void Main()
        {
            // Create an instance of UnicodeEncoding with big-endian byte order
            UnicodeEncoding bigEndianUnicode = new UnicodeEncoding(true, true);
    
            // Encode a string
            string text = "Bonjour, monde!";
            byte[] encodedBytes = bigEndianUnicode.GetBytes(text);
    
            // Display the encoded bytes in big-endian order
            Console.WriteLine("Big-endian encoded bytes:");
            foreach (byte b in encodedBytes)
            {
                Console.Write($"{b:X2} ");
            }
        }
    }
    

    Practical Applications

    • Internationalization: Ensure that text data can handle multiple languages and scripts.
    • Data Exchange: Facilitate text exchange between applications with different encoding standards.
    • File I/O: Read and write files in formats that support Unicode characters.

    Conclusion

    Unicode encoding in C# is essential for working with diverse text data. By understanding and implementing UnicodeEncoding correctly, developers can manage globalized text applications effectively.

    Author Information
    • Author: Ehsan Babaei

    Send Comment



    Comments