String Class
Inheritance Hierarchy
System.Object
System.String
Assembly
mscorlib (in mscorlib.dll)
Introduction
A string is a sequential collection of Unicode characters that is used to represent text. A String object is a sequential collection of System.Char objects that represent a string. The value of the String object is the content of the sequential collection, and that value is immutable (that is, it is read-only). For more information about the immutability of strings, see the Immutability and the StringBuilder class section later in this topic.The maximum size of a String object in memory is 2GB, or about 1 billion characters.
Note
To view the .NET Framework source code for this type, see the Reference Source. You can browse through the source code online, download the reference for offline viewing, and step through the sources (including patches and updates) during debugging; see instructions.
Syntax
Instantiating a String Object
You can instantiate a String object in several different ways:
By assigning a string literal to a variable
This is the most commonly used method for creating a string. The following example uses assignment to create several strings. Note that in C#, because the backslash (\) is an escape character, literal backslashes in a string must be escaped or the entire string must be @-quoted.
By calling a String class constructor.
The adjoining example instantiates strings by calling several class constructors. Note that some of the constructors include pointers to character arrays or signed byte arrays as parameters. Visual Basic does not support calls to these constructors. For detailed information about String constructors, see the String constructor summary.
By concatenating strings
By using the string concatenation operator (+ in C# and & or + in Visual Basic) to create a single string from any combination of String instances and string literals. The following example illustrates the use of the string concatenation operator.
By retrieving a property or calling a method that returns a string
The adjoining example uses the methods of the String class to extract a substring from a larger string.
By calling a formatting method to convert a value or object to its string representation
The following example uses the feature to embed the string representation of two objects into a string.
Char objects and Unicode characters
Each character in a string is defined by a Unicode scalar value, also called a Unicode code point or the ordinal (numeric) value of the Unicode character. Each code point is encoded by using UTF-16 encoding, and the numeric value of each element of the encoding is represented by a Char
object.
A single Char
object usually represents a single code point; that is, the numeric value of the Char
equals the code point. For example, the code point for the character "a" is U+0061. However, a code point might require more than one encoded element (more than one Char
object). The Unicode standard defines three types of characters that correspond to multiple Char
objects: graphemes, Unicode supplementary code points, and characters in the supplementary planes.
Graphemes
A grapheme is represented by a base character followed by one or more combining characters. For example, the character ä is represented by a Char
object whose code point is U+0061 followed by a Char object whose code point is U+0308. This character can also be defined by a single Char
object that has a code point of U+00E4. As the adjoining example shows, a culture-sensitive comparison for equality indicates that these two representations are equal, although an ordinary ordinal comparison does not. However, if the two strings are normalized, an ordinal comparison also indicates that they are equal. (For more information on normalizing strings, see the Normalization section.)
Unicode supplementary code points
A Unicode supplementary code point (a surrogate pair) is represented by a Char
object whose code point is a high surrogate followed by a Char object whose code point is a low surrogate. The code units of high surrogates range from U+D800 to U+DBFF. The code units of low surrogates range from U+DC00 to U+DFFF. Surrogate pairs are used to represent characters in the 16 Unicode supplementary planes. The following example creates a surrogate character and passes it to the Char.IsSurrogatePair(Char, Char)
method to determine whether it is a surrogate pair.
Strings and embedded null characters
In the .NET Framework, a String object can include embedded null characters, which count as a part of the string's length. However, in some languages such as C and C++, a null character indicates the end of a string;it is not considered a part of the string and is not counted as part of the string's length. This means that the following common assumptions that C and C++ programmers or libraries written in C or C++ might make about strings are not necessarily valid when applied to String objects:
- The value returned by the
strlen
orwcslen
functions does not necessarily equalString.Length
. - The string created by the
strcpy_s
orwcscpy_s
functions is not necessarily identical to the string created by theString.Copy
method.
You should ensure that native C and C++ code that instantiates String objects, and code that is passed String objects through platform invoke, do not assume that an embedded null character marks the end of the string.
Embedded null characters in a string are also treated differently when a string is sorted (or compared) and when a string is searched. Null characters are ignored when performing culture-sensitive comparisons between two strings, including comparisons using the invariant culture. They are considered only for ordinal or case-insensitive ordinal comparisons. On the other hand, embedded null characters are always considered when searching a string with methods such as Contains
, StartsWith
, and IndexOf
.
Strings and indexes
An index is the position of a Char
object (not a Unicode character) in a String. An index is a zero-based, nonnegative number that starts from the first position in the string, which is index position zero. A number of search methods, such as IndexOf
and LastIndexOf
, return the index of a character or substring in the string instance.
The Chars
property lets you access individual Char
objects by their index position in the string. Because the Chars
property is the default property (in Visual Basic) or the indexer (in C#), you can access the individual Char
objects in a string by using code such as that shown in the adjoining sample.
The adjoining code looks for white space or punctuation characters in a string to determine how many words the string contains.
Because the String
class implements the IEnumerable
interface, you can also iterate through the foreach
construct, as the adjoining example shows.
Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one Char
object. In particular, a string may contain multi-character units of text that are formed by a base character followed by one or more combining characters or by surrogate pairs. To work with Unicode characters instead of Char
objects, use the System.Globalization.StringInfo
and TextElementEnumerator
classes.
The following example illustrates the difference between code that works with Char objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character. The code works with text elements by using the StringInfo.GetTextElementEnumerator
method and the TextElementEnumerator
class to enumerate all the text elements in a string. You can also retrieve an array that contains the starting index of each text element by calling the StringInfo.ParseCombiningCharacters
method.
For more information about working with units of text rather than individual Char
values, see the StringInfo
class.
Null strings and empty strings
A string that has been declared but has not been assigned a value is null
. Attempting to call methods on that string throws a NullReferenceException
. A null string is different from an empty string, which is a string whose value is "" or String.Empty
. In some cases, passing either a null string or an empty string as an argument in a method call throws an exception. For example, passing a null string to the Int32.Parse
method throws an ArgumentNullException
, and passing an empty string throws a FormatException. In other cases, a method argument can be either a null string or an empty string. For example, if you are providing an IFormattable
implementation for a class, you want to equate both a null string and an empty string with the general ("G") format specifier.
The String class includes the following two convenience methods that enable you to test whether a string is null
or empty:
IsNullOrEmpty
Indicates whether a string is either null
or is equal to String.Empty
. This method eliminates the need to use the adjoining code.
IsNullOrWhiteSpace
Indicates whether a string is null
, equals String.Empty
, or consists exclusively of white-space characters. This method eliminates the need to use the adjoining code.
The adjoining example uses the IsNullOrEmpty
method in the IFormattable.ToString
implementation of a custom Temperature
class. The method supports the "G", "C", "F", and "K" format strings. If an empty format string or a format string whose value is null
is passed to the method, its value is changed to the "G" format string.
Immutability and the StringBuilder class
A String object is called immutable (read-only), because its value cannot be modified after it has been created. Methods that appear to modify a String object actually return a new String object that contains the modification.
Because strings are immutable, string manipulation routines that perform repeated additions or deletions to what appears to be a single string can exact a significant performance penalty. For example, the adjoining code uses a random number generator to create a string with 1000 characters in the range 0x0001 to 0x052F. Although the code appears to use string concatenation to append a new character to the existing string named str, it actually creates a new String object for each concatenation operation.
You can use the StringBuilder
class instead of the String class for operations that make multiple changes to the value of a string. Unlike instances of the String class, StringBuilder
objects are mutable; when you concatenate, append, or delete substrings from a string, the operations are performed on a single string. When you have finished modifying the value of a StringBuilder
object, you can call its StringBuilder.ToString
method to convert it to a string. The adjoining example replaces the String used in the previous example to concatenate 1000 random characters in the range to 0x0001 to 0x052F with a StringBuilder
object.
Ordinal vs. culture-sensitive operations
Members of the String class perform either ordinal or culture-sensitive (linguistic) operations on a String object. An ordinal operation acts on the numeric value of each Char
object. A culture-sensitive operation acts on the value of the String
object, and takes culture-specific casing, sorting, formatting, and parsing rules into account. Culture-sensitive operations execute in the context of an explicitly declared culture or the implicit current culture. The two kinds of operations can produce very different results when they are performed on the same string.
The .NET Framework also supports culture-insensitive linguistic string operations by using the invariant culture (CultureInfo.InvariantCulture
), which is loosely based on the culture settings of the English language independent of region. Unlike other System.Globalization.CultureInfo
settings, the settings of the invariant culture are guaranteed to remain consistent on a single computer, from system to system, and across versions of the .NET Framework. The invariant culture can be seen as a kind of black box that ensures stability of string comparisons and ordering across all cultures.
Operations for casing
, parsing and formatting
, comparison and sorting
, and testing for equality
can be either ordinal or culture-sensitive. The following sections discuss each category of operation.
Tip
You should always call a method overload that makes the intent of your method call clear. For example, instead of calling the Compare(String, String)
method to perform a culture-sensitive comparison of two strings by using the conventions of the current culture, you should call the Compare(String, String, StringComparison)
method with a value of StringComparison.CurrentCulture
for the comparisonType argument. For more information, see Best Practices for Using Strings in the .NET Framework.
Security Note
If your application makes a security decision about a symbolic identifier such as a file name or named pipe, or about persisted data such as the text-based data in an XML file, the operation should use an ordinal comparison instead of a culture-sensitive comparison. This is because a culture-sensitive comparison can yield different results depending on the culture in effect, whereas an ordinal comparison depends solely on the binary value of the compared characters.
Important
Most methods that perform string operations include an overload that has a parameter of type StringComparison
, which enables you to specify whether the method performs an ordinal or culture-sensitive operation. In general, you should call this overload to make the intent of your method call clear. For best practices and guidance for using ordinal and culture-sensitive operations on strings, see Best Practices for Using Strings in the .NET Framework.
Casing
Casing rules determine how to change the capitalization of a Unicode character; for example, from lowercase to uppercase. Often, a casing operation is performed before a string comparison. For example, a string might be converted to uppercase so that it can be compared with another uppercase string. You can convert the characters in a string to lowercase by calling the ToLower
or ToLowerInvariant
method, and you can convert them to uppercase by calling the ToUpper
or ToUpperInvariant
method. In addition, you can use the TextInfo.ToTitleCase
method to convert a string to title case.
Casing operations can be based on the rules of the current culture, a specified culture, or the invariant culture. Because case mappings can vary depending on the culture used, the result of casing operations can vary based on culture. The actual differences in casing are of three types:
Type 1
Differences in the case mapping of LATIN CAPITAL LETTER I (U+0049), LATIN SMALL LETTER I (U+0069), LATIN CAPITAL LETTER I WITH DOT ABOVE (U+0130), and LATIN SMALL LETTER DOTLESS I (U+0131). In the tr-TR (Turkish (Turkey)) and az-Latn-AZ (Azerbaijan, Latin) cultures, and in the tr, az, and az-Latn neutral cultures, the lowercase equivalent of LATIN CAPITAL LETTER I is LATIN SMALL LETTER DOTLESS I, and the uppercase equivalent of LATIN SMALL LETTER I is LATIN CAPITAL LETTER I WITH DOT ABOVE. In all other cultures, including the invariant culture, LATIN SMALL LETTER I and LATIN CAPITAL LETTER I are lowercase and uppercase equivalents.
The adjoining example demonstrates how a string comparison designed to prevent file system access can fail if it relies on a culture-sensitive casing comparison. (The casing conventions of the invariant culture should have been used.)
Type 2
Differences in case mappings between the invariant culture and all other cultures. In these cases, using the casing rules of the invariant culture to change a character to uppercase or lowercase returns the same character. For all other cultures, it returns a different character. Some of the affected characters are listed in the following table.
Character | If Changed To | Becomes |
---|---|---|
MICRON SIGN (U+00B5) | Uppercase | GREEK CAPITAL LETTER MU (U+-39C) |
LATIN CAPITAL LETTER I WITH DOT ABOVE (U+0130) | Lowercase | LATIN SMALL LETTER I (U+0069) |
LATIN SMALL LETTER DOTLESS I (U+0131) | Uppercase | LATIN CAPITAL LETTER I (U+0049) |
LATIN SMALL LETTER LONG S (U+017F) | Uppercase | LATIN CAPITAL LETTER S (U+0053) |
LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON (U+01C5) | Lowercase | LATIN SMALL LETTER DZ WITH CARON (U+01C6) |
COMBINING GREEK YPOGEGRAMMENI (U+0345) | Uppercase | GREEK CAPITAL LETTER IOTA (U+0399) |
Type 3
Differences in case mappings of two-letter mixed-case pairs in the ASCII character range. In most cultures, a two-letter mixed-case pair is equal to the equivalent two-letter uppercase or lowercase pair. This is not true for the following two-letter pairs in the following cultures, because in each case they are compared to a digraph:
- "lJ" and "nJ" in the hr-HR (Croatian (Croatia)) culture.
- "cH" in the cs-CZ (Czech (Czech Republic)) and sk-SK (Slovak (Slovakia)) cultures.
- "aA" in the da-DK (Danish (Denmark)) culture.
- "cS", "dZ", "dZS", "nY", "sZ", "tY", and "zS" in the hu-HU (Hungarian (Hungary)) culture.
- "cH" and "lL" in the es-ES_tradnl (Spanish (Spain, Traditional Sort)) culture.
- "cH", "gI", "kH", "nG" "nH", "pH", "qU', "tH", and "tR" in the vi-VN (Vietnamese (Vietnam)) culture.
However, it is unusual to encounter a situation in which a culture-sensitive comparison of these pairs creates problems, because these pairs are uncommon in fixed strings or identifiers.
The adjoining example illustrates some of the differences in casing rules between cultures when converting strings to uppercase.
Parsing and formatting
Formatting and parsing are inverse operations. Formatting rules determine how to convert a value, such as a date and time or a number, to its string representation, whereas parsing rules determine how to convert a string representation to a value such as a date and time. Both formatting and parsing rules are dependent on cultural conventions.
The adjoining example illustrates the ambiguity that can arise when interpreting a culture-specific date string. Without knowing the conventions of the culture that was used to produce a date string, it is not possible to know whether 03/01/2011, 3/1/2011, and 01/03/2011 represent January 3, 2011 or March 1, 2011.
Similarly, as the adjoining example shows, a single string can produce different dates depending on the culture whose conventions are used in the parsing operation.
String comparison and sorting
Conventions for comparing and sorting strings vary from culture to culture. For example, the sort order may be based on phonetics or on the visual representation of characters. In East Asian languages, characters are sorted by the stroke and radical of ideographs. Sorting also depends on the order languages and cultures use for the alphabet. For example, the Danish language has an "Æ" character that it sorts after "Z" in the alphabet. In addition, comparisons can be case-sensitive or case-insensitive, and in some cases casing rules also differ by culture. Ordinal comparison, on the other hand, uses the Unicode code points of individual characters in a string when comparing and sorting strings.
Sort rules determine the alphabetic order of Unicode characters and how two strings compare to each other. For example, the String.Compare(String, String, StringComparison)
method compares two strings based on the StringComparison
parameter. If the parameter value is StringComparison.CurrentCulture
, the method performs a linguistic comparison that uses the conventions of the current culture; if the parameter value is StringComparison.Ordinal
, the method performs an ordinal comparison. Consequently, as the following example shows, if the current culture is U.S. English, the first call to the String.Compare(String, String, StringComparison)
method (using culture-sensitive comparison) considers "a" less than "A", but the second call to the same method (using ordinal comparison) considers "a" greater than "A".
The .NET Framework supports word, string, and ordinal sort rules:
- A word sort performs a culture-sensitive comparison of strings in which certain nonalphanumeric Unicode characters might have special weights assigned to them. For example, the hyphen (-) might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. For a list of the Stringmethods that compare two strings using word sort rules, see the String operations by category section.
- A string sort also performs a culture-sensitive comparison. It is similar to a word sort, except that there are no special cases, and all nonalphanumeric symbols come before all alphanumeric Unicode characters. Two strings can be compared using string sort rules by calling the
CompareInfo.Compare
method overloads that have an options parameter that is supplied a value ofCompareOptions.StringSort
. Note that this is the only method that the .NET Framework provides to compare two strings using string sort rules. - An ordinal sort compares strings based on the numeric value of each
Char
object in the string. An ordinal comparison is automatically case-sensitive because the lowercase and uppercase versions of a character have different code points. However, if case is not important, you can specify an ordinal comparison that ignores case. This is equivalent to converting the string to uppercase by using the invariant culture and then performing an ordinal comparison on the result. For a list of the String methods that compare two strings using ordinal sort rules, see the String operations by category section.
A culture-sensitive comparison is any comparison that explicitly or implicitly uses a CultureInfo
object, including the invariant culture that is specified by the CultureInfo.InvariantCulture
property. The implicit culture is the current culture, which is specified by the Thread.CurrentCulture
and CultureInfo.CurrentCulture
properties. There is considerable variation in the sort order of alphabetic characters (that is, characters for which the Char.IsLetter
property returns true
) across cultures. You can specify a culture-sensitive comparison that uses the conventions of a specific culture by supplying a CultureInfo
object to a string comparison method such as Compare(String, String, CultureInfo, CompareOptions)
. You can specify a culture-sensitive comparison that uses the conventions of the current culture by supplying StringComparison.CurrentCulture
, StringComparison.CurrentCultureIgnoreCase
, or any member of the CompareOptions
enumeration other than CompareOptions.Ordinal
or CompareOptions.OrdinalIgnoreCase
to an appropriate overload of the Compare
method. A culture-sensitive comparison is generally appropriate for sorting whereas an ordinal comparison is not. An ordinal comparison is generally appropriate for determining whether two strings are equal (that is, for determining identity) whereas a culture-sensitive comparison is not.
The adjoining example illustrates the difference between culture-sensitive and ordinal comparison. The example evaluates three strings, "Apple", "Æble", and "AEble", using ordinal comparison and the conventions of the da-DK and en-US cultures (each of which is the default culture at the time the Compare
method is called). Because the Danish language treats the character "Æ" as an individual letter and sorts it after "Z" in the alphabet, the string "Æble" is greater than "Apple". However, "Æble" is not considered equivalent to "AEble", so "Æble" is also greater than "AEble". The en-US culture doesn't include the letter"Æ" but treats it as equivalent to "AE", which explains why "Æble" is less than "Apple" but equal to "AEble". Ordinal comparison, on the other hand, considers "Apple" to be less than "Æble", and "Æble" to be greater than "AEble".
Use the following general guidelines to choose an appropriate sorting or string comparison method:
- If you want the strings to be ordered based on the user's culture, you should order them based on the conventions of the current culture. If the user's culture changes, the order of sorted strings will also change accordingly. For example, a thesaurus application should always sort words based on the user's culture.
- If you want the strings to be ordered based on the conventions of a specific culture, you should order them by supplying a
CultureInfo
object that represents that culture to a comparison method. For example, in an application designed to teach students a particular language, you want strings to be ordered based on the conventions of one of the cultures that speaks that language. - If you want the order of strings to remain unchanged across cultures, you should order them based on the conventions of the invariant culture or use an ordinal comparison. For example, you would use an ordinal sort to organize the names of files, processes, mutexes, or named pipes.
- For a comparison that involves a security decision (such as whether a username is valid), you should always perform an ordinal test for equality by calling an overload of the
Equals
method.
Note
The culture-sensitive sorting and casing rules used in string comparison depend on the version of the .NET Framework. In the .NET Framework 4.5 running on the Windows 8 operating system, sorting, casing, normalization, and Unicode character information conforms to the Unicode 6.0 standard. On other operating systems, it conforms to theUnicode 5.0 standard.
For more information about word, string, and ordinal sort rules, see the System.Globalization.CompareOptions
topic. For additional recommendations on when to use each rule, see Best Practices for Using Strings in the .NET Framework.
Ordinarily, you do not call string comparison methods such as Compare
directly to determine the sort order of strings. Instead, comparison methods are called by sorting methods such as Array.Sort
or List
. The adjoining example performs four different sorting operations (word sort using the current culture, word sort using the invariant culture, ordinal sort, and string sort using the invariant culture) without explicitly calling a string comparison method, although they do specify the type of comparison to use. Note that each type of sort produces a unique ordering of strings in its array.
Tip
Internally, the.NET Framework uses sort keys to support culturally sensitive string comparison. Each character in a string is given several categories of sort weights, including alphabetic, case, and diacritic. A sort key, represented by the SortKey
class, provides a repository of these weights for a particular string. If your app performs a large number of searching or sorting operations on the same set of strings, you can improve its performance by generating and storing sort keys for all the strings that it uses. When a sort or comparison operation is required, you use the sort keys instead of the strings. For more information, see the SortKey
class.
If you don't specify a string comparison convention, sorting methods such as Array.Sort(Array)
perform a culture-sensitive, case-sensitive sort on strings. The following example illustrates how changing the current culture affects the order of sorted strings in an array. It creates an array of three strings. First, it sets the System.Threading.Thread.CurrentThread.CurrentCulture
property to en-US and calls the Array.Sort(Array)
method. The resulting sort order is based on sorting conventions for the English (United States) culture. Next, the example sets the System.Threading.Thread.CurrentThread.CurrentCulture
property to da-DK and calls the Array.Sort method again. Notice how the resulting sort order differs from the en-US results because it uses the sorting conventions for Danish (Denmark).
Warning
If your primary purpose in comparing strings is to determine whether they are equal, you should call the String.Equals
method. Typically, you should use Equals
to perform an ordinal comparison. The String.Compare
method is intended primarily to sort strings.
String search methods, such as String.StartsWith
and String.IndexOf
, also can perform culture-sensitive or ordinal string comparisons. The following example illustrates the differences between ordinal and culture-sensitive comparisons using the IndexOf
method. A culture-sensitive search in which the current culture is English (United States) considers the substring "oe" to match the ligature "œ". Because a soft hyphen (U+00AD) is a zero-width character, the search treats the soft hyphen as equivalent to Empty
and finds a match at the beginning of the string. An ordinal search, on the other hand, does not find a match in either case.
Searching Strings
String search methods, such as String.StartsWith
and String.IndexOf
, also can perform culture-sensitive or ordinal string comparisons to determine whether a character or substring is found in a specified string.
The search methods in the String class that search for an individual character, such as the IndexOf
method, or one of a set of characters, such as the IndexOfAny
method, all perform an ordinal search. To perform a culture-sensitive search for a character, you must call a CompareInfo
method such as CompareInfo.IndexOf(String, Char) or CompareInfo.LastIndexOf(String, Char)
. Note that the results of searching for a character using ordinal and culture-sensitive comparison can be very different. For example, a search for a precomposed Unicode character such as the ligature "Æ" (U+00C6) might match any occurrence of its components in the correct sequence, such as "AE" (U+041U+0045), depending on the culture. The following example illustrates the difference between the String.IndexOf(Char)
and CompareInfo.IndexOf(String, Char)
methods when searching for an individual character. The ligature "æ" (U+00E6) is found in the string "aerial" when using the conventions of the en-US culture, but not when using the conventions of the da-DK culture or when performing an ordinal comparison.
On the other hand, String class methods that search for a string rather than a character perform a culture-sensitive search if search options are not explicitly specified by a parameter of type StringComparison
. The sole exception is Contains
, which performs an ordinal search.
Testing for equality
Use the String.Compare
method to determine the relationship of two strings in the sort order. Typically, this is a culture-sensitive operation. In contrast, call the String.Equals
method to test for equality. Because the test for equality usually compares user input with some known string, such as a valid user name, a password, or a file system path, it is typically an ordinal operation.
Warning
It is possible to test for equality by calling the String.Compare
method and determining whether the return value is zero. However, this practice is not recommended. To determine whether two strings are equal, you should call one of the overloads of the String.Equals
method. The preferred overload to call is either the instance Equals(String, StringComparison)
method or the static Equals(String, String, StringComparison)
method, because both methods include a System.StringComparison
parameter that explicitly specifies the type of comparison.
The adjoining example illustrates the danger of performing a culture-sensitive comparison for equality when an ordinal one should be used instead. In this case, the intent of the code is to prohibit file system access from URLs that begin with "FILE://" or "file://" by performing a case-insensitive comparison of the beginning of a URL with the string "FILE://". However, if a culture-sensitive comparison is performed using the Turkish (Turkey) culture on a URL that begins with "file://", the comparison for equality fails, because the Turkish uppercase equivalent of the lowercase "i" is "İ" instead of "I". As a result, file system access is inadvertently permitted. On the other hand, if an ordinal comparison is performed, the comparison for equality succeeds, and file system access is denied.
Normalization
Some Unicode characters have multiple representations. For example, any of the following code points can represent the letter "ắ":
- U+1EAF
- U+0103 U+0301
- U+0061 U+0306 U+0301
Multiple representations for a single character complicate searching, sorting, matching, and other string operations.
The Unicode standard defines a process called normalization that returns one binary representation of a Unicode character for any of its equivalent binary representations. Normalization can use several algorithms, called normalization forms, that follow different rules. The .NET Framework supports Unicode normalization forms C, D, KC, and KD. When strings have been normalized to the same normalization form, they can be compared by using ordinal comparison.
An ordinal comparison is a binary comparison of the Unicode scalar value of corresponding Char
objects in each string. The String class includes a number of methods that can perform an ordinal comparison, including the following:
- Any overload of the
Compare
,Equals
,StartsWith
,EndsWith
,IndexOf
,andLastIndexOf
methods that includes aStringComparison
parameter. The method performs an ordinal comparison if you supply a value ofStringComparison.Ordinal
orOrdinalIgnoreCase
for this parameter. - The overloads of the
CompareOrdinal
method. - Methods that use ordinal comparison by default, such as
Contains
,Replace
, andSplit
. - Methods that search for a
Char
value or for the elements in aChar
array in a string instance. Such methods includeIndexOf(Char)
andSplit(Char[])
.
You can determine whether a string is normalized to normalization form C by calling the String.IsNormalized()
method, or you can call the String.IsNormalized(NormalizationForm)
method to determine whether a string is normalized to a specified normalization form. You can also call the String.Normalize()
method to convert a string to normalization form C, or you can call the String.Normalize(NormalizationForm)
method to convert a string to a specified normalization form. For step-by-step information about normalizing and comparing strings, see the Normalize()
and Normalize(NormalizationForm)
methods.
The adjoining simple example illustrates string normalization. It defines the letter "ố" in three different ways in three different strings, and uses an ordinal comparison for equality to determine that each string differs from the other two strings. It then converts each string to the supported normalization forms, and again performs an ordinal comparison of each string in a specified normalization form. In each case, the second test for equality shows that the strings are equal.
For more information about normalization and normalization forms, see System.Text.NormalizationForm
, as well as Unicode Standard Annex #15: Unicode Normalization Forms and the Normalization FAQ on the unicode.org website.
Operations By Task
The String class provides members for comparing strings, testing strings for equality, finding characters or substrings in a string, modifying a string, extracting substrings from a string, combining strings, formatting values, copying a string, and normalizing a string.
Comparing Strings
Testing strings for equality
Finding characters in a string
Tip
If you want to search a string for a particular pattern rather than a specific substring, you should use regular expressions. For more information, see .NET Framework Regular Expressions.
Modifying a string
Important
All string modification methods return a new String object. They do not modify the value of the current instance.
-
Insert()
-
PadLeft()
-
PadRight()
-
Remove()
-
Replace()
-
ToLower()
-
ToLowerInvariant()
-
ToUpper()
-
ToUpperInvariant()
-
Trim()
-
TrimEnd()
-
TrimStart()
Extracting substrings from a string
Combining strings
Formatting values
Copying a string
Normalizing a string
Properties
-
Chars[Int32]
Summary
Gets the
Char
object at a specified position in the currentString
object.Declaration
Parameters
-
A position in the current string.
Returns
-
The object at position
index
.
Exceptions
-
index
is greater than or equal to the length of this object or less than zero.
Remarks
The index parameter is zero-based.
This property returns the
Char
object at the position specified by the index parameter. However, a Unicode character might be represented by more than oneChar
. Use theSystem.Globalization.StringInfo
class to work with Unicode characters instead ofChar
objects. For more information, see Char Objects and Unicode Characters.In C#, the Chars property is an indexer. In Visual Basic, it is the default property of the
String
class. EachChar
object in the string can be accessed by using code such as the following.Examples
-
-
Length
Summary
Gets the number of characters in the current
String
object.Declaration
Returns
-
The number of characters in the current string.
Remarks
The Length property returns the number of
Char
objects in this instance, not the number of Unicode characters. The reason is that a Unicode character might be represented by more than oneChar
. Use theSystem.Globalization.StringInfo
class to work with each Unicode character instead of each Char.In some languages, such as C and C++, a null character indicates the end of a string. In the .NET Framework, a null character can be embedded in a string. When a string includes one or more null characters, they are included in the length of the total string. For example, in the following string, the substrings "abc" and "def" are separated by a null character. The Length property returns 7, which indicates that it includes the six alphabetic characters as well as the null character.
Examples
-
Methods
Methods not covered in the Operations By Task section are detailed here.
-
GetEnumerator()
-
GetHashCode()
-
GetType()
-
GetTypeCode()
-
Intern()
-
IsInterned()
-
IsNullOrEmpty()
-
IsNullOrWhiteSpace()
-
Substring()
-
ToCharArray()
-
ToString()
Fields
-
String.Empty
Summary
Represents the empty string. This field is read-only.
Remarks
The value of this field is the zero-length string, "".
In application code, this field is most commonly used in assignments to initialize a string variable to an empty string. To test whether the value of a string is either
null
orString.Empty
, use theIsNullOrEmpty
method.Declaration
Returns