4/11/2021 0 Comments Ascii Control Characters
He is the member of the Johannesburg SQL User Group and also hold a Masters Degree in MCom IT Management from the University of Johannesburg.This could involve looking up foreign keys, converting values from one data type into another, or simply conducting data clean-ups by removing trailing and leading spaces.One aspect of transforming source data that could get complicated relates to the removal of ASCII special characters such as new line characters and the horizontal tab.
In this article, we take a look at some of the issues you are likely to encounter when cleaning up source data that contains ASCII special characters and we also look at the user-defined function that could be applied to successfully remove such characters. Ascii Control Characters Code Associated WithFor instance, the ASCII numeric code associated with the backslash ( ) character is 92. Ascii Control Characters Software Vendors AbideMany of the software vendors abide by ASCII and thus represents character codes according to the ASCII standard. Likewise, SQL Server, which uses ANSI an improved version of ASCII, ships with a built-in CHAR function that can be used to convert an ASCII numerical code back to its original character code (or symbol). Script 1 shows us an example of how an ASCII numeric code 92 can be converted back into a backslash character as shown in Figure 1. Table 1 shows a top 5 sample of ASCII Printable Characters. Say for instance that source data contains an email address for John Doe that has several invalid special characters as shown in Script 2. Such characters typically are not easy to detect (to the human eye) and thus not easily replaceable using the REPLACE T-SQL function. Table 2 shows a sample list of the ASCII Control Characters. Furthermore, if you go back to Script 4, you will recall that for the 3 rd email address, I included the start of header character at the end of the email address, but looking at the data in Figure 3, the start of header character is not easily visible at the end of that 3 rd email address. In fact, it looks like the email address 3 and 4 have the same amount of characters which is not true. Only using advanced text editors such as Notepad are we then able to visualize the special characters in the data, as shown in Figure 4. For instance, say we have successfully imported data from the output.txt text file into a SQL Server database table. If we were to run the REPLACE T-SQL function against the data as we did in Script 3, we can already see in Figure 5 that the REPLACE function was unsuccessful as the length of data in the original column is exactly similar to the length calculated after having applied both REPLACE and TRIM functions. Thus, instead of providing an exclamation mark as the string to replace, we can hardcode the ASCII numerical code for exclamation mark which is 33 and convert that numeric code back to character code using the CHAR function. Thus, we have successfully managed to remove invincible special characters. This means if the email address data contained special characters with ASCII numerical value 8 then we wouldnt have removed them as we had hardcoded our script to specifically look for CHAR(1) and CHAR(9). Therefore, there is a need for a mechanism that allows us to automatically detect ASCII Control Characters contained in a given string and then automatically replace them. Script 8 provides such a mechanism in a form of a While loop within a user-defined function that iteratively searches through a given string to identify and replace ASCII Control Characters. However, when it comes to removing special characters, removal of ASCII Control Characters can be tricky and frustrating. ![]() Ndlovu Sifiso is a Johannesburg based certified professional within a wide range of Microsoft Technology Competencies such SQL Server and Visual Studio Application Lifecycle Management.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |