Understanding GSM vs Non-GSM Characters
The GSM character set is a standardized collection of characters used in SMS messaging. Let’s explore how to identify and validate these characters in SQL Server.
GSM Character Set Overview
The GSM (GSM 3.38) character set includes:
- Basic Latin alphabet (A-Z, a-z)
- Numbers (0-9)
- Special characters (@£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ)
- Punctuation marks (!”#¤%&’()*+,-./:;<=>?¡)
- Extended characters (ÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà)
Implementation Approaches
An Optimized SQL Server Implementation
CREATE FUNCTION dbo.IsGSM(@inputString NVARCHAR(MAX))
RETURNS BIT
AS
BEGIN
DECLARE @gsmCharacterSet NVARCHAR(MAX) =
' @£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ!""#¤%&''()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà'
DECLARE @isNonGSM BIT = 0
DECLARE @i INT = 1
DECLARE @char NVARCHAR(1)
WHILE @i <= LEN(@inputString)
BEGIN
SET @char = SUBSTRING(@inputString, @i, 1)
IF CHARINDEX(@char, @gsmCharacterSet) = 0
BEGIN
SET @isNonGSM = 1
BREAK
END
SET @i = @i + 1
END
RETURN 1^@isNonGSM
END
- Efficient character-by-character validation
- Clear implementation logic
- Easy to maintain and modify
- Handles NULL inputs gracefully
- May be slower for very long strings
- Requires careful character set maintenance
Regular Expression Approach
SQL Server isn’t known to fully support Regular Expressions or this would have been the preferred implementation. Since it’s much in demand, support for regular expressions should get stronger. Please do check if the latest version of SQL Server has this feature.