Skip to content

Understanding GSM vs Non-GSM Characters

Published: at 01:29 AM

Understanding GSM vs Non-GSM Characters

The GSM character set is a standardized collection of characters used in SMS messaging. Let’s explore how to identify and validate these characters in SQL Server.

GSM Character Set Overview

The GSM (GSM 3.38) character set includes:

Implementation Approaches

An Optimized SQL Server Implementation

CREATE FUNCTION dbo.IsGSM(@inputString NVARCHAR(MAX))
RETURNS BIT
AS
BEGIN
    DECLARE @gsmCharacterSet NVARCHAR(MAX) =
    ' @£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ!""#¤%&''()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà'

    DECLARE @isNonGSM BIT = 0
    DECLARE @i INT = 1
    DECLARE @char NVARCHAR(1)

    WHILE @i <= LEN(@inputString)
    BEGIN
        SET @char = SUBSTRING(@inputString, @i, 1)
        IF CHARINDEX(@char, @gsmCharacterSet) = 0
        BEGIN
            SET @isNonGSM = 1
            BREAK
        END
        SET @i = @i + 1
    END

    RETURN 1^@isNonGSM
END

Regular Expression Approach

SQL Server isn’t known to fully support Regular Expressions or this would have been the preferred implementation. Since it’s much in demand, support for regular expressions should get stronger. Please do check if the latest version of SQL Server has this feature.

References

https://en.wikipedia.org/wiki/GSM_03.38


Previous Post
Running SQL Server in Docker
Next Post
Managing Orphaned Upload Files