Reference
Understanding non-GSM characters

Angel Cheung
Angel Cheung
  • Updated

Working with SMS

gsm.png

What are GSM characters?

GSM characters refer to the standard character set used for encoding SMS messages. This set, known as GSM 7-bit alphabet, includes the following:

  • Letters of the English alphabet (both uppercase and lowercase).
  • Numerical digits (0-9).
  • Common punctuation marks and symbols, such as !,'@%$=.?/().

These characters are designed to be compact, with each character encoded using only 7 bits. The efficiency of this encoding allows a standard SMS to contain up to 160 characters.

  Your appointment is on 20th June at 3 PM. Please confirm by replying YES or NO.

This message is within the 160 character limit and uses only GSM characters.

What are non-GSM characters?

Non-GSM characters include any characters that do not conform to the GSM 7-bit standard. These require a different encoding scheme, typically UCS-2 (16-bit), which expands the character set to include:

  • All types of emojis.
  • Characters from non-Latin scripts such as Arabic, Chinese, or Cyrillic.
  • Special symbols that are not covered by the GSM character set, like the Euro symbol or curly brackets {}.

Certain symbols, like curly brackets {} or the Euro symbol €, consume more than one character space. Using these sparingly can prevent unexpected message segmentation.

Using non-GSM characters changes the encoding of an SMS and reduces the maximum length from 160 to 70 characters due to the increased bit requirement per character.

Hey 😊! Don't forget your appointment on 📅 June 20th at 🕒 3 PM. Reply YES to confirm!

This message includes emojis, which are non-GSM characters. Due to this, the SMS would need to be encoded using UCS-2, significantly reducing the available character limit per segment.

Character limits and concatenation

When non-GSM characters are used, not only is the maximum SMS length reduced from 160 to 70, but also the segmentation behaviour changes:

  • Single SMS with non-GSM characters can contain up to 70 characters.
  • For concatenated SMS segments, each can hold only 64 characters when more than one segment is necessary.

This is because additional data (segment headers) are needed to instruct the receiving device on how to reassemble the parts into a coherent message.

How do you know if your message body contains non-GSM characters?

If a non-GSM character is detected within a message body that you have written, the non-GSM character highlighter will highlight the non-GSM characters within.

gsm.png