That pesky Microsoft! They have to be different and mess us developers around don’t they. Have you ever noticed that Microsoft Word’s symbols look a bit different or act a little odd? Well it’s because they are not the standard char characters. This can be a pain for Regex and other things. So how do you get them…
The reason they are so difficult is they use Windows-1252 character encoding set which are not represented in ASCII or ISO-8859-1. This is what just about everyone doesn’t do of course. These characters include:
- … ellipsis
- ‘smart’ “quotes”
- en – dash and em — dash
- dagger † and double dagger ‡
There are few more of course, but these are the most common few that come up. You can find more of Microsoft Word Windows-1252 character encoding here.
Symbol | Encoding |
single quotes and apostrophe | \u2018\u2019\u201A |
double quotes | \u201C\u201D\u201E |
ellipsis | \u2026 |
dashes | \u2013\u2014 |
circumflex | \u02C6 |
open angle bracket | \u2039 |
close angle bracket | \u203A |
spaces | \u02DC\u00A0 |
Here is a pre-built method for JavaScript and C# to combat these.
JavaScript Clean String
var wordClean = function(text) {
var cleanStr = text;
// smart single quotes and apostrophe
cleanStr = cleanStr.replace(/[\u2018\u2019\u201A]/g, “\'”);
// smart double quotes
cleanStr = cleanStr.replace(/[\u201C\u201D\u201E]/g, “\””);
// ellipsis
cleanStr = cleanStr.replace(/\u2026/g, “…”);
// dashes
cleanStr = cleanStr.replace(/[\u2013\u2014]/g, “-“);
// circumflex
cleanStr = cleanStr.replace(/\u02C6/g, “^”);
// open angle bracket
cleanStr = cleanStr.replace(/\u2039/g, “<“);
// close angle bracket
cleanStr = cleanStr.replace(/\u203A/g, “>”);
// spaces
cleanStr = cleanStr.replace(/[\u02DC\u00A0]/g, ” “);
return cleanStr ;
}
C# Clean String
public string wordClean (string text)
{
var cleanStr = text;
// smart single quotes and apostrophe
cleanStr = Regex.Replace(s, “[\u2018\u2019\u201A]”, “‘”);
// smart double quotes
cleanStr = Regex.Replace(s, “[\u201C\u201D\u201E]”, “\””);
// ellipsis
cleanStr = Regex.Replace(s, “\u2026”, “…”);
// dashes
cleanStr = Regex.Replace(s, “[\u2013\u2014]”, “-“);
// circumflex
cleanStr = Regex.Replace(s, “\u02C6”, “^”);
// open angle bracket
cleanStr = Regex.Replace(s, “\u2039”, “<“);
// close angle bracket
cleanStr = Regex.Replace(s, “\u203A”, “>”);
// spaces
cleanStr = Regex.Replace(s, “[\u02DC\u00A0]”, ” “);
return cleanStr ;
}
If you are doing some validation using Regex, here is also how you can check these characters.
JavaScript Regex
function containsWordChar(text) {
var contains;
switch (text) {
case (text.match(/^[\u2018\u2019\u201A]$/)):
contains += “single quotes and apostrophe, “;
case (text.match(/^[\u201C\u201D\u201E]$/)):
contains += “double quotes, “;
case (text.match(/^[\u2026]$/)):
contains += “ellipsis, “;
case (text.match(/^[\u2013\u2014]$/)):
contains += “dashes, “;
case (text.match(/^[\u02C6]$/)):
contains += “circumflex, “;
case (text.match(/^[\u2039]$/)):
contains += “open angle bracket, “;
case (text.match(/^[\u203A]$/)):
contains += “close angle bracket, “;
case (text.match(/^[\u02DC\u00A0]$/)):
contains += “spaces, “;
default:
contains += “double quotes”;
}
return contains;
}
C# Regex (MVC)
[RegularExpression("^[\u2018\u2019\u201A\u201C\u201D\u201E\u2026\u2013\u2014\u02C6\u2039\u203A\u02DC\u00A0]+$", ErrorMessage = "Your content contain some Microsoft Word Windows-1252 character encoding.")]
C# Regex
Public string containsWordChar(text) {
String contains;
switch (text) {
case (text.IsMatch(@”^[\u2018\u2019\u201A]$”)):
contains += “single quotes and apostrophe, “;
case (text.IsMatch(@”^[\u201C\u201D\u201E]$”)):
contains += “double quotes, “;
case (text.IsMatch(@”^[\u2026]$”)):
contains += “ellipsis, “;
case (text.IsMatch(@”^[\u2013\u2014]$”)):
contains += “dashes, “;
case (text.IsMatch(@”^[\u02C6]$”)):
contains += “circumflex, “;
case (text.IsMatch(@”^[\u2039]$”)):
contains += “open angle bracket, “;
case (text.IsMatch(@”^[\u203A]$”)):
contains += “close angle bracket, “;
case (text.IsMatch(@”^[\u02DC\u00A0]$”)):
contains += “spaces, “;
default:
contains += “double quotes”;
}
return contains;
}
Wow, this piece of writing is pleasant, my sister is analyzing these kinds of things, thus I am going to inform her.
LikeLike
Woah! I’m really digging the template/theme of this website.
It’s simple, yet effective. A lot of times it’s tough to get that “perfect balance” between user friendliness and visual appearance.
I must say you have done a superb job with this. Also, the blog loads extremely
fast for me on Safari. Outstanding Blog!
LikeLike
I am sure this post has touched all the internet users,
its really really pleasant piece of writing
on building up new website.
LikeLike