I know I should use them more, but they just do my head in.
Every now and then though I come up against a problem that really should be done with a regular expression and I make the effort to get my head around the weird syntax and come up with something to solve the problem.
I did that the other day at work where i wanted to clean input parameters from a URL when a form is submitted.
Now, I've done this many times before an a number of different ways, but I think my latest attempt is the best I've come up with so far.
Here is the expression I came up with (and a couple of variations) -
$myVariable = eregi_replace("[^a-z]", "", $HTTP_GET_VARS['form-variable']);
What this does is strip out anything that is not a letter from form-variable, which means no nasties like SQL injection and cross-site scripting attacks will be able to play.
Using eregi_replace means it is not case sensitive, so you don't have to worry about capitalization.
The [^a-z] part specifies what to look for. The '^' at the start of the class (the [] bit) means look for everything that is not in the following list. So you just have to put in the characters you want to let through in the list.
The "" part specifies what to do with the characters the expression finds that are not in the list. It this case I've set it to replace them with nothing - effectively deleting them.
And the last bit id where you get the string to be evaluated. In this case the GET variable.
This has worked really well for me. It strips out all the bad characters and leaves the good ones.
Here are a couple more variations of the expression that are really useful.
$page = eregi_replace("[^-a-z.0-9\\/_:?=]", "", $_SERVER['REQUEST_URI']);
I use this one to check for correctly formed URLs. As the URLs I'm checking are all part of a CMS I'm building I can determine exactly what should be in the URL.i.e. hyphens, letters, numbers, slashes, underscores, colons, question marks and equals signs.
Anything else will be stripped out.
$myVariable = eregi_replace("[^0-9]", "", $HTTP_GET_VARS['form-variable']);
This is useful for checking that the variable is numeric. Things like ID values etc.I still have trouble with getting my head around how the expressions work, but at least I'm making small steps and figuring out how to get them working for me.
If anyone has any good expressions feel free to add them in the comments section. All help is good help. :-)
No comments:
Post a Comment