There are some really great resources out there to find examples of regular expressions and to learn how they work.
Input validation is the practice of limiting the data that is processed by your application to the subset that you know you can handle.
This means going beyond simple data types and diving deeply into understanding the ideal data type, range, format and length for each piece of data.
Input validation is your first line of defense when creating a secure application, but it's often done insufficiently, in a place that is easy to bypass, or simply not done at all.
Since this is a common issue I see in our assessments and something that has such a great impact on security I'd like to spend a bit of time outlining input validation best practices and give you some concrete examples of how to do it well.
I definitely believe that regular expressions should be taught at the very beginning of any CS course without focusing too much on mathematics automatons.
It would save a lot of space on Stack Overflow and it would dramatically save you a lot of time.
There we go, that matches only the usernames that we want.
What if later there is a business requirement to allow numbers the dash and dot characters to usernames? If we continue to take this approach we can clearly see each inclusive decision and easily see which characters will make it through, and which will not.
This can be very challenging; we need to understand every context, every attack and every encoding to be successful.
In addition to context we must be able to anticipate all future attacks and bad values . If we whitelist a set of characters that we know we can handle the task of validation is much easier.
One example of this might be a phone number, which could be stored as a string in memory and a varchar in the database, however there is much more information about the context of that phone number that we can use to ensure we limit our attack surface by verifying the validity of that input.