SQL in a simplified way

Mateo mojica
11 min readNov 11, 2022

--

Photo by Stephen Dawson on Unsplash

As a developer, when I started learning to program, SQL was one of my least favorite things to learn because it wasn’t like a traditional programming language and it was really hard for me to understand how to approach it. But as I started closing the gap and practicing more and more I believe that I’m getting the hang of it, the secret is to focus in the right places so you can see it as a regular language. I hope this article helps you if you are feeling lost and overwhelmed with SQL like I once was. All the examples in this article are done with the Northwind data set for MSSQL server.

Let’s start with some definitions. SQL stands for Structured Query Language and is used to communicate with a relational database. SQL statements are used to perform tasks such as updating data on a database, or retrieving data from a database with the standard SQL commands such as “Select”, “Insert”, “Update”, “Delete”, “Create”, and “Drop”.

Relational databases are managed by engines known as RDBMS (relational database management systems) that use standard SQL to communicate with the database but also implement their own statements. These engines provide a layer of control over the database and provide extended functionality, so you can access data faster and in a more efficient way.

Some of the more common RDBMS are Oracle, Microsoft SQL Server, Access, Postgres, and MySQL, but there are many more and which one is used, as always, depends on the needs of the project, like budget, compatibility, legacy code, robustness, etc.

Photo by Kelvin Ang on Unsplash

SQL provides a structured way to store data, think of it as a giant spreadsheet in Excel, in which you have columns with the data attributes that you want to keep track of and rows with data points that have some or all those attributes. But SQL not only provides a way to store the data but also a way to create relationships between different sets of data. Depending on how you handle the data SQL will let you find data that 2 sets have in common or aggregate data creating a huge table to pick rows from. You can also trigger events when the RDBMS gets a request, for example, to apply preprocessing to the data being inserted.

There is another type of database that is very popular right now which is called NoSQL, meaning they don’t use SQL as the query language for the RDBMS, instead they use key/value pairs to store the data. Depending on the application this kind of database can be very useful and very fast. Let’s see the main differences between these two types of databases:

  • SQL databases use structured query language and have a predefined schema, for applications that know exactly what information they need to store. NoSQL databases have dynamic schemas for unstructured data, used in applications where you are just storing data about something but don’t know exactly how much information you have to store in each key.
  • SQL databases are relational, meaning that the models can be related to each other and the columns of a model contain all the information for that model and its relations, NoSQL databases are non-relational, which means that the data in one key might not be related with the information in other keys.
  • SQL databases are table-based, while NoSQL databases are document, key-value, graph, or wide-column stores.
  • SQL databases are better for multi-row transactions, while NoSQL is better for unstructured data like documents or JSON

Now that we have all the definitions out of the way let’s start digging a little bit more. In SQL there are 3 basic concepts that we have to grasp. The first one is tables, these are the “models” of your applications, in other words, a group of data that is directly related to each other, for example, a Car table would have all the information directly related to a car. The second concept is columns, those are attributes that the data has, in our car example the columns of the car table could be color, size, weight, maker, model, type, etc. And the last concept is rows. The rows are data points of the same model, so for cars, one row can be a blue chevy spark, and another can be a red Audi tt.

When you need to create relationships or connect one model with others you create a column that stores a unique value that is going to correlate with another column in another table. This is called a foreign key. Remember, the secret to these keys is that the value has to be unique so the correlation is one-to-one, that is why generally the foreign key is an ID, but it can be something else.

Photo by israel palacio on Unsplash

Now we are going to get into the meat and bones of SQL, so remember everything that we have talked about so far because you are going to need it. Let’s begin with the most simplified way we can look at a SQL query, and divide it into 2 parts:

  • The Select part: This section sets the columns that are going to be part of the result data.
  • The From part: This tells us where we are taking the data from.

With this approach, you look at the From section first so you know where the data is being pulled from, and then look at the Select section so you can see which columns are being selected. Everything else is just ways to add more data from other tables or to consolidate data into new columns.

In addition to this, a query has to answer a question, like “how many employees have the same first name?”. To keep the scope of the query in check, ask this question and have it always present. Many times the query can be simpler than you thought by just looking at the question that you are trying to answer.

The fundamental clauses used in SQL queries are SELECT and FROM, we are going to take a look at both so we can start our journey with SQL:

SELECT is an instruction that tells SQL which columns are going to be included in the result from the query, so if you are querying one table that has 20 columns you can pick and choose which of those columns you need in your results, that is very useful when combining tables or queries because they have to have the same number of result columns and you can tell it from which table to pick the result. If you need all the columns in a table and you don’t want to write each of them you can use the wildcard *, this is interpreted by SQL as every column on the table.

FROM tells SQL where to look for the data, this can be one table, several tables joined together, or another query. When writing the from clause is better to name the table with a variable name, generally the first letter of the words in the name of the table, so if for example, we have a table called Sample Table 1 we can write the clause FROM “Sample Table 1” st1, this way you can reference any column for that table as st1.column_name, this gives specificity and helps with readability.

This is the most basic query you can make in SQL. You have the select part where you tell the engine which columns you want in your result and you have the from part that tells it where to get the information.

So far we have been able to get all the information from a table, the only thing we can control is which columns we get. Now if we want to get some specific information, aka filtering the data, as is the usual case, we have to use the clause WHERE, with it, we can specify conditions, values, ranges, and sets of values to filter the data by.

Now we are being selective with our result data. Instead of getting all the information from the employees table, now we are just getting the employees that have USA as their country attribute. Let’s refine it even further.

Now, if you want to use more than one condition to filter your data you have to use logical connectors to chain those conditions. Here is where SQL borrows from traditional languages by using AND and OR clauses to connect conditions in the WHERE clause.

Now we are looking into two conditions instead of only one, getting a different set of data than the original full table data.

If you want to filter by a range of values, the clause BETWEEN has to be used, this looks for values between the two values specified. Now, if you want to specify the values that you are looking for you have to use the clause IN and then an array with the values that you want inside parentheses and separated by commas.

These examples show the use of between and in.

In the case, you are looking for a string but don’t know exactly the value that you want, we can use wildcard characters to help us look for anything before, after, or between a value. The most used wildcard character is %, this means that there is an unlimited amount of characters represented by that percentage sign. To use wildcard matchers you have to use the clause LIKE and then a string with the expression that you want to look for.

The first example looks for the customer IDs that start with VIN and don’t care how long or what is after that. The second example just looks for the customer IDs that have somewhere in their value the string IN no matter what comes before or after it.

Up to this point, we have seen how to get your data and how to filter the results, but what if you don’t want the full set of results. Let’s assume that you only want the first 5 results, that is where the LIMIT clause comes in, with it you tell SQL how many results to get. The limit clause is used most of the time in combination with the ORDER BY that allows you to order the results by one specific column or several columns in ascending or descending order. So if you are looking to get the max value of a column you can do the ORDER BY column_name and then LIMIT 1. Is with noting that if you want to order by several columns the first one specified is the one that gets ordered first and then goes in order of specificity.

GROUP BY will help you group your results fields by specific fields on the table and use calculation or aggregate functions to have additional information in the result. When you group data the result will not have duplicate rows.

If you don’t like the name of the column or for your result, the name just doesn’t make any sense, you can create an alias for that column name using the clause AS and a string specifying the new name for the result column.

Now let’s get into the interesting stuff, combining data. Joins let you combine data from different tables. In order to join tables they have to have a column or value in common in both tables to use it to join the tables, this is where the foreign key that we talked about earlier comes in, this key will make joining tables so much easier and straightforward. When using joins it is mandatory to use identifiers for the tables so you can be very specific about which table you are pulling data from. When joining two tables you have to be aware of which columns you are going to use to perform the join because by using the ON keyword you are going to provide the join condition.

We have different types of joins to take into account:

INNER: brings all the records that have corresponding data, meaning that the foreign key value exists in both tables.

OUTER: Merges all the records from both tables, no matter if it has corresponding or valid values in the foreign key

LEFT: This specifies that you want all the records from the table on the left of the clause, in other words, the master table, so it is going to include all of the records from the master table even if there are no matching records in the other table.

RIGHT: This specifies that you want all the records from the table on the right of the clause. In this case, the master table is going to be the right one and all the records from that table are going to get included first. This is used to find if there is any missing information that you are not displaying with the other queries.

UNION is another way to combine results from two queries. The UNION clause alone will give you unique values, meaning that if there is duplicated information in both queries, it will only show one occurrence of that information. UNION ALL will have all the values even if there are duplicates. To use union you have to select the same number of columns in both queries and ideally, those columns will be related or be the same in both queries, because those columns are the ones that are going to be combined in the final result.

Many functions summarize data in SQL, but the most used ones are AVG, SUM, COUNT, MIN, and MAX. These functions give one number as a result so you have to be very careful when using them in queries, also since the result is not a column on the table you have to create an alias for it to have a name in the result table.

Subqueries are a good way of selecting data for another query, meaning you can select data from tables but you can also select data from other queries.

Finally, there are stored procedures. These are functions that can do many things and are stored in the RDBMS itself, so you can do preprocessing or data cleaning or simply calculate a value using SQL just by calling a function. Each RDBMS leverages this functionality differently so if you want to learn how to use them go to the documentation and see how to do it for your specific use case.

This is all I had about queries, thank you for reading all the way through and I hope that it helped clarify many concepts of building and reading SQL queries. If you liked this article and found it useful give it a clap and please check out my other articles on some other topics.

References

--

--

Mateo mojica
Mateo mojica

Written by Mateo mojica

Electronic engineer and software developer

No responses yet