Advertisement

Understanding the ETL vs. ELT Alphabet Soup and When to Use Each

By on

Click to learn more about author Craig Stewart.

We in the technology industry are great at a lot of things, but there’s one thing in particular that we really excel at – creating an “alphabet soup” of different terms that need to be looked up on the internet. Lately, there’s been a great deal of discussion in industry circles about the terms ETL (extract, transform, load) and ELT (extract, load, transform). These terms, the debate, and oftentimes the misunderstandings about which method is better or which should be used when is not new. However, with the renewed emphasis on collecting and using your company’s data in the cloud to streamline business processes and make better decisions, it makes sense that the discussion has been renewed. 

The ongoing pandemic has refocused companies on improving their own operations and empowering their employees to do more, and that all comes back to giving your employees – your team – the best chance to succeed. The smart use of data throughout all aspects of the business, and by all teams across your organization, has proven to be the best antidote to the disruption caused by the pandemic.

What Comes First – the T or the L?

The terms ETL and ELT are very closely related. At the end of the day, there’s no right or wrong answer as to which method your organization should be using. It really comes down to what you’re trying to accomplish and what you have to work with in order to get yourselves there. There are benefits and detriments, advantages and disadvantages, to each method. To understand which method is a better fit, it’s important to understand what it means when one letter comes before the other. Let’s look at the common definitions of each:

  • ETL (Extract, Transform, and Load): With this approach, organizations pull data from one or more sources, but before loading it into their data warehouse or data lake, the next step is to cleanse the data for use. This entails reviewing the data, putting it into the correct categorizations and formats, and ensuring it will line up with your existing target database. The data is then loaded into your target system and ready to be used by your team.
  • ELT (Extract, Load, and Transform): This method takes data from one or multiple remote data sources and loads it into a staging area in the data warehouse without looking at or changing any of the data beforehand. With this approach, once the data is loaded, organizations can then cleanse and transform the data into the specific target formats and categories after the load, to make it usable by specific programs and team members. 

As mentioned, both are similar, and eventually bring your organization to the same end goal – the use of data to improve the way your organization works. That said, the major difference between the two – when to cleanse and transform your incoming data – needs to be looked at a bit more in detail, as your organization’s needs, capabilities, and resources to handle each approach, combined with the type of data you have to work with, will help to dictate which is better for you.

When to Transform Your Data

You should start by looking at the types of data you’ll be working with. In many cases, reviewing the following variables will help make the decision for you. 

  • Data format: If your data is unstructured, it does not neatly fit into a relational structure, which most of the analytics tools will work from. In that case, you will need to use the ETL approach to reshape the data to work with a relational format, which can then be utilized by the end-user data consumers. If your data is already in a relational, or flat form, then it can be loaded directly ELT-style into the target, and then potentially massaged into the target form, along with any sort/aggregate/joins and cleansing operations.
  • Data size: Datasets come in many different shapes and sizes. This variable influences your ETL vs. ELT decision as well. For large datasets, you see ELT used so that the large amount of data can be processed and transformed at the same time. Improvements in the speed and power of processing have also made it possible for large datasets to be handled as one unit. Smaller amounts of data are often connected to the ETL approach. 
  • Cost: Alongside the above, an important consideration is how expensive working with data can be. One of the most expensive things you can do with data is to move the actual bits of which it’s composed. Often, the data has already been landed in the target system, and the ask is to cleanse/format/aggregate the data. If you only have an ETL tool to do that, you must physically move those bits out of the database, then into an external system to do that processing, only to then move the resulting data back into the database or warehouse. This process – moving all those bits – can become awfully expensive to accomplish, not to mention very draining on manpower. Rather, if the data is already there, and you have “limitless processing” on demand in a cloud data warehouse, it may be beneficial to use the massive parallel processing capability that’s available in order to do what’s necessary to the data, without having to move it around. This is a much more efficient process, with far fewer moving parts to coordinate as well.
  • Data source: The source of the data comes into play here as well. What type of application or data source is it coming from? Does that source easily connect to what your organization is using, or will there be a great deal of transformation work to be done to make sure the data is usable? Is it coming from an on-premises store or from the cloud? Another consideration here is that the ELT approach will use set-wise operations on the data, which are inherently very “batchy.” This may be fine – and even well-suited – for larger volumes of data, but it is simply not a good fit if the data is more akin to streaming or messaging. In this case, the ETL style will almost invariably be the better approach.
  • Data destination: Closely tied to the source is the question of what the destination of the data is within your organization. Does the data source easily connect to it? Are they from different companies with limited connective tissue? Will the data coming from one product need to look completely different to be used by the data warehouse solution? ETL is often the preferred method for data with a different source and destination product or style, or that is stored on-premises; while for data that is going from apples to apples, ELT is often used.
  • Intensity: This one’s a little subjective, but the idea is to look at just how much work the data transformation will take to become something that’s useful for your team’s analysis and decision-making needs. Size comes into play here as well. If the transformation process will be less complex, then ELT may be the right choice. If the transformations are more complex, then many organizations choose ETL, so a little is done at a time instead of all at once. That said, with the power of cloud solutions enabling you to expand your capabilities and scale as needed, having more resources available to process larger datasets via ELT may no longer be an issue to worry about.

As you’ve probably surmised, the data transformation takes place in different places depending on the method you choose. In ETL there’s an in-between stage before the data makes it to the warehouse where the transformations are done. 

In ELT, the data warehouse or database management system you’ve loaded the data into does the transformation. ELT only requires raw data from the database to work, but also requires a great deal more power and overhead to store and transform the data. This, in turn, allows for a shorter time between extracting the data and using it, and provides the option for a greater deal of customization. Because of this, business teams can now quickly build their own data pipelines and immediately see insights that can change the business.

Pick Your Technology Tools Wisely

There are a lot of variables to consider when deciding to move data into a database with either the ETL or ELT method. For some, the variables may make the choice more of a non-choice. At different points throughout your organization’s journey, however, you may favor one approach or the other. Many tools on the market can help you either transform your data before loading, or load your data before transforming. There are solutions that can make it easier for you to select either approach depending on the types of data and organizational requirements at the specific moment in time, instead of forcing you to select an ETL or ELT approach forever, because of what data loading or cleansing software you’ve already invested in. The important thing to remember is that your approach may, and should, shift over time – so invest wisely in tools that can adapt as your approach does.

Leave a Reply