Here’s what the official documentation says about the Joiner Transformation:
Use the Joiner transformation to join source data from two related heterogeneous sources residing in different locations or file systems. You can also join data from the same source. The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses a condition that matches one or more pairs of columns between the two sources. The Joiner transformation is an active, connected, and native transformation.
To join more than two sources in a mapping, join the output from the Joiner transformation with another source pipeline. Add Joiner transformations to the mapping until you have joined all the source pipelines.
Note: I have already talked about joining data from same source using source qualifier transformation, check it out before proceed reading, here: https://powerbixpert.com/2022/06/30/informatica-powercenter-source-qualifier-trasformation/
Understanding the Joiner Transformation
We have talked about Source Qualifier transformation, and with it you can also join tables in Powercenter, however, using Source Qualifier you can only relate tables that are from the same source, read more about it here: https://powerbixpert.com/2022/06/30/informatica-powercenter-source-qualifier-trasformation/
The joins created using joiner transformation are similar to the joins in databases. The advantage of joiner transformation is that joins can be created for heterogeneous systems (different databases).
Take a time to understand the following image:
Understanding JOINs is a really important aspect in the daily life of a data analyst, data engineer, data scientist or anyone that works with data. Therefore, understanding how SQL JOIN works is important, and we can make a parallel here. For instance, normal join is a inner join. Master outer join is a Right Outer Join, or right join, Detail outer is a left join, or left outer join. Full outer is the same as Full oter join in SQL as well.
Here’s the SQL JOIN schema:
I have wrote about JOINs in the past, you can check it here (written in pt-br):
Another very good url to check on Joins: https://community.powerbi.com/t5/Data-Stories-Gallery/Visualizing-Merge-Join-Types-in-Power-BI/td-p/219906
The Joiner transformation accepts input from most transformations. However, consider the following limitations on the pipelines you connect to the Joiner transformation:
- You cannot use a Joiner transformation when either input pipeline contains an Update Strategy transformation.
- You cannot use a Joiner transformation if you connect a Sequence Generator transformation directly before the Joiner transformation.
Important to know: The Joiner transformation does not match null values. If ports being used on join condition both contains null values, then Powercenter (Integration Service) does not consider them a match and does not join the two rows. To join rows with null values, replace null input with default values, and then join on the default values.
Hands on – Joiner Transformation
Before starting creating our mappings and joiner transformations, let’s understand our data. We have 3 tables: one in a SQL database, and two flat files.
SQL source, CustomerBase table, 91 rows.
Flat file, CustomerSalary, csv file, 9 rows.
Flat file, CustomerAddress, csv file, 5 rows.
For our normal join we have CustomerBase and CustomerSalary, SQL and Flat file, joined together. The target is a SQL table.
Expected result is 9 rows, because CustomerSalary only have 9 rows and its a normal join (inner). The join condition is on CustomerID = CustomerID1. Be aware of data types when setting up the condition ports.
Here is something worth noticing, the Master table. For this example master is CustomerSalary.
Always try to use the table with lesser rows as master, because Powercenter (integration services) compares each row of the master against detail.
- To improve performance for an unsorted Joiner transformation, use the source with fewer rows as the master source.
- To improve performance for a sorted Joiner transformation, use the source with fewer duplicate key values as the master.
Since CustomerSalary has 9 rows against 91 from CustomerBase, performance wise CustomerSalary should be Master.
Of course, this is a simple example, but you can use more ports, more fields in your condition field to set the join condition. But be aware that, if you use multiple ports in the join condition, the Integration Service compares the ports in the order you specify, and it will be slower.
Here’s the result of this mapping after we executed it’s workflow:
Three table joins
Joiner Transformation can join only two tables at a time. If you want to make it work with 3 tables you will have to do this:
Here I added our final table, CustomerAddress, from our csv flat file. Notice that I have added a new Joiner Transformation, dragged the ports from our first joiner transformation to this new transformation, and then added all ports from our CustomerAddress source.
The condition being used on this second joiner transformation is CustomerID2 = CustomerID, being CustomerID2 from the new added source CustomerAddress, and CustomerID being our ID from the SQL database source.
I have also changed the join behavior for this second joiner transformation. While we have NORMAL join in our first joiner, on this second one I opted for a Master Outer Join.
Here, once again, my master is that table which has the lower number of rows, meaning that CustomerAddress, that has only 5 rows, goes as master here.
A quick recap on Master outer join does:
By doing that I ensure that I will end up with all my 9 rows, same as the first mapping we had, but this time I will also have address data whenever it applies, and since we only have 5 rows worth of address data, we are going to end up with 9 total rows, being 5 with address and 4 will null values for address columns.
Here’s how this mapping return after the workflow is executed:
Sorting data before joining
It is recommended to sort your data before performing a Join.
Unsorted joiner : when Powercenter performs unsorted join it read all master rows before it read details rows, so Powercenter blocks details rows while caching master rows, this is to make sure there is no altering of details while master is being handled. After master is cached, details are released from block.
Sorted joiner: when data is sorted Powercenter blocks data based on our mapping configuration. This blocking logic is only possible if master and detail that form the joiner transformation comes from different sources. If the blocking logic can not be performed, it instead store more rows in cache. Sorted joiner can be much faster if the blocking logic applies.
I created our first version of NORMAL join, but this time using sort transformations prior to our joiner transformation.
Be aware that it is also required to check the SORTED INPUT in your joiner transformation properties tab.