Tuesday 12 January 2021

How to Extract and Load

I am writing simple steps, how we can Extract and Load data from a Database to a Data warehouse.

Firstly, we need to install Visual Studio. I have installed Visual Studio 2019.

After installing visual studio, we need to install Extensions. 

We can see in the above image that I have installed SSIS SSAS and SSRS extensions.

Now we need to create a new project,,,

Name your project and define location and create it.


We will see a window or interface like below.. now this is the place where we create a connection to data source, then we have a package or pipeline (ETL Package)
This control flow is a Pipeline.

Drag the data flow task to the design area.


Now, double click on it and we will get to the next tab Data Flow. Here click on the Source Assistant .. > SQL Server …New > Server Name .. Database Name... Test the connection

Now we will do the same steps for Destination Assistant (to load data in data warehouse)
but this time select the destination database, where we will load our data. see image below

Now we have source and destination on the data flow.. 

Now double click on the source..
Here we will select the table that we want to extract from source destination.. I have selected table rakesh and in the destination i have selected new and you can name the table rakesh, and we can do mapping to the column level to match the column from source to the destination. check below image.
Now, we will run the ETL package to load the data. The job has run successfully, and 5 rows has loaded to the destination database.

We can query the destination database to check or verify the result.



 



What is SSIS

SQL Server Integration Service (SSIS) is a component of the Microsoft SQL Server database software that can be used to execute a wide range of data migration tasks. SSIS is a fast & flexible data warehousing tool used for data extraction, loading and transformation like cleaning, aggregating, merging data, etc.

It makes it easy to move data from one database to another database. SSIS can extract data from a wide variety of sources like SQL Server databases, Excel files, Oracle and DB2 databases, etc.

SSIS also includes graphical tools & wizards for performing workflow functions like sending email messages, FTP operations, data sources, and destinations.

The SSIS Import/Export Wizard lets the user create packages that move data from a single data source to a destination with no transformations. The Wizard can quickly move data from a variety of source types to a variety of destination types, including text files and other SQL Server instances.

Developers tasked with creating or maintaining SSIS packages use a visual development tool based on Microsoft Visual Studio called the SQL Server Business Intelligence Development Studio (BIDS). It allows users to edit SSIS packages using a drag-and-drop user interface. A scripting environment for writing programming code is also available in the tool. A package holds a variety of elements that define a workflow. Upon package execution, the tool provides color-coded real-time monitoring. (Note: In more recent versions MS SQL Server, BIDS has been replaced with "SQL Server Data Tools - Business Intelligence" (SSDT-BI).

What is ETL

ETL stands for Extract, Transform and Load, which is a process used to collect data from various sources, transform the data depending on business rules/needs and load the data into a destination database. The need to use ETL arises from the fact that in modern computing business data resides in multiple locations and in many incompatible formats. For example business data might be stored on the file system in various formats (Word docs, PDF, spreadsheets, plain text, etc), or can be stored as email files, or can be kept in a various database servers like MS SQL Server, Oracle and MySQL for example. Handling all this business information efficiently is a great challenge and ETL plays an important role in solving this problem.

ETL Tools 

Microsoft - SQL Server Integration Services (SSIS) 
Oracle - (Oracle Warehouse Builder)
IBM - (IBM InfoSphere DataStage)

MS SQL Server and its Editions.

Microsoft SQL Server  Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a ...