Data Loader
Last updated
Last updated
Data loading integration refers to the process of extracting data from various sources, transforming it as needed, and loading it into a destination system, such as a data warehouse or data lake. This process is often referred to as ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) depending on when the data transformation occurs.
Integration is Divided into 5 Simple Steps:
Prep
Select Source
Select Target
Configuration
Confirm
Choose Integration Type:
Integration Type
Full Load
Transfers the entire dataset from the source to the target.
Integration Type
Incremental Load
Transfers only the data that has changed since the last load.
Warehouse:
Warehouse (also known as a virtual warehouse) is a key component that plays a central role in processing Integrations. It is a cluster of computing resources (e.g., CPU, memory) that users can provision to perform data processing tasks.
Components of Warehouse:
Name
Warehouse Name
CPU
No of CPU's
Memory
Amount of memory
IP address
Public IP Address
Whitelist Warehouse IP:
To ensure proper functionality, please whitelist the IP address in your environment. This will allow necessary access and prevent any connectivity issues.
Select Source: Lyftrondata integrates with over 300 data sources. You simply need to select the source from which to load your data into the target.
Connection Name
You need to write a meaningful connection name.
Description
Short description of the connection name.
Tag
Tags for a connection are keywords or labels assigned to a data connection to categorize and organize it.
You need to complete the prerequisites for the API in order to obtain the credentials. Some APIs require payment, while others are free to use. I have used Freshsales API as an example.
Personal Token
Your Freshsales API Personal Token.
Base URL
Your Freshsales API BASE URL.
Hostname
Your Freshsales Hostname.
Select Target: Lyftrondata's target refers to the destination where data is transferred, transformed, or loaded during data integration processes. It could include databases, data warehouses, data lakes, cloud storage services, or other platforms where the processed data is ultimately stored or used for further analysis and reporting.
Connection Name
You need to write a meaningful connection name.
Description
Short description of the connection name.
Tag
Tags for a connection are keywords or labels assigned to a data connection to categorize and organize it.
URL
Your snowflake account URL.
Required
Username
Enter your snowflake Username.
Required
Password
Enter your snowflake Password.
Required
Schema
Enter your snowflake Schema.
Required
Role
Enter your snowflake Role.
Required
Warehouse
Enter your snowflake Warehouse.
Required
Database
Enter your snowflake Database.
Required
Target Snowflake Connection Video:
After setting up the target, the integration configuration process begins, defining data flow through mappings, transformations, and schedules for efficient, accurate processing. Batches manage data transfer size and frequency to optimize performance, while logging tracks each step for troubleshooting and monitoring. Webhooks trigger actions on event-based notifications, enhancing automation in real-time data workflows.
Batch Size
Batch size is the number of data records processed together in a single operation, optimizing performance and resource use.
Select Memory Size
Refers to choosing the amount of memory allocated for a specific task.
Regex
A sequence of characters that defines a search pattern for matching, replacing, and extracting text.
Die on Error
Immediately stop a program or process when an error occurs, preventing any further execution.
Process Method
Process method parquet or Avro" refers to the choice between using the Parquet or Avro file formats during data processing.
Pipeline Parallelism
Pipeline parallelism means how many pipelines can run in parallel.
Enable Multithreading
Execute multiple threads concurrently, improving performance by utilizing multiple CPU cores effectively
Pipeline Per Dag Limit
Pipeline per DAG means how many pipelines/tables you can select in a single integration.
You need to select the target schema in the load configuration.
You can schedule the integration based on your specific time.
If you want to receive notifications through email or a Slack channel, you can configure that. You will get notifications for any event, whether it passes or fails.
You have the option to select your preferred logging service for tracking and monitoring your data integration processes. Choose between Lyftrondata or CloudWatch to ensure you receive timely and detailed logs of all activities.
You can also set up Web Hook Calls to receive real-time notifications and updates. This allows you to instantly react to events and integrate with other systems seamlessly.
Data Loading Integration: