Saturday, October 19, 2024

tMap vs tJoin -Talend

 tMap is frequently used component for joins and lookup purpose, it is also use for verity of operations and transformations, whereas tJoin is used for join and lookups only.

tMap

tJoin

It accepts more than one input one is main and rests of the lookups.

It accepts only two inputs and only one is main and other one is lookup.

We can create more than one output

It has two default outputs one is “Main” and another one is ” Inner join reject”

tMap has “inner join ” and ” left outer join” joining model

tJoin offer`s only “inner join”

tMap offers three match model

  1. Unique Match
  2. First Match
  3. All Matches

tJoin defaulted with Unique match

tMap allows to store data on file option for lookup data processing

tJoin doesn`t offer this feature

In tMap you can filter data using filter expression

tJoin doesn`t offer this feature

You can write transformation using expression builder at each column level

tJoin doesn`t offer this feature

Thursday, September 5, 2024

Azure Data Factory Azure Data Factory - list of activities

List of Activities in ADF

Azure Data Factory Azure Data Factory is a cloud-based data integration service that allows you to create, schedule, and manage data pipelines to move and transform data from various sources to various destinations. Here are some of the different types of activities in Azure Data Factory with examples: 

1. Copy Activity:- The Copy activity is used to copy data from one data store to another. For example, you can use the Copy activity to copy data from an on-premises SQL Server database to an Azure SQL database. 

 2. Execute Pipeline Activity:- The Execute Pipeline activity is used to call another pipeline from within the current pipeline. For example, you can use this activity to execute a pipeline that contains a data transformation activity after the data has been copied. 

3. Web Activity:- The Web activity is used to call a REST API endpoint or a web service. For example, you can use this activity to call an API to retrieve data from an external system. 

4. Stored Procedure Activity:- The Stored Procedure activity is used to call a stored procedure in a SQL Server database. For example, you can use this activity to execute a stored procedure that performs a data transformation. 

5. If Condition Activity:- The If Condition activity is used to create a conditional workflow in your pipeline. For example, you can use this activity to check if a file exists in a data store and only continue with the pipeline if the file is found.

 6. For Each Activity:- The For Each activity is used to iterate over a collection of items and perform an action on each item. For example, you can use this activity to loop through a list of files and copy each file to a destination. 

7. Lookup Activity:- The Lookup activity is used to retrieve metadata or a single value from a data store. For example, you can use this activity to get the schema of a table in a SQL Server database. 

8. Set Variable Activity:- The Set Variable activity is used to set the value of a variable in a pipeline. For example, you can use this activity to set a variable that holds the current date or time.

 9. Wait Activity: The Wait activity is used to pause the execution of a pipeline for a specified period of time. For example, you can use this activity to wait for a specific time to start a data transfer operation.

 10.Filter Activity:- The Filter activity is used to filter data based on a specified condition. For example, you can use this activity to filter data based on a specific column value before transferring the data to a destination. 

11.Join Activity:- The Join activity is used to join data from two or more sources. For example, you can use this activity to join data from two tables in a SQL Server database.

 12.Union Activity:- The Union activity is used to combine data from two or more sources. For example, you can use this activity to combine data from two tables in a SQL Server database into a single destination. 

13.Lookup Activity:- The Lookup activity is used to retrieve metadata or a single value from a data store. For example, you can use this activity to get the schema of a table in a SQL Server database.

 14.Set Variable Activity:- The Set Variable activity is used to set the value of a variable in a pipeline. For example, you can use this activity to set a variable that holds the current date or time.

 15.If Condition Activity:- The If Condition activity is used to create a conditional workflow in your pipeline. For example, you can use this activity to check if a file exists in a data store and only continue with the pipeline if the file is found. 

16.Until Activity:- The Until activity is used to execute a loop until a specific condition is met. For example, you can use this activity to keep copying data until a specific file is found in a data store.

 17.Mapping Data Flow Activity: The Mapping Data Flow activity is used to visually design and build data transformation logic using a drag-and-drop interface. For example, you can use this activity to transform data from one format to another, or to combine data from multiple sources.

 18.Databricks Notebook Activity: The Databricks Notebook activity is used to run a Databricks notebook in a Databricks workspace. For example, you can use this activity to run a Python or Scala script to transform data.

 19.HDInsight Hive Activity: The HDInsight Hive activity is used to execute Hive queries on an HDInsight cluster. For example, you can use this activity to transform data using HiveQL.

 20.HDInsight Pig Activity: The HDInsight Pig activity is used to execute Pig scripts on an HDInsight cluster. For example, you can use this activity to transform data using Pig Latin. 

21.HDInsight MapReduce Activity: The HDInsight MapReduce activity is used to execute MapReduce jobs on an HDInsight cluster. For example, you can use this activity to perform complex data transformations on large datasets. 

22.Custom Activity: The Custom activity is used to run custom code in a data pipeline. For example, you can use this activity to run a PowerShell script to perform a specific task. 

23.Execute SSIS Package Activity: The Execute SSIS Package activity is used to execute an SSIS package stored in an Azure Storage account or a SQL Server Integration Services (SSIS) catalog. For example, you can use this activity to perform data transformations using an existing SSIS package.

 24.Delete Activity: The Delete activity is used to delete data from a data store. For example, you can use this activity to delete files from an Azure Blob Storage container. 

25.Teradata Query Activity: The Teradata Query activity is used to execute queries on a Teradata database. For example, you can use this activity to extract data from a Teradata database. 

26.Amazon S3 Storage Activity: The Amazon S3 Storage activity is used to copy data between an Amazon S3 storage account and an Azure Data Factory-supported data store. For example, you can use this activity to transfer data between an Amazon S3 storage account and an Azure Blob Storage account. 

27.Azure Function Activity: The Azure Function activity is used to execute an Azure Function in a pipeline. For example, you can use this activity to perform custom data transformations using an Azure Function.

 28.Wait Event Activity: The Wait Event activity is used to pause the execution of a pipeline until a specific event occurs. For example, you can use this activity to wait for a signal from an external system before proceeding with the pipeline. 

29.Amazon Redshift Query Activity: The Amazon Redshift Query activity is used to execute queries on an Amazon Redshift database. For example, you can use this activity to extract data from an Amazon Redshift database. 

30.Web Activity: The Web activity is used to call a REST API or a web endpoint from a pipeline. For example, you can use this activity to call an API to retrieve data or to perform an action. 

31.Azure Analysis Services Activity: The Azure Analysis Services activity is used to execute a command or a query against an Azure Analysis Services database. For example, you can use this activity to refresh a cube in an Azure Analysis Services database.

 32.SharePoint Online List Activity: The SharePoint Online List activity is used to copy data between a SharePoint Online list and an Azure Data Factory- supported data store. For example, you can use this activity to transfer data between a SharePoint Online list and an Azure SQL Database. 

33.Stored Procedure Activity: The Stored Procedure activity is used to execute a stored procedure in a database. For example, you can use this activity to perform a custom data transformation using a stored procedure.

 34.Lookup with a Stored Procedure Activity: The Lookup with a Stored Procedure activity is used to retrieve data from a database using a stored procedure. For example, you can use this activity to retrieve data from a SQL Server database using a stored procedure. 

35.Copy Activity: The Copy activity is used to copy data between different data stores. For example, you can use this activity to copy data from an on-premises SQL Server database to an Azure Blob Storage container. 

36.IF Condition Activity: The IF Condition activity is used to evaluate a Boolean expression and perform different actions based on the result. For example, you can use this activity to perform different data transformations based on a condition.

 37.For Each Activity: The For Each activity is used to loop through a set of items and perform an action for each item. For example, you can use this activity to process a set of files stored in an Azure Blob Storage container. 

38.Until Activity: The Until activity is used to repeatedly perform an action until a certain condition is met. For example, you can use this activity to keep polling a system until a certain status is returned.

 39.Filter Activity: The Filter activity is used to filter data based on a condition. For example, you can use this activity to filter out data that does not meet certain criteria. 

40.Set Variable Activity: The Set Variable activity is used to set the value of a variable that can be used in later activities. For example, you can use this activity to set a variable to the current date and time.

 41.Azure Databricks Notebook Activity: The Azure Databricks Notebook activity is used to execute a Databricks notebook in a pipeline. For example, you can use this activity to perform advanced data processing and analytics using Databricks. 

42.Lookup Activity: The Lookup activity is used to retrieve data from a data store. For example, you can use this activity to retrieve metadata from a file stored in Azure Blob Storage. 

43.Wait Activity: The Wait activity is used to pause the execution of a pipeline for a specified amount of time. For example, you can use this activity to introduce a delay between two activities in a pipeline.

 44.If Condition Branch Activity: The If Condition Branch activity is used to define the action that should be taken based on the result of an If Condition activity. For example, you can use this activity to perform different data transformations based on the result of the if Condition activity. 

45.Get Metadata Activity: The Get Metadata activity is used to retrieve metadata about a file or folder stored in a data store. For example, you can use this activity to retrieve the size, type, and last modified date of a file stored in Azure Blob Storage.

 46.Union Activity: The Union activity is used to combine the results of two or more data sources. For example, you can use this activity to combine the results of two different SQL queries into a single data set

Wednesday, July 31, 2024

Python in 30 min

 Python is a versatile and powerful programming language that's widely used in various fields, including web development, data analysis, artificial intelligence, and more. Here’s a beginner-friendly tutorial to get you started with Python:

1. Introduction to Python

Python is known for its readability and simplicity. It uses indentation to define code blocks, which makes it visually clear and easy to follow.

2. Setting Up Python

  1. Download and Install Python:

    • Go to the official Python website and download the latest version.
    • Follow the installation instructions for your operating system (Windows, macOS, or Linux).
  2. Install an IDE or Text Editor:

    • You can write Python code in various IDEs and text editors. Some popular choices are:
      • IDLE: Comes bundled with Python.
      • PyCharm: A powerful IDE for Python.
      • VS Code: A lightweight but powerful editor with Python support.

3. Basic Syntax and Concepts

Hello World

Let's start with a simple program that prints "Hello, World!" to the console.


print("Hello, World!")

Variables and Data Types

Variables are used to store data. Python supports various data types including integers, floats, strings, and booleans.


x = 5 # Integer y = 3.14 # Float name = "Alice" # String is_student = True # Boolean

Basic Operations


# Arithmetic operations a = 10 b = 5 print(a + b) # Addition print(a - b) # Subtraction print(a * b) # Multiplication print(a / b) # Division # String concatenation first_name = "John" last_name = "Doe" full_name = first_name + " " + last_name print(full_name)

Control Structures

Conditional Statements


age = 18 if age >= 18: print("You are an adult.") else: print("You are a minor.")

Loops

  • For Loop

for i in range(5): print(i)
  • While Loop

count = 0 while count < 5: print(count) count += 1

4. Functions

Functions are reusable blocks of code that perform a specific task.


def greet(name): return f"Hello, {name}!" print(greet("Alice"))

5. Lists and Dictionaries

  • Lists are ordered collections of items.

fruits = ["apple", "banana", "cherry"] print(fruits[0]) # Access first item fruits.append("date") # Add item
  • Dictionaries are collections of key-value pairs.

person = {"name": "John", "age": 30} print(person["name"]) # Access value by key person["age"] = 31 # Update value

6. File Handling

You can read from and write to files using Python.

Reading a file


with open('example.txt', 'r') as file: content = file.read() print(content)

Writing to a file


with open('example.txt', 'w') as file: file.write("Hello, World!")

7. Modules and Packages

Modules and packages allow you to organize your code into separate files and directories.

Importing a module


import math print(math.sqrt(16))

Creating your own module Save the following in a file named my_module.py:


def say_hello(name): return f"Hello, {name}!"

You can use it in another file:


import my_module print(my_module.say_hello("Alice"))

8. Error Handling

Python uses try and except blocks to handle errors gracefully.


try: result = 10 / 0 except ZeroDivisionError: print("You can't divide by zero!") finally: print("This block always executes.")

9. Object-Oriented Programming

Python supports object-oriented programming. You can create classes and objects.




class Dog: def __init__(self, name): self.name = name def bark(self): return f"{self.name} says Woof!" my_dog = Dog("Buddy") print(my_dog.bark())

10. Next Steps

  • Explore Libraries: Python has a rich ecosystem of libraries and frameworks. Explore libraries like NumPy for numerical computing, pandas for data analysis, and Flask/Django for web development.
  • Practice: Work on small projects or problems to reinforce your learning.

Feel free to ask if you have questions about any of these topics or need more detailed explanations!

Tuesday, July 23, 2024

Databricks - File to Table Data Loading


# Step-by-Step Script for File to Table Data Loading in Databricks:

from pyspark.sql import SparkSession


# Initialize Spark session

spark = SparkSession.builder \

                    .appName("File to Table Data Loading") \

                    .getOrCreate()


# Load data from CSV files into DataFrame

df = spark.read.format("csv") \

               .option("header", "true") \

               .load("dbfs:/mnt/data/csv_files/")


# Perform data transformations if needed

df = df.withColumn("amount", df["amount"].cast("double"))


# Save DataFrame to a Delta Lake table

df.write.format("delta") \

        .mode("overwrite") \  # or "append" for incremental loading

        .saveAsTable("my_database.my_table")


# Optionally, stop Spark session

spark.stop() 

Wednesday, May 15, 2024

Talend Cloud Data Connectors

 

List of supported Talend Cloud Data Connectors

List of the environments and systems to which you can connect.

Unless stated otherwise the latest versions are supported.

Supported connectors and their categories

Supported system

Connection type

Unidirectional / Bidirectional

Amazon Aurora

Databases

Bidirectional

Amazon DynamoDB

Databases

Bidirectional

Amazon Redshift

Databases

Bidirectional

Apache Kudu

Databases

Bidirectional

Azure Cosmos DB

Databases

Bidirectional

Azure Synapse

Databases

Bidirectional

Couchbase

Databases

Bidirectional

Delta Lake

Databases

Bidirectional

Derby

Databases

Bidirectional

Google BigQuery

Databases

Bidirectional

Google Bigtable

Databases

Bidirectional

MariaDB

Databases

Bidirectional

Microsoft SQL Server

Databases

Bidirectional

Microsoft SQL Server - JTDS driver (Deprecated)

Databases

Bidirectional

MongoDB

Databases

Bidirectional

MySQL

Databases

Bidirectional

Oracle

Databases

Bidirectional

PostgreSQL

Databases

Bidirectional

SingleStore

Databases

Bidirectional

Snowflake, including pushdown capabilities

Databases

Bidirectional

Amazon S3

Cloud file systems

Bidirectional

Azure Blob Storage

Cloud file systems

Bidirectional

Azure Data Lake Storage Gen2

Cloud file systems

Bidirectional

Box

Cloud file systems

Bidirectional

Google Cloud Storage

Cloud file systems

Bidirectional

Dynamics 365

Business applications

Bidirectional

Marketo

Business applications

Bidirectional

Google Analytics

Business applications

Input only

Google Analytics 4

Business applications

Input only

NetSuite

Business applications

Bidirectional

Salesforce

Business applications

Bidirectional

Workday

Business applications

Bidirectional

Zendesk

Business applications

Bidirectional

HTTP Client

Web services

Bidirectional

REST (deprecated)

Web services

Bidirectional

FTP

File systems

Bidirectional

HDFS

File systems

Bidirectional

Amazon Kinesis

Messaging

Input only

Apache Pulsar

Messaging

Bidirectional

Azure Event Hubs

Messaging

Bidirectional

Google PubSub

Messaging

Bidirectional

Kafka

Messaging

Bidirectional

RabbitMQ

Messaging

Bidirectional

ElasticSearch v2.4.4 to 6.3.2

Search and index

Bidirectional

Local connection: This built-in connection allows you to store your local file as a dataset.

Local connection (for local data)

Bidirectional

Data generator: This connection allows you to generate random realistic data according to the conditions you define.

Test connection (for test data)

Input only

Test connection: This built-in connection allows you to enter manually your test data as a dataset.

Test connection (for test data)

Bidirectional

Talend Cloud Data Stewardship campaigns can be retrieved and used as pipeline sources and destinations, allowing you to both read from them and write into them.

Talend Cloud platform

Bidirectional

 

tMap vs tJoin -Talend

  tMap is frequently used component for joins and lookup purpose, it is also use for verity of operations and transformations, whereas tJoin...