Talend Learning: Different types of Schemas in Talend

What are different types of schemas in Talend

The benefits of using Repository schemas are:

1. They can be re-used across multiple jobs, thus reducing the amount of re-keying.

2. Talend will ensure that changes made to a Repository schema are cascaded to all jobs

that use the schema, thus avoiding the need to scan jobs manually for Built-In schemas

that need to be changed.

3. Impact analysis reports can be generated showing where a Repository schema is

being used within a project. This enables the impact of changes to be more

Generic schemas aren’t tied to a particular source, so they can be used as a shared

resource across multiple types of data source or they can be used to define data sources

that are generated, such as the output from custom SQL queries.

Schemas captured from a particular type of data source are stored in the metadata

repository in a folder for that data type (for example, CSV file schemas are stored in the

directory for delimited files).

There are however instances where schemas will be shared across multiple types. For

example, a CSV file and Excel file could be used to directly load a database table.

If you import the metadata from one of the sources, it will be stored in the folder for that

source, which could make it hard to find.

By storing the schema as a Generic schema, it is more obvious that the schema isn’t used

just for a single source.

Generated data sources

It is often necessary to perform a query against a database and return the result set to the

Talend job. It is often the case that the same query is used multiple times in many jobs.

By storing the schema for the result set in a generic schema, it removes the tedious

process of having to create the same schema over and over again manually every time the

query is used.

Some components, such as tLogCatcher, have predefined schemas that are read-only.

These can be easily recognized due to the fact that the whole schema is gray.

You may also find that certain flows, for instance the reject flows, have fixed columns that

have been added to the original schema. This is because Talend will add the errorCode and

errorMessage fields to the schema to store the error information. These additional fields will

be green to distinguish them as Talend fields.