Skip to main content

Configure automatic source loading

Automatic source loading simplifies pipeline configuration by automatically determining the best source loading strategy. Describe the characteristics of your source data, and Data Hub will select the optimal strategy at processing time.

When to use automatic source loading

Automatic source loading is recommended when:

  • You are creating a new pipeline. Auto will optimally handle standard data load requirements.
  • You are unsure which source load type best suits your data. Auto will in most cases pick the best strategy.
  • Your pipeline could benefit from both incremental loading and partitioning. Auto will seamlessly switch between both depending on process type.
note

Automatic source loading is only available for source table pipelines and source SQL pipelines.

Configure automatic source loading

  1. In the design panel, under Incremental process (Source), select Auto from the Source load type dropdown.

  2. Configure the following settings, fill out as many as possible:

    • Created date column - A date column that represents when each row was created (e.g., created_date, inserted_at). This column must not change after a row is created.

      important

      The created date column must be a date or datetime type and included in the pipeline's column selection. As with the partition column in partitioned processing, this column must contain immutable values - records that move between partitions may cause processing errors.

    • Modified date column - A date column that represents when each row was last updated (e.g., modified_date, updated_at).

      note

      The modified date column must be a date or datetime type and included in the pipeline's column selection.

    • Has deletes - If you don't need to capture deleted rows, leave this disabled even if the source system supports it.

    • Active data range - Data outside the active range is treated as historic and loaded only once.

      OptionDescription
      InactiveAll data is historic. Data is loaded once and skipped on subsequent processes.
      1 month - 10 yearsPick a range of recent time that is considered active.
      All activeAll data is considered active.
      important

      Ensure the active range is large enough that it includes the oldest rows that can be deleted if Has deletes is checked.