Skip to main content

Transform your Data with Steps

A Step is a pipeline setting that configures the transformations to be applied to column(s) in the pipeline. Each step can have a different purpose and correlates to a set of nest views within the warehouse.

Typically, steps can be added to either segment a transformation for a logical grouping, such as calculations, or because the step, like an aggregation, needs to be separated.

Steps are sequential, i.e., only data that fits a previous step will be available for the next step. For example, if you have filtered out all sales representatives from region 1 in a basic step, their data will not be available when you add an aggregate step next.

It is a best practice to consolidate steps where possible.

Add a step

Note

Each pipeline contains an Initial step and may have up to five additional steps added to it. In the unlikely event that you need more than six steps in a pipeline, you can create a new warehouse pipeline that uses the original pipeline as its source – giving you a further six steps.

  1. Open the pipeline you want to add the step to.

  2. Click + to the right of the Initial step.

  3. Select the type of step you want to add from the drop-down.

  4. Click in the New Step text field to rename your step.

  5. See below for configuring steps.

Configure a step

Initial step

The Initial step is automatically created in your pipeline and provides the most common settings for configuring columns in a pipeline.

On the Initial step, you can:

  1. Adjust columns by:

    • Renaming columns.

    • Setting key columns.

    • Assigning data types.

    • Adding settings for reporting.

  2. Add lookup columns for using data from related pipelines.

  3. Add calculated columns.

  4. Delete columns.

  5. Filter rows.

Basic step

The Basic step is the default step type and the simplest of all the available steps.

In a Basic step, you can:

  1. Add calculated columns.

  2. Add lookup columns for using data from related pipelines.

  3. Filter rows.

Aggregate step

An aggregate step lets you summarize the rows in a pipeline using the SQL GROUP BY clause. You can select which grouping operator to apply to each column and use preset templates to implement aggregations that ignore duplicates or summarize duplicates quickly.

Note

While you can manually apply aggregation operators to each column, pre-set templates make it easy to apply the common types of aggregation to a pipeline.

  1. Select Aggregate step type.

  2. Choose a template to apply using the Template drop-down:

    1. Default template - summarizes the table based on the values of its attribute columns.

    2. Ignore Duplicates template - removes rows containing duplicates of the primary key.

    3. Summarize Duplicates template - used to remove duplicate rows while retaining aggregate values for columns from the duplicate rows.

  3. Click Save.

Union step

A union step combines two pipelines based on a set of rules per column. It combines the rows from the previous step in the current pipeline (the input pipeline) with the output rows from another pipeline (the union pipeline).

Note

Choosing to perform the mapping from the union column or the input column will result. For viewing convenience, it may be preferable to perform the mapping from the input column, as that will retain the column's position in the list of columns.

  1. Select Union step type.

  2. Choose a column to union with from the Union with drop-down.

  3. Manually map a column in one pipeline to a corresponding column in another pipeline using the drop-down lists in the Union and In table columns.

    • To map a column in the input pipeline to a column in the union pipeline, use the drop-down list of columns in the Union table column.

    • To map a column in the union pipeline to a column in the input pipeline, click on the drop-down.

  4. Click Save.

Unpivot step

Unpivot steps can be used to rotate input columns into row values. This transforms information resembling a spreadsheet pivot table into a relational form suitable for combining with other pipelines.

Although a single measure is automatically added when an unpivot step is created, additional measures may be necessary, depending on your data source and how you need to pivot your data.

  1. Select Unpivot step type.

  2. Return to the Initial step and expand the Preview area.

  3. Examine Preview area data for the Initial step and verify the pipeline columns have been set to required measures and attributes.

  4. Click Save.

Important

Only measures can be configured using an Unpivot step.

Note

If any measures share the same name in Pivot column key column, Data Hub assumes they share that key.

Add an additional unpivot measure and rename

  1. Hover over the Initial step, and click the small arrow icon.

  2. Click Unpivot to add an unpivot step after the initial step.

  3. Rename the Pivotcolumn entry.

    1. The column list (above Preview area) locates the column that was automatically labeled Pivot column.

    2. Click the column to select it. The column is now highlighted.

    3. In the Properties panel, type a new name for the column.

  4. Click Save.

History step

The history step allows you to track changes in data model information over time. It allows you to snapshot your data at different intervals (points in time) and see what has changed.

Note

For accuracy and consistency, make the history step the last step in your pipeline.

  1. On the pipeline that you want to track, click + to add a new step.

  2. In the type dropdown, select History.

    AddHistoryStep.jpg
  3. In the Type column of the step columns' grid, select the columns you want to track. By default, no columns are tracked (Latest selected by default).

    Note

    If a column is not set to tracked, it will just report the latest value.

    Track_LatestColumns.jpg
  4. Specify the history type used for the pipeline (using the options in the Properties panel.

    HistoryTypeDropdown.jpg
  5. Review the RELATIONSHIPS pipeline tab. When a History step is added to a pipeline, reporting relationships are automatically added to the Date dimension.

    DateRelationship.jpg
  6. Click Save and process.

External Lookup step

The external lookup step provides the ability to lookup external data.

Geocoding

Geocoding provides coordinates for addresses, which enhances the ease of map chart creation significantly.

Prerequisites

In order to use the Geocoding step, you'll need:

  1. An Azure maps account. Use an existing account or configure a new Azure maps account

  2. Save the primary key to use when connecting in Data Hub.

Configure geocoding in Data Hub
  1. Add an EXTERNAL LOOKUP step, and choose Geocoding from the Service dropdown.

    ExternalLookupType.png
  2. Provide the following required field values:

    • Location column: Address column in your source data to be used as the input values for geocoding.

    • Azure maps subscription key: Key on your Azure maps account. (To find the key, follow the steps in the prerequisites section above.

    Coordinets.png
  3. After processing, the latitude and longitude columns should be available to use in reporting.

Sentiment Analysis

Sentiment analysis, is a technique used to analyze text to identify and categorize opinions expressed as positive, negative, or neutral. Sentiment analysis can be used in any business sectors, e.g. marketing and social sciences to gain insights into public opinion and customer satisfaction.

Prerequisites

In order to use the Sentiment Analysis step, you'll need:

  1. An Azure Language service. Use an existing service or configure a new Azure Language service.

  2. Save the Primary key and Endpoint from step 1 to use when connecting in Data Hub.

Configure Sentiment Analysis in Data Hub
  1. Add an EXTERNAL LOOKUP step, and choose Sentiment analysis from the Service dropdown.

    ExternalLookupType.png
  2. Provide the following required field values:

    • Language service URL: The endpoint value available from the Azure language service

    • Azure language service key: The key on the Azure language service. (To find the key and endpoint, follow the steps in the prerequisites section above.

    Sentiments.png
  3. After processing, the negative, neutral and positive columns are available to use in reporting.

Classification Step

  1. Shift your focus onto the rules grid by clicking any line

  2. Click '+' to add a rule.

    FunctionIcons.gif
  3. Add condition columns by clicking the paperclip icon in the rules grid, or '+' in the Classification Configuration panel on the right.

    Rules_Grid.png
  4. Delete condition columns by clicking the trash bin icon.

    Classification_Configuration_Panel.gif
  5. Specify the conditions of the rule.

    Note

    Conditions vary depending on the data type of the column.

    NumberColumnConditions.png

    When using text columns, you can use exact values to filter on, or partially matched values using wildcards. Below example will give all records that start with 'C'

    TextColumnConditions.png
  6. The following grid functions can be used to configure rules.

    • Add, Cut, Copy, Paste, Delete, Refresh

      FunctionIcons.gif
      Classification_Rule_Grid_Functions.gif

    Click refresh in the grid functions to apply the new rule to the counts. This helps identify if the rule is returning the results you need.

  7. Click refresh in the preview to view the results in the data.

Filter a pipeline using a step

Filter pipeline source data

  1. Expand the Filter list.

  2. Click the + Condition button to add a row to the filter list.

  3. Click the Column drop-down list, and choose a column for the filter.

  4. Click the Operator to choose an operator.

  5. If necessary, click the Not column to the left of the Operator text box to negate the condition.

  6. Type a value into the Value combo box.

  7. Click the stacked squares icon to the right of the Value text box to display a context menu that lets you change it to a column selector.

Filter pipeline staging data

Filter pipeline using staging data in a step

  1. Select + to add a new step.

  2. Select BasicStep

  3. Scroll down to Filter

  4. Select + Condition to add a filter

  5. Select Column

  6. Select Operator

  7. Select Value.

Delete a step

Delete a step from a pipeline using the Delete step icon.

Note

You cannot delete the Initial step.

  1. Click on the step to be deleted.

  2. Click x to remove the step.