Build and Manage ML features for Production-Grade Pipelines

timestamp_col is the name of a timestamp column that is used to join with a table containing the required entity keys for training to retrieve point-in-time correct feature values.

A key benefit of Snowflake Feature Store is its use of Dynamic Tables to automate and abstract the complexity of data and feature engineering pipeline and backfill management. In many feature store solutions, the user is responsible for creating all the data and feature engineering logic to perform the initial population and subsequent ‘update’ of feature values. These steps then need to be scheduled and managed manually outside of the feature store.

In a Snowflake managed Feature View, all of this is declaratively handled. You define the logic to compute features across all history, using Dataframe/SQL. Snowflake handles the incrementalization of that declarative logic. To use these managed Feature Views, simply specify the refresh_freq, which defines the frequency of feature refresh and how up to date you need your features to be from their source tables. Snowflake-managed Feature Views can be monitored from the Snowsight UI via the new Feature Store support.

While in most cases you will want to use such managed Feature Views, there may be scenarios where you want to use feature pipelines, maintained by you, that run using external tools. In this case, create a Feature View by omitting the refresh_freq. This creates user-maintained Feature Views that are computed at retrieval time.

Generating training data

A key purpose of feature stores is to simplify generation of consistent training data sets. Feature Store provides APIs to generate training data in two formats depending on your workflow. In either case, Snowflake Feature Store handles retrieval of point-in-time correct values using the timestamp and ASOF JOIN function to efficiently and scalably join features from multiple views, yielding time-consistent results.

Snowflake Dataset is a new schema-level object specially designed for machine learning workflows. Snowflake Datasets hold collections of data organized into versions, where each holds a materialized snapshot of your data with guaranteed immutability, efficient data access and interoperability with popular deep learning frameworks, such as PyTorch and TensorFlow. Datasets can be conveniently created from Feature Store as shown below:

Build and Manage ML features for Production-Grade Pipelines

Generating training data

Related Posts

UiPath teams up with SAP to accelerate enterprise automation for SAP customers

Amazon Aurora PostgreSQL and Amazon DynamoDB zero-ETL integrations with Amazon Redshift now generally available

Leave a Reply Cancel reply