Ingesting Snowflake Data: Turn a Warehouse Table into a Self-Refreshing Analytical Dataset

How the Snowflake SQL API returns query results as JSON over HTTP, why key-pair JWT authentication is the right way to connect a service identity, and how QuantumLayers parses the ResultSet into a dataset that runs through automated statistical analysis and AI-generated insights


The Warehouse Is Where the Numbers Live

For a large share of data teams, Snowflake is the single source of truth. The fact tables that finance reconciles against, the customer dimension that every department joins to, the event models that the analytics engineers shape with dbt, the revenue marts that feed the executive review: all of it converges in a Snowflake account, governed by roles and warehouses, modeled into clean schemas that the rest of the business consumes. Getting data into Snowflake is a solved problem. Fivetran, Airbyte, Snowpipe, and a dozen ingestion tools compete to land raw data in the warehouse, and dbt transforms it into something usable downstream.

The harder problem is what happens after the modeling is done. A clean, well-governed mart sits in Snowflake, refreshed every few hours, and the people who need to understand it still cannot get to it without help. The analyst writes the query and exports a CSV. The dashboard shows the number but not the reason behind it. The business stakeholder who wants to know whether a metric moved because of seasonality or because something actually broke waits for someone with warehouse access to investigate. The data is modeled, current, and trustworthy, and it is still trapped behind SQL and a BI license.

Snowflake addressed exactly this gap when it shipped the SQL API. The API turns any query against your warehouse into a standard HTTPS request that returns structured JSON. No ODBC driver to install, no connector to maintain, no Python environment with the Snowflake library pinned to a working version. A single POST request carries your SQL statement, and a JSON response carries the rows back. That makes a Snowflake table reachable by anything that can speak HTTP, which is precisely what lets QuantumLayers treat it as a live data source.

This post covers the full path: how the Snowflake SQL API works, how to set up key-pair authentication for a service identity, how to configure the request inside the QuantumLayers REST connector, what the response actually looks like, and what happens to the rows once they land in the platform’s analysis pipeline. By the end, a Snowflake query will be a refreshable dataset inside QuantumLayers that QL-Agent can interrogate in plain English.

Why a Live Query Beats a Scheduled Export

It is worth being precise about why querying Snowflake through its API is structurally different from the more common pattern of exporting results and handing them off as a file. The difference is not cosmetic. It changes who has to do the work and how fresh the numbers stay.

The export pattern goes like this. An analyst opens a worksheet, runs a query, downloads the result, and uploads it somewhere a stakeholder can see it. The moment that file is created it begins to age. Snowflake marts are typically rebuilt on a schedule, often several times a day, so by the next dbt run the exported file no longer matches the warehouse. Refreshing it means a person repeating every manual step, and across a dozen recurring reports a meaningful fraction of an analyst’s week disappears into re-running and re-sending queries that have not conceptually changed.

The API pattern removes the human from that loop. QuantumLayers stores the connection as a recipe: the account endpoint, the warehouse and role, the SQL statement, and the credential. When the dataset needs refreshing, the platform re-runs that recipe against Snowflake, pulls the current rows, and reprocesses them through the full analysis pipeline. The dataset is always the output of the most recent successful query, not a snapshot that someone remembered to regenerate. The same logic applies to any warehouse with an HTTP query interface, which is why we have written companion guides for connecting Databricks and any general REST API source.

There is a second benefit that teams tend to underrate. When you query through the SQL API, Snowflake’s compute does all of the work. The warehouse scans micro-partitions, applies pruning, executes the joins and aggregations, and enforces row and column access policies, then hands back a compact result. QuantumLayers never touches raw storage, never worries about table formats, and never needs to understand how the warehouse is clustered. You write a SELECT against a governed view. Snowflake decides how to run it. The platform receives clean tabular JSON. That separation keeps the pushdown where it belongs and keeps your governance intact, because the query runs as a specific Snowflake role and sees only what that role is allowed to see.

How the Snowflake SQL API Works

The Snowflake SQL API exposes a single resource, /api/v2/statements, with three endpoints: one to submit a statement, one to check its status, and one to cancel it. For pulling an analytical result into QuantumLayers, the submit endpoint is the one that matters. Everything is built on plain HTTP requests, so any tool that can send a POST and read JSON can use it.

The endpoint lives on your account hostname:

The account identifier is the part of your Snowflake URL before snowflakecomputing.com. It usually takes the form orgname-accountname, for example acme-marketing. You can confirm it under Admin in Snowsight, or simply read it from the browser address bar when you are signed in. Use the exact identifier with no trailing slash.

The request body is a JSON object. The only strictly required field is the SQL statement, but in practice you also set the warehouse that should run it, plus the database, schema, and role that establish the execution context. A request for the last 90 days of daily order revenue by product category looks like this:

The timeout field sets the maximum number of seconds Snowflake will spend on the statement before giving up. The warehouse, database, schema, and role values are case sensitive, so match them exactly to how they appear in Snowflake. One behavior worth knowing in advance: if a query runs longer than about 45 seconds, the API returns a 202 status code instead of 200, signaling that the statement is still executing and that results must be fetched later from the status endpoint. For the aggregated reporting queries that feed most dashboards, the warehouse returns well inside that window and you get a 200 with the data inline.

Authenticating with a Key-Pair JWT

Snowflake authenticates SQL API requests with either OAuth or a key-pair JSON Web Token, and for a machine-to-machine connection the key-pair approach is the cleaner choice. You generate an RSA key pair, assign the public key to a Snowflake user, and then sign a short-lived JWT with the private key. That signed token goes in the Authorization header as a Bearer token, alongside a header that tells Snowflake which token type to expect.

The headers on every request look like this:

The reason key-pair authentication fits a platform integration so well is that it is built for an identity that is not a person. The recommended setup is a dedicated Snowflake service user, often created with the TYPE = SERVICE property so it cannot log in through the browser, granted a single read-only role scoped to exactly the schemas QuantumLayers needs to query. The public key is registered to that user, and the private key is what signs each token. Because the JWT is short lived and the private key never leaves your control, there is no long-lived password sitting in a config file waiting to be leaked.

This is also where the principle of least privilege pays off. Create a role such as QUANTUMLAYERS_READER, grant it USAGE on the warehouse and SELECT on the specific views you intend to expose, and assign it to the service user. The connection can then read only what that role allows. If someone later changes the SQL to reach a table the role cannot see, the query simply fails rather than quietly exposing data it should not. The same access controls that protect every other consumer of the warehouse protect this one.

If you want a deeper background on why service identities and scoped credentials matter as more tools start reading directly from the warehouse, the team at Lurika has a clear writeup on the topic in their piece on data contracts and protecting production from breaking changes.

What the Response Looks Like

The response shape determines how QuantumLayers parses the data, so it is worth understanding before configuring anything. A successful query returns a ResultSet object. The two parts that matter are resultSetMetaData, which describes the columns, and data, which holds the rows. A response from the example query above looks like this, abbreviated for clarity:

The resultSetMetaData.rowType array gives the column names and their Snowflake types in order. The data array holds the rows, each one an array of string values aligned to the same column order. This is a column-oriented JSON layout, where column names and row values live in separate arrays, and it is exactly the kind of structure QuantumLayers’ automatic JSON detection is built to recognize. You do not configure any paths by hand. The platform inspects the response, identifies the column-oriented pattern, pairs the rowType names with the data values, and flattens the result into a clean table on its own. Snowflake returns every value as a string regardless of its declared type, which is normal for this API, so QuantumLayers reads the type hints and converts each column to its proper numeric, date, or text form during ingestion. A revenue figure becomes a true number and an order date becomes a real date rather than free text.

There is one structural detail to plan for. Snowflake splits large results into partitions, and only the first arrives inline with the initial response; the partitionInfo array lists the rest, which are fetched with follow-up GET requests to the status endpoint. For aggregated and summarized queries the entire result fits in the first partition, so a single request returns everything. When a query would return a very large number of rows, the better design is to aggregate or sample inside the SQL itself rather than pulling raw detail, which keeps the response in one partition and keeps the dataset focused on what the analysis needs.

Configuring the Connection in QuantumLayers

The connection uses the same REST API template that every QuantumLayers source follows. If you have connected another warehouse or any API before, this will be familiar. If it is your first connection, the REST API connector walkthrough covers the fundamentals. Open the QuantumLayers Connect page, choose REST API, and fill in the fields below.

Endpoint URL

The Statement Execution endpoint on your account host:

Use the account identifier exactly as it appears in your Snowsight URL, with no trailing slash.

HTTP Method

Set to POST. The SQL API receives the statement and execution context as a JSON body, which requires a POST.

Authentication

Provide the signed JWT generated from your service user’s private key as the connection credential. The credential is stored encrypted in QuantumLayers and attached to every request. The service user behind the token must hold USAGE on the warehouse and SELECT on the objects referenced in the statement, with those grants flowing through the dedicated read-only role you assigned.

Snowflake’s SQL API expects one extra header alongside the token: X-Snowflake-Authorization-Token-Type set to KEYPAIR_JWT, which tells Snowflake how to interpret the credential. Support for sending custom request headers like this one directly from the REST connector is on the QuantumLayers roadmap and will make the Snowflake connection a fully self-contained setup. Until that lands, the cleanest approach is to point QuantumLayers at a thin proxy endpoint that holds the service credential, attaches the required header, signs a fresh short-lived JWT on each call, and forwards the request to Snowflake. The proxy keeps the private key in one place and removes any concern about token expiry, since a new token is minted on every sync.

Request Body

The JSON payload with your statement and the warehouse, database, schema, and role that set the execution context. Adding a timeout keeps a slow query from hanging the refresh. This is the one field you will revisit most, because changing the analytical question means changing the SQL here and nothing else.

Response Handling

There is nothing to configure here. QuantumLayers detects the structure of the JSON response automatically, recognizes Snowflake’s column-oriented layout, pairs the column names with the row values, and produces a flat table ready for analysis. Type detection happens at the same time, so dates, numbers, and text all land in their correct form. This automatic handling is the same detection engine that lets the REST connector ingest responses from almost any API without manual mapping.

What Happens After the Data Lands

Connecting the source is the start, not the finish. Once the ResultSet becomes a dataset, it enters the same pipeline that every QuantumLayers source runs through. The platform profiles each column, infers types, and runs a battery of statistical tests before any language model sees the data. This preprocessing step is deliberate: extracting distributions, correlations, and anomalies up front means the AI layer reasons over compact statistical summaries rather than raw rows, which keeps insights both cheaper and more accurate. We wrote about why this ordering matters in why statistical preprocessing matters.

From there the dataset is open to QL-Agent, the conversational layer that lets anyone ask questions of the data in plain language and get back charts, statistical results, and written explanations without touching SQL. A finance lead can ask why revenue dipped in a region and get a decomposition rather than a shrug. The deeper diagnostic capability behind that is something we covered in how data teams answer “why did this metric change”. The Snowflake mart that used to require a query and an analyst now answers questions directly.

Because the connection is a stored recipe rather than a file, refreshing is a single action. The platform re-runs the statement against Snowflake through your endpoint, which supplies a fresh token, pulls the current ResultSet, and reprocesses the dataset end to end. The numbers your stakeholders see always trace back to the most recent successful query against the warehouse, with no export, no upload, and no stale snapshot in between. Your modeled data stays in Snowflake, your governance stays intact, and the understanding finally reaches the people who need it.

A Practical Example: Revenue Decomposition

To make this concrete, picture a subscription business whose analytics engineers maintain a daily revenue mart in Snowflake. The mart has one row per day per product category per region, with columns for new revenue, expansion revenue, churned revenue, and active accounts. It is modeled in dbt, refreshed every few hours, and trusted by finance. The head of revenue operations connects it to QuantumLayers with a query that pulls the last 180 days, scoped to the read-only role described earlier.

The resulting dataset is a few thousand rows, comfortably inside a single partition. She generates AI insights, and the statistical pipeline surfaces several findings: a structural break in expansion revenue starting six weeks ago that nobody had isolated, a significant difference in churn across regions that finance had suspected but never quantified, and a leading relationship where movement in new-account growth predicts revenue changes a month later. None of these required her to write SQL beyond the initial SELECT, and none required an analyst to investigate. The mart that used to answer one question per request now answers them in conversation.

The deeper value shows up when this Snowflake dataset is merged with sources that do not live in the warehouse, such as campaign data from an ad platform or budget targets from a spreadsheet. Joining them on region and month produces cross-source views that no single system can provide on its own. The mechanics of multi-source merging are covered in the data ingestion challenge.

Keeping the Data Fresh

Once the connection is saved there are three ways to keep the dataset current. The first is a manual refresh from the dataset view, which re-runs the stored statement against Snowflake and reprocesses the result on demand. The second is a scheduled report: configure a daily or weekly cadence, and QuantumLayers re-syncs the query, regenerates insights against the latest data, and emails the result to whoever needs it, with no one logging into the warehouse. The third is QL-Agent, where you can simply ask it to refresh the Snowflake dataset and summarize what changed since last week, and it runs the query, compares states, and replies with a plain-language summary and inline charts.

In every case the SQL statement in the connection behaves as a reporting query rather than an extraction query. It should return the analytical dataset a specific question needs, not a raw dump of a fact table. Aggregating and filtering at the source keeps the response in a single partition, keeps refreshes fast, and keeps the dataset focused on what the analysis is actually about.

Conclusion

Snowflake already holds the modeled, governed data. QuantumLayers turns it into analysis. The connection between them is a POST request: an account endpoint, a SQL statement, an execution context, and a key-pair credential. No driver to install, no extraction pipeline to maintain, no stale exports to chase. The SQL API returns results as structured JSON, and the platform’s automatic detection ingests that JSON into a live dataset with statistical profiling, AI-generated insights, interactive charts, and conversational querying through QL-Agent.

The SQL statement defines the analytical scope, runs against the live state of your warehouse, and re-runs on every refresh. Authentication today routes a signed key-pair token through your endpoint, and with native custom-header support arriving the setup becomes fully self-contained inside the connector. If your analytical data lives in Snowflake, this is the shortest path from a governed table to a living, queryable dataset. Set up a read-only service role, register a key pair, and point QuantumLayers at the statements endpoint to connect your first Snowflake source from the Connect page.


This post is part of the QuantumLayers blog series on connecting live data sources for AI-powered analysis. For the companion guide on querying a lakehouse, see Querying Databricks from QuantumLayers. For the general walkthrough of the REST API connector, see Live Data Without the Export Cycle. For what happens to your data once it lands in the platform, see Understanding Your Data: A Comprehensive Guide to Statistical Analysis. Connect your first dataset at www.quantumlayers.com.