• /
  • Log in
  • Free account

Baseline your ingest data

Baseline

In this stage it is necessary to get a high level view of all of the telemetry currently being generated by your organization. The unit focuses on breaking down ingest stats into various groups such as account, telemetry type, and application. These figures will be used to inform the Optimize your ingest data and Forecast your ingest data stages.

You'll learn how to generate a structured breakdown report for the following dimensions:

  • Organization
  • Sub account
  • Billable Telemetry Type

In addition you'll learn how to create highly granular breakdowns including:

  • Application (APM|Browser|Mobile)
  • K8s Cluster
  • Infrastructure Integration

Desired outcome

Understand exactly which groups within the org are contributing which types of data and how much.

Prerequisites

Process

Install the data governance baseline dashboard
Add ingest target indicators to your dashboard
Generate a tabular 30 day ingest report
Customize your report
Detect ingest anomalies
Install the entity breakdown dashboard (Optional)
Install the cloud integration dashboard (optional)

Install the data governance baseline dashboard

  1. Navigate to the data governance quickstart.
  2. Click Install this quickstart in the upper right portion of your browser window.
  3. Select your top level master account or POA account in the account drop down.
  4. Click Done since there is no agent to install.
  5. When the quickstart is done installing, open the Data Governance Baseline dashboard.

That will bring you to the newly installed dashboard.

Dashboard Overview

The main overview tab shows a variety of charts including some powerful time series views.

Organization wide baseline ingest time series

Organization wide baseline ingest time series

The second tab provides a baseline report by sub-account and usage metric.

Organization wide baseline tabular view

Organization wide baseline tabular view

The remaining tabs provide detailed views of specific telemetry types such as browser data, apm data, logs, and traces. For example this screenshot shows the browser detail page

Baseline Browser ingest

Example of an ingest detail focused on a single telemetry type (in this case Browser data)

Detail tabs include:

  • APM: ApmEventsBytes
  • Tracing: TracingBytes
  • Browser: BrowserEventsBytes
  • Mobile: MobileEventsBytes
  • Infra (Host): InfraHostBytes
  • Infra (Process):InfraProcessBytes
  • Infra (Integration): InfraIntegrationBytes
  • Custom Events: CustomEventsBytes
  • Serverless: ServerlessBytes
  • Pixie: PixieBytes

Add ingest target indicators to your dashboard

In the prerequisites section we discussed the concept of a monthly usage target. You may actually have several targets to help keep you on track:

  • An overall organizational target on daily rate or monthly ingest.
  • Targets per data type to ensure the optimal breakdown (for example 1 TB per day for logs and 2 TB per day for metrics).
  • Targets for specific sub-accounts or business units.

In our example we have an organization that targets their total organizational ingest to < 360 TB per month. This was a new target after having reduced ingest down from over 20TB per day (600 TB per month).

To make the target easier to measure against we added a threshold line chart by adding the static number 360000 to our SELECT statement.

SELECT 360000, rate(sum(GigabytesIngested), 30 day) AS '30 Day Rate' FROM NrConsumption WHERE productLine='DataPlatform' since 30 days ago limit max compare with 1 month ago TIMESERIES 7 days
Thirty day ingest target line

We an use NRQL to render a line representing our target thirty day ingest target.

We can also apply a daily rate target line. Let's just divide 360000 by 30 and we'll use 12000 as our daily rate target. Update the Daily Ingest Rate (Compare With 3 Months Prior) chart:

SELECT 12000, rate(sum(GigabytesIngested), 1 day) AS avgGbIngestTimeseries FROM NrConsumption WHERE productLine='DataPlatform' TIMESERIES AUTO since 9 months ago limit max COMPARE WITH 3 months ago
Daily ingest target line

We an use NRQL to render a line representing our daily ingest target.

Generate a tabular 30 day ingest report

  1. Open the previously installed Data governance baseline dashboard.
  2. Click on the Baseline report tab.
  3. Click on ... in the upper right of the "Last 30 Days" table and choose Export as CSV
  4. Import the CSV into Google Sheets or the spreadsheet of your choice.

Alternatively, if you didn't install the dashboard, you may simply use this query to create a custom chart in Query Builder:

SELECT sum(GigabytesIngested) AS 'gb_ingest_30_day_sum', rate(sum(GigabytesIngested), 1 day) AS 'gb_ingest_daily_rate', derivative(GigabytesIngested, 90 day) as 'gb_ingest_90_day_derivative' FROM NrConsumption WHERE productLine='DataPlatform' since 30 days ago facet consumingAccountName, usageMetric limit max

Below is an example of a sheet we imported into Google Sheets.

Baseline tabular spreadsheet

A spreadsheet exported from the baseline dashboard tabular page

The screenshot shows the table sorted by 30 day ingest total.

Feel free to adjust your timeline and some of the details as needed. For example, we chose to extract a 90 day derivative to have some sense of change over the past few months. You could easily alter the time period of the derivative to suit your objectives.

Customize your report

Add useful columns to your report in order to facilitate other phases of data governance such as optimize and forecast. The following fields will help guide optimization and planning decisions:

  • Notes: Note any growth anomalies and any relevant explanations for them. Indicate any major expected growth if foreseen.
  • Technical Contact: Name of the manager of a given sub-account or someone related to a specific telemetry type.

Detect ingest anomalies

Alert on ingest anomalies

Use this ingest alerts guide to make sure that an increase in data consumption doesn't catch you by surprise. At a minimum, create:

  • A threshold alert to notify if you exceed monthly targets for data ingest beyond seasonal increases
  • A baseline alert to notify you of a sudden sharp increase ingest data

In addition to using alerts to identify consumption anomalies, you can use lookout to explore potential ingest anomalies.

Lookout View

Lookout allows you to provide nearly any NRQL query and it will search for anomalies over a given period of time. This view is based on the query

SELECT rate(sum(GigabytesIngested), 1 day) AS avgGbIngest FROM NrConsumption WHERE productLine='DataPlatform' FACET usageMetric
Lookout view usage metric

We can use Lookout to find anomaly in our ingest by usageMetric

Change the facet field to consumingAcountName to get this view:

Lookout view consuming account

We can use Lookout to find anomaly in our ingest by consumingAccountName

Install the entity breakdown dashboard (Optional)

In a previous section you installed the ingest baseline dashboard that uses NrConsumption as its primary source. In addition to that high level view you can create other visualizations that use bytescountestimate() to estimate ingest for nearly any event or metric. A detailed overview of bytescountestimate() was discussed in the prerequisites section.

  1. Go to the same quickstart you used for the baseline dashboard.

  2. Click Install this quickstart in the upper right section of your browser window.

  3. Don't install this instance of the dashboard into a POA account. Instead, install it into any account that contains APM, Browser, Mobile applications or K8s clusters import dashboard function. You can install this dashboard into multiple accounts. You can install it into a top-level parent account and modify the dashboard so you have account-specific charts all in one dashboard.

  4. Click Done since there is no agent to install.

  5. When the quickstart is done installing open the Data Governance Entity Breakdowns dashboard.

    Entity breakdown dashboard

    The entity breakdown dashboard uses bytecountestimate() to facet ingest by useful attributes such as application or cluster name

You can refer back to this section to see exactly which event types are used in these breakdowns.

Tip

These queries consume more resources because they don't work from a pre-aggregated data source like NrConsumption. You may need to adjust the time frames by using additional where and limit clauses to make them work better in some of your environments.

Install the cloud integration dashboard (optional)

Cloud Integrations can often be a significant source of data ingest growth. Without good visualizations it can be very difficult to pinpoint where the growth is coming from. This is partly because these integrations are so easy to configure and they are not part of an organization's normal CI/CD pipeline. They may also not be part of a formal configuration management system. Fortunately this powerful set of dashboards can be installed directly from New Relic I/O. Individual dashboards installed by this package include:

  • AWS Integrations
  • Azure Integrations
  • GCP Integrations
  • On-Host Integrations
  • Kubernetes
Infra integrations dashboard

This quickstart contains a highly granular set of dashboard breaking down data by nearly every cloud integration, on-host integration, as well as the K8s integration

Exercise

Answering the following questions will help you develop confidence in your ability to interpret baseline data and make correct inferences. These questions can be answered using the Data Ingest Baseline and Data Ingest Entity Breakdown dashboards. Install those dashboards as described and see how many of these questions you can answer.

Questions
What is the typical daily ingest rate for the entire organization (all sub-accounts) in the past week? What was it three months prior?
What are the top three telemetry types (for the organization as a whole) by ingest? List each telemetry type and its most recent 30 day ingest rate.
How many sub-accounts contribute to this organizations ingest?
How many sub-accounts (if any) currently contribute more than 50TB per month?
What are the top three sub-accounts in terms of ingest for the past 30 days?
What is the GB ingest for the calendar month of this past January for the highest consuming sub-account?
What are the top three sub-accounts in terms of ApmEventsBytes ingest for the past 30 days
What is the single largest increase in terms of telemetry type ingest for a given sub-account in the last 9 months? What about decreases?
Go to the account that contributes the most ApmEventsBytes and install/open the Data Governance Entity Breakdown dashboard. List the top three APM applications by ingest for the past 24 hours and their respective 24 hour ingest rates.

Conclusion

The process section took you through the creation of data ingest visualizations and reports. You can now review data ingest with a data driven visual approach that you and your peers can use to collaborate around.

Going forward, decide which visualizations to use for:

Additional resources

Create issueEdit page
Copyright © 2022 New Relic Inc.