Skip to main content

HTTP

The HTTP scraper allows you to collect data from HTTP endpoints and APIs. It supports various authentication methods and data transformation capabilities.

aws-scraper.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: lastfm-scraper
spec:
http:
- type: 'LastFM::Singer'
name: '$.name'
id: '$.url'
env:
- name: api_key
valueFrom:
secretKeyRef:
name: lastfm
key: API_KEY
url: 'http://ws.audioscrobbler.com/2.0/?method=chart.gettopartists&api_key={{.api_key}}&format=json'
transform:
expr: |
dyn(config).artists.artist.map(item, item).toJSON()
FieldDescriptionSchemeRequired
scheduleSpecify the interval to scrape in cron format. Defaults to every 15 minutes.Cron
retentionSettings for retaining changes, analysis and scraped itemsRetention
httpSpecifies the list of HTTP configurations to scrape.[]HTTP
logLevelSpecify the level of logging.string
fullSet to true to extract changes & access logs from scraped configurations. Defaults to false.bool

HTTP Configuration

FieldDescriptionScheme
url*

The URL to send the HTTP request to. Must include the scheme (http:// or https://)

string

bearer

Bearer token for authentication

EnvVar

body

Request body for POST/PUT requests

string

connection

Reference to a pre-configured HTTP connection. Use this to reuse connection settings across multiple scrapers

string

digest

Enable Digest authentication, a more secure alternative to Basic authentication

boolean

env

Environment variables to be used in the templating

[]EnvVar

headers

HTTP headers to include in the request

map[string]EnvVar

method

HTTP method to use (GET, POST, etc.)

string

ntlm

Enable Windows NTLM authentication protocol. Typically used in corporate environments

boolean

ntlmv2

Enable NTLMv2 authentication protocol, a more secure version of NTLM

boolean

oauth.clientID

OAuth 2.0 client identifier

EnvVar

oauth.clientSecret

OAuth 2.0 client secret

EnvVar

oauth.params

Additional OAuth 2.0 parameters to include in the token request

[map[string]string]

oauth.scopes

List of OAuth 2.0 scopes

[]string

oauth.tokenURL

OAuth 2.0 token endpoint URL

string

password

Password for Basic or Digest authentication.

EnvVar

tls.ca

Custom Certificate Authority (CA) certificate for TLS verification. Used for self-signed or internal certificates

EnvVar

tls.cert

Client TLS certificate for mutual TLS authentication (mTLS)

EnvVar

tls.handshakeTimeout

Maximum time to wait for TLS handshake completion. Example: "30s", "1m"

Duration

tls.insecureSkipVerify

Skip TLS certificate verification. Use with caution - only enable in trusted environments or for testing

boolean

tls.key

Private key corresponding to the client TLS certificate for mTLS

EnvVar

username

Username for Basic or Digest authentication.

EnvVar

labels

Labels for each config item.

map[string]string

properties

Custom templatable properties for the scraped config items.

[]ConfigProperty

tags

Tags for each config item. Max allowed: 5

[]ConfigTag

transform

Transform configs after they've been scraped

Transform

Mapping

Custom scrapers require you to define the id and type for each scraped item. For example, when you scrape a file containing a JSON array, where each array element represents a config item, you must specify the id and type for those items. You can achieve this by using mappings in your custom scraper configuration.

FieldDescriptionScheme
id*

A static value or JSONPath expression to use as the ID for the resource.

string or JSONPath

name*

A static value or JSONPath expression to use as the name for the resource.

string or JSONPath

type*

A static value or JSONPath expression to use as the type for the resource.

string or JSONPath

class

A static value or JSONPath expression to use as the class for the resource.

string or JSONPath

createFields

A list of JSONPath expressions used to identify the created time of the config. If multiple fields are specified, the first non-empty value will be used.

[]jsonpath

deleteFields

A list of JSONPath expressions used to identify the deleted time of the config. If multiple fields are specified, the first non-empty value will be used.

[]jsonpath

description

A static value or JSONPath expression to use as the description for the resource.

string or JSONPath

format

Format of config item, defaults to JSON, available options are JSON, properties. See Formats

string

health

A static value or JSONPath expression to use as the health of the config item.

string or JSONPath

items

A JSONPath expression to use to extract individual items from the resource. Items are extracted first and then the ID, Name, Type and transformations are applied for each item.

JSONPath

status

A static value or JSONPath expression to use as the status of the config item.

string or JSONPath

timestampFormat

A Go time format string used to parse timestamps in createFields and deleteFields. (Default: RFC3339)

string

Formats

JSON

The scraper stores config items as jsonb fields in PostgreSQL.

Resource providers typically return the JSON used. e.g. kubectl get -o json or aws --output=json.

When you display the config, the UI automatically converts the JSON data to YAML for improved readability.

XML / Properties

The scraper stores non-JSON files as JSON using:

{ 'format': 'xml', 'content': '<root>..</root>' }

You can still access non-JSON content in scripts using config.content.

The UI formats and renders XML appropriately.

Extracting Changes & Access Logs

Custom scrapers ingest changes & access logs from external systems when you enable the full option.

Every single config is expected to have at these 3 top-level fields

  • config
  • changes
  • access_logs
info

They could have more fields or even missing some of these fields. The point is that only these fields are extracted.

Consider a file that contains the following json data.

{
"reg_no": "A123",
"config": {
"meta": "this is the actual config that'll be stored."
},
"changes": [
{
"action": "drive",
"summary": "car color changed to blue",
"unrelated_stuff": 123
}
],
"access_logs": [
{
"config_id": "99024949-9118-4dcb-a3a0-b8f1536bebd0",
"external_user_id": "a3542241-4750-11f0-8000-e0146ce375e6",
"created_at": "2025-01-01"
},
{
"config_id": "9d9e51a7-6956-413e-a07e-a6aeb3f4877f",
"external_user_id": "a5c2e8e3-4750-11f0-8000-f4eaacabd632",
"created_at": "2025-01-02"
}
]
}

A regular scraper saves the entire json as a config. However, with the full option, the scraper extracts the config, changes and access logs.

apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: file-scraper
spec:
full: true
file:
- type: Car
id: $.reg_no
paths:
- fixtures/data/car_changes.json

The resulting config is:

{
"meta": "this is the actual config that'll be stored."
}

and the scraper records the following new config change on that config:

{
"action": "drive",
"summary": "car color changed to blue",
"unrelated_stuff": 123
}

and the access logs will be saved to

[
{
"config_id": "99024949-9118-4dcb-a3a0-b8f1536bebd0",
"external_user_id": "a3542241-4750-11f0-8000-e0146ce375e6",
"created_at": "2025-01-01"
},
{
"config_id": "9d9e51a7-6956-413e-a07e-a6aeb3f4877f",
"external_user_id": "a5c2e8e3-4750-11f0-8000-f4eaacabd632",
"created_at": "2025-01-02"
}
]