HTTP

The HTTP scraper allows you to collect data from HTTP endpoints and APIs. It supports various authentication methods and data transformation capabilities.

aws-scraper.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
  name: lastfm-scraper
spec:
  http:
    - type: 'LastFM::Singer'
      name: '$.name'
      id: '$.url'
      env:
        - name: api_key
          valueFrom:
            secretKeyRef:
              name: lastfm
              key: API_KEY
      url: 'http://ws.audioscrobbler.com/2.0/?method=chart.gettopartists&api_key={{.api_key}}&format=json'
      transform:
        expr: |
          dyn(config).artists.artist.map(item, item).toJSON()

Field	Description	Scheme
`schedule`	Specify the interval to scrape in cron format. Defaults to every 15 minutes.	Cron
`retention`	Settings for retaining changes, analysis and scraped items	`Retention`
`http`	Specifies the list of HTTP configurations to scrape.	`[]HTTP`
`logLevel`	Specify the level of logging.	`string`
`full`	Set to `true` to extract changes & access logs from scraped configurations. Defaults to `false`.	`bool`

HTTP Configuration

Field	Description	Scheme
`url*`	The URL to send the HTTP request to. Must include the scheme (http:// or https://)	`string`
`bearer`	Bearer token for authentication	EnvVar
`body`	Request body for POST/PUT requests	`string`
`connection`	Reference to a pre-configured HTTP connection. Use this to reuse connection settings across multiple scrapers	`string`
`digest`	Enable Digest authentication, a more secure alternative to Basic authentication	boolean
`env`	Environment variables to be used in the templating	[]EnvVar
`headers`	HTTP headers to include in the request	`map[string]EnvVar`
`method`	HTTP method to use (GET, POST, etc.)	`string`
`ntlm`	Enable Windows NTLM authentication protocol. Typically used in corporate environments	boolean
`ntlmv2`	Enable NTLMv2 authentication protocol, a more secure version of NTLM	boolean
`oauth.clientID`	OAuth 2.0 client identifier	EnvVar
`oauth.clientSecret`	OAuth 2.0 client secret	EnvVar
`oauth.params`	Additional OAuth 2.0 parameters to include in the token request	`[map[string]string]`
`oauth.scopes`	List of OAuth 2.0 scopes	`[]string`
`oauth.tokenURL`	OAuth 2.0 token endpoint URL	`string`
`password`	Password for Basic or Digest authentication.	EnvVar
`tls.ca`	Custom Certificate Authority (CA) certificate for TLS verification. Used for self-signed or internal certificates	EnvVar
`tls.cert`	Client TLS certificate for mutual TLS authentication (mTLS)	EnvVar
`tls.handshakeTimeout`	Maximum time to wait for TLS handshake completion. Example: "30s", "1m"	Duration
`tls.insecureSkipVerify`	Skip TLS certificate verification. Use with caution - only enable in trusted environments or for testing	boolean
`tls.key`	Private key corresponding to the client TLS certificate for mTLS	EnvVar
`username`	Username for Basic or Digest authentication.	EnvVar
`labels`	Labels for each config item.	`map[string]string`
`properties`	Custom templatable properties for the scraped config items.	`[]ConfigProperty`
`tags`	Tags for each config item. Max allowed: 5	`[]ConfigTag`
`transform`	Transform configs after they've been scraped	`Transform`

Mapping

Custom scrapers require you to define the id and type for each scraped item. For example, when you scrape a file containing a JSON array, where each array element represents a config item, you must specify the id and type for those items. You can achieve this by using mappings in your custom scraper configuration.

Field	Description	Scheme
`id*`	A static value or JSONPath expression to use as the ID for the resource.	`string` or JSONPath
`name*`	A static value or JSONPath expression to use as the name for the resource.	`string` or JSONPath
`type*`	A static value or JSONPath expression to use as the type for the resource.	`string` or JSONPath
`class`	A static value or JSONPath expression to use as the class for the resource.	`string` or JSONPath
`createFields`	A list of JSONPath expressions used to identify the created time of the config. If multiple fields are specified, the first non-empty value will be used.	[]jsonpath
`deleteFields`	A list of JSONPath expressions used to identify the deleted time of the config. If multiple fields are specified, the first non-empty value will be used.	[]jsonpath
`description`	A static value or JSONPath expression to use as the description for the resource.	`string` or JSONPath
`format`	Format of config item, defaults to JSON, available options are JSON, properties. See Formats	`string`
`health`	A static value or JSONPath expression to use as the health of the config item.	`string` or JSONPath
`items`	A JSONPath expression to use to extract individual items from the resource. Items are extracted first and then the ID, Name, Type and transformations are applied for each item.	JSONPath
`status`	A static value or JSONPath expression to use as the status of the config item.	`string` or JSONPath
`timestampFormat`	A Go time format string used to parse timestamps in createFields and deleteFields. (Default: RFC3339)	`string`

Formats

JSON

The scraper stores config items as jsonb fields in PostgreSQL.

Resource providers typically return the JSON used. e.g. kubectl get -o json or aws --output=json.

When you display the config, the UI automatically converts the JSON data to YAML for improved readability.

XML / Properties

The scraper stores non-JSON files as JSON using:

{ 'format': 'xml', 'content': '<root>..</root>' }

You can still access non-JSON content in scripts using config.content.

The UI formats and renders XML appropriately.

Extracting Changes & Access Logs

Custom scrapers ingest changes & access logs from external systems when you enable the full option.

Every single config is expected to have at these 3 top-level fields

config
changes
access_logs

info

They could have more fields or even missing some of these fields. The point is that only these fields are extracted.

Consider a file that contains the following json data.

{
  "reg_no": "A123",
  "config": {
    "meta": "this is the actual config that'll be stored."
  },
  "changes": [
    {
      "action": "drive",
      "summary": "car color changed to blue",
      "unrelated_stuff": 123
    }
  ],
  "access_logs": [
    {
      "config_id": "99024949-9118-4dcb-a3a0-b8f1536bebd0",
      "external_user_id": "a3542241-4750-11f0-8000-e0146ce375e6",
      "created_at": "2025-01-01"
    },
    {
      "config_id": "9d9e51a7-6956-413e-a07e-a6aeb3f4877f",
      "external_user_id": "a5c2e8e3-4750-11f0-8000-f4eaacabd632",
      "created_at": "2025-01-02"
    }
  ]
}

A regular scraper saves the entire json as a config. However, with the full option, the scraper extracts the config, changes and access logs.

apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
  name: file-scraper
spec:
  full: true
  file:
    - type: Car
      id: $.reg_no
      paths:
        - fixtures/data/car_changes.json

The resulting config is:

{
  "meta": "this is the actual config that'll be stored."
}

and the scraper records the following new config change on that config:

{
  "action": "drive",
  "summary": "car color changed to blue",
  "unrelated_stuff": 123
}

and the access logs will be saved to

[
  {
    "config_id": "99024949-9118-4dcb-a3a0-b8f1536bebd0",
    "external_user_id": "a3542241-4750-11f0-8000-e0146ce375e6",
    "created_at": "2025-01-01"
  },
  {
    "config_id": "9d9e51a7-6956-413e-a07e-a6aeb3f4877f",
    "external_user_id": "a5c2e8e3-4750-11f0-8000-f4eaacabd632",
    "created_at": "2025-01-02"
  }
]

HTTP Configuration​

Mapping​

Formats​

JSON​

XML / Properties​

Extracting Changes & Access Logs​