Ingestion

From S3

To ingest data from s3 you need to specify a host to connect to, but there are also other settings that can be used:

spec:
  clusterConfig:
    ingestion:
      s3connection:
        host: yourhost.com  (1)
        port: 80 # optional (2)
        credentials: # optional (3)
        ...

1	The S3 host, not optional
2	Port, optional, defaults to 80
3	Credentials to use. Since these might be bucket-dependent, they can instead be given in the ingestion job. Specifying the credentials here is explained below.

You can specify just a connection/bucket for either ingestion or deep storage or for both, but Druid only supports a single S3 connection under the hood. If two connections are specified, they must be the same. This is easiest if a dedicated S3 Connection Resource is used - not defined inline but as a dedicated object.

Prior to Druid 37.0.0, the S3Connection region field is ignored because Druid uses the AWS SDK v1, which ignores the region if the endpoint is set. The host is a required field, therefore the endpoint will always be set.

Since Druid 37.0.0, TLS is now required for S3.

S3 credentials

No matter if a connection is specified inline or as a separate object, the credentials are always specified in the same way. You need a Secret containing the access key ID and secret access key, a SecretClass and then a reference to this SecretClass where you want to specify the credentials.

The Secret:

apiVersion: v1
kind: Secret
metadata:
  name: s3-credentials
  labels:
    secrets.stackable.tech/class: s3-credentials-class  (1)
stringData:
  accessKey: YOUR_VALID_ACCESS_KEY_ID_HERE
  secretKey: YOUR_SECRET_ACCES_KEY_THATBELONGS_TO_THE_KEY_ID_HERE

1	This label connects the `Secret` to the `SecretClass`.

The SecretClass:

apiVersion: secrets.stackable.tech/v1alpha1
kind: SecretClass
metadata:
  name: s3-credentials-class
spec:
  backend:
    k8sSearch:
      searchNamespace:
        pod: {}

Referencing it:

...
credentials:
  secretClass: s3-credentials-class
...

Adding external files, e.g. for ingestion

Since Druid actively runs ingestion tasks there may be a need to make extra files available to the processes.

These could for example be client certificates used to connect to a Kafka cluster or a keytab to obtain a Kerberos ticket.

In order to make these files available the operator allows specifying extra volumes that are added to all pods deployed for this cluster.

spec:
  clusterConfig:
    extraVolumes:
      - name: google-service-account
        secret:
          secretName: google-service-account

All Volumes specified in this section are made available under /stackable/userdata/{volumename}.