The configuration in the directory for Fluentd / td-agent is used
to watch changes to Docker log files. The kubelet creates symlinks that
capture the pod name, namespace, container name & Docker container ID
to the docker logs for pods in /var/log/containers directory on the host.
If running this fluentd configuration in a Docker container, the /var/log
directory should be mounted in the container.

These logs are then submitted to Elasticsearch which assumes the
installation of the fluent-plugin-elasticsearch & the
fluent-plugin-kubernetes_metadata_filter plugins.
See https://github.com/uken/fluent-plugin-elasticsearch &
https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter for
more information about the plugins.

Maintainer: OpenShift Integration Services <dev@lists.openshift.redhat.com>

Example
=======
A line in the Docker log file might look like this JSON:

{"log":"2014/09/25 21:15:03 Got request with path wombat\n",
 "stream":"stderr",
  "time":"2014-09-25T21:15:03.499185026Z"}

The time_format specification below makes sure we properly
parse the time format produced by Docker. This will be
submitted to Elasticsearch and should appear like:
$ curl 'http://elasticsearch-logging.default:9200/_search?pretty'
...
{
     "_index" : "logstash-2014.09.25",
     "_type" : "fluentd",
     "_id" : "VBrbor2QTuGpsQyTCdfzqA",
     "_score" : 1.0,
     "_source":{"log":"2014/09/25 22:45:50 Got request with path wombat\n",
                "stream":"stderr","tag":"docker.container.all",
                "@timestamp":"2014-09-25T22:45:50+00:00"}
   },
...

The Kubernetes fluentd plugin is used to write the Kubernetes metadata to the log
record & add labels to the log record if properly configured. This enables users
to filter & search logs on any metadata.
For example a Docker container's logs might be in the directory:

 /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b

and in the file:

 997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log

where 997599971ee6... is the Docker ID of the running container.
The Kubernetes kubelet makes a symbolic link to this file on the host machine
in the /var/log/containers directory which includes the pod name and the Kubernetes
container name:

   synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
   ->
   /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log

The /var/log directory on the host is mapped to the /var/log directory in the container
running this instance of Fluentd and we end up collecting the file:

  /var/log/containers/synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log

This results in the tag:

 var.log.containers.synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log

The Kubernetes fluentd plugin is used to extract the namespace, pod name & container name
which are added to the log message as a kubernetes field object & the Docker container ID
is also added under the docker field object.
The final tag is:

  kubernetes.var.log.containers.synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log

And the final log record look like:

{
  "log":"2014/09/25 21:15:03 Got request with path wombat\n",
  "stream":"stderr",
  "time":"2014-09-25T21:15:03.499185026Z",
  "kubernetes": {
    "namespace": "default",
    "pod_name": "synthetic-logger-0.25lps-pod",
    container_name: "synth-lgr"
  },
  "docker": {
    "container_id": "997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b"
  }
}

This makes it easier for users to search for logs by pod name or by
the name of the Kubernetes container regardless of how many times the
Kubernetes pod has been restarted (resulting in a several Docker container IDs).

