Streaming logs
If you have any questions about the real-time log streaming service, please contact us at support@transparentedge.eu.
To activate the real-time log streaming service, simply access the dashboard at https://dashboard.transparentcdn.com/ and go to the Logs section. In the Streaming tab, you will find everything you need.
Once the service is activated, you can download a zip file that contains the necessary digital certificates to authenticate your consumers, as well as a set of preconfigured templates with your data for consuming logs using Filebeat, Logstash, and Python.
Furthermore, you can easily add the IP addresses where you will install the consumer(s), so that the necessary firewall rules are automatically adjusted in our brokers.
The templates are preconfigured with all the necessary data, but let's go through the contents of the zip file and certain important requirements and parameters to consider. One very important parameter is the Consumer Group.
Connection Parameters:
The address of our brokers:
kafka1.edgetcdn.io
kafka2.edgetcdn.io
kafka3.edgetcdn.io
The port to use, which will be 9093
Contents of the zip:
Client's public certificate:
c<ID>.crt.pem
Client's private certificate:
c<ID>.key.pem
Keystore in PKCS12 format:
c<ID>.keystore.p12
The password used to encrypt the keystore and truststore:
password.txt
Public certificate of our CA:
transparentcdnCA.pem
Truststore with our CA (required in some consumers):
truststore.p12
Filebeat template:
filebeat.yml
Logstash template:
kafka-logstash.conf
Simple Python consumer:
consumer.py
Other data:
The Topic to subscribe to will be
c<ID>
The prefix of the Consumer Group to join your consumers. For example, if your
ID
is83
, you will subscribe to the topicc83
, and you can join your consumers to any "Consumer Group" starting withc83_
, such asc83_group1
,c83_test
,c83_pre
... You can find more information about consumer groups here.
We will need:
The IP address(es) from where your consumers will connect. You can add them from the dashboard (please allow a margin of 5 minutes for them to be active on our firewall).
Consuming the logs
Currently, there are numerous destinations for your logs. You may be interested in ingesting them into Elasticsearch for data analytics, or perhaps uploading them to a third-party service like Datadog or Amazon S3. The options are nearly endless and will greatly depend on your business needs.
That's why, staying true to our philosophy of keeping things as simple as possible, we are going to suggest that you use two widely used tools in the community: Filebeat and/or Logstash to consume your logs from our Streaming Log system.
Filebeat vs Logstash
It is very common, especially for people who are not familiar with these types of technologies, to confuse when to use Logstash, when to use Filebeat, or when to use them together, which is also possible. Here we will try to explain it in a somewhat simplified manner so that you can make a decision on this matter.
Logstash is a Java-based program that is part of the ELK stack (Elasticsearch - Logstash - Kibana) developed and maintained by ElasticSearch company.
On the other hand, Filebeat is written in Go by the same company and emerged as a response to the growing community need for a lightweight tool to transport logs since Logstash consumes more resources being written in Java.
Filebeat, as I mentioned, is a very lightweight software that allows you to transport logs from one place to another, similar to Logstash (although Logstash is not as lightweight). However, Logstash is much more versatile and powerful than Filebeat. It allows you to consume logs (inputs) from a greater number of sources and send them to a greater number of destinations (outputs).
Here are the links to the Logstash and Filebeat Input and Output documentation:
Therefore, using Logstash or Filebeat to retrieve logs from our Log Streaming system will depend on your specific needs, especially the final destination of the logs, the log rate per second, and whether you require any log transformations.
In summary, our recommendation is to use Filebeat whenever possible since it is lighter and easier to configure. If you need an output that is not supported by Filebeat or require log transformations, then you can use Logstash.
Remember that you always have a third option, which is also valid and considered in this documentation, which is to write your own consumer using your favorite programming language.
Consuming logs using Filebeat
Let's now go through a simple deployment of Filebeat on a Debian server as a first step, where we will store the logs in a text file.
The official documentation can be found at: https://www.elastic.co/guide/en/beats/filebeat/current/index.html
We will use the following example data, but remember that the zip file you downloaded after activating the service already contains a template called filebeat.yml
with all the necessary information.
Certificates
c83.crt.pem
yc83.key.pem
Password:
password
Topic:
c83
Consumer group:
c83_filebeat
First, we download and install the Filebeat package on our server:
We enable the Kafka module:
We edit the Filebeat configuration /etc/filebeat/filebeat.yml
and paste the data we have in the filebeat.yml
template. You will need to edit the following parameters if you copy the certificates to different locations or modify the path where the files will be dumped:
ssl.certificate:
Location ofc<ID>.crt.pem
ssl.key:
Location ofc<ID>.key.pem
ssl.certificate_authorities:
Location oftransparentcdnCA.pem
path:
Final destination path where the consumed logs will be deposited Filebeat.
On the server where you configure Filebeat, copy the public and private key of the certificate, as well as the Transparent Edge Services CA, to the paths you have defined in the configuration.
You will also need to create the folder defined in the path
if it doesn't exist.
Once everything is configured, you just need to start the Filebeat service, usually using systemd with the command systemctl start filebeat
. If everything goes well, you will see logs being consumed in the path you defined in path:
.
Consuming logs using Logstash
Now let's see how to consume our logs using Logstash.
Note: We will use the Keystore and Truststore instead of the private-public key pair, so you will need to copy them to the server where you run Logstash. Additionally, the default Systemd service uses the Logstash user, so it should have read permissions for these files. For the example, we will leave them in /etc/logstash/certs.
We will install Logstash from the official package repository. Execute the following commands to add the repository and install Logstash. Alternatively, you can follow the official guide at https://www.elastic.co/guide/en/logstash/current/installing-logstash.html
Now we will configure a pipeline that will consume logs from our Kafka servers and dump them into a JSON file, categorizing each log field.
Remember that Logstash offers multiple inputs/outputs to different systems and allows you to customize and mutate logs. For more information, please refer to the following resources: (Input plugins, Output plugins).
To do this, create a new file in /etc/logstash/conf.d/kafka-logstash.conf
with the content you received in the kafka-logstash.conf
template in the zip file you downloaded from the panel. You will need to edit the following parameters if you copy the certificates to different locations or modify the file dumping path:
ssl_keystore_location:
Location ofc<ID>.keystore.p12
ssl_truststore_location:
Location oftruststore.p12
path => :
File dumping path
Since the service will be running with the user "logstash", we need to ensure that we have the correct permissions for the certificates, configuration, and the final directory where Logstash will write:
Now all that's left is to start Logstash. We can do it through the command line to verify that everything is working correctly:
Remember to verify that all the necessary files are accessible by the "logstash" user, including the certificates and the destination file + folder.
After a few seconds, you will start receiving the logs in the file or output that you have specified in JSON format. Here's an example:
If everything went well, you can cancel the previous command and let the service run with:
Consuming logs through a custom Python script
You can write a consumer in many different programming languages. Let's see an example using Python, using the "confluent-kafka-python" client or library at https://github.com/confluentinc/confluent-kafka-python.
Another popular option is https://github.com/dpkp/kafka-python.
Here's the example (once again, we'll use Debian as the operating system):
We install the necessary packages:
We create the Python file with the content you received in the consumer.py
template. Below are some example data. The important part is the configuration section. You will need to edit the following parameters if you copy the certificates to different locations:
ssl.ca.location:
Ubicación detransparentcdnCA.pem
ssl.keystore.location:
Ubicación dec<ID>.keystore.p12
If everything is correct, upon starting the consumer, you will receive the following message and begin consuming from the topic:
What are Consumer Groups?
Traditionally, message brokers operated in two ways:
Queue: Messages are published once and consumed once.
Pub/Sub: Messages are published once and consumed multiple times.
Kafka can operate in both ways thanks to Consumer Groups:
If we want to act as a queue, we put all consumers in the same consumer group.
If we want to act as a pub/sub, each consumer goes in a different group.
Examples
Currently, we create topics with 2 partitions by default (which can be increased upon request). Let's see some examples with a topic with 2 partitions:
We start 3 consumers, all in the same consumer group. One of them consumes from partition 0, another from partition 1, and the last one remains idle. This achieves parallel processing and high availability. (High availability can also be achieved with 2 consumers: if one fails, the other will consume from both partitions). Messages will be distributed between consumer 1 and consumer 2.
We start 2 consumers, each in a different consumer group: Both will receive ALL messages from the topic and will be completely isolated. This is useful if we want to perform different processing on the received messages for each consumer. We can add more consumers to this scheme, and the result will be the same: each of them will receive all messages from all partitions.
Since we work with logs, and unless multiple different post-processes are required, it is most interesting to have the consumers in the same consumer group. It is highly likely that only one consumer is sufficient given the performance offered by Kafka. Additional consumers can be started if one of them cannot consume in real-time or if we want parallel processing + high availability.
Consuming WAF Logs
If you have the WAF service enabled, you can also consume real-time audit logs. Unlike the delivery service, these logs are in JSON format.
You can use the Python consumer mentioned earlier, with the only difference being the topic to which you subscribe. In this case, the topic will have the following format: c<ID>_waf
.
For example, if your company has the <ID>
(client identifier) 83, you should subscribe to the topic c83_waf
.
Audit Log Format
The format of the WAF service is a standard JSON object.
This JSON contains all the relevant request data: the HTTP code, response headers, request headers, URL, method, client IP..., basically all the information. Additionally, it contains a field with all the details related to the attack detected.
It refers to the fields by separating them with a dot, as it is the notation used in jq, a JSON processor. For example, if we refer to the field .transaction.messages.message
in jq
, in Python it would be: ["transaction"]["messages"]["message"]
A single request can trigger one or multiple WAF rules, which is why the field .transaction.messages
is an array of different messages.
To facilitate the handling of logs, we send them from the CDN by separating each attack. Therefore, the field .transaction.messages
ceases to be an array and becomes a single JSON object that contains the information of a single attack.
If a malicious user makes a POST request to /wp-admin
and our CDN detects 2 attacks in the same request, you will receive 2 logs, each containing the information of one attack.
Here we access the "messages" field using jq
, but we could also do it in Python or other languages. We examine the "message" field of the detected attack for each request.
One of the important fields contained within messages
is ruleId
. It is a numerical value that allows us to set exceptions.
Here is an example of the content of the messages
field:
Last updated