In this post, we would like to explain a few common syslog-ng error and warning messages, what they mean, and how to solve them.
Destination queue full
Destination queue full, dropping messages; queue_len='10000', log_fifo_size='10000', count='4', persist_name='afsocket_dd_qfile(stream,serverdown:514)'
This is a very common and very important warning message because it points out a message loss. The root cause of this error is a missing flow-control flag in the log path.
Just look at this very simple syslog-ng configuration:
source s_network{network(port(1212));}; destination d_network{network("serverdown", port(514));}; log { source(s_network); destination(d_network); #flags(flow-control) -> add and uncomment this line to avoid this warning message };
Because the server is not available and the log path is not flow-controlled, syslog-ng won’t stop reading the source. When the queue of the destination is full, the destination will drop the newer messages and generate this warning message.
In the warning message, you can see the configured size of the queue (log_fifo_size=10000), the number of messages in the queue (queue_len=10000, so it’s full), and the number of dropped messages (count=4). The destination is identified by it’s address:port pair (serverdown:514).
Always use flow-control if you want to avoid message loss.
One more important thing: if you don’t use flow-control, syslog-ng can drop a message even if the server is alive. If the remote server accepts the logs slower than the sender syslog-ng receives, then the sender syslog-ng will fill up the destination queue, then drop the newer messages.
Sometimes this error happens only at a specic time interval, for example only between 7-8 AM or between 16-17 PM (when your users log in or log off and that generates a lot of messages within a short interval).
So always enable this flag and only disable it if you have a good reason to do that.
SSL errors
Alert unknown ca
SSL error while writing stream; tls_error='SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca'
This is also a common error message. It means that the other (remote) side could not verify the certificate sent by syslog-ng.
To fix it, first check the logs on the remote site. Basically, this message means that the syslog-ng on the receiving was not able to find the CA certificate that signed this certificate.
PEM routines:PEM_read_bio:no start line
This is a rare error message when using TLS. The message comes from OpenSSL (because syslog-ng uses OpenSSL for TLS). You will receive the same error message if you check the certificate using this openssl command:
pzolee@thor-x1:~/cert_no_start_line/certs$ openssl x509 -in cert.pem -text unable to load certificate 140178126276248:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:701:Expecting: TRUSTED CERTIFICATE
The problem is that the certificate contains some character(s) that OpenSSL cannot understand. Mostly it happens when the certificate comes from Windows, and you want to use it on a Linux-based computer. On Windows the EOL character is different (\r\n) compared to Linux (\n).
If you open the certificate with an editor (like mcedit), you can see the problem (^M chars).
How to solve it:
- save the certificate using UTF-8 on Windows. You can do that for example with Notepad++, select the Encoding option, change the value from “UTF-8-BOM” to “UTF-8” and save it. (note: Windows notepad is not able to save the file in normal UTF-8, even if you select it, this will UTF-8-BOOM that is not good for us).
- run dos2unix cert.pem on Linux. This will convert the file to a Linux-compatible style. (Alternatively, replace the EOL characters in the file manually.)
TID is already used
TID is already used; proto='0x202c6c0', TID='61b6456d2f02052780d0d8930cbd043857c2463fcb6014b748b1450595a682', client='10.140.35.9' Syslog connection closed;
This happens when you are using the RLTP protocol (Reliable Log Transfer Protocol, only available in syslog-ng Premium Edition). When a client that uses RLTP connects to the server the first time, it generates a persistent ID and sends it to the server during the handshake process. This is the TID.
The server allows only one connection with the same TID.
Now, if the client loses the connection to the server silently (for example the UTP cable is pulled from the host or other network issues happen), the server is unable to detect this kind of connection loss (due to the missing TCP RESET package).
If the client tries to reconnect within a short time interval, it will send the same TID and the server will “think” it already has a live connection with this TID, and thus drop the new connection due to the duplicated TID.
Solution: this error will be eliminated automatically, because the RLTP server will close the connection if there were no new messages from the client within the timeout frame. After the timeout of the RLTP server, the client will be able to reconnect to the server (when the time_reopen() of the client has elapsed).
If this error message appears regularly, it means your network may be unstable, and sometimes the client loses the connection to the server in an abnormal way.