Logging - What, Why and How?

Logging vs Monitoring vs Analytics

Logging is a critical tool for debugging and troubleshooting applications. It tells you what happened, and gives you the raw data to track down the issue. By capturing relevant information at various stages of the application’s execution, logging enables developers to pinpoint the root cause of errors and resolve them efficiently.

Monitoring is an essential practice for ensuring the reliability and performance of applications. It tells you how your application is behaving by continuously gathering and analyzing data about its behavior and health. With monitoring, you can detect issues early, prevent downtime, and optimize resource utilization.

Logging is also a useful tool for tracking application behavior and collecting statistics, but it’s important to note that it only covers a limited range of functionality. E.g. Logs (formerly LogTail) has functionality to provide alerts from specific log conditions.

To achieve comprehensive monitoring and analytics, it’s recommended to use additional services such as Datadog, New Relic, or Google Analytics.

What to Log?

Collecting a sufficient amount of logs is essential for identifying when errors occur and determining what was happening prior to the occurrence of an error. In addition to identifying errors and their causes, logs can also aid in investigating the root cause of an issue. This is why it’s important to include info logging in the logging process.

For thorough investigations into application failures, it’s important to focus on the “5 W”s:

Who, What, When, Where and Why

Who was using the system when it failed?
Where in the code did the application fail?
What was the system doing when it failed?
When did the failure occur?
Why did the application fail?

What Kind of Errors to Log?

Logging can be used to alert when issues arise. Logging frameworks can be configured to automatically send alerts through external services when certain errors occur.

Therefore it is important to log only cases as errors where we need to act immediately. This is not always easy to specify. So iteration is also good with logging. If we do not do anything about this case, lower error to info. It is better to log too much in the beginning, than too little.

Do not log normal application behavior as errors, e.g. business logic rule failures, input validation failures or ‘item not found’ cases. If everything is working correctly, these are not cases where engineers need to act on immediately.

In some cases it is hard to define what is an error and what is not. For example, if we are fetching data from external service, we can have following cases:

1. Data is fetched from external service -> Error is returned
    - Log depending on response status
        - Data not found -> No logging
          - Or log, if we expect that data should be there
        - Data not valid -> No logging
          - Or log, if we expect that data should be valid
        - Bad request -> Logging
            - Depends on additional info, what went wrong?
            - Maybe something was missing from request data?
2. Data is fetched from external service -> Data is returned -> Exception in our code
    - Log, as this is something that needs to be fixed immediately

Where to Log Errors?

Errors should be logged only once and when it happens:

- Exception is raised
    - Exception is caught -> Handle & log -> Return error result
    - Exception is not caught -> Global catch -> Log → Framework will return 500

Sometimes this is not as easy as it seems. For example, if we are fetching data from external service and 3rd party libary raises and exception and that is handled and re-thrown as our own exception. Own exception is then handled in our business logic.

3rd party API connection -> Exception handled and re-thrown as our own exception
Feature -> Exception handled and return a failed result
API -> Return a correct error response

3rd party API can log info on error level for debugging purposes. It doesn’t have full context of the error, so there is no knowledge if it is critical or not.

Feature knows the context and can make better decision of the severity of the error. E.g. sometimes feature can retry the request and only if it fails again, it should be logged as error.

API shouldn’t anymore log errors. It should return a correct error response. There can be also a backup handling for unhandled errors. This is often implemented as a common middleware.

How to Log?

Log in machine parseable format
Make the logs human readable as well

Separating the log message from variable information by passing variables as parameters to a logging function is a recommended best practice. This approach, known as structured logging, records additional data as separate fields, making logs easier to search, filter, and analyze.

Embedding variables directly in log messages makes it difficult for monitoring and error tracking services to parse meaningful data. Dynamic information, like user IDs, complicates event data generation and error grouping. Structured logging resolves this by isolating variable data, allowing services to process log data more reliably.

From a security perspective, it’s easier to filter out sensitive information when it’s recorded as additional data fields rather than embedded within the error message itself.

// Not good
log.error('Order ${order.id} creation failed for user ${user.id}: ' + ex.message);

// Good
log.error('Order creation failed for user', { orderId: order.id, userId: user.id, ex });

Additional data should be structured as an object, allowing services to efficiently serialize it into distinct data fields.

log.error('This is the message and it should always be same', { all data, exceptions and variables in additional data });

Additional data passed straight as metadata:

log.error('Order creation failed', order);

// Additional data in e.g. Sentry:
{
  id: 'aaaa',
  description: 'asdasfdasdf'
}

Additional data wrapped in object:

log.error('Order creation failed', { order });

// Additional data in e.g. Sentry:
{
  order: {
    id: 'aaaa',
    description: 'asdasfdasdf'
  }
}

One additional benefit of using structured logging is that it is often easier for developers to structure error messages and data consistently when messages do not contain variables and additional data is passed as an object.

log.error('Payment (id: ${payment.id}) failed - User: ${user.id}', { payment, user });
// More consistent
log.error('Payment failed', { paymentId: payment.id, userId: user.id });

Exceptions should not contain variables in their descriptions. It is advisable to use custom exceptions where additional information can be passed.

throw new Error('Order ${order.id} creation failed for user ${user.id}');
throw new OrderCreationError('Order creation failed for user', { order, userId: user.id});

Hierarchy of log levels

Different levels of logging vary based on the language and logging framework used. However, the following levels are commonly used:

DEBUG: Detailed information, typically of interest only when diagnosing problems or during development.
INFO: General information confirming that things are working as expected.
WARNING: An indication that something unexpected happened. The software is still working as expected.
ERROR: Due to a more serious problem, the software has not been able to perform some function.
CRITICAL / FATAL: A serious error, indicating that the program itself may be unable to continue running.

Use debug logs only in local environments. Debug logging can be turned on through a flag or environment variable and removed from logs after a short period.

Use warnings for unexpected events that don’t break functionality (e.g., invalid user input that still allows the software to continue, or a missing user record in another system). Manual action may be required later but isn’t urgent.

Use errors for situations where the software cannot complete a function (e.g., an external service returned an error). These often require immediate attention and can be configured to send alerts.

Personally Identifiable Information (PII)

Do not log any personally identifiable information (PII). While some services, such as Sentry, attempt to clean up PII from logged data, not all services do this automatically.

When additional data is passed as an object, it is more straightforward to incorporate filtering logic for removing PII information.

PII (Personally Identifiable Information) data is any information that can be used to identify a particular individual. This can include a person’s name, address, phone number, email address, social security number, driver’s license number, passport number, financial account numbers, and any other information that can be used to uniquely identify a person. PII data is sensitive and should be protected to prevent identity theft, fraud, or other types of misuse.

Logging in Python

Python has slightly different recommendations for logging compared to some other languages, as message interpolation is commonly recommended in Python logging instead of structured logging.

As variables as passed as arguments, services can pick those from the logged data.

log.error("Some error (%s) happened: %s", error_code, error_message)

Sentry displays data in following format:

Message:
Some error (1234) happened: Total failure

0 1234
1 Total failure

However, with string interpolation, variable names are lost, which can limit context. Using extra-parameter or structured logging preserves variable names, providing more detailed and accessible information.

log.error("Some error happened", {"error_code": error_code, "error_message": error_message})

Message:
Some error happened

error_code: 1234
error_message: Total failure

How arguments should be passed and are handled depends on the library used and decided conventions. E.g. structlog supports passing keyword arguments as well.

log.error("Some error happened", error_code=error_code, error_message=error_message)

For new projects, adopting structured logging (using third-party libraries or the extra-parameter) may be beneficial. However, it’s important to keep in mind that many Python libraries rely on interpolation-based logging, so mixing approaches could lead to inconsistent log formats.

Adding Exception Info to Log

In Javascript error includes the stack, but in Python it does not. In Python it needs to be added manually.

In Python we often log exceptions with other errors at the error level. Add exc_info=True if the error is logged in an exception handler. It includes information about an exception that occurred in the log record.

logger.exception() is equivalent to calling logger.error() with exc_info=True. It is advisable to use it instead of error if possible.

logger.error("Error message", {"data": data, "error": str(e)}, exc_info=True)

NOTE: traceback.print_exc() can be used to print the exception to the console.

Formatting

When message doesn’t include any arguments, then arguments must be defined in the format.

format = "%(asctime)s - %(name)s -  %(levelname)s - %(message)s - %(args)s"

Python Log Message Interpolation

Message should not contain variables, so avoid interpolation.

logger.info("test", {"new_field": 1})
# "message":"test"

logger.info("test %s", {"new_field": 1})
# "message":"test {'new_field': 1}"

logger.info("test", "new_field")
# Exception (args should be an object)

logger.info("test %s", "new_field")
# "message":"test new_field"
# In this case args doesn't need to be an object as it is interpolated to message

Python, Extra-parameter and Structured Logging

In some examples, additional structured data is passed as an extra-parameter and using it can be beneficial in some cases. The use of an extra-parameter is beneficial in scenarios where specific information must be included in all application logs, and these details depend on the execution context (e.g., session_id, request_user_id).

It is important to note that certain property names, such as message and asctime, are reserved in the extra-parameter and should not be used.

Example of using extra in Sentry and in DataDog.

Structured logging can be implemented with Python standard library, but it might be worth considering using a library like structlog.

JSON serialization errors

If you use JSON logging, you may run into issues when serializing certain objects: TypeError: Object of type xxx is not JSON serializable.

It is a good practice not to pass class instances directly to the logger. Instead, select only the data that is needed. However, if you do need to log complex objects, you can use a custom JSON encoder or make sure that the object is serializable.

If you must pass an exception as an object, you can use exc_info.

try:
    complex_data = NonSerializableComplexData()
    result = 1/0
except Exception as e:
    logger.error(
        "Handling of complex data failed",
        value=complex_data.value,
        error=str(e),
        exc_info=True
    )

exc_info=True captues the current exception, and exc_info=exception_isntance captures the passed exception instance.

Conclusion

Incorporating logging best practices is crucial for maintaining reliable and easily debuggable applications. Careful consideration of what to log, how to log it, and the use of appropriate logging tools will provide insights into an application’s behavior and help identify issues effectively.

Structured logging, where log messages and variable data are separated, enables monitoring and error-tracking services to parse information more accurately and enhances the security of log data by simplifying the filtering of sensitive information.

Pros:
- Easier to log important info
- Easier to parse and analyze logs
- Easier for developers to structure error messages and data consistently
Cons:
- More complex to implement
- Risk of logging too much additional data

Written on March 25, 2022