To cribl or not to cribl.... (Part 2)

Marc Luescher
Nov 4, 2021
2 min read

Working on a global project requires onboarding of many new data sources while complying to legal data residency requirements. While this is being worked out for our company we will be starting to produce a high amount of logging data very quickly.

Management has asked us to become smarter with less. One of our biggest costs drivers in security is the availability of the correct log files and sources for the appropriate retention period.

In the past the decision was made to just ingest all that valuable data into our SIEM. Now we are confronted with an ever increasing demand for SIEM volume and our forecast over the next 2 years would predict that just our ingest volume would double or triple again.

Since we are using a managed cloud service we are also charged for additional diskspace which goes beyond the contracted searchable retention period, in our case 90 days.

This left us with a few open tasks and questions:

a) Can we limit or control the data ingest volume into our SIEM and how could be best achieve this ?

b) Is all the data we have inside our SIEM really worthwhile having there ?

c) Does a long retention period really mean that all dat must be searchable at once ?

d) How much retention do you really need and what are the legal obligation's for our industry ?

Let me give you some answers using my current understanding.

a) As long as we are still acquiring new businesses and onboarding new data sources, platforms and systems we can not keep our SIEM volume under control.

b) We have identified some data sources, like the windows event logs, which contain many unnecessary and unwanted records which do not provide any value to the security team. Even worse since we have so many events per day searches take much longer then necessary and disk volume will increase without any real added value. Another area of interest are syslog data sources. To be syslog RFC compliant and to give target systems more dedicated information there is a lot of metadata or overhead in every single syslog event. During our POC we have identified that we can shorten our syslog data without loosing any required information by about 35%, our Windows event stream by about 30% and some JAVA based log files by as much as 80%.

c) For me retention means that we need to have the original log files available for the defined retention period. This needs to be analyzed when required but this does not mean that all the data from within the defined retention period must be searchable at once. We have settled now for 90 days of searchable event data inside our SIEM, summary data will be available for 1 year. All the original log files are now being stored on cheap AWS S3 storage. Automation was built so we can either import required log file/periods at will or use AWS search features like AWS Athena for an adhoc search.

d) To best understand the retention requirements I reached out to our privacy and legal team for the proper guidance. But like so often in our life - it depends....

Cribl LogStream has helped us overcome some of the above issues and we will re-define our log and observability pipeline to become even more efficient.

Happy cribling and splunking.

To cribl or not to cribl.... (Part 2)

Recent Posts

Comments