Monitoring a local file using OSSEC Integration

From khika
Jump to navigation Jump to search

In addition to event logs or syslogs, a lot of local files are created by various applications. These files reside on the hosts/servers where applications run and contain wealth of information, valuable for both security and operational intelligence. The application logs are useful for debugging or capturing run time errors/exception or even business opportunities in production production environments. It is imperative to monitor local files for gaining actionable insights, real time alerting, correlations and forensic debugging.

KHIKA integrates closely with OSSEC to monitor application logs in real time. This section explains how to use OSSEC to monitor the application logs. We begin with broad level steps and then dive deep into each step so that we explain the methodology and the intricate details associated with it. At a broad level you perform following steps

Install an OSSEC Agent on the end node.

OSSEC Agent provides the simplest way to monitor a local file on any computer in real time. You must install the OSSEC agent where the file is being created. Please refer the appropriate section sections on Linux or Windows for installing the OSSEC Agent.

Configure the OSSEC Agent to monitor the local file

Once the OSSEC agent is installed, you need to locate the file you want to monitor. In this example, we will monitor apache access logs created in the directory /path/of/apache/access/log and name of the file is access.log

  1. Login to OSSEC Agent node and open file ossec.conf. On Windows this file is located in C:\Program Files (x86)\ossec-agent directory . On Linux, you will find it at /vat/ossec/ossec-agent/etc directory. (if you have installed the agent in the default location)
  2. Locate section <localfile>. Note that ossec.conf uses XML formatting and hence you have to be careful enough not to disturb the other tags. Navigate just above "Rootcheck - Policy monitor config" tag and insert following section
 <localfile>
   <location>/path/of/apache/access/log/access.log</location>
   <log_format>syslog</log_format>
 </localfile>

Note that we have added <log_format>syslog<log_format> as the apache access logs are single line messages and OSSEC will treat them same as syslog messages.

We save the file and restart OSSEC Agent. This will start apache access logs being received on the OSSEC Server side (i.e. on Data Aggregator Node).

Login to KHIKA Data Aggregator and navigate to directory /opt/ossec/logs/archives/2019/May/ (needless to say, enter current year and month). Here you will find a multiple directories with names of workspaces prepended to it. Enter appropriate directory and check the current file. You should see live logs coming from our newly added apache server

Parse the file using extensible KHIKA Adapter for OSSEC

As OSSEC Server on your KHIKA Data Aggregator starts receiving live logs, you will observe that all the logs are being gathered in the same file. If you have already configured OSSEC Adapter via KHIHKA App for Linux or KHIKA App for Windows, you will need to tell this Adapter to parse the Apache access logs using the appropriate parser logic. This is where extensible KHIKA Adapter for OSSEC comes handy.

KHIKA Adapter for OSSEC is modular and can be extended by adding an appropriate python to it. To understand how to add a new python a function to parse a new type of logs, let us first understand the structure of this adapter.

  1. Login to KHIKA Data Aggregator and go to directory /opt/KHIKA/Apps/Adapters/OssecArchiveLog
  2. You will find a file filename_parser_mapping.csv. Open this file and check the header and a couple of first lines
  Parser_ID,File_Name,Parser_Name,Invoke_Parser,Headers,"Agent Source Name"
  1,"(\/var\/log\/audit\/audit.log)","unix_audit","CallProcessUnixAuditLogLine",,
  2,"(\/project\/job\/remove_NetScreen_)(.*)(log)","juniper_netscreen","CallProcessJuniperNetscreenLogLine",,
  3,"(\/project\/job\/remove_fortigate.log)","varnish_apache_access_logs","CallProcessVarnishApacheAccessLogLine",,

Parser_ID = The sequence number of the parser. This is nothing but a unique number File_Name = This is path of the regular expression to the path of the file on agent. Note the escape sequence in the name of the file Parser_Name = The name of parser (python function to be involved when a line from File_Name if encountered in he OSSEC Archive log Invoke_Parser = Is string identifier of Parser_Name.

We will talk more about these fields with example as we move ahead and explore the code.

  1. As we want to monitor "/path/of/apache/access/log/access.log" we will add a new line to filename_parser_mapping.csv at the end and increment the sequence number. Lets say I add the new line as 4th record, our file filename_parser_mapping.csv should look like this
  Parser_ID,File_Name,Parser_Name,Invoke_Parser,Headers,"Agent Source Name"
  1,"(\/var\/log\/audit\/audit.log)","unix_audit","CallProcessUnixAuditLogLine",,
  2,"(\/project\/job\/remove_NetScreen_)(.*)(log)","juniper_netscreen","CallProcessJuniperNetscreenLogLine",,
  3,"(\/project\/job\/remove_fortigate.log)","varnish_apache_access_logs","CallProcessVarnishApacheAccessLogLine",,
  4,"(\/path\/of\/apache\/access\/log\/access.log)","varnish_apache_access_logs","CallProcessVarnishApacheAccessLogLine",,

We use varnish_apache_access_logs as the Parser_name and CallProcessVarnishApacheAccessLogLine as the string identifier for the same. This is existing Apache Access log parser shipped with KHIKA and we will use it for the demonstration purpose.

Basically, by adding the above line in filename_parser_mapping.csv file we instructed the OSSEC parser to invoke function varnish_apache_access_logs() when a log message from /path/of/apache/access/log/access.log.

Lets walk through some python code and understand how we achieve this.

Open TLHookScript_ArchivesMasterParser.py and search for CallProcessVarnishApacheAccessLogLine

                elif (re_file_parser_dict[parser_id][2] == "CallProcessVarnishApacheAccessLogLine"):
                    if g_normalized_apache == 0:
                        if platform.system() == 'Windows':
                            NORMALIZER_DIR = site.getsitepackages()[0] + '/share/logsparser/normalizers'
                        else:
                            NORMALIZER_DIR = os.path.dirname(os.path.realpath(__file__)) + '/../../../3rdpartyUnix/TLPython/share/logsparser/normalizers'
                        apache_normalizer = LN(NORMALIZER_DIR)
                        apache_normalizer.set_active_normalizers({"apache-0.99" : True})
                        apache_normalizer.activate_normalizers()

                    parsed_output = invoke_parser(g_process_line,g_file,line_count,g_hostname,logger, apache_normalizer,GetTimeStampWithFormat) 

invoke_parser() is a function pointer which actually calls CallProcessVarnishApacheAccessLogLine() with arguments such as g_process_line,g_file,line_count,g_hostname,logger, apache_normalizer,GetTimeStampWithFormat

So, the real call is CallProcessVarnishApacheAccessLogLine(). You can pass any arguments to it but first few are highly recommended (though not compulsory as you are free implement it the way you want) 1)g_process_line is the line to be processed 2)g_file name is of file (on agent) 3)line_count represents line count 4)g_hostname is basically the name of the aggregator 5)logger is our logger object Other arguments can be specific to your logic. As in this case we are passing apache_normalizer object created just above and some other stuff.

Essentially, if you are extending OSSEC Parser, you can copy paste above block of code and replace "CallProcessVarnishApacheAccessLogLine" with your own function.

Now let us inspect the code of CallProcessVarnishApacheAccessLogLine() function. Open file TLHookLibrary_parsers.py and go to "def CallProcessVarnishApacheAccesLogLine".

  def CallProcessVarnishApacheAccessLogLine(line,file,line_count,g_hostname,logger, normalizer,dict_device_timezone):
      from CommonLibrary import TLHookScript_apache_varnish_access as TLVARNISH
      return TLVARNISH.ProcessLineOfFile(line,file,line_count,g_hostname,logger,normalizer,dict_device_timezone)

It is a small piece of code that imports class TLHookScript_apache_varnish_access as TLVARNISH from our common library and on next line calls ProcessLineOfFile() function of the imported class, with all the appropriate arguments.

To examine the specific implementation of ProcessLineOfFile(), we can open TLHookScript_apache_varnish_access.py from /opt/KHIKA/Apps/Adapters/OssecArchiveLog/CommonLibrary directory and locate ProcessLineOfFile()

def ProcessLineOfFile(line,file,line_count,g_hostname,logger1,normalizer,GetTimeStampWithFormat):
       l = {'body' : line }
       normalizer.lognormalize(l)
       line_count += 1
       return ApacheNormalizer(l,file,g_hostname,logger1,GetTimeStampWithFormat)

It shows that we are doing processing of Apache logs using an opensource normalizer (python) classs. You can add your logic here. As you can see ApacheNormalizer() is the function that does the job of parsing. It finally returns the output in KHIKA Data format. We encourage you to examine the code below

def ApacheNormalizer(l,file,g_hostname,logger1,GetTimeStampWithFormat):
       try:
               #global install_dir, logger
               #global g_hostname
               #I want to make this part dynamic.
               #User should be allowed to configure which field should be used for epoch_date,
               #which field should be used for forming the messages and what fileds should be ignored.
               #By default, messsages is 'none'
               #and we ignore 'raw', 'uuid' 'date' and 'body'
               TIMESTAMP_FORMAT = '%Y %b %d %H:%M:%S'
               date_str = l['date'].strftime(TIMESTAMP_FORMAT)
               epoch = GetTimeStampWithFormat(date_str, '%Y %b %d %H:%M:%S')
               meta_data = "tl_tag \"apache\""+ " " +"tl_src_host \""+g_hostname+"\" "
               for key, value in l.iteritems():
                       if (key == 'raw' or key == 'uuid' or key == 'date' or key == 'body') : continue
                       if (key == 'request'): msg = value
                       #value = str(value).replace(' ','_')
                       #value = str(value).replace(':','_')
                       meta_data += str(key) +" \""+str(value) + "\" "
               if ('request_header_referer_contents' in l and l['request_header_referer_contents'].find('?') != -1):
                       uri_query = "uri_query \""
                       query_str = l['request_header_referer_contents'].split('?')[1]
                       #query_str = query_str.replace(' ','_')
                       #query_str = query_str.replace(':','_')
                       meta_data += uri_query+query_str+"\""
               meta_data = meta_data.strip()
               return str(epoch) + " : " + meta_data + " event_str \"" + msg + "\""
       except Exception,e:
               logger1.info("Error in Varnish ApacheNormalizer: %s", e)
               return None

Important thing to note here is how we build the meta_data using the message. meta_data string is nothing but key-value pairs extracted from the raw message.

Finally we call print_parsed_output() when the parser returns the parsed output in KHIKA Data format. We recommend to use print_parsed_output() library function while extending the OSSEC Adapter.


Here is a quick summary of steps that are worth noting when you want to extend the OSSEC Parser to parse your own data (say application logs)

  1. Go to directory /opt/KHIKA/Apps/Adapters/OssecArchiveLog on your aggregator
  2. Open filename_parser_mapping.csv and add a line for your new data source at the bottom (do not change any existing lines)
  3. Open TLHookScript_ArchivesMasterParser.py and search for CallProcessVarnishApacheAccessLogLine (or similar function for your reference. This is to help you insert the code at appropriate location)
  4. Change CallProcessVarnishApacheAccessLogLine to your function that you added in filename_parser_mapping.csv file
  5. Do initializations here and then call invoke_parser() with appropriate arguments
  6. Code CallProcess_YOUR_FUNCTION_NAME() and import a class that must implement ProcessLineOfFile() function.
  7. Call ProcessLineOfFile() function after the import statement. (Make sure to reference correct library that you import)
  8. Now go and implement ProcessLineOfFile() for your data
  9. The actual code of your function ProcessLineOfFile() must reside in CommonLibrary direcotry (/opt/KHIKA/Apps/Adapters/OssecArchiveLog/CommonLibrary).
  10. Create a python file file with name exactly same as module you imported in above step in directory /opt/KHIKA/Apps/Adapters/OssecArchiveLog/CommonLibrary
  11. Implement ProcessLineOfFile() function in this file. All your programming logic goes here.
  12. Test your Adapter before configuring it in production by running it on command line. Make sure to see the logs for errors. Ensure it is parsing all the lines coming into OSSEC Archive logs

Set enrichment rules (if any)

Set the index in Elastic Search

Define alerts and dashboards

Define alerts and correlations