Difference between revisions of "Writing advanced adapters"
Line 70: | Line 70: | ||
Followed by this the real stuff starts. On line 321 we call ReadHistoryFile() function with history_path as the argument. We basically read the offsets maintained in the history_file to understand from where we want to start reading the files during this execution. Note that the script executes after a periodic interval and every execution will update the offset of the files (using WriteHistoryFile() function) at the end of its execution. Some global data structures are set in ReadHistoryFile() to help us ''seek()'' to the offset when we call ProcessUsingConfigFile() on line 322. | Followed by this the real stuff starts. On line 321 we call ReadHistoryFile() function with history_path as the argument. We basically read the offsets maintained in the history_file to understand from where we want to start reading the files during this execution. Note that the script executes after a periodic interval and every execution will update the offset of the files (using WriteHistoryFile() function) at the end of its execution. Some global data structures are set in ReadHistoryFile() to help us ''seek()'' to the offset when we call ProcessUsingConfigFile() on line 322. | ||
− | Function ProcessUsingConfigFile() takes two arguments, config_path and <span style="color:#ff0000">ProcessLineOfFile</span>. | + | Function ProcessUsingConfigFile() takes two arguments, <span style="color:#ff0000">config_path</span> and <span style="color:#ff0000">ProcessLineOfFile</span>. |
+ | *config_path is the file we set during our initialization. The config file is a csv file. Below is a sample of config file | ||
+ | /opt/remotesyslog/172.28.1.16,2.*.log$,None | ||
+ | /opt/remotesyslog/172.28.1.17,2.*.log$,None | ||
+ | ** The first part is directory from where we want to read the logs | ||
+ | ** The second part is the regular expression of filenames. Files matching the regular expression will be processed. KHIKA Data Aggregator receives PaloAlto Firewall Logs over syslog protocol and stores it in the /opt/remotesyslog directory. It dynamically creates a directory with IP address of the syslog source device (PaloAlto firewall, in this case). Under the directory, dynamic files are created per day basis in YYYY-MM-DD.log format (Eg: 2019-05-31.log) |
Revision as of 09:45, 31 May 2019
After understanding Khika Data Format and going through the initial exercise of Writing you own KHIKA Data Adapters , it is the time to create a production level KHIKA Adapter. A few points to note here before we begin writing our own Adapter:
- Adapters are scripts that execute on KHIKA Data Aggregator
- Adapters can be written in any programming language (our favorite is python 2.7)
- Adapters are scheduled processes and KHIKA Data Aggregator is responsible for scheduling them to run at a periodic interval (typically 1 minute to 5 minutes)
- The Adapter scripts
- read the raw log messages one-by-one (from source such as files, queues, APIs, Databases etc),
- parse the log messages,
- convert it in Khika Data Format
- Write the output to stdout
- KHIKA Data Aggregator pipes the output of the Aggregator script and send it to KHIKA over a SSL connection
With these concepts in mind, let proceed with an example of a production ready KHIKA Data Adapter. Login to your KHIKA Data Aggregator node (default username/password is khika/khika123). We will study a syslog based adapter that processes the messages received from a PaloAlto Firewall. Open file TLHook_Adaptor_PaloAlto.py from directory /opt/KHIKA/Apps/Adapters/PaloAltoFW.
Check first few lines of this file where we import some important python modules
1 #!/bin/env python 2 import os, sys 3 import socket 4 import csv,StringIO 5 import logging #The logging libraries 6 import re #python inbuilt regular expressions library 7 import time 8 from time import strptime #Useful time format conversion functions 9 from datetime import datetime 10 import random 11 import calendar 12 from ipaddress import IPv4Network,IPv4Address 13 import pdb
Note that we have imported logging (line 5) and some useful time libraries (line 8,9). We have also imported 're' library for python regular expression. We will be using it in the code.
Now, lets move to the bottom of this file and locate function "__main__". This is start of execution of the code.
295 if __name__ == "__main__": 296 global isOccurrences 297 isOccurrences = False 298 dict_report_stats = {} 299 g_smallest_day = 0 300 g_highest_day = 0 301 install_dir = os.path.dirname(os.path.realpath(__file__)) 302 sys.path.insert(0, install_dir) 303 sys.path.insert(0, install_dir + os.sep + ".." + os.sep + "TLCOMMON" + os.sep) 304 g_hostname = socket.gethostname() 305 file_name_format = sys.argv[1] if len(sys.argv) == 2 else os.getenv("TL_WORKSPACE")+'_'+os.getenv("TL_ADAPTER")+'_'+os.getenv("TL_AGGREGATOR") 306 from TLHook_common import * 307 logfile_path = install_dir+ '/' + 'log_'+file_name_format+'.log' 308 if not is_safe_path(logfile_path): 309 exit() 310 logger = InitLogger(logfile_path,logging.INFO) 311 logger.info("A new execution of script %s begins", __file__) 312 int_time(logger) 313 tz_file_path = install_dir+'/'+'timezone_'+file_name_format+'.csv' 314 history_path = install_dir+'/history_'+file_name_format+'.csv' 315 config_path = install_dir+'/config_'+file_name_format+'.csv' 316 317 if not is_safe_path(tz_file_path) or not is_safe_path(history_path) or not is_safe_path(config_path) : 318 logger.error("Path is invalid: history_path : %s ,timezone_file_path : %s , config_path : %s ",history_path,tz_file_path ,config_path ) 319 exit() 320 GetHostToTimeZoneDict(tz_file_path) 321 ReadHistoryFile(history_path) 322 ProcessUsingConfigFile(config_path, ProcessLineOfFile) 323 PrintDashboardStatistics(dict_report_stats,logger,g_hostname,g_smallest_day,g_highest_day) 324 WriteHistoryFile(history_path) 325
After doing some initializations (such as setting PATH, log file, timezone etc), we import TLHook_common on line 306. This a common library and provides functions for Timezone, logging and offset maintenance etc. We set the
- hstory_file for maintaining the offset, timestamp etc after each execution. (line 314)
- config_file for reading the configuration from (it basically tells what files to read from what directory). (line 315)
Followed by this the real stuff starts. On line 321 we call ReadHistoryFile() function with history_path as the argument. We basically read the offsets maintained in the history_file to understand from where we want to start reading the files during this execution. Note that the script executes after a periodic interval and every execution will update the offset of the files (using WriteHistoryFile() function) at the end of its execution. Some global data structures are set in ReadHistoryFile() to help us seek() to the offset when we call ProcessUsingConfigFile() on line 322.
Function ProcessUsingConfigFile() takes two arguments, config_path and ProcessLineOfFile.
- config_path is the file we set during our initialization. The config file is a csv file. Below is a sample of config file
/opt/remotesyslog/172.28.1.16,2.*.log$,None /opt/remotesyslog/172.28.1.17,2.*.log$,None
- The first part is directory from where we want to read the logs
- The second part is the regular expression of filenames. Files matching the regular expression will be processed. KHIKA Data Aggregator receives PaloAlto Firewall Logs over syslog protocol and stores it in the /opt/remotesyslog directory. It dynamically creates a directory with IP address of the syslog source device (PaloAlto firewall, in this case). Under the directory, dynamic files are created per day basis in YYYY-MM-DD.log format (Eg: 2019-05-31.log)