Difference between revisions of "Write Your Own Adapter"

From khika
Jump to navigation Jump to search
(Created page with "You may have custom applications logging messages in their own format, databases storing messages in their own table structures, typical files or some third party applications...")
 
Line 7: Line 7:
 
2) Convert the data into Khika data format: It is important to know the format of the source data that you are reading as you have to extract timestamp (date and time) from each message and metadata/value pairs, if possible. We recommend you to add meaningful metadata tags in this step so that it becomes easy to deal with the data in Khika at a later stage.
 
2) Convert the data into Khika data format: It is important to know the format of the source data that you are reading as you have to extract timestamp (date and time) from each message and metadata/value pairs, if possible. We recommend you to add meaningful metadata tags in this step so that it becomes easy to deal with the data in Khika at a later stage.
  
3) Write the Khika formatted data line-by-line on the stdout and exit after the available data is written
+
3) Write the Khika formatted data line-by-line on the stdout and exit after the available data is written.
Khika executes Adapter script after every 'Polling Interval'. It is important to make sure that the Adapter script/program reads only the incremental data at each execution and does not read all the data every time. Make sure that the Adapter does not re-read all the data each time it executes. Consider an Adapter which is a shell script that reads lines from a text file. It is essential that the Adapter script knows how many lines it read during the last execution and reads only next available lines, if any, during the subsequent executions. We explain it with an example below.
+
 
Open demo.sh script installed in $KHIKA_ROOT/Adapters directory
+
 
 +
Khika executes Adapter script after every 'Polling Interval' (or at a scheduled time). ''It is important to make sure that the Adapter script/program reads only the incremental data at each execution and does not read all the data every time.'' Make sure that the Adapter does not re-read all the data each time it executes. Consider an Adapter which is a shell script that reads lines from a text file. It is essential that the Adapter script knows how many lines it read during the last execution and reads only next available lines, if any, during the subsequent executions.  
 +
 
 +
We explain it with an example below. Open demo.sh script installed in $KHIKA_ROOT/Adapters directory
  
  

Revision as of 10:15, 29 May 2019

You may have custom applications logging messages in their own format, databases storing messages in their own table structures, typical files or some third party applications that generate data and make it available through APIs which you may wish to import continuously into Khika. To be able to do so, it is important to be able to write your own Adapter scripts so that you can pump your data into Khika and start analyzing it. Khika does not pose any restriction on the source of the data as long it conforms to the standard Khika Data Format. (Please refer to section on Khika Data Format).

There are three steps to be considered while developing custom Adapter script:

1) Read the data from the source: It may involve reading a simple text file or may involve reading the data from a third party application using the third party APIs. In all the cases, you'll need to ensure that you have appropriate read access to the source of the data. (NOTE: The user account executing the Adapter script must have read access on the source data.)

2) Convert the data into Khika data format: It is important to know the format of the source data that you are reading as you have to extract timestamp (date and time) from each message and metadata/value pairs, if possible. We recommend you to add meaningful metadata tags in this step so that it becomes easy to deal with the data in Khika at a later stage.

3) Write the Khika formatted data line-by-line on the stdout and exit after the available data is written.


Khika executes Adapter script after every 'Polling Interval' (or at a scheduled time). It is important to make sure that the Adapter script/program reads only the incremental data at each execution and does not read all the data every time. Make sure that the Adapter does not re-read all the data each time it executes. Consider an Adapter which is a shell script that reads lines from a text file. It is essential that the Adapter script knows how many lines it read during the last execution and reads only next available lines, if any, during the subsequent executions.

We explain it with an example below. Open demo.sh script installed in $KHIKA_ROOT/Adapters directory


     1 #!/bin/bash
     2 if [ -e /home/KHIKA/Adapters/out.txt ]
     3 then
     4         line_already_read=`cat /home/KHIKA/Adapters/out.txt`
     5         no_of_lines=`wc -l /home/KHIKA/Adapters/demo.txt|awk '{print $1}'`
     6         lines_to_read=$(($no_of_lines - $line_already_read))
     7         echo `date` " : " $lines_to_read>>/home/KHIKA/Adapters/log.txt
     8         tail -n $lines_to_read /home/KHIKA/Adapters/demo.txt|awk '{printf("%d ", $1);for(i=2;i<=NF;++i){printf("%s ", $i);} printf("\n");}'
     9 else
    10         lines_to_read=`wc -l /home/KHIKA/Adapters/demo.txt|awk '{print $1}'`
    11         echo `date` " : " $lines_to_read>> /home/KHIKA/Adapters/log.txt
    12         head -n $lines_to_read /home/KHIKA/Adapters/demo.txt|awk '{printf("%d ", 	$1);for(i=2;i<=NF;++i){printf("%s ", $i);} printf("\n");}'
    13 fi
    14 wc -l /home/KHIKA/Adapters/demo.txt| awk '{print $1}' > /home/KHIKA/Adapters/out.txt


The script checks for out.txt file in certain directory. This file is used to keep the record of total lines read so far. If you are executing it for the first time, the file out.txt wont exist and hence we'll execute the else part on line 9 Here we use wc -l command (line 10) to find the number of lines in file demo.txt, which is our source file to read. We store in a variable lines_to_read We log an info message on line 11 On line 12 we read line_to_read number of lines from the top using head command of the source file line 13 is Unix shell syntax for indicating the if loop is ended On line 14 we calculate the number of lines in the source file using the wc -l command and store it in out.txt which we refer everytime. The script ends here. During the next execution, we find the demo.txt to file (line 2) and read how many lines we read the last time (line 4). We store it in variable line_already_read. On line 5 we find total lines in the source file demo.txt using the wc-l command and store it in variable no_of_lines. On line 6 we take the difference between the number of lines in file right now (no_of_lines) and lines read till previous execution (lines_already_read). If the file is appended during the polling interval, the difference will be a positive number (lines_to_read) On line 7 we log an informative log message On line 8 we read exactly lines_to_read number of lines at the end of the file using tail command.

You may have observed that this script reads a simple text file and writes the message on stdout (using head and tail commands of Unix). It stores the number of lines read (in a file) so that the next execution refers to it and reads only the appended portion of the file. It does not perform any epoch time conversion because the source file (demo.txt) has the time stamp in the epoch format. This is an unlikely case as most application would log the timestamp in a human readable format which will have to be converted to the epoch time so as to conform to Khika format. In the next example, we read the /var/log/messages file (the syslog format) using a python script and explain how to convert a human readable timestamp into an epoch time. In the $KHIKA_ROOT/Adapters directory, open the file read_syslog.py


    10 import time
    11 import socket
    12
    13 input = "/var/log/messages"
    14 #Change this path to Khika'sAdapter directory or some safe location
    15 meta_data = "/tmp/metadata.txt"
    16
    17 try:
    18         lines_read = int(open("/tmp/metadata.txt", 'r').read())
    19 except:
    20         lines_read = -1
    21
    22 if (lines_read == -1):
    23         skip_lines = 0
    24 else:
    25         skip_lines = lines_read
    26 raw_input
    27 count = 0
    28 with open(input, "r") as f:
    29         for line in f:
    30                 if ( count>= skip_lines):
    31                         split_line = line.split()
    32                         month = split_line[0]
    33                         date = split_line[1]
    34                         year = str(time.gmtime().tm_year)
    35                         timestamp = split_line[2]
    36                         hours = timestamp.split(':')[0]
    37                         minutes = timestamp.split(':')[1]
    38                         seconds = timestamp.split(':')[2]
    39                         MyStr = year + " " + month + " " + date + " " + hours + " " + minutes + " " + seconds
    40                         s = time.strptime(MyStr, "%Y %b %d %H %M %S")
    41                         epoch_time = str(int(time.mktime(s)))
    42                         #Write the output to stdout in Khika format
    43                         print epoch_time+":  host "+socket.gethostname()+"  file:/var/log/messages event_str " ," ".join(split_line[3:])
    44                         lines_read += 1
    45                 else:
    46                         count += 1
    47                         continue
    48
    49 open("/tmp/metadata.txt", 'w').write(str(lines_read))


The core logic of read_syslog.py is more or less the same as that of demo.sh explained earlier. It reads the syslog format file (/var/log/messages) all at once. Stores the number of lines read in '/tmp/metadata.txt' which it refers at each execution. It skips already read lines and reads only incremental data. It parses the file to extract the date field from each line, converts it in Khika data format and writes the output on stdout. The important step is parsing of the data i.e. lines 31 to 43. Before we actually wrote this simple parser, we had a look at the data by opening the /var/log/messages file to understand the format of the file. We observed following  Each line has timestamp at the beginning  The format of the time stamp is consistent across the file  The date field, which is at the beginning of the file has a peculiar format: "MM DD Hours:Minutes:Seconds"  The important thing to note here is the year is missing from the timestamp. This much information is good enough for us to write the parser. This is how we wrote our parser  We split each line into a list of words (using whitespace as the separator. This works in most of the log files)  Now, we know that the first word in the line is a Month, second is the date and the third is the timestamp, which we further split using ':' as the separator. This gives us Month, Date, Hours, Minutes and Seconds, pretty much all the fields but the year, which we populate using the value of the current year. (Note: python indexes start with 0)  On line 39, 40 and 41 we use simple python time library functions to convert this date into epoch time. Please refer to the documentation of strptime() and mktime() library functions that we have used here on http://docs.python.org/2/library/time.html. If you use any other scripting/programming language such as perl, JAVA, Ruby, C, C++ etc, you should get access to many standard time library functions as all the popular languages provide a rich interface to time library.  Other part of the message remains the same  Finally on line 43, we print the message in Khika Data format on stdout.

Note: Needless to mention, but the account executing the' Adapter Script' must have read permission on /var/log/messages file In the meantime, if you need any help for writing the Adapter scripts, please write to us on info@khika.com.