Difference between revisions of "Define your own enrichment"

From khika
Jump to navigation Jump to search
(Introduction)
Line 16: Line 16:
  
 
=== Example of Building and Referring Dynamic Enrichment ===
 
=== Example of Building and Referring Dynamic Enrichment ===
You can '''build''' any number of primary-key based tables from any streaming KHIKA data source by selecting any key as the primary key. This key can have values associated with it from any any fields present in the records. This database can then be '''referred''' by any streaming KHIKA data source to match the key to any of the selected fields and values from the database can be used to enrich the message when a match is found, in real-time.
+
You can '''build''' any number of primary-key based tables from any streaming KHIKA data source by selecting any key as the primary key. This key can have values associated with it from any fields present in the records. This database can then be '''referred''' by any streaming KHIKA data source to match the key with any of the selected field and values from the database can be used to enrich the message when a match is found, '''in real-time.'''. This is KHIKA's ability to correlate logs across data sources in real-time.
  
In the example below we'll '''build''' a database using Windows AD login logs. Primary key will be the IP Address of the workstation from where the login is happening and value would be username (WindowsUser). This way, we '''build''' the workstationIP to WindowsUser database.  Consequently, this information will be '''referred''' in the Linux logs, wherein we will use the IP address from login message and match it with database to fetch the windows user which will be enriched in the Linux login message. This give us the AD user doing actual login in Linux which is a useful '''log correlation'''.
+
In the example below we'll '''build''' a database using Windows AD login logs. Primary key will be the IP Address of the workstation from where the login is happening and value would be username (WindowsUser). This way, we '''build''' the workstationIP to WindowsUser database.  Consequently, this information will be '''referred''' in the Linux logs, wherein we will use the IP address from login message and match it with database to fetch the windows user, which will be enriched in the Linux login message. This gives us the AD user doing actual login in Linux which is a pretty useful '''log correlation'''. Sounds interesting?, Lets see how we build this in KHIKA.
  
 
1 Login to KHIKA GUI as a customer (you must be admin of the customer)
 
1 Login to KHIKA GUI as a customer (you must be admin of the customer)

Revision as of 06:28, 12 June 2019

Introduction

Enrichment, as the word suggests, can be used to add context to the streaming data. At its simplest or basic level, you can enrich the data at run time by referring to a CSV file. Some of the examples could be as under

  • A csv file can contain information about the inventory, (such as name of computer, location, owner, service tag etc) with name of computer as the primary key. You can use this information to enrich the windows AD logs to add more context to the login information as and when login logs are captured in AD.
  • If you have a CSV database of IP addresses with bad reputation where IP address is the primary key and country, city, reputation etc are the other columns, you can very well correlate this information to streaming firewall/proxy/WAF logs to enrich any communication with bad IPs as your logs will have source and/or destination IP addresses which can be used for the lookup.

There could be several more examples of using static CSV based enrichment. You can change these CSV file dynamically (periodically or events based) and KHIKA will consume it immediately in real-time.

More advanced and real cool thing about enrichment is KHIKA's ability to build the CSV database from a streaming data source and being able to use it in other data source for enrichment. Using this ability, you can literally correlate or stitch the logs from different data sources at run time, provided they have a field in common. Some of the examples could be as under

  • We can build IP and username database at run-time using AD logs with IP address as the primary key. Further, this database can be referred in Linux logins where AD user can be enriched as Linux logs would have IP address of login workstation, but not the AD username. (Linux usernames are different from AD user names)
  • We can extract session ID, IP address from Web logs with session ID as primary key and use it to enrich the IP address in application logs which has session ID but not IP address of the client.

Lets us walk through an examples, starting with simple enrichment using static CSV files.

Example of CSV Based Enrichment

Please refer to section Data Enrichment in KHIKA for understanding how CSV based enrichment works in KHIKA.

Example of Building and Referring Dynamic Enrichment

You can build any number of primary-key based tables from any streaming KHIKA data source by selecting any key as the primary key. This key can have values associated with it from any fields present in the records. This database can then be referred by any streaming KHIKA data source to match the key with any of the selected field and values from the database can be used to enrich the message when a match is found, in real-time.. This is KHIKA's ability to correlate logs across data sources in real-time.

In the example below we'll build a database using Windows AD login logs. Primary key will be the IP Address of the workstation from where the login is happening and value would be username (WindowsUser). This way, we build the workstationIP to WindowsUser database. Consequently, this information will be referred in the Linux logs, wherein we will use the IP address from login message and match it with database to fetch the windows user, which will be enriched in the Linux login message. This gives us the AD user doing actual login in Linux which is a pretty useful log correlation. Sounds interesting?, Lets see how we build this in KHIKA.

1 Login to KHIKA GUI as a customer (you must be admin of the customer) 2 Click "Configure" from side menu and click "Enrichment Rules"

<<<image>>

3 Click "Manage Lookup Database". We will first create the schema of the database

<<image>>

4 Click "Add Lookup" <<image>>

5 Now on your local computer create create a simple csv file with header (and no values as values will be added dynamically from the streaming data). Let us add two columns separated by comma as shown :-

     tl_win_ip,tl_win_user

Save the file with name "IP_to_User_Mapping_Lookup.csv" and close

6 On KHIKA Web GUI, in Upload the file with name "IP_to_User_Mapping_Lookupc.csv" <<image>>

7 The uploaded file should be visible in the "Enrichment Lookup Database" screen. You can even search for it by typing the name IP_to_User_Mapping_Lookup in the search box