**********************************************************
Overview
**********************************************************
This tool extracts flow metadata from a pcap file or a
network interface, and logs it in a CSV file.

**********************************************************
Installation
**********************************************************
make: build application
make install: install application in src/bin directory

Notes:

* build is dynamic by default.
  Set STATIC=1 for static build.
  Set DEBUG=1 to get debug info.

* the configuration file must be copied to the directory
  where the tool executable file is installed.

**********************************************************
License
**********************************************************
This tools needs the ixEngine SDK to be licensed with
a valid "METADATA"-level license.

**********************************************************
Configuration file and output
**********************************************************
By default, this tool generates the following CSV log
file in the current directory:
'output.csv'

The tool reads configuration from the following file,
expected to be in the same folder as the tool executable:
'pcap_logger.cfg'

**********************************************************
Usage
**********************************************************
To get full usage just type: "./pcap_logger".

The LD_LIBRARY_PATH variable can be set as follows from this directory:
$ export LD_LIBRARY_PATH=../../../lib

Command-line interface:
$ ./pcap_logger [options] [pcap(s)|interface]

**********************************************************
Examples
**********************************************************
(1) run pcap_logger from pcaps
$ ./pcap_logger pcap1.pcap pcap2.pcap pcap3.pcap [...]

(2) run pcap_logger from a network interface
# ./pcap_logger --live eth1

**********************************************************
CSV Format
**********************************************************
The comma-separated columns list is like:

flow_id,
l4_proto_id,
top_proto_id,
flow_duration,
volume_of_pkts_cts,
volume_of_pkts_stc,
first_pkt_number,
first_pkt_timestamp,
pkt_count_cts,
pkt_count_stc,
<attributes>

Notes:

'flow_duration' and 'first_pkt_timestamp' fields are given in SECONDS.

'top_proto_id' is the LAST protocol ID discovered by the ixEngine
  classification path on this flow.

<attributes> is a ',' separated list of ixEngine attributes found in the
  flow. When there are several attribute values raised in the same flow
  (eg. several common names), these values are separated by a ';'
  in the same CSV column. The <attributes> columns list is defined in the
  '[attributes]' section of the configuration file.

The '<attributes>' column values are outputted between "double quotes".
Other columns are numeric values (unsigned integer), except when the
"--print-names" command line option is used.

A typical <attributes> column list is like:

ssl:common_name,
ssl:server_name,
quic:server_name,
http:server,
http:user_agent,
[...]

Special <attributes>:

 'ip:resolv_name' is a domain name which was solved by a DNS query found
  in the capture, and which resolved IP address corresponds to this
  flow's destination IP address.

 'udp:stream' is the 50 first UDP payload bytes of an 'unknown' L7 protocol.
  The first value of this column would be the Client-to-Server way bytes,
  the second value would be the Server-to-Client bytes.

 'tcp:stream' is the 50 first TCP payload bytes of an 'unknown' L7 protocol
  in the Server-to-Client way; only written when the handshake was seen.
  The first value of this column would be the Client-to-Server way bytes,
  the second value would be the Server-to-Client bytes.

About attribute columns data formatting:

The payload is outputted in a valid C HEX characters array format:
- printable ASCII characters are kept:    0x61 (a) --> a
- non printable characters are formatted: 0x0D     --> \x0d
- CSV string separator gets formatted:    0x22 (") --> \x22
- '\0' byte gets the shortened format:    0x00     --> \0

**********************************************************
LOG files archiving and rotation
**********************************************************
The '--rotate' command line option activates the LOG file splitting and
archiving mechanism built in this application.

The user defines a threshold using a time-based or size-based unit.

Units are: - duration units:    m: minute(s)
                                h: hour(s)
                                d: day(s)
                                w: week(s)
           - file size units:   B: Byte(s)
                               kB: kilobyte(s)
                               MB: Megabyte(s)
                               GB: Gigabyte(s)

Each time the configured threshold is reached, the current LOG <file>
(default: './output.csv') is closed and moved into <file>.<YYYYMMDD>.<NNN>.
Then a new LOG <file> is opened, and the logging continues in it.

The <YYYYMMDD> indicates today's date.
The <NNN> is a LOG file index, set to '000' on date change.

Example:
$ ./pcap_logger --csv mylog --rotate 100MB [...]&
$ ls -1 mylog*
mylog
mylog-20161129-000
mylog-20161129-001
mylog-20161129-002
mylog-20161129-003
mylog-20161129-004
mylog-20161129-005
mylog-20161130-000
mylog-20161130-001
mylog-20161130-002
mylog-20161130-003
[...]

**********************************************************
DNS Caching
**********************************************************
By default, the tool performs an OFFLINE DNS lookup of all flows,
using DNS Response packets found inside the network capture.

When matched, a DNS-resolved flow gets the following attribute
added to its attribute list: 'ip:resolv_name'.

The user can disable the DNS Caching by removing/commenting the following
metadata entry from the '[attributes]' section of the configuration file:
'ip:resolv_name'

**********************************************************
The configuration file
**********************************************************
This file contains several sections:

(1) The '[attributes]' section
**********************************************************
This section contains an ixEngine attribute WHITELIST used for
metadata extraction and logging.

Each attribute defined in this file will generate a CSV column.

Note: this configuration file is optional.
If not provided, the application will not write any protocol metadata
to the CSV output, but only flow classification status and metrics.

Line format is:
<protocol>:<attribute>

(2) The '[exclude]' section
**********************************************************
This section contains attribute/values that we don't want to keep
in flow logging, because they are generic and/or related to
mainstream client/server applications.

The attribute values described in this file can contain the '*' wildcard,
which turns them into PREFIX and/or SUFFIX PATTERNs.

Attribute value PATTERNs are case sensitive.

The valid '*' positions in PATTERNs are:
- no '*' at start or end: EXACT matching is required,
- '*' at the start of the pattern: SUFFIX matching,
- '*' at the end of the pattern: PREFIX matching,
- '*' both at start and end: ANY POSITION matching.

Line format is:
<protocol>:<attribute>:<pattern>

Examples:
1. Adding the "http:user_agent:Mozilla*" line will make the pcap_logger discard
any HTTP:User-Agent ixE metadata starting with "Mozilla", eg. "Mozilla/5.0".
2. Adding the "ip:resolv_name:*akamai*" line will make the pcap_logger discard
any DNS:Query ixE metadata containing the "akamai" string.

