LoGS Blog
2008-02-29
0.2.0_release_party
I'm in the process of releasing LoGS 0.2.0 (thats like LoGS 4.7.5 under any sane versioning scheme ;P )
The proposed 0.2.0 branch has been merged back into the HEAD code.
0.2.0 contains a lot of SBCL improvements; in fact, SBCL is now my preferred platform for LoGS (despite some performance issues).
There have been a lot of "growing up" sorts of features added to LoGS for the 0.2.0 release. LoGS has an init script now and various things that support being able to run LoGS as a service on a production type machine!
2007-11-02
mo betta rdl action
Lets try that again...
We could write the rule like this and simplify our lives:
(defvar *ssh-scanner-rule*
(rule
matching "Invalid user (.*) from (.*)"
binding variables (user ip)
doing
#'handle-invalid-user-message))
I'm using the RDL to replace the find-invalid-user-message function - that saves me a lot of code!
SSH Scanning Jerks
I don't know no invalid users!
So, I'm working on putting together a new cluster at the CHPC. OK, well, actually what I'm up to is ripping apart an old cluster and putting it back together.

Anyway, there's a fun dance that DHCP and PXE do that is very easy to get wrong. I like to watch /var/log/messages while I'm installing machines to make sure that, yes, they picked up the right server, have the MAC adress I'm expecting and the like. Anyway, there have been something like 3 occasions in as many days where watching /var/log/messages is going fine, then, all of a sudden, I get a flood of login attempts via SSH from like china or god knows where... Anyway, its not somewhere that my users are. Of course, there is a .0001% chance they may be on sabbatical or something, so we can't just block the rest of the unknown universe at our border firewall.

So, after the third flood of invalid login attempts (NO, I DO NOT HAVE A USER NAMED 'webmin' - That should save you bad guys a little effort
), I got to thinking - Time for some LoGS action!Here, I'm going to try to document the creation of a small ruleset to catch these baddies. For this ruleset, I'll be using the 0.2.0 proposed branch of LoGS because thats still what I'm working on/with

My first take on this:
I think that if I see someone give me a whole boatload of this crap, we can do something about it:
Oct 28 17:03:11 l096 sshd[16651]: Invalid user michael from 62.123.192.212
Oct 28 17:03:35 l130 sshd[17500]: Invalid user rr from 62.123.192.212
Oct 28 17:03:24 l099 sshd[17188]: Invalid user michael from 62.123.192.212
Oct 28 17:03:23 l120 sshd[12610]: Invalid user alexandre from 62.123.192.212
Oct 28 17:03:48 l116 sshd[16923]: Invalid user alexandre from 62.123.192.212
Oct 28 17:03:25 l112 sshd[16801]: Invalid user alexandre from 62.123.192.212
Oct 28 17:03:28 l121 sshd[11007]: Invalid user transfer from 62.123.192.212
Oct 28 17:03:27 l125 sshd[16701]: Invalid user transfer from 62.123.192.212
Oct 28 17:03:21 l123 sshd[13014]: Invalid user transfer from 62.123.192.212
Oct 28 17:04:58 ll04 sshd[2866]: Invalid user alexandre from 62.123.192.212
Oct 28 17:03:27 l081 sshd[13585]: Invalid user mike from 62.123.192.212
Oct 28 17:03:29 l083 sshd[16865]: Invalid user mike from 62.123.192.212
(IPs not anonymized to implicate the guilty)
I'm noticing here that the same IP is trying multiple Invalid users. Most likely, my users wouldn't do this. It is possible to mistype your login on your ssh command line, but you probably would realize fairly quickly that you had done so. When I tried to log in as an invalid user, in this case 'jimistesting', I was given a password prompt as usual - for some reason all 3 passwords I entered were rejected by ssh... hmmm.... Here's my log message:
Oct 30 07:53:34 ll04 sshd[16472]: Invalid user jimistesting from 127.0.0.1
I'm thinking that if I keep track of which IPs from which I have see these 'Invalid user' login attempts and which invalid user they log in as, I should be able to distinguish someone mis-remembering her username and someone trying to break in. I'm thinking if you manage to try 3 different usernames, all invalid on my system, from the same IP address you can go into my block list on the firewall :)
Note: I'm actually disreguarding the host that the bad guy is trying to log into. 3 strikes and your out against any of my hosts listening on the SSH port.
First, we need a function to identify these "Invalid user" messages:
(defun find-invalid-user-message (message &optional env)
(declare (OPTIMIZE SPEED (DEBUG 0) (SAFETY 0)))
(multiple-value-bind (matches sub-matches)
(cl-ppcre::scan-to-strings
"Invalid user (.*) from (.*)"
(message message))
(when matches
(values t
`((user ,(aref sub-matches 0))
(ip ,(aref sub-matches 1)))))))
This function uses cl-ppcre to match a regular expression. It pulls out the username and the IP address from the log message and
returns them into the LoGS 'environment' associating them with 'user and 'ip respectively.
Now, we need to handle the messages we've found:
;; a table to hold hosts we've already reported as bad
(defvar *reported-bad-hash* (make-hash-table :test #'equal))
;; 3 minute window length
(defvar *ssh-scanner-window-length* (* *LoGS-internal-time-units-per-second* 3 60))
(defun handle-invalid-user-message (message &optional env &rest xxx)
(let ((window-name (format () "invalid user from ~A"
(get-logs-env-var 'ip env))))
(unless (gethash window-name *reported-bad-hash*)
(let ((window
(ensure-window
:window-length *ssh-scanner-window-length*
:max-lines 2
:name window-name
:actions (list #'report-ssh-scanner))))
(unless
(find-in-store (get-logs-env-var 'user env) window)
(add-item window (get-logs-env-var 'user env)))))))
Here we add the username to a WINDOW object named for the IP address that these messages came from, but only if that username isn't still in the window. This window will run its actions (we'll get to that in a sec...) when it gets more than 2 usernames in the window. this window will throw away usernames older than 3 minutes (*ssh-scanner-window-length*).
Finally, we need something for the window to do when it 'blows up' and runs its action(s):
(defun report-ssh-scanner (window &optional env)
(setf
(gethash (name window) *reported-bad-hash*) t)
(format t "window: ~A blew up~%"
(name window))
(map-store
(lambda (entry &optional level)
(format t "user: ~A~%" entry))
window))
Here we report the ssh scanner, if we have not already reported this IP. Then we print out a message with the window name, then list the usernames that are in the window.
Obviously, this is a really trivial response to this. Maybe we could have this shell out and run some iptables fu or something along those lines to automate a response to these scanners.
Finally, we tie this all together into a rule. Here I'm using "RDL" syntax:
Running it, I get output something like this:
(setf *ssh-scanner-rule*
(rule
matching
#'find-invalid-user-message
doing
#'handle-invalid-user-message))
(enqueue *root-ruleset* *ssh-scanner-rule*)
window: invalid user from 202.146.92.147 blew up
user: test
user: tester
user: testing
window: invalid user from 62.123.192.212 blew up
user: a
user: b
user: c
window: invalid user from 201.88.73.66 blew up
user: admin
user: stud
user: trash
Performance note:
On my laptop, I'm processing approximately 48,000 messages/second with just this rule in the ruleset.
2007-09-12
Bogus blog entry
2007-08-17
Hierarchical LoGS implementation
I started on implementation of a hierarchical LoGS setup. The basic idea is a N-tiered setup (two tiers for now :) where each, say, cluster has a local instance of LoGS running analyzing the interesting stuff for that machine. This local instance is responsible for reporting on/responding to issues that are local to that cluster. If it comes across anything that is globally interesting (maybe some port scanning activity on one or more nodes of the cluster) it is reported back to another instance of LoGS that is running somewhere to correlate the globally interesting events.
This is *VERY* work in progress right now.
A related side-note: One thing I've clearly learned thus-far is that LoGS needs to handle CTRL-C in some reasonable way. I've been using an instance of LoGS spawned over an ssh connection to a remote host as a Data Source for the instance of LoGS running on my desktop. When I ctrl-c on the desktop instance of LoGS, not only does it dump me into the debugger, but when I quit out of the debugger and exit Lisp, it invariably leaves the instance of LoGS running over the ssh connection running on the remote machine. Yuck!
I'm also realizing that the RDL stuff /really/ needs to be documented in the LoGS manual!!! OMG! I'd mostly forgotten how to use the RDL :)
I wonder if my little LoGS app could somehow be changed to have a network data source and distribute SBCL-based Linux binaries to others on campus that would read from their local log file(s) and would write to this network port to correlate these worms campus-wide.
In other words, I'd like to have something where I don't have to be granted access to a department's log server in order to get access to their ssh/whatever worm data.
I'm sure I could co-opt the CS log server to do the initial testing. George is nice. They have good sys-admins at CS. :)
Back to my current implementation....
Right now I'm looking for ssh invalid/failed logins in a global sense. The ruleset on the cluster (this is nano, for now) looks for cluster-specific issues, such as Torque problems which it reports to me. It also looks for these login problems and keeps track of how many times its seen a given username and ip address that have had problems logging in. It then reports this back to the instance of LoGS running on my desktop (right now, the desktop instance is spawning the cluster instance and the cluster instance is simply printing messages about the failed logins to stdout where they are read by the desktop instance. So that there is /something/ to correlate with, the desktop instance also watches its /var/log/messages file to look for failed login attempts there. In theory... it should be simple to add other clusters to this setup and correlate across all of them.
2007-02-19
2007-01-21
Proposed Change to the LoGS API
Changes proposed in the LoGS API for version 0.2.0.
Here is an example of how that worked (previous to 0.2.0):
(make-instanceAccording to The CL-Cookbook , each thread in a Lisp program will have its own bindings for dynamic variables. This may be a problem if LoGS wants to become multi-threaded (something I view as being highly probable). We probably won't be able to to do what we want under a multi-threaded model using dyamic variable bindings like this. Also, it seems to me that when referring to a variable, you likely want the most recent one from the stack of variable bindings in the LoGS "environment" (those variable bindings produced by match functions, etc. This means that either we must wait to evaluate all variables until the very last possible instant at run-time. Therefore, compilation buys us very little (I think). To remedy these problems, I'm proposing a separation between Lisp variables and those in the LoGS environment. Each of the match, delete, and action functions should now accept the parameter ENVIRONMENT. This parameter will contain a list of lists representing the stack of variable bindings. Each sub-list will have a CAR of the variable name and a CADR of the binding of that variable. I further propose adding an accessor function, GET-LoGS-ENV-VAR which will take the name of a variable and will return its value and a value indicating if the key was found (simular to CL's GETHASH). The previous rule would then be written as:
'rule
:match
(lambda (message)
(multiple-value-bind (matches sub-matches)
(cl-ppcre::scan-to-strings
"nfs server (\\S+) not responding"
(message message))
(when matches
(values
t
`((NFS-SERVER ,(aref sub-matches 0)))))))
:actions
(list
(lambda (message)
(declare (special NFS-SERVER) (ignore message))
;; NFS-SERVER is a dynamically bound variable
(format t "NFS problem on server: ~A~%" NFS-SERVER))))
(make-instance
'rule
:match
(lambda (message environment)
(declare (ignore environment))
(multiple-value-bind (matches sub-matches)
(cl-ppcre::scan-to-strings
"nfs server (\\S+) not responding"
(message message))
(when matches
(values
t
`((NFS-SERVER ,(aref sub-matches 0)))))))
:actions
(list
(lambda (message environment)
(declare (ignore message))
;; NFS-SERVER is no longer a dynamically bound variable
(format t "NFS problem on server: ~A~%"
(GET-LoGS-ENV-VAR 'NFS-SERVER))))
2006-10-19
Time issues with "offline mode" (of any tool)
Discussion of some thoughts John and I have come up with reguarding offline functioning of LoGS or SEC or ...
- Single-host time continuity problems
- Multi-host time continuity problems
- Too low resolution for time
- Too high resolution for time + synchronization problems
Single Host Time Continutity Problems
A single host may appear to travel forwards, backwards, or more quickly/slowly in time in a logfile if the system clock was altered while the log was generated.
Multi-host Time Continuity problems
Multiple hosts without perfect time synchronization may cause time to appear to go forwards/backwards often when analyzing logfiles from two or more hosts.
Too low resolution problems
When using standard syslogd, the resolution on the timestamps is one second. This can lead to ordering problems when working with multiple files.
Too high resolution problems
Because time synchronization cannot be perfect, when working with really high resolution timestamps in files, it may be possible to order events incorrectly based solely on their timestamps.
2006-09-28
Supporting Times
Offline Mode/Parsing Timestamps
Supporting working with time, especially for "offline mode" seems important.
One should be able to (painlessly) tell LoGS to be in "offline mode". Offline mode seems to imply at least:
- Not using the internal-time
- Parse the timestamps on the log messages; use them to set *now*
- *LoGS-INTERNAL-TIME-UNITS-PER-SECOND* should be set to the resolution of 1
(In "online mode", this is the value of INTERNAL-TIME-UNITS-PER-SECOND or 100 under CMUCL).
In order to support this, we introduced the --parse-timestamp, --timestamp-start, and --timestamp-end flags.
--parse-timestamp causes LoGS to not use the internal-time and, instead, to set *now* by parsing the timestamp on the line. I'm not exactly sure what will happen when you parse a syslog file from, say leap day, when this year doesn't have a leap day! LoGS will probably die a flaming death and take you with it!
--timestamp-start causes LoGS to look at this position in the line for the start of the timestamp. This sets the internal variable *timestamp-start*. The default value of *timestamp-start* is 0.
--timestamp-end causes LoGS to use this position in the line for the end of the timestamp. This sets the internal variable *timestamp-end*. The default value of *timestamp-end* is 15.
Time formats:
Right now, LoGS understands only the standard syslog time format. This format is:
Mon Dt HH:MM:SS
eg:
Jun 9 12:34:56
2006-09-21
Working with Time
LoGS needs to be able to talk about times easily, at times.
LoGS, of course, already has facilities for doing things at certain times. The problem is that these times have to be specified in *LoGS-INTERNAL-TIME-UNITS-PER-SECOND* since the start of the Lisp process; isn't that convienant?!
What we would like to be able to say is something like: "write me a report on all the invalid ssh connections AT MIDNIGHT" or "... AT 12:01 PM"
While CMUCL has facilities for parsing dates that seem rather robust, these facilities are not portable. I found that Gene Michael Stover has created a library (I'm not quite sure of its name exactly!) that can be found here: http://cybertiggyr.com/gene/pdl/
I've started to do a little work to make this library support the date formats that I would like to see (such as a simple "12:34:56" to mean 12:34:56 AM today and "12:34:56 tomorrow" to mean, obviously, 12:34:56 AM tomorrow). These are the sorts of times I want to be able to use in my LoGS rulesets.
I've submitted a patch to Mr. Stover to support a couple of these formats. I hope that he will accept them so that I don't have to maintain a bunch of time parsing stuff inside of LoGS.
LoGS 0.1.2 released 9/22/2006
LoGS 0.1.2 released
