Tuesday, May 22, 2007

Oh God!

Oh god... am I like that? (don't answer!)

You are javascript. People often think you are somebody else.  You tend to be annoying to most people, but it's not your fault.  You jsut get used.
Which Programming Language are You?

Saturday, May 12, 2007

Food for thought

Peter Lin is doing quite a few posts lately sharing his thoughts about temporal logic implementation in the last few days. Thank you Peter!

In this post, he makes a point in the AI x Business Rules scenarios and I agree with him. Although, the most fun is the AI side, the Business Rules side is the one that gets more practice and real use out there. Can we attend to both? :)

Anyway, he also talks about the retraction of facts from the network. The approach for doing it is not clear for me either, but I agree with him that the polling is complex and probably a waste of resources. I was thinking if it is possible to have a kind of registry, where facts registry themselves for removal at a specific time and this way we avoid the polling? Kind of what happens in simulation systems where the events are recorded in a time line in a way that the clock does not need to simulate every single instant in the time line... the clock can simply go from one event to other, since between events, there is no state change.

But that brings us to the most difficult design decision for me in the beginning of this research: how to model the clock?

Maybe Peter can blog about it? :)
Basically my thoughts and questions are:

* one can make the engine work with the either the real system clock or a pseudo clock implementation.

* using the real clock seems wrong to me, as the engine has no control over it. So if the engine is under a heavy load or must fire a lot of activations as a result of a given state, it may lose time windows and get inconsistent.

* using event's timestamps as the clock driver for temporal rules seems also no good for me because again, the engine has no control over arriving events (and it's clock). An example of problem with such approach is: what happens if events arrive out of order or from multiple, de-synchronized streams?

* so, the only option I feel it is feasible is to implement a pseudo clock over which the engine has full control. The user would still be able to constrain patterns of event's timestamps, since they are natural attributes of the facts, but the temporal behavior would be defined in relation to this pseudo clock. Question is: how does this pseudo clock gets "updated"? will the user explicitly call a "tick" method for each clock tick? will the engine somehow run the clock based on an external resource? I came to no conclusion so far and appreciate directions in case anyone already did something similar in the past.

Thinking out loud... ;)

Edson

Tuesday, May 8, 2007

Red Hat Brazilian Office


Well, catching up with news, 3 weeks ago Red Hat moved into the new office here in Brazil. A really nice office with good space and a good "atmosphere" (well, talking about people mood, not exactly the air in the biggest South American city).

The image at the left is the office reception. No doubts about what company it belongs to, right? :)



Here some of the technical people at the coffee room. From left to right, Carlos, Marcio, Maurício, Alberto e Filipe.


For those curious about São Paulo, a view from the window in the coffee room.


Here you see me and Fernando. Yes, I'm bald. Yes, it was cold. :)


Well, another great day at work. A quick post just to share the photos.

Sunday, May 6, 2007

Event Stream Processing, Complex Event Processing and Rules Engines

For those wondering why I finally decided to create my own blog, it all started in my last class of Non-Classic Logics at USP. The teacher was explaining about one of the temporal logics that were formalized, known as Logic US (that stands for Until/Since).

At that moment, something clicked and I remembered Mark commenting about supporting CEP/ESP as an extension to the Rete engine we have implemented in JBoss Rules. It seemed a perfect match: supporting temporal logic would give us all we need to also reason over stream of events. I wish it was that simple.

Yesterday I spent the day looking for information about possible research that might have been done and what is the state of the art in the field. I confess, besides having a reasonable knowledge of Rete, I have no experience in ESP/CSP.

As you can imagine, there is a huge amount of information about it in the internet and I couldn't find/read it all, but so far, the following is what I was able to gather (trying a top-down approach).

Event Driven Architecture

First thing I tried to understand is the terminology, the use cases and state of the art in the ESP/CEP field, and I ended up reading this excellent post by Brenda Michelson, where she provides an overview of the terminology and how they relate to each other. I will not repeat here all what she says, but I found specially interesting the Event Driven Architecture she describes there. From it, I was able to define the boundaries of my current research.
What I'm interested is in how we can leverage the features of a rules engine, more specifically a Rete implementation, to handle CEP and ESP. In other words, from the reference architecture she mentions, I'm interested in the Event Processing Engine component.

ESP and CEP

ESP and CEP have quite a lot of information published. From David Luckham's excellent site, to university research projects, to products developed to do it, like Esper and StreamCruncher. A very prolific field.

Rules Engines

Besides all that information, I found just a few mentions about leveraging Rete to do it. James Taylor has three posts about it ([1], [2], [3]) and they are all 1+ years old.

I found a recent post (April, 2007) in a blog by Ashwin Jayaprakash where he addresses the issue. In his words:

The tenets of Event Processing

A facility to consume the state(s) of a System as discrete Events, a language or User Interface to specify the conditions/properties to look for in those Events from one System or across Systems (Co-relation) and provide actions to be performed when such situations arise.

It is no accident that Event Processing looks like a cross between Database Query processing and Rule Engines. There is no denying that the technology evolved from existing technologies. However the distinguishing characteristics of Event Processing technology are - the constructs it provides to specify Time based boundaries, fixed size moving Windows, Aggregation and Partitioning/Grouping Events - on a stream of incoming Events and a facility to continuously scan these Events for telltale signs, all this expressed as a concise Query.

Database Triggers which in turn execute SQL/Stored Procedures every time an Event/Row is inserted can provide rudimentary Event Processing functionality. Rule Engines can be used to discern information from Facts being asserted. But none of these technologies alone provide a concise and yet very expressive Domain Specific Language along with a platform, specifically built to constantly monitor/process live streams of Events.


I specially agree with Ashwin's statement that a Rules Engine as we know them, alone, can not provide a concise and yet expressive DSL to handle ESP and CEP. Although, I prefer to think (I guess in a similar fashion to what James Taylor says) that we can leverage a Rete Engine to address such requirements. We all know that Rete Engines excel at pattern matching tasks, so basically what is missing is the capability to handle streams of events where just some of them are really "interesting" for the processing (a "select"-kind feature) and even those selected events get old quickly and must be discarded (sliding window and temporal logic support) in a way to scale the solution. The expressiveness of the language I guess is a consequence of the addition of the mentioned features.
This may be an over simplification of the subject, but that is how I see it so far.

Current Approaches

The few current approaches I was able to read about since yesterday are extensions to the relational engines to handle temporal constraints and extensions to the SQL language to allow for querying such engines (I didn't read documentation about proprietary solutions like Coral8 and StreamBase yet).

  • StreamCruncher for instance implements and engine that is backed by a relational database and leverages it to handle ESP/CEP.
  • Esper on the other hand, seems to implement its own relational engine, but the language is also an extension of SQL.
While both approaches seems good to me, it itches me even more to understand how a Rules Engine approach would do in comparison with the existing ones.

Floyd Marinescu posted last year an article to InfoQ talking about the Esper release where he says that "The rules engines proved cumbersome and slow for this use case". He also quotes Alexandre Vasseur:

[Esper] is not to be confused with classical rules engines...The internals of Esper are made up of fairly complex algorithms primarily relying on state machines and delta networks in which only changes to data are communicated across object boundaries when required.

On the proprietary software side, I found a post about TIBCO, stating it uses "a modern compiled variant of the Rete algorithm for class-leading performance". It only mentions CEP though and I'm not sure it is used as a synonym for ESP or not.

In this comment, Daniel speaks about the CEP capabilities of ILog JRules, but not about ESP.

Given all the mentioned above, I guess that in order to believe, I need to see it myself.

Event Stream Processing Requirements

From my readings, I was able to draft a list of requirements, that in my opinion, a Rete engine would need to comply to be able to handle ESP and CEP tasks. They would be:

  • Support for events as a special type of facts: while facts are long living pieces of information, events have special temporal requirements and the engine must be able to work with both at the same time, inside the same working memory.
  • Support asynchronous multi-thread feeding: events may arrive at any time and from multiple sources (or channels), so the engine must support asynchronous, multi-thread feeding of events.
  • Support event selection: only a subset of the events going through the event stream are interesting for a given set of rules. The engine must be able to identify (select) such events and ignore the others.
  • Support events garbage collection: events grow old, quickly or slow, but they do grow old. The engine must be able to identify the events that are no longer needed and dispose them as a way of scaling. It is simply impossible to hold events indefinitely. Ideally, the identification and disposal would be automatically, using the results of rules compile time analytics.
  • Support temporal relative constraints: one way of defining the window of interest in a stream of events is do define patterns of events that start and close each window. Such windows would be defined through the use of temporal constraints with operators like "since" and "until". One example of a rule using such constraints would be: "Since the stock price went under US$ X, watch for any buyers of more than 1% of total stocks, until the day is over". The above rule will start to watch for any buyers when the stock price drops under a threshold, but once it happens, will continue watching them until the end of the day, even if the stock price goes over the threshold again.
  • Support temporal absolute constraints: the rules engine must implement a pseudo-clock, that may or may not (depending on the use case) be synchronized with the real clock and allow for constraints to be defined over the sliding time window relative to the pseudo-clock. Events outside the time window are old and should be discarded.
  • Support reasoning over absence of events: the engine must support the definition of rules that will fire over the absence of events in the expected time-frame (relative or absolute). Example: "If the temperature goes over the threshold and no contention measure is taken within 10 seconds, then sound the alarm".
  • Support to common mathematical accumulations of the events in the streams, like average, max, min, etc.

The whole event matching and correlation requirements are already provided by the rules engine. The main question though is:

Is it possible to implement all the above functional requirements as an extension to a Rete engine and still comply to the non-functional requirements, specially performance?

That is what I will try to understand over the next few weeks. Stay tunned and please, I do appreciate feedback, suggestions and critics.

I Joined the Statistics

Yes, I joined the statistics. I started my own blog. After all, there is a saying: "If you can't beat them, join them!". :)

I believe a quick first post is appropriate for introductions, so lets get over it: my name is Edson Tirelli, 28 years old, born and living in Brazil, graduated in Computer Science by the Federal University of Santa Catarina, current working for Red Hat as a Senior Software Engineer. More specifically, I'm one of the core developers for JBoss Rules and I eventually write some posts for the JBoss Rules blog.

I created this blog to share some day to day thoughts, activities and news. So, take the content of it very lightly an enjoy. I will continue to post more serious stuff in the JBoss Rules blog. More info on next posts.