HIDS Agentless in AlienVault USM

June 29, 2016, 4:59 am

It provides the SSH authentication to the host you want to access. For Cisco devices (PIX, routers, etc), you need to provide an additional parameter for the enable password. The same thing applies if you want to add support for “su”, it must be the additional parameter.

1. Log into AlienVault USM.
2. Navigate to environment -> detection -> hids -> agentless
3. Click on 'New' and add new HIDS in agentless

You will notice Agentless is not running in red text.

Agentless daemon is running after adding device

4. Go to HIDS control center to enable agentless process if not started

5. If web interface did not work then you have to check in console and log

Here is log for agentless

grep agentless /var/ossec/logs/ossec.log

2016/06/29 15:08:01 ossec-agentlessd: INFO: Not configured. Exiting.

Let work with terminal

Getting started with agentless

6. You need to enable the agentless monitoring:

# /var/ossec/bin/ossec-control enable agentless

7. Listing agentless host in the system. It should list which we just added as below

/var/ossec/agentless/register_host.sh list

8. Update the configuration files by adding <agentless> to </ossec_config>

vi var/ossec/etc/ossec.conf

   <agentless>
      <type>ssh_pixconfig_diff</type>
      <frequency>36000</frequency>
      <host>host@192.168.100.xxx</host>
      <state>periodic_diff</state>
    </agentless>

9. Check ossec status by

/var/ossec/bin/ossec-control status

10. Restart the ossec by below command and check again the status.

/var/ossec/bin/ossec-control restart

Here it is active

↧

How access log work with OSSIM

July 5, 2016, 3:17 am

≫ Next: OSSEC Decoder

≪ Previous: HIDS Agentless in AlienVault USM

Access log moves to sensor / data source then I mapping to event id with considering the rules in ossim.

Data sources can be found in “ossim ->configuration –> threat_intelligence –> data_source” and search for source as below. Pick “AlienVault HIDS-accesslog” and it reads the access log.

Browser the data source from the UI.

Events are map to OSSEC event in here.

# /var/ossec/rules/web_rules.xml

Event range 31100–31199 is web access log rules

↧

OSSEC Decoder

July 8, 2016, 4:06 am

≫ Next: Grep quotes in Linux

≪ Previous: How access log work with OSSIM

Each application contains it's own log record format.
eg:

web.madhuka.lk 123.231.120.128 - - [27/Dec/2015:03:44:16 +0530] "POST /lksearch.php HTTP/1.1" 200 35765 "http://madhuka.lk/""Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0"

Here we add new ossec decoder called “custom-apache-access-log”
# /var/ossec/etc/decoder.xml

<decoder name="custom-apache-access-log">
<program_name>custom-apache-access-log</program_name>
</decoder>

Then test it

# /var/ossec/bin/ossec-logtest

It says

**Phase 2: Completed decoding.
No decoder matched.

No matching found as we did not write match for our new custom log decoder still. Let write prematch for our decoder

<decoder name="custom-apache-access-log">
<prematch>^web.madhuka.lk </prematch>
</decoder>

Then run again then it will hit our custom decoder as below

Adding new child decoder

<decoder name="custom1-apache-access-log">
<parent>custom-apache-access-log</parent>
<prematch offset="after_parent"> "POST \S+ \S+"</prematch>
<regex offset="after_parent">^(\S+) - - [(\S+) (\S+)] "POST (\S+) (\S+)" (\d+) (\d+) "(\S+)""(\S+)"$</regex>
<order>srcip, extra_data, extra_data, url, srcuser, status, extra_data, extra_data, extra_data</order>
</decoder>

Testing with

web.madhuka.lk 123.231.120.128 - - [27/Dec/2015:03:44:16 +0530] "POST /lksearch.php HTTP/1.1" 200 35765 "http://madhuka.lk/""Mozilla/5.0"

↧

Grep quotes in Linux

July 14, 2016, 3:04 am

≫ Next: Uncomplicated Firewall

≪ Previous: OSSEC Decoder

Count line when words has been matched

$ grep -c 'word' /path/to/file

Pass the -n option to precede each line of output with the number of the line in the text file
$ grep -n 'root' /etc/passwd

Ignore word case
$ grep -i 'word' /path/to/file

Use grep recursively under each directory

$ grep -r 'word' /path/to/file

Use grep to search 2 different words

$ egrep -w 'word1|word2' /path/to/file

Grep invert match

$ grep -v 'word' /path/to/file

you can force grep to display output in colors, enter:
$ grep --color 'word' /path/to/file

You can limiting the results count
$ grep -m 10 'word' /path/to/file

You can match regular expression in files (Syntax: grep "REGEX" filename)
$ grep 'word1.*word2' /path/to/file

? The preceding item is optional and matched at most once.
* The preceding item will be matched zero or more times.
+ The preceding item will be matched one or more times.
{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.
{,m} The preceding item is matched at most m times.
{n,m} The preceding item is matched at least n times, but not more than m times.

Display N lines around match

Grep can display N lines after match (Syntax: grep -A <N> "string" filename)
$ grep -A 2 'word' /path/to/file

The following example prints the matched line, along with the two lines after it.
$ grep -A 2 -i 'word' /path/to/file

-C is the option which prints the specified N lines before the match.
$ grep -C 2 'word' /path/to/file

↧

Uncomplicated Firewall

August 1, 2016, 6:41 am

≫ Next: DiskPart in window (Fdisk in windows 8)

≪ Previous: Grep quotes in Linux

The Linux kernel in Ubuntu provides a packet filtering system called netfilter, and the traditional interface for manipulating netfilter are the iptables suite of commands. The Uncomplicated Firewall (ufw) is a frontend for iptables and is particularly well-suited for host-based firewalls.

Allowing port from any
$ sudo ufw allow 122/tcp

Listing the app and app infor
$ sudo ufw app list
$ ufw app info Squid

UFW status
$sudo ufw status verbose

Aollowing the port in the IP
$ sudo ufw allow from 192.168.3.231 to any port 443

↧

DiskPart in window (Fdisk in windows 8)

August 10, 2016, 9:16 am

≫ Next: Sending Brute force attack

≪ Previous: Uncomplicated Firewall

Unfortunately Windows does not support Fdisk anymore. But there is another good command line tool to solve this problem. DiskPart in windows is useful format unallocated spaces in USB pen.

1. Enter ‘diskpart’ in cmd

Then disk part will start

2. List down storage in PC by

list disk

3. Select the disk to fix by (my case it is disk 1)

select disk 1

4. clean all volumes and partitions on the disk

clean

5. Create a primary partition

create partition primary

↧

Sending Brute force attack

August 16, 2016, 2:17 am

≫ Next: OSSEC Rule Testing

≪ Previous: DiskPart in window (Fdisk in windows 8)

A brute-force attack consists of an attacker trying many passwords or passphrases with the hope of eventually guessing correctly. The attacker systematically checks all possible passwords and passphrases until the correct one is found. Alternatively, the attacker can attempt to guess the key which is typically created from the password using a key derivation function. This is known as an exhaustive key search.

Install pre requests
apt-get install python-ipy python-nmap python-paramiko

Getting osueta repo clone
git clone https://github.com/c0r3dump3d/osueta.git

Create username file
Create txt file with sample of user names in osueta directory (vi username.txt)

Generate brute force attack
Now send sample brute force attack from 'osueta'
./osueta.py -H 192.168.1.4 -p 22 -d 15 -v dos no -L username.txt

↧

OSSEC Rule Testing

August 23, 2016, 3:02 am

≫ Next: Creating New Rule set for OSSEC Server

≪ Previous: Sending Brute force attack

Introductions

In OSSEC, the rules are classified in multiple levels from the lowest (00) to the maximum level 16. But some levels are not used right now and below explain level details.
00 - Ignored
01 – None
05 – Error is generated by user
06 - Low relevance attack
08 - First time seen
12 - High important event
15 - Severe attack ( There is no chances of false positives)

Rules group are used specify groups for specific rules. It’s used for active response reasons and for correlation.

Checking Rules

You can find the OSSEC rule list ‘var/ossec/rules’. All this xml files in this directory contains the rules.

In rule xml file we have name group (‘group name’) at parent level of the xml <group name="web,accesslog,">. In there you can define the rules as below

As example I need to get all 400

<rule id="31101" level="5">
<if_sid>31100</if_sid>
<id>^4</id>
<description>Web server 400 error code.</description>
</rule>

Then you need to skip resources file which end with .jpg, .css and .js

<rule id="31102" level="0">
<if_sid>31101</if_sid>
<url>.jpg$|.css$|.js$</url>
<compiled_rule>is_simple_http_request</compiled_rule>
<description>Ignored extensions on 400 error codes.</description>
</rule>

‘is_simple_http_request’ [1] is function which already inbuilt in OSSEC, if you building ossec from source you can customizing the this functions or added new function that will improve your rules.

Testing the Rules

Initial Test Case

To test above rules you can add custom log record as below

In here we need to get current time by the terminal with below format.
23/Aug/2016:10:09:28 +0530

Setup sample apache log as below

now=$(date +"%d/%b/%Y:%T %z")
echo "192.168.100.78 - - [$now] \"GET /ossim/services HTTP/1.1\" 200 2295 \”-\” \"Mozilla Firefox/47.0\""

Log record adding to log file

httpStatus=400

logRecord="192.168.100.78 - - [$(date +"%d/%b/%Y:%T %z")] \"GET /ossim/services HTTP/1.1\" $httpStatus 2295 \”-\” \"Mozilla Firefox/47.0\""

Then add it the log file for testing our use case with custom rules

echo '{string}'>> file.txt

Our apache log is in /var/log/apache2/access.log

logRecord="192.168.100.78 - - [$(date +"%d/%b/%Y:%T %z")] \"GET /ossim/services HTTP/1.1\" $httpStatus 2295 \"Mozilla Firefox/47.0\""

echo '$logRecord'>> access.log

Or you can try this for testing

echo "192.168.100.78 - - [$(date +"%d/%b/%Y:%T %z")] \"GET /ossim/foo/ HTTP/1.1\" $httpStatus 3360 \"-\" \"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\"">> access.log

Second use case

Customizing the rule id="31104" which is in ‘var/ossec/rules/web_rules.xml’ as below to testing

<rule id="31104" level="6">
<if_sid>31100</if_sid>
<url>foo</url>
<description>Common web attack.</description>
<group>attack</group>
</rule>

Then restart OSSEC server which is content to OSSIM or AlienVault.

Then send rule triggering log record as below

echo "192.168.100.78 - frank [$(date +"%d/%b/%Y:%T %z")] \"GET /ossim/go/foo HTTP/1.1\" $httpStatus 3360 \"-\" \"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\"">> ../../log/apache2/access.log

Here is the trigger in OSSIM UI.

[1] https://github.com/Madhuka/ossec-hids/blob/master/src/analysisd/compiled_rules/generic_samples.c#L106

↧

Creating New Rule set for OSSEC Server

August 24, 2016, 1:51 am

≫ Next: Connecting to OSSEC rule from OSSIM

≪ Previous: OSSEC Rule Testing

In here I am using well known decoder in OSSEC if you need new OSSEC decoder you can write new decoder also [1]. Add new file to rules directory in OSSEC.

Creating new OSSEC rule set

$ vi var/ossec/rules/custom_access_rules.xml

In here I am interest to monitor web user behavior model. So I only need 200 http status code and I mark that rule with level 05 as it is important to this use case.Mark sure that rule id is unique. I am using ‘accesslog’ decoder as I am reading web access log in here. Here is content of my new ossec rule xml files.

<group name="web,accesslog,">
<rule id="70000" level="0">
<category>web-log</category>
<description>Access log messages grouped.</description>
</rule>

<rule id="70001" level="5">
    <if_sid>70000</if_sid>
    <id>^2</id>
    <description>Web server 200 respond code.</description>
</rule>

</group>

Restart the OSSEC

./bin/ossec-control restart

Testing the Rules that we just created with OSSEC log test

./bin/ossec-logtest

Add 200 http status log record

Here is my sample log record

123.231.120.128 - - [27/Dec/2015:03:44:16 +0530] "GET /lksearch.php HTTP/1.1" 200 35765 "http://madhuka.lk/""Mozilla/5.0"

Here it is trigger the event of rules that we created for web access log

↧

Connecting to OSSEC rule from OSSIM

August 29, 2016, 4:23 am

≫ Next: Adding More user data field for Event

≪ Previous: Creating New Rule set for OSSEC Server

Pre request

Test OSSEC new log from ‘ossec-logtest’

Here is the custom created rules. This rule is mainly looking on url with word with ‘payment’

<rule id="31181" level="6">
   <if_sid>31100</if_sid>
   <url>payment|paid|pay|pays|bar</url>
   <description>Customer payment attempt.</description>
   <group>attack,</group>
</rule>

1. Update the OSSIM plugins

OSSIM plugin need to update to map OSSEC rule to OSSIM agent plugin

etc/ossim/agent/plugins/ossec-single-line.cfg

eg: 31181=7058

2. Check the rules is visible to ossim

Rules id will shows ‘environment –> detection –> hids –> Edit Rules’

3. Adding OSSIM Event

Add new event type as below by navigating on ‘configuration --> threat_intelligence --> data_source’

Re configure the OSSIM server

ossim-reconfig -c -v –d

Test OSSEC rule mapping to OSSIM

send the below request

httpStatus=400
alienvault:/var/ossec/rules# echo "192.168.100.251 - testuser [$(date +"%d/%b/%Y:%T %z")] \"GET /myapp/pays HTTP/1.1\" $httpStatus 3360 \"-\" \"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116

Here Alert comes

add rules id for

alienvault:/etc/ossim/agent/plugins# httpStatus=404

alienvault:/etc/ossim/agent/plugins# echo "192.168.100.251 - testuser [$(date +"%d/%b/%Y:%T %z")] \"GET /ossim/go/payment HTTP/1.1\" $httpStatus 3360 \"-\" \"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\"">> ../../../../../var/log/apache2/access.log

ossim-reconfig -c -v -d

↧

Adding More user data field for Event

August 31, 2016, 11:19 pm

≫ Next: Triggering action or email over the event occurrence in OSSIM

≪ Previous: Connecting to OSSEC rule from OSSIM

We need to have extra user data field on our security event. We need to know

event occurred time
Host Server IP

Editing particular event on ‘/etc/ossim/agent/plugins/ossec-single-line.cfg’. We can achieve it. We are interest on Web group and ID 0030. We added below line as our need.

userdata3={normalize_date($date)}
userdata4={resolv($hostname)}

After Editing it will be as below

[0030 - Web - group - 31xxx]
event_type=event
#precheck="web"
regexp="^AV\s-\sAlert\s-\s\"(?P<date>\d+)\"\s-->\sRID:\s\"(?P<rule_id>31\d\d\d)\";\sRL:\s\"(?P<rule_level>\d+)\";\sRG:\s\"(?P<rule_group>web[^\"]*)\";\sRC:\s\"(?P<rule_comment>[^\"]+)\";\sUSER:\s\"(?P<username>\S+)\";\sSRCIP:\s\"(?P<srcip>[^\"]*)\";\sHOSTNAME:\s\"(?P<agent_name>$[^$]*\)\s+)?(?:\S+@)?(?P<hostname>(?(agent_name)(?:\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})|(?:\S+)))(?:->\S+)?;\sLOCATION:\s\"(?P<location>[^\"]*)\";\sEVENT:\s\"\[INIT\](?P<request>.*)\[END\]\";"
date={normalize_date($date)}
plugin_id={translate($rule_id)}
plugin_sid={$rule_id}
device={resolv($hostname)}
src_ip={resolv($srcip)}
dst_ip={resolv($hostname)}
username={$username}
userdata1={$rule_comment}
userdata2={$request}
userdata3={normalize_date($date)}
userdata4={resolv($hostname)}

Then Trigger this with 404 web request. Here we see those custom user data field as below.

This user data field can improve you OSSIM directives and Rules.

↧

Triggering action or email over the event occurrence in OSSIM

October 7, 2016, 1:35 am

≫ Next: Syscheck in OSSEC

≪ Previous: Adding More user data field for Event

Triggering action over the event occurrence in OSSIM is going to explain in this article.
There is agent in the system with IP, 192.168.80.22. Email is to be send to server admins whenever this agent disconnect and reconnect to SEIM server.
Below is the sample event

Here is event ID and data source ID that are interested when agent start to communicate with SEIM server.

Event Name: AlienVault HIDS- HIDS agent started.
Event Type Id: 503
Data Source Id: 7007
Raw log: AV - Alert - "1475748734" --> RID: "503"; RL: "3"; RG: "ossec,"; RC: "Ossec agent started."; USER: "None"; SRCIP: "None"; HOSTNAME: "(my-agent) 192.168.80.22->ossec"; LOCATION: "(my-agent) 192.168.80.22->ossec"; EVENT: "[INIT]ossec: Agent started: 'my-agent->192.168.80.22'.[END]";

Here is event ID and data source ID that are interested when agent disconnected from server .

Event Name: AlienVault HIDS- HIDS agent disconnected.
Data Source Id: 7007
Event Type Id: 504
Raw log: AV - Alert - "1475662394" --> RID: "504"; RL: "3"; RG: "ossec,"; RC: "Ossec agent disconnected."; USER: "None"; SRCIP: "None"; HOSTNAME: "alienvault"; LOCATION: "ossec-monitord"; EVENT: "[INIT]ossec: Agent disconnected: 'my-agent-192.168.80.22'.[END]";

Let’s start it in action
Adding action

Navigate action

Configuration --> Threat Intelligence --> Actions

Click on 'New' to create a new action on OSSIM
Fill the form and select 'send an email' for the 'Type'

After completing the form save it

Create a policy

Navigate policy

Configuration --> Threat Intelligence --> Policy

Add new policy for ‘Default policy group’
File the form

Make sure you fill below in correctly.
Policy Rule Name
Source (it is you agent name or IP)
Destination (it can be any)
Action (Pick the action we just added)

Click 'update policy' and you have to reload the policy then by click on 'Reload Policy' button in policy group level
Order the policy in correct order
Add correct event ID for the group

Save it and reload it again.

Test it..

↧

Syscheck in OSSEC

October 31, 2016, 3:16 am

≫ Next: Cleaning OSSIM Alarms

≪ Previous: Triggering action or email over the event occurrence in OSSIM

If you’re familiar with SEIM tools or OSSEC, then you know syscheck. Syscheck is the integrity checking daemon within OSSEC. It’s purpose is simple, identify and report on changes within the system files. Once the baseline is set, syscheck is able to perform change detection by comparing all the checksums on each scan. If it’s not a 1 for 1 match, it reports it as a change. If new files are added, it identifies it as new, and reports it. Syscheck options are available in the server, local and agent installation.

In /var/ossec/etc/ossec.conf we can find the Syscheck config. The frequency option is in seconds and is defaulted to 22 hours (or 79,200 seconds). You have added below for new file adding.

<alert_new_files>yes</alert_new_files>

Syscheck in OSSEC is also leveraged the inotify system calls as its detection engine.

You can ignore files in directory using below rules, with rules level 0 or using 'ignore' tag

Option attributes

realtime
check_all
check_sum
frequency
scan_day
auto_ignore
refilter_cmd
- This option can potentially impact performance negatively

By default when a file has changed three times, new changes will be automatically ignored. Handy but it could be improved. When I’m deploying security tools and control, my goal is to reduce the “noise” as much as possible. A side effect of file integrity monitoring is the number of false positive alerts generated.

[1] https://blog.rootshell.be/2013/05/13/improving-file-integrity-monitoring-with-ossec/

↧

Cleaning OSSIM Alarms

October 31, 2016, 4:13 am

≫ Next: WSO2 ESB with JavaScript Object Notation

≪ Previous: Syscheck in OSSEC

Working on an Alienvault IDS system or OSSIM you can come across over huge amount of alarms are created will system migrations.

use the ossim-db command:
> ossim-db

use the alienvault database:
> USE alienvault

Check for Alarm tables

>SHOW TABLES LIKE 'alarm%';

Get table description

>DESCRIBE alarm;

Get the number of records in a table 'alarm'.

>SELECT COUNT(*) FROM alarm;

Listing 20 timestamp in alarm table which are created today

>SELECT timestamp FROM alarm WHERE DATE(timestamp) = CURDATE() limit 20;

Let make status for Close for today alarms

>update alarm set status = 'closed' WHERE DATE(timestamp) = CURDATE();

Before -------- -----> Now

↧

WSO2 ESB with JavaScript Object Notation

February 20, 2017, 10:15 am

≫ Next: Handling simple denormalized data from Talend

≪ Previous: Cleaning OSSIM Alarms

There is few thing that make my work enjoyable with WSO2 ESB as it provides support for JavaScript Object Notation (JSON) payloads in messages. It is not very new feature and it old feature.

It supports

JSON message building
Converting a payload between XML and JSON
Accessing content on JSON payloads
Logging JSON payloads
Constructing and transforming JSON payloads
Troubleshooting, debugging, and logging

But I will explain some basic feature on this and it is worth to know. It makes your task easy all the time. It is accessing content on JSON payloads

Listing
Sorting
Searching
Picking
Comparing

<?xml version="1.0" encoding="UTF-8"?>
<proxy xmlns="http://ws.apache.org/ns/synapse"
       name="testadmin2"
       startOnLoad="true"
       statistics="disable"
       trace="disable"
       transports="http,https">
   <target>
      <inSequence>
         <payloadFactory media-type="json">
            <format>
            { "store": {
    "book": [
      { "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      { "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      },
      { "category": "fiction",
        "author": "Herman Melville",
        "title": "Moby Dick",
        "isbn": "0-553-21311-3",
        "price": 8.99
      },
      { "category": "fiction",
        "author": "J. R. R. Tolkien",
        "title": "The Lord of the Rings",
        "isbn": "0-395-19395-8",
        "price": 22.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
}
}
            </format>
            <args>
               <arg evaluator="xml" expression="//store/name"/>
               <arg evaluator="xml" expression="//store/price"/>
               <arg evaluator="json" expression="store.toppings.topping"/>
            </args>
         </payloadFactory>
         <property name="messageType" scope="axis2" value="application/json"/>
         <log>
            <property expression="json-eval($.store.book[*].author)" name="JSON-book"/>

<property expression="json-eval($..book[?(@.price>12)])"
name="Book-price"/>

         </log>
         <send>
            <endpoint>
               <http method="POST" uri-template="http://127.0.0.1:8020/2"/>
            </endpoint>
         </send>
      </inSequence>
      <outSequence>
         <send/>
      </outSequence>
   </target>
   <description/>
</proxy>

↧

Handling simple denormalized data from Talend

January 13, 2018, 5:06 am

≫ Next: Handling BigDecimal in Talend

≪ Previous: WSO2 ESB with JavaScript Object Notation

Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. Today some systems may store data in a denormalized form and data integration tools able handle those. In this blog post Talend will be used to show case handling simple denormalized data set file.

For example, system stores state data with the following schema: [filed1];[[filed2.1],[filed2.2]] Schema is mapping [StateID];[[StateName],[PostCode]]. Here is the sample file ‘states.csv’.

StateID;StateName,PostCode
1;Alabama,009234
2;Alaska,009235
3;Arizona,009236
4;Arkansas,009237
5;California,009244
6;Colorado,009245
7;Connecticut,009214
8;Delaware,009278
9;Florida,0092897
10;Georgia,009247

Start Development in Talend Studio

Drop the following components from the Palette onto the design workspace: tFileInputFullRow, tExtractDelimitedFields, and tLogRow.
Connect them using the Row Main links.

Configuring the components

1. Double-click the tExtractDelimitedFields component to open its Basic settings view. Add the file path and Skip the header line.

Update the schema as below

2. Double-click the tFileInputFullRow component to open its Basic settings view. Edit the schema

3. Double-click the tLogRow component to open its Basic settings view. Edit the schema

Running

1. Save it and press ‘F6’

↧

Handling BigDecimal in Talend

January 21, 2018, 8:00 am

≫ Next: Enterprise Data integration Directions

≪ Previous: Handling simple denormalized data from Talend

Post is very basic one, Since Talend is all about data integration. Finding a BigDecimal [1] in such data set is very common.

BigDecimal VS Doubles

A BigDecimal is an exact way of representing numbers. A Double has a certain precision. Working with doubles of various magnitudes (say d1=1000.0 and d2=0.001) could result in the 0.001 being dropped all together when summing as the difference in magnitude is so large. With BigDecimal this would not happen.

The disadvantage of BigDecimal is that it's slower, and it's a bit more difficult to program algorithms that way (due to + - * and / not being overloaded).

If you are dealing with money, or precision is a must, use BigDecimal. Otherwise Doubles tend to be good enough.

Big Decimal Sample

First we go with Big Decimal value such as ‘1.8772265500517E19’. It means 1.8772265500517 x 10¹⁹. We need to pass it without scientific notation. You can used ‘tJava’ comment in Talend and used simple java to achieve this.

BigDecimal bigDecimal = new BigDecimal("1.8772265500517E19");
System.out.println(bigDecimal.toPlainString());

If you think you need to specific decimal point count also. Then you can used below line

System.out.printf("%1$.2f", bigDecimal);

2 is the number of decimal places you want. You can change as you need.

Here is the output

DoublesSample

There are few ways to achieve this such as 'Talend Routines' or 'tJava', etc. But here we used tJava component. Add below lines to the ‘Basic setting panel tab’

double sampleDouble = 1.8772265500528E9;
System.out.println(sampleDouble);
NumberFormat formatter = new DecimalFormat("###.####");
String sampleDoubleString = formatter.format(sampleDouble);
System.out.println(sampleDoubleString);

Then add below imports in to the ‘Advance Setting tab’

import java.text.NumberFormat;
import java.text.DecimalFormat;

Here is the out put of the job

Make sure you usedBigDecimal and Doubles in correct way in correct places.

[1] https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html

↧

Enterprise Data integration Directions

March 1, 2018, 4:32 am

≫ Next: Lifecycle of a Book in WSO2 Greg

≪ Previous: Handling BigDecimal in Talend

Enterprise Data Integration is a broad term used in the integration landscape to connect multiple Enterprise applications and hardware systems within an organization. All these enterprise data integration lead to achieve to remove the complexity by simplifying data management as a whole.

Unified Data Management Architecture

Unified Data Management Architecture offers reliability and performance of a data warehouse, real-time and low-latency characteristics of a streaming system, and scale and cost-efficiency of a data lake. More importantly, UDM utilizes a single storage backend with benefits of multiple storage systems which avoids moving data across systems hence data duplication, and data consistency issues. Overall, less complexity to deal with.

Common In-Memory Data Interfaces
It is a new data integration pattern. It depends on a shared high-performance distributed storage or a common data format sitting between compute and storage. Alluxio and Apache Arrow are sample for respectively. Apache Arrow has support for 13 major big data frameworks including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm.

Image result for Common In-Memory Data Interfaces

Machine Learning with Data Integration
Machine learning and artificial intelligence (AI) tools are based smart data integration assistants. These assistants can recommend next-best-action or suggest datasets, transforms, and rules to a data engineer working on a data integration project.

Event-Driven Data Flow Architecture
More and more organizations are jumping to the event-driven architecture with the view that it can provide real-time and fast the existing systems. To achieve this, organizations are utilizing distributed messaging system such as Apache Kafka, Message brokers. On top, they are implementing concepts such events, topics, event producers, and event consumers. A key aspect of event-driven data flow architecture is support for microservices architecture, and, more specifically, database per service patterns.

[1] https://tdwi.org/articles/2010/05/06/introduction-to-unified-data-management.aspx

↧

Lifecycle of a Book in WSO2 Greg

March 20, 2018, 5:20 am

≫ Next: JAVA8 Stream API and New Class Optional

≪ Previous: Enterprise Data integration Directions

The Lifecycle Management(LCM) plays a major role in SOA Governance. WSO2 Governance Registry Lifecycle Management supports access control at multiple levels in lifecycle state.
1. Permissions
1.1 Check items with permissions configuration
<permissions>
     <permission roles=""/>
</permissions>
1.2 State Transitions by transitionPermission configuration
<data name="transitionPermission">
     <permission forEvent="" roles=""/>
</data>
2. Validations
2.1 Check items by he validations configuration
<validations>
     <validation forEvent="" class="">
         <parameter name="" value=""/>
     </validation>
</validations>
2.2 State Transitions by transitionValidations
<data name="transitionValidation">
     <validation forEvent="" class="">
         <parameter name="" value=""/>
     </validation>
</data>
3. Resource Permissions at each environment
<permission roles=""/>
4. State transition approvals with voting procedure
<data name="transitionApproval">
     <approval forEvent="Promote" roles="" votes="2"/>
</data>

Use case
Just think how book is produce from writing to market. Book will have it’s own Book Life Cycle [2] and it main contain Acquisitions, Editorial, Production and Marketing

In Acquisitions state it will have some element or items such as Proposal, Submit manuscript, Peer Review, Approved by editorial board and Launched in to editorial department
Editorial state will contains Copyediting, Author review, typesetting and design, Page proofs ready, Proofreading and Final Author review
Book is printing and Shipped are event can be found in Production
Marketing state consist with Moving book in to warehouse, picking publication date, book announced, moving the book to the book stores and promotion continues

It is not only the life cycle, Book contains it’s own attribute (schema) [1]. You can define new asset type to WSO2 GREG. Asset can have custom lifecycle. Let added Book asset type to WSO2 GREG with RXT.

Here is Book RXT file [3]

Loading RXT https://gist.github.com/Madhuka/4f0a28341bfe25f362f17f9f09cfb7f6 ....

Here is Book Life Cycle file [4]

Loading LifeCycle https://gist.github.com/Madhuka/46cb1cdf225592c2b6c268d09e9001e9 ....

Add Both file to WSO2 Greg as it is show in below screen shoot

Adding Life Cycle to Greg 5.4.0

Adding custom Artifact type

Done!!

Now login to to Publisher and then you can add Books as below

https://localhost:9443/publisher/

Once you added all the data, You can see your book as below

You can start life Cycle management by clicking life Cycle button

You can see Book life cycle move as below

Once it publish you can see it in you book store as below

https://localhost:9443/store/assets/mybooklibrary/list

It is good to go for your own artifacts and life cycles.

References
[1] http://schema.org/Book
[2] http://passwordincorrect.com/life-cycle-book-chart-infographic/

[3] https://gist.github.com/Madhuka/4f0a28341bfe25f362f17f9f09cfb7f6

[4] https://gist.github.com/Madhuka/46cb1cdf225592c2b6c268d09e9001e9

↧

JAVA8 Stream API and New Class Optional

June 3, 2018, 4:59 am

≫ Next: Estimation for Software project development

≪ Previous: Lifecycle of a Book in WSO2 Greg

In this post give some basic on JAVA Stream API which is added in Java 8. It works very well in conjunction with lambda expressions. Pipeline of stream operations can manipulate data by performing operations like search, filter, count, sort, etc. A stream pipeline consists of a source such as an array, a collection, a generator function, an I/O channel. It may have zero or more intermediate operations for transform a stream.

Stream operations are divided into intermediate and terminal operations.

Intermediate operations return a new stream. They are always lazy; executing an intermediate operation does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Intermediate operations do not get executed until a terminal operation is invoked as there is a possibility they could be processed together when a terminal operation is executed.
eg: map, filter, flatmap, limit, sorted, distinct, peek

Terminal operations produces a non-stream, result such as primitive value, a collection or no value at all. Terminal operations are typically preceded by intermediate operations which return another Stream which allows operations to be connected in a form of a query.
eg: findAny, allMatch, count, max, min, etc.

Sample in can be found in here

Intermediate operations are further divided into stateless and stateful operations.

Stateless operations, such as filter and map, retain no state from previously seen element when processing a new element, each element can be processed independently of operations on other elements.

Stateful operations, such as distinct and sorted, may incorporate state from previously seen elements when processing new elements. Stateful operations may need to process the entire input before producing a result. For example, one cannot produce any results from sorting a stream until one has seen all elements of the stream.

Sample in can be found in here. You have be care full on this if you use in wrong order it may lead memory issue. As example if we sort() stream first then it will load all in to memory as shown in sample. (test_sorted_notShortCircuiting)

(limit_car_maker_test)

You can see all the elements are loaded in ‘test_sorted_notShortCircuiting’ though it have limit of 2.

Streams in serial or in parallel

You can execute streams in serial or in parallel. When a stream executes in parallel, the Java runtime partitions the stream into multiple sub-streams.

As example - Collection has methods Collection.stream() and Collection.parallelStream(), which produce sequential and parallel streams respectively.

When you do that, the stream is split into multiple chunks, with each chunk processed independently and with the result summarized at the end. In sample implementation of get sum of the longs method you can take advantage of parallelization and utilize all available CPU cores.

Optional

In the sample you will find new class Optional which was introduce with Java 8. It is used to represent a value is present or absent. The main advantage of this new construct is that No more too many null checks and NullPointerException. It avoids any runtime NullPointerExceptions and supports us in developing clean and neat Java APIs or Applications.

When a value is present, the Optional class just wraps it. Conversely, the absence of a value is modeled with an “empty” optional returned by the method Optional.empty. It’s a static factory method that returns a special singleton instance of the Optional class. Dereference a null will invariably cause a NullPointer-Exception, whereas Optional.empty() is a valid, workable object.

↧