Sunday, September 13, 2009

How to get AWStats to show Intranet location stats

I was asked by Teva's HR department to do something about the HR intranet "Portal". The requirement is to know how the HR intranet is used, based on the 3 W rules: What, When, and Where. Teva has multiple offices in Israel, and HR went through a major reorganization and restructuring of its services and methodology during the past years.

It is good they came to me first, as some of my colleagues would have made it into a multi-million dollars data warehouse analysis project :)

Having good experience with Awstats, I decided to use this tool to analyze the HR intranet logs.

AWstats provides good enough stats about the What and the When, but I hit a blank wall regarding the Where. There are several plugins that enable geo-ip analysis, but according to the documentation, all of them are useless for intranet only log files.

The plugins from MaxMind use a proprietary format, and do not include my 10.* network anyway.

This left me with the IPFree database and plugin as the only viable option to add branch awareness into the intranet stats.

Geo-IPfree can be found on CPAN. First thing is to download it.
We need a couple of tools that may or may not be included in the IPfree package. txt2ipct.pl and ipct2txt.pl are required for us to make the modifications to the IPfree database.
First, extract the IP database from ipscountry.dat into an editable text file.
perl ipct2txt.pl ./ipscountry.dat ./ips-ascii.txt
Let's suppose that you have 3 branches: London, Rome and Tel-Aviv. Your WAN network segments are respectively 10.1.*, 10.2.*, 10.3.*, and must be mapped into the IPfree database.

Open the ips-ascii.txt file with your favorite text editor and find your LAN/WAN IP range.
You have a line that looks like this
ZZ: 10.0.0.0 10.255.255.255
We will map your locations into codes Z1, Z2 and Z3, because there are no such ISO country codes.

So, replace your ZZ line with these five
ZZ: 10.0.0.0 10.0.255.255
Z1: 10.1.0.0 10.1.255.255
Z2: 10.2.0.0 10.2.255.255
Z3: 10.3.0.0 10.3.255.255
ZZ: 10.4.0.0 10.4.255.255
Just for safety, rename your current ipscountry.dat file, and execute
perl txt2ipct.pl ./ips-ascii.txt ./ipscountry.dat
Now, create a new Geo folder under the plugins folder in the awstats installation, and copy the IPfree.pm and ipscountry.dat files into the new folder.

We have to modify the lib/domains.pm file to recognize the new Z1, Z2 and Z3 domains. Just add them to the end of the list, and keep the new domain names in lower case. The last line of the domains.pm file will now look like this
'zm','Zambia','zr','Zaire','zw','Zimbabwe', 'z1', 'London', 'z2', 'Rome', 'z3', 'Tel Aviv'
The last thing to do is to turn on the geoipfree plugin in the awstats configuration file.

Have fun!!

P.S.
Maybe I will initiate the multi-million dollar web data warehouse project after all ...

Wednesday, September 2, 2009

Restoring my mobile phone backup to a new phone

My Sony Ericsson K610 mobile phone dropped dead.
The service guy removed my SIM and memory card from the fresh corpse, and stared me in the eye: "Do you have anything stored on the phone, sir? There are no phone numbers stored on the SIM"
I remembered storing everything on the phone instead of on the SIM.
I also remembered using Sony's PC Suite to backup the phone contents a couple of days ago.
"Don't worry, got backup" - I told the service guy.

Problem
The Sony Ericsson PC Suite recognized the new phone easily. I selected Tools/Backup&Recovery and started to shiver.
No backup was listed, and my sync to Outlook was 4 months old.

Research
First, I determined that I must find the old backup. I created a new backup for my new phone, called it "backup2" and learned that
the phone backups are stored with extension .dbk inside "My Documents", in folder
"Sony Ericsson\Sony Ericsson PC Suite\Phone backup".
As if by magic, my old backup file also turned out in this folder.
I realized that the file contents must be tagged with the phone information.

Opening the backup file with a text editor didn't help. It loooked almost completely binary.
However, there was a slight resemblance to something I saw before...

I created a copy of the dbk file, and renamed it with a zip suffix. That did the trick.
Solution
The key is in the phoneID.txt file.

Version=3.0
DeviceManufacturer=Sony Ericsson
DeviceModel=K610
HeartBeat=HB1-06
IMEI=3546xxxxxxxxxx
PhoneName=My K610

I replaced the file with the one found in the dummy backup of the new phone

Version=3.0
DeviceManufacturer=Sony Ericsson
DeviceModel=W595
HeartBeat=HB1-07
IMEI=3529yyyyyyyyyyyy
PhoneName=My W595

Changed the zip file extension back to dbk. Now Sony Ericsson PC Suite showed the old backups and I was able to restore all of my important information to the new phone.

Monday, January 26, 2009

Backing up to the cloud

My previous post was about using the Jets3t synchronize tool on an iSeries server.
While synchronize works quite well, it is not flexible enough for what I had in mind. To be able to integrate Amazon S3 with a real life backup and DRP policy, I want a set of atomic functions that enable me to simply transfer particular data objects between S3 and the server.
Using the Jets3t rich set of S3 objects and the samples provided by the author, I found it quite easy to write additional proof of concept tools.

I now have Create Bucket, Upload Single file, and DownLoad Single File programs. These are all barebones tools, use them at your own risk..

All tools use the iUtils helper tool that provides for central management of parameters and credentials. In fact, iUtils uses exactly the same configuration file Synchronize does.

I timed the Upload and Download tools with a 500MB savefile, and on my network it took less than 30 minutes to go either way.

The usage is quite simple:
  • CreateBucket requires a single parameter, the new bucket name.
  • uploadSingleFile requires two parameters: the target bucket name, and the file name to upload
  • DownLoadSingleFile requires two parameters: the source bucket name, and the file name to download.
Get the extended toolset files here. As I said before, these are barebones tools and there is a lot to improve. There is practically no graceful error handling, no compression and no encryption.

Using Amazon S3 in an iSeries backup/recovery scenario
S3 can be used as an offsite backup for some of your data, maybe even for all of your data. A library can easily be saved to a save file, and the save file uploaded to S3 storage until needed, at which time it can be downloaded and restored to your iSeries.

For example, look at the following set of commands that saves changed objects to a save file, zips it and sends it to S3. The extra zipping is required because the data compression built into the iSeries save commands is not very efficient, and because I have not implemented it yet as an integral part of the upload tool (although the functionality exists in the jets3t package).

/* create save file */
crtsavf s3wk200901

/* backup objects changed since December 31 to save file */
savchgobj obj(*all) lib(mylib) dev(*savf) savf(s3wk200901)
refdate('31/12/2008') clear(*all) dtacpr(*high)

/* copy save file to IFS */
cpytostmf frommbr('/qsys.lib/mylib.lib/s3wk200901.file')
tostmf('/backups/s3wk200901.savf') stmfopt(*replace)

/* further compress save file */
qsh cmd('jar cMf /backups/s3wk200901.zip /backups/s3wk200901.savf')

/* upload compressed save file to S3 */
qsh cmd('cpytos3.sh mybackup /backups/s3wk200901.zip')

Sunday, January 25, 2009

Synchronize iSeries files with Amazon S3 virtual storage

Amazon Web Services, or AWS, is one of the new and hot promises to revolutionize IT.
I subscribed to AWS almost two years ago, but started to really use it only on last October.

AWS provides huge benefits to IT, developers and web site owners.
For IT, other than the ability to dynamically allocate servers and pay for the actual server utilization, AWS provides the ability to have a reliable, secure remote storage.

I went out to see whether Amazon S3 lives up to my expectations and can play a significant part in a disaster recovery plan. Some smart folks already did some math a while ago, and the economic drivers for using S3 are now higher than ever.

It turns out that there are quite a few S3 tools. For S3 management I ended up using the S3 organizer, a free FireFox extension, and the S3 Cockpit tool, which is actually part of a larger package, Jets3t.

There is a major difference between personal tools and tools appropriate to use in a datacenter. The biggest challenge for me was to find those tools that support multiple platforms, and that once configured, work in unattended mode with no hassle.

I settled for Jets3t - a set of libraries and tools written in Java.
On a windows server, installation is straightforward: just unzip the distribution archive, set up some environment variables and you are ready to start. Of course, Java is a prerequisite.

Jets3t consists of a set of libraries that wrap the S3 API with Java objects, and 4 working tools:
  • Cockpit, a GUI application/applet for viewing and managing the contents of a S3 account.
  • Synchronize, a console application for synchronizing directories on a computer with an Amazon S3 account.
  • CockpitLite, a GUI application/applet for viewing and managing the contents of an S3 account, where the S3 account is not owned by the application's user directly but is made available via the Gatekeeper servlet.
  • Uploader, a wizard-based GUI application/applet that S3 account holders (Service Providers) may provide to clients to allow them to upload files to S3 without requiring access to the Service Provider's S3 credentials.
The most promising tool, as far as I am concerned, is Synchronize.
Working in a manner similar to rsynch, this tool can keep an S3 bucket and a local folder in synch with each other, while providing both compression and encryption in the cloud.

After making it work on my PC, I set out to make it work on an iSeries server.

Setting up on iSeries
The first thing to do is to install Jets3t on your server, in any IFS directory you like.
I placed it in the /QIBM/UserData/aws/jets3t folder.

Jets3t relies on some environment variables to be defined, specifically JETS3T_HOME and JAVA_HOME.
Make sure both are defined before executing the synchronize.sh script.
I set up both as global variables by running these iSeries commands:

addenvvar JETS3T_HOME '/QIBM/UserData/aws/jets3t' LEVEL(*JOB) REPLACE(*YES)
addenvvar JAVA_HOME '/QIBM/ProdData/Java400/jdk15' LEVEL(*JOB) REPLACE(*YES)

Additionally, the unix shell scripts that came with the package assume that the shell is in /bin/sh
This probably is not true on your iSeries server, so either modify the shebang lines or run this command to solve the problem

ADDLNK OBJ('/usr/bin') NEWLNK('/bin') LNKTYPE(*SYMBOLIC)

Synchronize supports file compression and encryption, as well as various modes of operation that make synching a breeze. For example, to synch the /home/shalom/personal folder with a S3 bucket called shalom-personal, you will execute

synchronize.sh UP shalom-personal /home/shalom/personal/

You do not have to create the bucket prior to using synchronize, it will create the bucket if it does not exist. ( tip: use the --noaction switch to verify your actions before commiting to any changes. )

To learn how to actually use synchronize, read the manual.

To make it easier on the iSeries folks, here is a CL program and CMD that wrap the synchronize tool in a familiar interface.

In the next post I will explain how to extend Jets3t to a real on-demand storage device with some Java programming.

Thursday, January 22, 2009

Processing XML on iSeries, Part 2

Read here the first part of this article, about XML and database integration using Saxon.

In last October I looked at DataDirect's XQuery tools. These guys are serious. Their web site contains one of the best Xquery references on the web today.

If you are an iSeries professional, you may ask yourself WTH is xquery and why you should care.

Xquery is a bit more than yet another XML processing language.

It attempts to deal with XML files in a manner similar to relational databases, and this is good. Many developers find it cumbersome to program an event driven process to deal with basically recurring events, like records in a XML message. It seems easier to write a couple of FOR loops (XQuery) rather than trapping template occurences (xslt). For some tasks, a plain procedural program flow rules. I had to develop some pretty complex XML transformation processes lately, and I found using xquery to be both more productive and more readable than XSLT. It is also easier to understand for non-xml experts.

DataDirect support the standard xquery model exceptionally well. Their XML studio shines, and they also extended the standard and added database support, the result being that now it is really easy to define interactions between XML and databases.

Like almost everything Java, I like to see if it is feasible to run it on iSeries, especially as I immediately realized the benefits of native execution: finally a tool to enable XML integration with DB2 tables, NOT WRITTEN IN RPG, maintainable by non-iseries buffs, and able to be inserted into native iseries business processes.

The bulk of my experience is already detailed here and I will not repeat it in this blog. I can just say that it works.

However, not everything is bright and shiny with DataDirect XQuery.

First, there is performance: on my 520 it sucks. A program that took a couple of seconds on a PC runs for more than 30 seconds. There are simply too many dependencies and jars being loaded to execute a single XQuery program. I believe that this issue can be addressed by not spawning a distinct xquery instance for each xml file, but I haven't got to do anything about it.

Then, there is the price... I will be fair to DataDirect and not disclose the price they wanted for an execution license on a System i model 520, but it was a bit on the high side. Had they asked for a price similar to their X86 license plan, we would have almost certainly bought it....

Sunday, January 4, 2009

Processing XML on iSeries, Part 1

I've always felt a bit sorry for the poor folks who insist on using RPG to write new code on the System i platform. I mean no offense, but IMHO clinging to outdated technology and claiming it superior is not a smart move. RPG is simply a very bad fit for many modern information processing tasks. Like processing XML.

Creating XML files is easy. You can think of it as a complex report, and as such a XML file can be created by a plain RPG II program with some O record wizardry.

On the other hand, reading an XML file file is complex. There's white space to ignore, completion checks to do, schemas to validate, DTD to test against, naming spaces, private tags, encodings to parse, and more.

On the gripping hand, we can limit our problem to getting the data out of XML and into data entities that are more familiar to System i developers, like database tables.

Fortunately, there are tools that let you map the relationship between data in XML and data in a table. Some are custom made for the System i and geared for RPG developers. I am deliberately going to ignore them in this post. If I have a choice, I prefer to use ubiquitous technology, that is easy to understand by hordes of programmers. Other solutions require deployment of ESB / ETL / EAI or similar infrastructures. While I am a big fan of ESB, sometimes you need a local solution rather than depend on external processes and processing, not to mention the costs of integration, licenses, and servers.

That leaves us little choice but to focus on Java based tools.

The first one is Saxon, to be found here http://www.saxonica.com .
Saxon is among the leading XML transformation tools. In fact, it is embedded into a very large number of commercial XML processing tools and libraries.

Saxon has an extension, called Saxon-SQL, that lets you embed SQL data modification statements into a XSLT transformation stylesheet.
You need Java at version of at least 1.4 , and the Saxon distribution. The free Saxon-B is good enough for our needs, but I recommend that you get Saxon-SA - at £300.00 per server it is a real bargain.
The JT400 package is also required in order to use JDBC connections to the System i database.

I unzipped the entire distribution into a new directory: /QIBM/UserData/saxon

Now let's peek at the XML and database integration task.
Let's suppose that we are to insert into the Work Order application table data from XML.

The XML looks like this:
<?xml version="1.0"?>
<WO_OUT>
<ROW>
<WO>U10002</WO>
<ITEM>80012470</ITEM>
<FACILITY>PL</FACILITY>
<START_DATE>20081126</START_DATE>
<DUE_DATE>20081129</DUE_DATE>
<QUANTITY>654321</QUANTITY>
<TV_TRANSACTION_ID>85246786</TV_TRANSACTION_ID>
<INTERFACE_TRANSFER_DATE>20081015</INTERFACE_TRANSFER_DATE>
<INTERFACE_TRANSFER_TIME>110903</INTERFACE_TRANSFER_TIME>
</ROW>
</WO_OUT>

The mapping to the data table columns is:

ITEM --> TCPROD
FACILITY --> TCFAC
START_DATE --> TCRDTE
DUE_DATE --> TCDDTE
QUANTITY --> TCQORD
TV_TRANSACTION_ID --> TCTRID
INTERFACE_TRANSFER_DATE --> TCDATE
INTERFACE_TRANSFER_TIME --> TCTIME

The following XSLT maps the XML values into the corresponding database columns. It is based on the sample that comes with Saxon.

Notice the following special interest items:
  • The stylesheet refers the DB2 JDBC driver. If you provide the correct library at runtime, this code can be executed on the System i server as well as on an external server. We will use the JT400Native.jar library for internal processing
  • The system name, user, password and library are resolved at runtime thru external parameters. The user and password are provided as part of the connection string.
  • normalize-space is used to take care of extra white space like spaces, tabs, carriage returns
  • The INTERFACE_TRANSFER_DATE and INTERFACE_TRANSFER_TIME values are optional and possibly will not exist. Therefore TCDATE and TCTIME must be manipulated to set numeric defaults.
  • I recommend to set all database fields to allow null values in case of more missing data.
  • Do not use 1 char fields. All char fields must have length of at least 2.
  • The database table should be journalled - there is an implicit commit.


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:sql="java:/net.sf.saxon.sql.SQLElementFactory"
xmlns:saxon="http://saxon.sf.net/"
extension-element-prefixes="saxon sql">

<!-- insert your database details here, or supply them in parameters -->
<xsl:param name="driver" select="'com.ibm.as400.access.AS400JDBCDriver'"/>
<xsl:param name="LIBRARY"/>
<xsl:param name="SYSNAME"/>
<xsl:param name="SYSUSER"/>
<xsl:param name="SYSPASSWORD"/>

<xsl:param name="database"
select="concat('jdbc:as400://', $SYSNAME,
';user=' , $SYSUSER, ';password=',
$SYSPASSWORD, ';naming=system;libraries=', $LIBRARY)"/>
<xsl:param name="user"/>
<xsl:param name="password" />

<!-- This stylesheet writes the book list to a SQL database -->

<xsl:variable name="count" select="0" saxon:assignable="yes"/>

<xsl:output method="xml" indent="yes"/>

<xsl:template match="WO_OUT">
<xsl:if test="not(element-available('sql:connect'))">
<xsl:message>sql:connect is not available</xsl:message>
</xsl:if>

<xsl:message>Connecting to
<xsl:value-of select="$database"/>...</xsl:message>

<xsl:variable name="connection"
as="java:java.sql.Connection" xmlns:java="http://saxon.sf.net/java-type">
<sql:connect driver="{$driver}" database="{$database}"
user="{$user}" password="{$password}">
<xsl:fallback>
<xsl:message terminate="yes">SQL extensions are not installed</xsl:message>
</xsl:fallback>
</sql:connect>
</xsl:variable>

<xsl:message>Connected...</xsl:message>

<xsl:apply-templates select="WO_OUT">
<xsl:with-param name="connection" select="$connection"/>
</xsl:apply-templates>

<xsl:for-each select="ROW">
<sql:insert connection="$connection" table="KFXT">
<sql:column name="TCCMPN" select="normalize-space(WO)"/>
<sql:column name="TCPROD" select="normalize-space(ITEM)"/>
<sql:column name="TCFAC" select="normalize-space(FACILITY)"/>
<sql:column name="TCRDTE" select="normalize-space(START_DATE)"/>
<sql:column name="TCDDTE" select="normalize-space(DUE_DATE)"/>
<sql:column name="TCQORD" select="normalize-space(QUANTITY)"/>
<sql:column name="TCTRID" select="normalize-space(TV_TRANSACTION_ID)"/>

<sql:column name="TCDATE"
select="if (string-length(normalize-space(INTERFACE_TRANSFER_DATE)) != 0)
then normalize-space(INTERFACE_TRANSFER_DATE) else '0.0'"/>
<sql:column name="TCTIME"
select="if (string-length(normalize-space(INTERFACE_TRANSFER_TIME)) != 0)
then normalize-space(INTERFACE_TRANSFER_TIME) else '0.0'"/>

</sql:insert>
<saxon:assign name="count" select="$count+1"/>
</xsl:for-each>

<xsl:message>Inserted <xsl:value-of select="$count"/> records.</xsl:message>
<sql:close connection="$connection"/>

</xsl:template>

</xsl:stylesheet>


To execute this, you will have to run the QSH java command:
  • Set the classpath to include saxon9.jar, saxon9-sql.jar and jt400Native.jar.
  • We will use localhost and the current user credentials. If you connect from a remote server, you will need the actual user and password.
here it is embedded in a shell script:

#!/usr/bin/qsh
CP=/QIBM/UserData/Saxon/saxon9.jar
CP=$CP:/QIBM/UserData/Saxon/saxon9-sql.jar
CP=$CP:/QIBM/ProdData/OS400/jt400/lib//jt400Native.jar
java -Djava.version=1.5 -cp $CP net.sf.saxon.Transform
input.xml insert-table.xslt LIBRARY=MYLIB SYSNAME=localhost
SYSUSER=*current SYSPASSWORD=*current



and here it is in a CL program


PGM
DCL &CP *CHAR 1024
dcl &XQ *char 1024
dcl &PATH2LIBS *char 1024 '/QIBM/UserData/Saxon'
dcl &PATH2JDBC *char 1024 '/QIBM/ProdData/OS400/jt400/lib'

CD 'myDirectory'

chgvar &XQ ('net.sf.saxon.Transform')
CHGVAR VAR(&CP ) VALUE( +
&PATH2LIBS *tcat '/saxon9.jar' *tcat ':' *tcat +
&PATH2LIBS *tcat '/saxon9-sql.jar' *tcat ':' *tcat +
&PATH2JDBC *tcat '/jt400Native.jar' *tcat ':' *tcat +
'.' +
)

JAVA CLASS(&XQ) +
CLASSPATH(&CP) +
parm( +
'input.xml' +
'insert-table.xslt' +
'LIBRARY=MYLIB' +
'SYSNAME=localhost' +
'SYSUSER=*current' +
'SYSPASSWORD=*current' +
) +
PROP((java.version 1.5)) +
OUTPUT(* )

RETURN
ENDPGM