Text Analytics – Slaying the Unstructured Data Dragon

The origin of the word “dragon” has been traced to a Greek word meaning “sharp-sighted one”.  The dragon is purported to have unusually acute vision, and in legends is known as a guardian of temples, paradises and hidden treasures.

Today’s dragon in the world of data is the massive amount of unstructured text that originates from an extensive array of sources: web pages, email, news, blogs, social media sites, surveys and every kind of document imaginable.  Unstructured data, like a dragon, is a big, scary, fire breathing beast – overwhelming to face, and seemingly impossible to vanquish.  Yet like a dragon, it is the guardian of an enticing treasure trove of information.

It is said that 80% of enterprise-relevant information originates in unstructured form.  Your customers, your competitors, and the public at large engage in an endless stream of conversation that contains nuggets of valuable information.   Timely and pertinent data present opportunities for your business and help you manage risk.  But how do you mine that vast ocean of unstructured data for the gems of particular value to you?

Text analytics is the growing area of data processing that gives you a weapon in your battle with unstructured data.  It involves multiple steps to make sense of the chatter and help you acquire business insight.  Here is a depiction of the process from Forrester Research:


Collecting and preparing the data are the first steps – cleansing and tokenizing the data into parts of speech.  Analytics then attempts to pull meaning from the sequences of words and phrases.  During this stage, concepts, categorization, and opinions are derived.  Analytics makes use of a repository of enriched data to query and run statistical functions on.  Finally, Reporting and delivery offer up the results for people to digest further – for insight, decision making and action planning.

The human brain is extremely adept at understanding language – it easily grasps sentence structure, word meaning and context.  It extracts concepts, draws relationships with the external world, and detects intent and emotion.  Text analytics as exercised by machines is not nearly as sophisticated as what our human brains can do, but computers are superior at processing large volumes of data quickly.  With strong algorithms, an extensive knowledge base, and some human involvement to drive and refine the search, they can be very effective at locating and analyzing the unstructured data that matters to you.

Sybase IQ 15 incorporates text analytics capabilities with its handling of large objects, specialized indexing for locating and scoring terms and phrases, and an integration layer for plugging in language processing libraries.  Sybase IQ is an analytics platform that offers you serious artillery in your battle against the unstructured data dragon.

To read more about Sybase IQ text analytics, check out:

