A search engine that uses linguistic analysis to cut to the chase
By Bob Weinstein, Globe Correspondant
Kathleen Dahlgren is on a mission: It's not to save the world, but to
" InQuizitize" it.
She's talking about InQuizit, the intelligent software she and her partner
Ed Stabler designed to find exact information rather than contend with a
mountain of useless information most search engine software dishes up.
InQuizit is the principal product of Santa Monica, Calif., based InQuizit Technologies, the company Dahlgren heads.
Dahlgren is a computer scientist with a Ph.D. in linguistics, and Stabler,
a professor of computational linguistics at the Uninversity of California at
Los Angeles. If you calculate the number of hours it took to perfect the
software, Dahglren estimates she and her partner, plus a small army of
linguists and programmers, have invested 85 person years. The partners started
developing the software 14 years ago when they worked as senior scientists at
IBM. So far, they've gone through more than $10 million in funding from
investors, government, and nonprofit organizations.
Developing the software was a tedious and exhausting process because
the partners had to mimic every aspect of human linguistic reasoning. But, it
was worth the effort because the software has enough data -- including
vocabulary, grammar and world knowledge -- to interpret Time magazine.
InQuizitize is the word Dahlgren coined to describe the process her
software uses to analyze and index information so you can ask it questions in
plain English and get precise answers back.
Radical? You bet, because the program is the closest thing to artificial
intelligence, or AI, on the market. InQuizit is not the only AI software
available, but it's the only one that has introduced semantics in the
processing of information, according to Dave Waltz, president of the American
Association of Artificial Intelligence and an adjunct professor of computer
science at Brandeis University.
Says Victoria Fromkin, professor of linguistics at UCLA and former
president of the Linguistics Society of America," InQuizit is unusual because
it actually analyzes language according to linguistic principles. From a
linguistic standpoint, it is the most sophisticated program of its kind. The
other programs try engineering tricks, but they don't draw on what we know
about the structure of human language."
How does it work? Dahlgren isn't about to over-explain the software
and give away proprietary secrets -- not that we'd understand what she's
saying. But, she will shower you with easy-to-understand examples of how
InQuizit spits back precise information in record time.
Ask a typical search engine what the stock market's high and low
was for a specific day and you're asking for trouble. Be prepared for
everything relating to the word" market," which could mean virtually any
kind of market and" stock," which could be livestock or" stock" meaning
inventory or shares of stock." InQuizit interprets `high' with high value and
`stock' means certificate of ownership as opposed to a kind of cow," she
says." And, a `market' means the place where it sells those things, not a
`market' where you go to buy food or other products."
No search engine can reason this way, according to Dahlgren. " The
software out there uses 25-year-old technology that only matches words or
patterns," she says." It does not use linguistics, meaning, or context.
Because language is ambiguous and words have multiple meanings, most
pattern-matching retrieval is highly inaccurate, and usually results in vast
amounts of irrelevant data."
The problem is most software cannot figure what words mean, says
Dahlgren. Take the question," How much do monitors (computer monitors)
cost?"
" The best standard software can do is number crunching," says Dahlgren.
" It says, `Here is a word and here is another word and they belong in this
context cluster when they are near each other."
There is another family of standard software that throws back more
information than you ever wanted, asserts Dahlgren." They either gather up
words that are similar in context or they use a technique called `statistical
enhancement' which throws away all the grammatical words, which actually tell
you what words mean in context," she says." In the `monitor' example, you
wind up getting everything related to monitor, which could mean `advise' as a
noun or `observe' as a verb, not to mention synonyms which include
`counselor,' `director' or `informant,' to name a few."
Like it or not, search engine veterans must deal with information
overload or" thesaurus enhancing" as Dahlgren calls it, which yields one
percent precision.
InQuizit is being used by corporate and government search engines to
locate data, documentation and inventory. Next month, Inquizit will be
hoisting a sprawling Web site for Bible.com, an opportunity to put InQuizit
through its paces. Ask it a question such as" Who's skin was cantankerous?"
and it will return," Job had boils." Or, ask it a tricky one like," What
was Job's job?" and InQuizit will tell you Job was the servant of the lord.
Most search engines would go nuts with the last question, according to
Dahlgren, because they couldn't see the difference between Job the person and
`job,' something you do for a living. Or ask it," What did Jesus cross?" and
it will retrieve the answer," The river."" Other search engines will give
you a million references to the word `cross'," Dahlgren adds.
Waltz calls InQuizit a pleasant surprise." They [Dahlgren and staff]
have given the computer enough knowledge to understand natural language," he
says." Other systems that are trying to use statistical means to understand
language are not going to be as successful."
Waltz explains that InQuizit is a significant improvement over what is
currently being used because it is able to understand words more deeply than
current systems that view documents as word salads.
InQuizit may be a step above current search engine products, but
Dahlgren says it's far from perfect. You can't feed it complex chess moves and
expect it to return perfect plays, for example. InQuizit can answer questions
and retrieve information in context, but it cannot reason." If you were to
ask it, `How many people died in World War II?', it wouldn't retrieve all the
people who died -- the Jews, Slavs, Poles, Catholics, etc. -- and give you a
total," says Dahlgren." But, it will provide a comprehensive list of every
group that lost people."
Reasoning enters the picture when the software can figure out what words
mean in context. But, we're getting closer every day, according to Rob
McGovern, chief executive officer of Careerbuilder.com of Reston, Va., a
network of career sites on the Web." The big issue for PC users is
navigation," he says." This generation of technology assumes a vast working
knowledge of computers. In order for computers to go from 40 percent of
households to 100 percent, computers will have to be intuitive and be able to
retrieve virtually any kind of information. You'll be able to say, `Find me a
restaurant close to the movie theater' and it will know that you mean close to
your home."
Attesting to the value of Dahlgren's efforts, McGovern says the" holy
grail of computing is natural language navigation. It won't be long before
computers can deliver enough power to users to process all the information
needed to make sophisticated distinctions," he says.
Dahlgren insists InQuizit will be playing a vital role in the next
generation of AI software. Plans are in the works to take the company public
to raise enough money to invest in more computing equipment to design its own
Internet search engine to be used on the World Wide Web.
That's when things heat up. If all goes according to plan, Dahlgren is
confident InQuizit could put some of the major search engines out of business
-- that is until a competitor leaps into the market with software that can
achieve more dazzling linguistic feats.