Wanted: A measure of semantic correlation

“The quick, brown fox jumps over the lazy dog.” and “There is a dog and a fox.  The fox, which is brown, jumps over the dog, which is lazy.  The fox is quick.” should give a value of 1.

“The quick, brown fox jumps over the lazy dog.” and “There is a dog and a fox.  The fox, which is brown, jumps over the dog, which is lazy.  The fox is fast.” should give a value of 0.999.

Yes, I know that it’s complicated; but it’s not impossible.  Google clearly does something similar when grouping stories together for news.google.com.

Then I want to have all news stories automatically compared to corporate press releases.  I want my webpage to show me the press release on one side and the news article on the other.  I want the news article to be shaded with two different colours; one colour for sections that are possibly reworded, but ultimately just taken from the press release and one colour for sections that represent actual work done by the reporter.