“The quick, brown fox jumps over the lazy dog.” and “There is a dog and a fox. The fox, which is brown, jumps over the dog, which is lazy. The fox is quick.” should give a value of 1.
“The quick, brown fox jumps over the lazy dog.” and “There is a dog and a fox. The fox, which is brown, jumps over the dog, which is lazy. The fox is fast.” should give a value of 0.999.
Yes, I know that it’s complicated; but it’s not impossible. Google clearly does something similar when grouping stories together for news.google.com.
Then I want to have all news stories automatically compared to corporate press releases. I want my webpage to show me the press release on one side and the news article on the other. I want the news article to be shaded with two different colours; one colour for sections that are possibly reworded, but ultimately just taken from the press release and one colour for sections that represent actual work done by the reporter.