They have now focused the system on comparing a text with the current version of the Wikipedia in 10 different languages, English, German, Spanish, Catalan, Basque, French, Italian, Dutch, Portuguese, and Swedish. That would make it a useful tool for teachers looking for Wikipedia copies in student papers, as students will tend to take the current version of an article and not a historic one.
I decided to test the system using the 2013 test cases that were based on the Wikipedia, as well as a translation from a Wikipedia, an original work, and a plagiarism with no Wikipedia sources. Both .pdf and .doc files were used. The results:
- 21-Tibet (3 sources including WP:DE): WP source found, and another Wikipedia article with an identical phrase
- 33-Eyjafjallajoekull (1 source, WP:EN): WP source found
- 36-Champagne (translation from WP:FR): nothing found
- 43-Brüder-Grimm (2 sources, one WP:DE): WP source found
- 45-Strelitzia (highly disguised plagiarism from WP:EN): WP source found
- 46-Thermoskanne (25% automatically disguised plagiarism from WP:EN): WP source found
- 47-Tessellation (4 sources, none WP): no sources found
- 50-Union-Jack (original paper with correct references to WP:DE): It notes a similarity to the Union Jack lemma, but does not flag any reused text.
- 51-London-Blitz (4 sources, one from WP:EN): WP source found
- 52-Boxer-Rebellion (disguised plagiarism from WP:DE): WP source found
- 57-Fallingwater (1 source, WP:DE): 2 possible WP sources named, one is correct
- 58-Phillip-K-Dick (disguised plagiarism from WP:DE): WP source found
- 60-Rolltreppe (3 sources, 1 WP:EN properly referenced): It notes a similarity to the Escalator lemma, but does not flag any reused text.
- 63-Hebrew-Plag (1 source, WP:HE): nothing found, this language is not given as a possible one
So the system does appear to be quite useful for a small subset of plagiarism detection problems, namely identifying text that has been taken from a current Wikipedia. If it is necessary to look for text in older versions of Wikipedia articles, the tool WikiBlame can be quite useful for identifying and dating text taken from the Wikipedia many years prior.