Tuesday, June 30, 2009

Skulldiggery

Spiegel Online reports on the recent judgement brought against the anthropologist Professor Dr. Dr. Reiner Rudolf Robert Protsch von Zieten (FamousPlagiarists profile SCMD-2005-RPVZ). He has been convicted of stealing 278 chimpanzee skulls from the Johann-Wolfgang-Goethe University in Frankfurt am Main, Germany, faking numerous certificates and documents, and selling the skulls for his own gain. He was sentenced to a remanded 18 month's prison term.

The 70-year-old anthropologist, who has been a professor since 1973, does not even seem to have completed high school, according to the FAZ, much less completed a doctorate, although he regularly used two in his title. He was awarded his professorship on the basis of a Ph.D. certificate, supposedly from an American university, UCLA. His second doctorate, from 1996, was awarded from the University of Vienna. He was fined for using this title before the examination had occured.

On his birth certificate there is only "Protsch" listed, the name of his father. He cannot prove why he calls himself "von Ziethen", the name change appears to have been around 1991, according to the Wikipedia entry on him.

He had often been the target of discussions in the past. People questioned his abilities at dating bones. He pretended to have developed methodologies that were actually due to other researcher. He presented bones as being from different places than where they were actually found. He plagiarized other's work, liberally. There were many accusations of plagiarism, but none stuck. The FAZ had complained in 2005 that nothing was happening, but even the complaint didn't help much, although he was finally forced to retire.

He had even "donated" items to a museum, pocketing a donation receipt. The items, however, belonged to the university, and were doctored by him to make them more interesting for the museum.

It was when he was finally caught with his hand in the proverbial cookie jar that he was brought before a judge. He stole a famous skull collection from the university and sold it for 70,000 $ for his own pocket, according to Spiegel. He also had student researchers scratch property notices off university bones so he could call them his own, and he had university stamps removed from books in which he then affixed his own.

(And even, Spiegel reports with relish, his car was parked during the court case on a handicapped lot, with a false handicapped certificate on the windshield. As the judgment was read, his car was towed. There was nothing true about this man, they write.)

The German Wikipedia has a long and detailed list of the different accusations. The Skeptic's Dictionary has a very detailed entry on him (in English). His former university web pages are available in the Internet Archives, however the University of Frankfurt/Main has no statement on its web presence about him.

So we have justice at last - he will not be enjoying a government pension. But the case does, indeed, demonstrate how easy it is to get away with academic dishonesty in Germany.

Sunday, June 21, 2009

Plagiarism Detection Competition

The SEPLN´09 Workshop PAN, "Uncovering Plagiarism, Authorship and Social Software Misuse" ran an international "Plagiarism" Detection Competition this year and have recently published their results. I've put the word plagiarism in quotes, as my definition of plagiarism encompasses much more than just character sequence matching. Copies and near copies can perhaps be detected by a programming system, but the determination of plagiarism is something that only a teacher can determine, as there may be legitimate reasons for copies (they are part of properly quoted material) and a structural plagiarism can exist where no exact copy can be found.

They have developed a massive corpus of English-language artefacts including various sizes of documents, various amounts and types of copying and have also included automatic translation from Spanish and German into English. They give the following statistics about their corpus:
  • Corpus size: 20 611 suspicious documents, 20 612 source documents.
  • Document lengths: small (up to paper size), medium, large (up to book size).
  • Plagiarism contamination per document: 0%-100% (higher fractions with lower probabilities).
  • Plagiarized passage length: short (few sentences), medium, long (many pages).
  • Plagiarism types: monolingual (obfuscation degrees none, low, and high), and multilingual (automatic translation).
They have a development corpus that annotates the copied portions, so that researchers can train their systems. The competition corpus is, of course, without such annotations.

They calculate precision, recall, and granularity for each of the contestants on a character sequence level. Precision is the name given for how many of the detections were correct. Recall is the amount of plagiarism that was there was actually identified. Granularity demonstrates how often a particular copy is flagged - this should be close to one, that is, that any given copy is found only once.

They split the competition into external copy identification (but for a given, finite corpus, not against the open Internet) in which a matching with a given set of papers is to be found, and an intrinsic plagiarism identification, in which a stylistic analysis without use of any external documents is to identify the plagiarisms.

The results are, as I expected, wildly different between external and intrinsic. I find the recall values important - how many of the possible copies were found, although the precision is also important, so that not too many false positives are registered.

The recall for the 10 systems doing the external identification ranges wildly between 1 % and 69 % of possible copies found. This corresponds with my results from 2008 with a small corpus of hand-made plagiarisms and hand-detection, in which we found a recall of between 20% and 75% (the ones finding nothing were disqualified in our test). The median recall of the competition is 49%, the average 45%, which validates my informal assertion that flipping a coin to decide if a paper is plagiarized is about as effective as running software over a digital version of the paper (of course, flipping a coin gives no indication as to what part is indeed plagiarized). The precision ran between a median of 63% and an average of 60%.

The intrinsic identification was quite different. Although the recall was good (median 51% and average 56% with one of the four systems reaching 94%), the precision gave a median of 15 % with an average of 16%. The best system only had 23 % correct answers - that means that over 3 in 4 identified plagiarisms using stilistic analysis was, in fact, incorrectly flagged as plagiarism. This has interesting ramifications for stylistic analysis.

The overall score (I am not sure exactly what this is) has a median of 32 % and an average of 29% over all of the systems for recall, and a precision of only 39 % (average 28%) on precision.

I can identify only two of the authors as having written software that I have tested. The group from Zhytomyr State University, Ukraine, are the authors of Plagiarism Detector, this system was removed from our ranking for installing a trojan on systems using it, although their results gave them second place in my test (overall fourth place in this test). I also tested WCopyFind, but this is a system that is for detecting collusion. It's recall was overall about 32 %, but with less than 1 % precision it generates a *lot* of false positives!

I applaud the competition organizers
for this very valuable competition, and I especially applaud them for making their results and the corpus available online. I'll download the corpus when I get my new laptop, I currently only seem to have 7 GB free :)

Saturday, June 20, 2009

Kaplan University

A short film was shown this morning as an introduction to use of new media in the classroom from Kaplan University. It is a well-made film with a black professor apologizing to the class for not being media-hip - and that then being transported to all sorts of devices.

Apart from this depicting learning as a one-way street, i.e. the consumption of video anytime, anywhere for credit, I have long suspected that this was just another diploma mill, but have never had the time to research the topic. Let's see:
  • The Wikipedia notes that this is the "doing business name of the Iowa College Acquisition Corporation, a company that owns and operates independent, private, for-profit, colleges".
  • Rip-off Report no textbook to read, just exercises to hand in that always come back with a grade "A"
  • Another Diploma Mill
  • They have an online (!) nursing program "accredited" by the "Commission on Accreditation of Allied Health Education Programs". I can't find much on them, except advertising sites for online education that say that this is legit. But I don't see them on official lists (except the Wikipedia, and I don't trust it for something like this). Please drop a comment with a reference if this is in line. How can you do nursing by distance?
  • Complaintsboard: A number of stories here.
Slick films, but I think I need more convincing that this is a legitimate university.

Fabrication of Data

An open access research article on data fabrication:

Citation: Fanelli D (2009) How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data. PLoS ONE 4(5): e5738. doi:10.1371/journal.pone.0005738

Editor: Tom Tregenza, University of Exeter, United Kingdom

Friday, June 19, 2009

Michael Leddy posted this note about a bit of a stir that is on at Jacksonville State University in Alabama, USA. The president of the university, William Meehan, was granted his doctorate from the University of Alabama. There has been determined that there is an enormous amount of, shall we say, correspondence, between that dissertation and the dissertation of Carl Boening, who was granted a doctorate a few years earlier from the same institution. Meehan's thesis does admit that it is duplicating a research method with a different population. But just duplicating the method does not warrent word-for-word copying from the text.

Interestingly, there are three professors who were on both dissertation commitees. Apparently, Meehan's committee approved the duplication of the method. It does seem strange that they did not pick up on over a third of the thesis being a copy.

The case was published in USA Today in April 2009. They report that the investigation was started because Meehan had seized a large plant sample collection from another professor and his lawyer is accusing Meehan of having a history of stealing academic work.

This same president was also caught with some plagiarized material in his regular column in the local newspaper in 2007. But he wasn't to blame, that was his ghostwriter doing the plagiarizing:
The Committee has found no discernable evidence that President Meehan knew or had reason to know that articles written and released over his name and office contained plagiarized material.
At least that cost him a job at a fancy Georgia Football College, Valdosta. The University of Alabama reviewed the similar dissertations, USA Today reports, and is not taking action because one professor from the committee is at another institution and one has passed away. Case closed. Ignore. Water over the mill.

The Tuscaloosa News reports, however, that the dissertation adviser was not involved in any sort of University of Alabama "review".

It does somehow smell of something getting swept under the carpet.

Simultaneous Translation

Sorry I haven't been blogging much lately, I've been giving many talks on plagiarism. Today's talk on plagiarism was for a real crowd. About 200 had been my largest crowd to date, but today I had about 260 listeners! There was no protesting that I don't need a microphone - without it, I would not have been to be heard. Luckily, I had a suit jacket on with pockets, so I had a place to park the microphone unit.

And I had, for the first time, simultaneous translation of my talk - into sign language! I was speaking to all of the new teachers in Hamburg, who have to attend mandatory training (which probably explains their faces,
they were arranged in a I-don't-really-want-to-be-here scowl). Hamburg has a school for the deaf, and they have a new teacher who is deaf herself. Since the University of Hamburg has a program in sign language, there were
two eager signers who spelled each other every 15 minutes, so that the deaf teacher could follow what I was saying.

I am used to owning the stage - I park my laptop, whip out my remote control, and pace about as necessary, gesturing a lot as I go. But there was always a woman next to me, and I couldn't see the eighth of the crowd
that was blocked out by the signer.

I eventually retreated to the lecturn, but it wasn't really that good. One of the signers then realized the situation, and took a step backwards when she took her position center stage. Now I could see everyone.

I do think that it would have been better for the deaf teacher and the signers to position themselves off to one side of the room, instead of taking center stage. But it seemed to work, and I spoke with the signers afterwards how
they signed the word plagiarism.

They held up their right hand, fingers streched and thumb at the side: this is writing. With the left hand they "pulled" a copy off the page, do it is writing-copying. Okay, I asked was it not perhaps writing-stealing? They signed the question to the deaf teacher, she replied very insistently, that writing-stealing was a much better sign, it was similar to the copying motion, but was very clearly a "taking" motion.

So now I know the word plagiarism in one more language!

Thursday, June 11, 2009

Parents organize alternative graduation for cheaters

Bizarre. An Ohio high school discovered that a student had hacked into the school computer system and stolen the final exams. Half of the student body either cheated or knew of the cheating and said nothing. So the adminstration cancelled the graduation ceremonies. For such a widespread scandal, I would say that was a good reaction. They sent the diplomas home to the parents.

The parents, however, set up an alternative graduation ceremony. With all the trappings. Teaching their kids that getting caught cheating is just a little annoyance, a bump in the road. Nothing to get worked up about.

The comments make me sick. There seems to be a lot of supporters for the cheaters. I suppose this class of '09 will be using the paper mills to get their bachelor's degree and get their Master's from one of those sham colleges.