AI Authoring Tool
Secondary Research | Primary Research | Define the problem | Ideation | Mockups | Prototyping & Testing | Next Steps
Getting Started
I was a member of an artificial intelligence/machine learning (AI/ML) research and development team that had a number of design challenges with early-stage products that had natural language processing (NLP) AI components. Utilizing my background in UX, statistics, psychometrics, and psychology, I helped solve a number of design challenges. For a more in-depth discussion and introduction to research and design of human-AI interaction and collaboration in the grading of higher education student writing see the AI Grading Assistant case study. This case study is a continuation and expansion of that project and focuses on the research and design of an interface that provided scaffolding for instructors to author custom writing assignments that an AI was capable of grading.
Indirect team members, remote
-
Product Marketing Manager
-
Product Manager
-
Learning Designer
Direct team members, in-person
-
Senior Data Scientist - ML Engineer
-
Senior Data Scientist - ML Engineer
-
Senior Data Scientist - ML Engineer
-
Senior Research Engineer - Front-end Developer
-
Senior Research Engineer - Back-end Developer
-
Senior Research Engineer - Back-end Developer
-
Principal Data Scientist - ML Modeling
-
Principal Data Scientist - ML Modeling
-
Senior Research Scientist - ML Modeling
-
VP AI Products & Solutions
Client
Pearson
Duration
6 months
Role on Project
UX research & design
Skills Demonstrated
Quant & qual research
Product management & strategy
Wireframing
Prototyping
Pilot testing
Usability testing
Secondary Research
I conducted secondary research in order to get more knowledge, context, history, and data surrounding the design challenge at hand. I started by diving into how to write the best writing prompts and grading rubrics by hand and then adjusting these best practices to accommodate for the fact that an AI would be grading the writing. Some of this information was obtained through books, others were obtained via SMEs. Below is a high-level overview and some general insights into writing prompt and grading rubric authoring that work with an AI grading assistant.
Writing Prompts
Writing prompts that will work with an AI should be clear and concise (1 to 3 paragraphs) and everything assessed in the grading rubric should be clearly asked for in the prompt. Additionally, the AI grading assistant was designed to analyze writing that is at least a few sentences long, and where there are only a small number of good responses. The response there are only a small number of good solutions. It tests conceptual coverage and writing ability, but like all tools has limitations. Examples of good instructions in a writing prompt:
-
Identify and describe…
-
Provide examples and explain…
-
Compare and contrast…
-
Give advantages and disadvantages…
-
Recommend…
-
Write an essay…
-
Create…
Higher-order critical thinking skills can also be assessed to a limited degree (note, for grading higher-order writing tasks, Audex works best when there are a small number of good answers that can be identified):
-
Analysis - asking a student to identify flaws in an experimental design
-
Synthesis - asking a student to plan a strategy for learning a lengthy list of dates and events and to retain the material for a test several weeks from now.
When writing a prompt, users should avoid:
-
Answers that rely on personal experience or subjective interpretation as too many situations may qualify or be arguable correct.
-
Solutions that extensively use mathematical or scientific reasoning-as Audex is not designed to capture and assess lengthy or challenging chains of inferential reasoning.
-
Solutions that require that special notation be used in addition to standard alphabetic characters as Audex is text-based and uses domain-specific language models.
-
Multipart questions where each answer depends and builds on a previous answer since the AI cannot grade parts of a student’s answer assuming previous parts were correct.
-
Short answer questions where the answer is a phrase or just a sentence or two since Audex needs more textual information to analyze to reliably infer the semantics of the student’s answer.
-
Highly abstract discussion that requires deep knowledge of a subject since the AI does not model this kind of deep knowledge and reasoning.
-
Questions that are overly open-ended or not narrow in topic; for instance, where there may be many varying correct responses
Good Prompt Example
You are reading your textbook and studying for an upcoming exam in psychology. Identify and describe each step in the process required for remembering information from your textbook in order to do well on the exam. Discuss a strategy for improving memory and provide an example of how it could help you on the exam.
Bad Prompt Example
Find an example of an experimental study in the news. Describe the experimental design and any flaws with it. How would you improve the study?
Grading Rubrics
Grading rubrics contain a number of different traits (aka criteria, dimensions) that a student’s writing is assessed on. Some common traits are:
-
Grammar/mechanics: Grammar, spelling, capitalization
-
Style: Vocabulary and sentence structure
-
Formatting: For example, APA formatting
-
Organization: The structure and organization of the writing
-
Development of ideas: Development of ideas with examples
-
Cohesion: Use of transition phrases
-
Coherence: Transitions between ideas
-
Critical thinking: For example, argumentation
-
Content: Coverage and correctness of what the writing prompt asks for
Certain kinds of rubrics are more difficult for an AI to score:
-
Rubrics that describe student answers in subjective, metalevel terms such as ‘insightful’ or ‘vivid’
-
Rubrics were descriptions vary unevenly across scoring levels e.g., ‘stellar’, ‘good’, ‘disappointing’, ‘no effort’ as there is a steep cliff from ‘good’ to ‘disappoint’ since Audex will need more student essays to distinguish performance levels that are crowded together and its scoring will be less reliable for those levels.
-
Rubrics were descriptions are pejorative at the lower levels e.g. describing the work as ‘poor’, or ‘inadequate’, rather than describing what is missing (e.g., ‘missing topic sentence’) - since Audex models solutions best when the content that should be present is clearly described; further these rubrics do not help students.
Tierney and Simon’s (2004) paper, "What's still wrong with rubrics: Focusing on the consistency of performance criteria across scale levels” provides a good example a problematic rubric with uneven descriptors, this rubric is shown to the right.
A solution to this rubric is to describe each level by the same sentence; for example: '____ of the required elements are present in each journal entry', where the blank is filled in with descriptor words 'All, 'Most', 'Some', or 'Few' for levels 4, 3, 2, and 1.
Problematic Rubric Trait
Level 4: Expert
Entries are creatively written. Procedures and results are clearly explained. Journal is well organized.
Level 3: Master
Entries contain most of the required elements and are clearly written
Level 2: Apprentice
Entries are incomplete. There may be some spelling or grammar errors
Level 1: Novice
Writing is messy and entries contain spelling errors. Pages are out of order or missing
A pilot was conducted with version 1 AI product. Pilots utilized qualitative and quantitative research methods. The pilot consisted of 7 instructors who volunteered to pilot an AI grading assistant tool within their live classroom (note, I discussed this product another case study). In contrast to the previous pilot, where instructors were required to select writing prompts and grading rubrics from a library, in this pilot instructors were not required to choose from a library of pre-created writing prompts and grading rubrics; rather, they worked one-on-one with me to author their own custom writing assignments that an AI would be able to grade.
Two sets of qualitative data were collected. First, I worked with instructors one-on-one to create their writing assignments. The instructors’ work with me on authoring the writing prompts and grading rubrics would be one source of data I would use to design an interface that would guide instructors through writing prompt and grading rubric authoring in-tool. Some of the instructors had multiple assignments and multiple rounds of revisions before the assignment was ready to go. The link below showcases some of the prompts and changes that needed to be made to get them to work with an AI grading assistant. Second, at the end of the semester I conducted post-pilot interviews with all of the participating instructors. Additionally, post-pilot surveys were designed and sent to the participating instructors.
Primary Research
Define the Problem
After the pilot it was clear that the instructors needed scaffolding to author custom writing prompts and grading rubrics that would work with an AI grading assistant. One-on-one assistance in the pilot wasn’t enough for instructors to clearly know what they could or couldn’t include in a writing assignment.
Instructors need a way to author custom writing prompts and grading rubrics so that they can assign a wider array of writing that an AI is capable of grading
Ideation
The original design of creating a writing assignment using manual grading had many opportunities for improvement. I used the original design as a jumping-off point for brainstorming, keeping in mind the user flow would need to improve on the original design AND instruct the user on creating writing prompts and rubrics that an AI grading assistant was capable of grading.
1
2
1. Update site navigation and information architecture: Courses and assignments all on the home page.
2. Update the "Create Assignment" call to action.
3. Make it clearer where and how to create a course.
1
2
3
1
2
3
1. Update site navigation and information architecture.
2. Update the organization of page content and call to actions.
3. Remove the bottom bar.
4. Add clear instructions on creating a writing prompt that an AI can grade.
5. Button clearly communicates what's next in the process.
4
4
5
4
1
2
3
4
5
5
1. Updated site navigation and information architecture.
2. Instructions on creating a grading rubric.
3. Pearson created trait library.
4. Instructor created trait library.
5. Call to action to create a trait and the pop-over to select a trait template.
6. Trait template form opens up.
7. Instructions/scaffolding on creating a trait.
8. Traits populate in the rubric thumbnail image.
9. Call to action to finalize the rubric after the desired number of traits have been added.
6
7
8
9
1
1. After the prompt and rubric have been created the final page needed to be redesigned for clarity. Change call to actions and text hierarchy.
2. Add form elements that need to be added to the assignment.
3. Add form elements to adjust trait weights.
4. Buttons further communicate actions the user can take.
1
2
2
3
4
Mockups
1
2
3
1. Home.
2. A course lives in a card.
3. A clear call to action to create a course.
1
1. A pop-over appears after a user selects "create course".
1
2
3
1. The course has been added in the card.
2. The user can also now add another course.
3. The user can create an assignment for their course.
1
1. After the user selects "create assignment", this pop-over appears confirming the type of writing assignment grading they would prefer: manual or AI Assisted Grading.
1
1. After the user selects "AI Assisted Grading" the pop-over expands and gives 3 options for AI Assisted Grading. This mock-up focuses on "V2" - custom prompts and rubrics.
1
2
3
2
2
4
5
1. Site navigation clearly communicates where the user is in the process and how to get back home.
2. Instructions/scaffolding on creating a custom writing prompt.
3. Tab navigation within the page.
4. Added a text formating to form boxes.
5. Button clearly communicates what happens next.
1
2
3
4
5
1. Site navigation clearly communicates where the user is in the process and how to get back home.
2. Instructions/scaffolding on creating a grading rubric, including the rubric traits.
3. As the user creates traits, the rubric gets populated at the top of the page.
4. Tab navigation within the page. Users can select Pearson-created traits to their rubric or select traits they have created in the past or create a new custom trait.
5. Traits live in cards that can be selected and added to the rubric or users can see more information on each trait.
1
3
2
4
5
1. A trait has been populated in the rubric builder.
2. The rubric builder communicates that more traits should be added
3. Button communicates the user can finalize the rubric at any time.
4. Tab navigation within the page. "Create Custom Trait" starts by the user selecting one of the trait templates.
5. The trait templates live in cards. The user can select any template to begin to create their custom rubric trait.
1
2
3
4
5
1. After selecting a trait template a pop-over form appears.
2. Helper text communicates the types of text that should be entered.
3. Instructions/scaffolding further help the user create their trait.
4. Helper text communicates the types of text that should be entered for each scale point.
5. Buttons communicate actions the user can take.
1
1. Click to view the 8 trait templates.
1
2
3
1. Added text formating to form boxes.
2. Hover states give the user additional information (e.g., percentage of points each scale point is worth).
3. After complete, the buttons communicate the action to take.
1
1. After the user selects "Save Trait" it gets populated in the rubric.
2
1
3
4
5
1. Add additional information (i.e., due date and length requirements).
2. Add the weights the traits are given.
3. Hover states give the user additional information.
4. Users can copy in an exemplary response if they choose - this will improve the AI Assistant's grading accuracy.
5. Buttons communicate actions the user can take.
2
1
3
4
1. The writing assignment is now populated on the home page.
2. Buttons communicate actions the user can take.
3. Users can add another assignment.
4. The "more menu" contains other options, like inviting TAs or inviting students to a course.
1
1. After the due date has passed the user is able to begin grading / training their AI assistant.
1
2
1. After the user selects "grade", just-in-time instructions appear.
2. Buttons communicate actions to take.
To test the prompt and rubric creation design two prototypes were created. First, a rough prototype was designed to collect data on the types of writing prompts and grading rubrics the users would create and to evaluate how well the scaffolding and instructions facilitated creating writing prompts and grading rubrics that could be used with an AI grading assistant. Since creating a working prototype or coding a rough prototype would have been fairly time-consuming we began by creating a prototype using Google Forms. Second, a prototype was designed using Sketch to be used in usability testing to gather additional qualitative and quantitative data on the design.
Google Form Prototype
A prototype was designed to collect data on the types of writing prompts and grading rubrics the users would create and to evaluate how well the scaffolding and instructions facilitated creating writing prompts and grading rubrics that could be used with an AI grading assistant.
Usability Testing Prototype & Test Plan
survey link, test plan link, prototype link
An additional prototype was designed, using the Sketch prototype feature, to be used in usability testing to gather additional qualitative information on the design.