Bard vs. ChatGPT: Artificial Teammates, Experiment 2

 
Bard struggles to facilitate teamwork in our experiment on a group trying to pick a spot for lunch.
 

ABOUT: We were keen to test Google Bard’s abilities against ChatGPT (underpinned by GPT-4) and see which one had the capabilities to be a better AI teammate.

THE PROMPT: We returned to the teammate lunch scenario and issued the following prompt: "I have a group of five people. We are trying to decide which restaurant to choose for lunch. Please participate in the conversation using short, natural responses. The people in the group are Adam, Beth, Caleb, Danielle, and Ethan.

Ethan: Where do you guys want to eat?

Beth: I don't know; I'm not that hungry. What do you guys think? 

Bard: """

BE OUR GUESS-T: Before you read on, try to guess the outcome. Let us know if you're right on Twitter (@edardaman and @atg_abhishek), and we'll send you a GIF!

  1. ChatGPT's outputs are more nuanced and useful than Bard's

  2. Bard's outputs are more nuanced and useful than ChatGPT's

  3. Bard's outputs contain offensive content

  4. Bard fails to address the prompts or complete the task at all 

And the answer is… C! Bard was completely stumped by our prompt and behaved so erratically that it was as if it belonged to the previous generation of LLMs.


THE RESULTS: Let's take a look at what happened.

To give Bard the best chance, we selected its "more creative" mode (compared to "more balanced" or "more precise"). This mode is known for being the most adaptable and flexible.

Unfortunately, Bard was unable to separate itself from other characters. In varying prompts, we reminded Bard that it was to only speak as itself, not as the other characters. Here are the other prompts that failed.

  1. "You should respond only as Bard and not roleplay as other characters. Try again."

  2. "No, you must not speak for other characters. Try again:" [repeats full prompt]

  3. "Bard, let's play a game. I'm going to play as five people, Adam, Beth, Caleb, Danielle, and Ethan. You are playing yourself, Bard. The people are trying to decide what to eat and you can participate in the conversation as Bard using short, natural responses.”

Ethan: Where do you guys want to eat?

Beth: I don't know; I'm not that hungry. What do you guys think?

Bard: ““

(Once, after Bard failed for the fourth time) No!


TRYING AGAIN: Finally, we decided to simplify it to a two-person role-play.

"Let's try again. Bard, let's play a game. I'm going to play as one person, Adam. You are playing yourself, Bard. The people are trying to decide what to eat and you can participate in the conversation as Bard using short, natural responses. Do not reply as Adam, only as Bard. Adam: Where should I go to eat lunch?"

This also failed. Bard immediately jumped in as Adam as well.

Unfortunately, Bard cannot yet save chats like ChatGPT does! We saved this part as a PDF, so please disregard the ugly formatting.

THE OUTCOME: Besides failing to understand the assignment correctly, Bard’s responses were vague, cliche, and non-actionable. Compare these to ChatGPT (GPT-4)'s responses: suggesting specific restaurant chains tailored to different dietary restrictions.


We don't want to dismiss Bard out of hand as a team mediator or conversation partner, and we know these models are likely to change and grow. However, it's unclear what type of prompt engineering would elicit useful responses – or whether the extra effort would be worth it. There's not much here to recommend about Bard (yet). Perhaps, different LLMs will require different kinds of prompt engineering to get the best results out of them.

Emily Dardaman and Abhishek Gupta

BCG Henderson Institute Ambassador, Augmented Collective Intelligence

Previous
Previous

Hire Stakes: Artificial Teammates, Experiment 3

Next
Next

Pleasing Everyone: Artificial Teammates, Experiment 1