duxup 17 hours ago

I'm going to throw out my own ignorant theory.

AIs that I find useful are still just LLMs and LLMs power comes from having a massive amount of text to work with to string together word math and come up with something ok. That's a lot of data that comes together to get things ... kinda right... sometimes.

I don't think there's that data set for "use an app" yet.

We've seen from "AI plays games" efforts that there have been some pretty spectacular failures. It seems like "use app" is a different problem.

  • cheevly 13 hours ago

    LLMs have literally won Pokemon. Im pretty sure that using an app is 10x simpler.

    • Vilian 13 hours ago

      A lot simpler to run pokemon than test an app, the game play by itself sometimes

drakonka 16 hours ago

They are; we're working on agents for web application testing over at qa.tech.

afrederico 16 hours ago

They should totally be able to. If there's "vibe coding" there should be "vibe testing." We're working on just such a product (https://actory.ai); right now it only does websites but just imagine when we turn it on mobile/apps, etc. How cool would that be?

HeyLaughingBoy 14 hours ago

Anecdotally, I know someone who tried to have ChatGPT generate unit tests and it was an abject failure.

  • cheevly 13 hours ago

    I know someone that generated unit tests successfully.

    • whoknowsidont 13 hours ago

      And I know exactly which one of these is an enterprise B2B app/platform.

  • haiku2077 10 hours ago

    I generate tests with Claude almost every day.

  • owebmaster 9 hours ago

    I have generated unit tests successfully, how did the someone you know failed?

  • gametorch 9 hours ago

    I generated tons of valuable code with a bunch of GitHub stars, paying users, hundreds of signups, millions of impressions. Just chipping in my anecdote.

aristofun 16 hours ago

Because for meaningful tests of an app (assuming b2c or b2b for end users) you are supposed to be or imitate a human being.

Current AI is not even designed to do that. It is just a very sophisticated auto-complete.

It is sophisticated enough to fool some VCs that you can chop your round peg into square hole. But there is no ground to expect a scalable solution.

  • gametorch 9 hours ago

    Eh, I disagree. Lots of valuable open source code purely written by AI has already been shipped.

    • aristofun 2 hours ago

      Give me 1 decent example of code "purely" written by AI

v5v3 17 hours ago

Are llm testers doing anything traditional scripts with for loops can't?

  • postalrat 14 hours ago

    llm testers have for loops so they can do everything traditional scripts with for loops can plus more.