By Christian Prokopp on 2022-12-04
OpenAI's ChatGPT has made the news recently as a next-generation conversational agent. It has a surprising breadth which made me wonder, could OpenAI generate specific technology content good enough to post, and what would that imply for the future?
First, I tried the research preview of ChatGPT.
Me: What is AWS Athena?
ChatGPT: Amazon Athena is a serverless query service that allows you to analyze data in Amazon S3 using SQL. Athena is easy to use, and you can get started with it quickly. It allows you to query data stored in Amazon S3 using SQL, and can be used to analyze a wide variety of data such as logs, clickstream data, and IoT data. With Athena, there is no need to set up, manage, or scale a data warehouse, which makes it a cost-effective and convenient option for querying data stored in S3.
Nice. Immediately, ChatGPT could be useful querying documentation or even the internet (hey Google) in a better way. But how about something more complex, like writing technical content? If you read Medium or other sources, you know that there is a wide range of human-written content, from the novice and incomprehensible to simple tutorials and analysis to deep, insightful technical domain expertise.
Last week, I wrote about AWS Athena, and in the past, teams I worked with discussed the pros and cons between Athena and Redshift. They are different tools for different purposes but have some overlap. It is a particular technical discussion. Surely, a general ML text model will be unable to generate anything meaningful about the topic.
I used the OpenAI GTP-3 (text-davinci-003) website's beta and prompted it to write a blog post about the subject.
Prompt: A blog post on AWS Athena and why it is better than AWS Redshift.
OpenAI content: As businesses increasingly move their data and tracking systems to the cloud, there is an increasing need for powerful and efficient ways of querying this stored data. This is where AWS Athena comes in.
AWS Athena is a query service built on top of Amazon’s cloud infrastructure that allows users to easily query and analyze data stored in the cloud in a variety of different formats. With Athena, users can run complex queries against their data without having to manage a separate cluster or rely on managing their own hardware.
One of the primary advantages that sets Athena apart from other cloud query services is its ability to quickly query data stored in the cloud with low latency. Athena queries are able to execute in a matter of seconds, which makes it much faster than other services like AWS Redshift, which can take minutes or even hours to execute. This means that users can quickly access the data they need without having to wait for long periods of time.
Another benefit is that Athena provides users with a wide range of data sources that can be queried, including Amazon S3, Amazon RDS, and other cloud storage services. This makes it much easier to query data from multiple sources in one place.
Finally, Athena is also much cheaper to use than AWS Redshift. With Athena, users only pay for the queries they run, allowing them to save significant amounts of money. By contrast, with Redshift, users have to pay for the cluster as well as the queries, which can add up to significant costs.
In conclusion, AWS Athena provides businesses with a powerful and efficient way of querying data stored in the cloud in a variety of different formats. With its ability to execute query quickly with low latency, support for a range of data sources, and lower costs than AWS Redshift, Athena is a great choice for businesses looking to efficiently query and analyze their cloud data.
Impressive. Some details are either inaccurate or depend on circumstance, but this was mostly correct and had some valuable insight. It is as good (or bad) as some posts I have read thrown together by the odd human, but also, it is far from what an eloquent domain expert could write. It does not come across as artificial or nonsensical, importantly.
As I wrote, it is an open argument, so let's see what happens if we favour Redshift in the prompt as the better solution.
Prompt: A blog post about AWS Redshift and why it is better than AWS Athena.
OpenAI content: AWS Redshift is a cloud-based data warehouse service that is part of Amazon's larger suite of cloud-based products. It provides businesses with a scalable, secure, and cost-effective way to store and analyze their data. It offers features such as columnar storage, automatic compression, advanced query optimization, and much more.
Compared to AWS Athena, Redshift offers a number of advantages that make it a better choice for businesses looking to store and analyze their data.
First, Redshift is much faster than Athena. With Redshift, it is easy to quickly analyze large amounts of data. It offers advanced query optimization, allowing queries to be processed faster and more efficiently. Additionally, Redshift offers accelerated databases, which can increase query performance even further.
Second, Redshift is more secure than Athena. It offers improved security measures such as encryption at rest and in transit, role-based access control, and data auditing. With these features, businesses can rest assured that their data is safe and protected.
Third, Redshift offers much more scalability than Athena. Redshift's elasticity and scalability allows businesses to easily scale up or down their data analysis needs as the business grows. This makes it easier to manage and control costs associated with storing and analyzing data.
Finally, Redshift is much more cost-effective than Athena. It offers businesses flexible pricing options, making it easy to select the best plan for their particular needs.
We can see an inconsistency in the argument, e.g. both scenarios claim to be the faster and cheaper one. But you could argue with enough know-how and given the right scenarios either way. Is OpenAI that smart, or is it simply repeating common unsubstantiated claims from mediocre articles it learned? I doubt it is that clever. Still, the point is that it is good enough that you can argue about it and not immediately dismiss it as false or artificial.
Sticking with OpenAI, I created the image of the robot with the typewriter at the beginning using Dall-E. Here is the kicker, the two blog posts and image generation cost only 20 cents and minutes of machine time. How much money and time would it cost to hire a mediocre technical writer and artist to create equivalent content?
Highly qualified humans can still create much more insightful and correct content. However, things are about to change on the lower end of content creation, which is about the art of good enough, cost and scale. And where that lower end begins will move rapidly with the quality of the machine learning models improving.
As a pessimist, I imagine the advent of a new wave of low-cost spam content indistinguishable from mediocre human content regurgitating existing content learned in new variations.
As an optimist, I imagine the advent of new content generation that synthesises unique insight from multiple data sources or more accessible information like translating documentation into compelling articles and posts.
Whatever the future holds, visual and text content generation will change dramatically.
Christian Prokopp, PhD, is an experienced data and AI advisor and founder who has worked with Cloud Computing, Data and AI for decades, from hands-on engineering in startups to senior executive positions in global corporations. You can contact him at christian@bolddata.biz for inquiries.
2024-03-14
Tax Shrink is a new online tool that helps owner-operators of Limited companies in the UK calculate and visualise the ideal salary-to-dividend rati...
2023-11-29
Large-language models (LLMs) are great generalists, but modifications are required for optimisation or specialist tasks. The easiest choice is Retr...
2023-11-23
Recently, OpenAI released GPT-4 turbo preview with 128k at its DevDay. That addresses a serious limitation for Retrieval Augmented Generation (RAG...
2023-02-15
Prevent errors and inconsistencies with Delta Lake's robust data management technology.
2023-02-11
Microsoft could follow Google's $100bn loss. I tried the new Bing Chat (ChatGPT) feature, which was great until it went disastrously wrong. It even...
2023-02-03
ChatGPT can combine Data with natural language and has extensive information about most subjects. That lends itself to novel applications like crea...