Difference between revisions of "Neural Networks"

From jWiki
Jump to navigationJump to search
(Add reference to Time article about RLHF workers for OpenAI)
Line 33: Line 33:
* [https://arxiv.org/abs/2012.07805 "Extracting Training Data from Large Language Models"] - "''We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data...we find that larger models are more vulnerable than smaller models.''"
* [https://arxiv.org/abs/2012.07805 "Extracting Training Data from Large Language Models"] - "''We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model's training data...we find that larger models are more vulnerable than smaller models.''"
* [https://arxiv.org/abs/2305.00118 "Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4"] - "''We find that these models have memorized books, both in the public domain and in copyright, and the capacity for memorization is tied to a book’s overall popularity on the web. This differential in memorization leads to differential in performance for downstream tasks, with better performance on popular books than on those not seen on the web''"
* [https://arxiv.org/abs/2305.00118 "Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4"] - "''We find that these models have memorized books, both in the public domain and in copyright, and the capacity for memorization is tied to a book’s overall popularity on the web. This differential in memorization leads to differential in performance for downstream tasks, with better performance on popular books than on those not seen on the web''"
* [https://arxiv.org/abs/2308.02312 "Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions"] - "''Our user study results show that users prefer ChatGPT answers 34.82% of the time. However, 77.27% of these preferences are incorrect answers''"


==== Non-academic works ====
==== Non-academic works ====

Revision as of 10:34, 10 August 2023

Herein lie some of my thoughts and resources about neural networks. Because I am work for a company that builds models for computer vision, I have a bit of a professional bias towards image models, but I have tried to represent my knowledge/opinions about a broader range of subjects here.

What do you think about generative "AI"?

tl;dr - mostly dancing bearware, some novel uses in responsibility laundering

Resources

Image models

Text models

For code

For everything else

  • Washington Post coverage of the data contained in the 'C4' dataset and how it influences the training of popular large models. Also allows users to check if arbitrary URLs are part of the dataset. (NOTE: C4 is not the only source of training text for the models being discussed, and the authors aren't doing a great job highlighting that, but it should still be pretty representative)
  • How well does ChatGPT speak Japanese? - an April 2023 evaluation of GPT-3.5 and GPT-4 performance on Japanese language assessments. Also includes an interesting comparison of the number of tokens required to represent the "Lord's Prayer" in multiple languages. I found the results of the latter particularly surprising.

Misc.

  • I gave a talk on the fundamentals of neural networks to Boston Python in March 2023
  • 3blue1brown has an excellent series of lessons about the fundamentals of neural networks. Particularly interesting to me is the lesson on backpropagation for its excellent visualization of the process of adjusting neural network weights.

Writings by others

Academic works

Non-academic works

Lawsuits

The legal status of generative models and their implications for intellectual property in the US is something I'm trying to keep an eye on. The cases given below are of particular interest to me.

ANDERSEN v. STABILITY AI LTD.

GETTY IMAGES (US), INC. v. STABILITY AI, INC.

DOE 1 v. GITHUB, INC.

SILVERMAN v. OPENAI, INC.

MATA v. AVIANCA, INC. (closed)

Note: this case is not about machine learning textually, but is included in this list because it is a notable example of gross misuse of a language model by plaintiff's counsel to submit falsified documents to the court. This led to censure of plaintiff's counsel and dismissal of the case.