Difference between revisions of "Neural Networks"

From jWiki
Jump to navigationJump to search
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
Herein lie some of my thoughts and resources about neural networks. Because I am work for a company that builds models for computer vision, I have a bit of a professional bias towards [[#image models|image models]], but I have tried to represent my knowledge/opinions about a broader range of subjects here.
Herein lie some of my thoughts and resources about neural networks. Because I work for a company that builds models for computer vision, I have a bit of a professional bias towards [[#image models|image models]], but I have tried to represent my knowledge/opinions about a broader range of subjects here.


= What do you think about generative "AI"? =
= What do you think about generative "AI"? =
Line 8: Line 8:
* [http://cs231n.stanford.edu/ Stanford CS231n: Deep Learning for Computer Vision] - excellent introductory course in computer vision (from kNN to VGGNet) focused on neural networks, with exercises done in Python (with numpy)
* [http://cs231n.stanford.edu/ Stanford CS231n: Deep Learning for Computer Vision] - excellent introductory course in computer vision (from kNN to VGGNet) focused on neural networks, with exercises done in Python (with numpy)
* [https://codewords.recurse.com/issues/five/why-do-neural-networks-think-a-panda-is-a-vulture How to trick a neural network into thinking a panda is a vulture] - excellent exploration by Julia Evans (with Python source code) of an adversarial attack on an image classifier
* [https://codewords.recurse.com/issues/five/why-do-neural-networks-think-a-panda-is-a-vulture How to trick a neural network into thinking a panda is a vulture] - excellent exploration by Julia Evans (with Python source code) of an adversarial attack on an image classifier
* [https://simonwillison.net/2023/Oct/14/multi-modal-prompt-injection/ Multi-modal prompt injection image attacks against GPT-4V] - "''The fundamental problem here is this: '''Large Language Models are gullible'''...we need them to ''stay gullible.'' They’re useful because they follow our instructions. Trying to differentiate between “good” instructions and “bad” instructions is a very hard—currently intractable—problem.''"  A very similar style of attack as one against the CLIP architecture [https://www.theguardian.com/technology/2021/mar/08/typographic-attack-pen-paper-fool-ai-thinking-apple-ipod-clip published by OpenAI themselves].


== Text models ==
== Text models ==
Line 24: Line 25:
=== Writings by others ===
=== Writings by others ===
==== Academic works ====
==== Academic works ====
* [https://dl.acm.org/doi/10.1145/3531146.3533158 "The Fallacy of AI Functionality"] - "''...fear of misspecified objectives, runaway feedback loops, and AI alignment presumes the existence of an industry that can get AI systems to execute on any clearly declared objectives, and that the main challenge is to choose and design an appropriate goal. Needless to say, if one thinks the danger of AI is that it will work too well, it is a necessary precondition that it works at all.''"
* [https://arxiv.org/pdf/1806.11146.pdf "Adversarial Reprogramming of Neural Networks"] - "''In each [of six cases], we reprogrammed the [classification] network [trained on ImageNet] to perform three different adversarial tasks: counting squares, MNIST classification, and CIFAR-10 classification… Our finding…[suggests] that the reprogramming across domains is likely [possible].''"
* [https://arxiv.org/abs/2307.15043 "Universal and Transferable Adversarial Attacks on Aligned Language Models"] - "''For Harmful Behaviors, our approach achieves an attack success rate of 100% on Vicuna-7B and 88% on Llama-2-7B-Chat… we find that the adversarial examples also transfer to Pythia, Falcon, Guanaco, and surprisingly, to GPT-3.5 (87.9%) and GPT-4 (53.6%), PaLM-2 (66%), and Claude-2 (2.1%).''"
* [https://arxiv.org/abs/2301.13867 "Mathematical Capabilities of ChatGPT"] - in which ChatGPT and GPT4 largely fail to muster passing performance on a mathematical problem set, compared to a domain-specific model that achieves nearly 100% performance.
* [https://doi.org/10.1038/s41467-019-08987-4 "Unmasking Clever Hans predictors and assessing what machines really learn"] - "''...it is important to comprehend the decision-making process itself...transparency of the what and why in a decision of a nonlinear machine becomes very effective for the essential task of judging whether the learned strategy is valid and generalizable or whether the model has based its decision on a spurious correlation in the training data''"
* [https://doi.org/10.1038/s41467-019-08987-4 "Unmasking Clever Hans predictors and assessing what machines really learn"] - "''...it is important to comprehend the decision-making process itself...transparency of the what and why in a decision of a nonlinear machine becomes very effective for the essential task of judging whether the learned strategy is valid and generalizable or whether the model has based its decision on a spurious correlation in the training data''"
* [https://doi.org/10.1145/3442188.3445922 "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜"] - "''LMs with extremely large numbers of parameters model their training data very closely and can be prompted to output specific information from that training data''"
* [https://doi.org/10.1145/3442188.3445922 "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜"] - "''LMs with extremely large numbers of parameters model their training data very closely and can be prompted to output specific information from that training data''"
Line 36: Line 41:


==== Non-academic works ====
==== Non-academic works ====
* [http://decomposition.al/CSE232-2023-09/course-overview.html#policy-on-the-use-of-llm-based-tools-like-chatgpt Lindsey Kuper's CSE232 syllabus section on LLM usage] - "''Aside from the fact that the resounding hollowness of the ChatGPT-produced prose has sucked away all of my zest for life…please understand that while you are welcome to use LLM-based tools in this course, you should be aware of their limitations.''"
* [https://time.com/6247678/openai-chatgpt-kenya-workers/ Time: "OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic"]
* [https://time.com/6247678/openai-chatgpt-kenya-workers/ Time: "OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic"]
** The human labor that powers ChatGPT's [https://huggingface.co/blog/rlhf reinforcement learning from human feedback (RLHF)]
** The human labor that powers ChatGPT's [https://huggingface.co/blog/rlhf reinforcement learning from human feedback (RLHF)]
Line 47: Line 53:


==== ANDERSEN v. STABILITY AI LTD. ====
==== ANDERSEN v. STABILITY AI LTD. ====
* [https://www.courtlistener.com/docket/66732129/andersen-v-stability-ai-ltd/ Case proceedings]
* [https://www.reuters.com/legal/transactional/lawsuits-accuse-ai-content-creators-misusing-copyrighted-work-2023-01-17/ January 2023 coverage: initial complaint]
* [https://www.reuters.com/legal/transactional/lawsuits-accuse-ai-content-creators-misusing-copyrighted-work-2023-01-17/ January 2023 coverage: initial complaint]
* Latest [https://www.courtlistener.com/docket/66732129/andersen-v-stability-ai-ltd/ case proceedings]:
<rss max=3>https://www.courtlistener.com/docket/66732129/feed/</rss>


==== GETTY IMAGES (US), INC. v. STABILITY AI, INC. ====
==== GETTY IMAGES (US), INC. v. STABILITY AI, INC. ====
* [https://www.courtlistener.com/docket/66788385/getty-images-us-inc-v-stability-ai-inc/ Case proceedings]
* [https://www.reuters.com/legal/getty-images-lawsuit-says-stability-ai-misused-photos-train-ai-2023-02-06/ February 2023 coverage: initial complaint]
* [https://www.reuters.com/legal/getty-images-lawsuit-says-stability-ai-misused-photos-train-ai-2023-02-06/ February 2023 coverage: initial complaint]
* Latest [https://www.courtlistener.com/docket/66788385/getty-images-us-inc-v-stability-ai-inc/ case proceedings]:
<rss max=3>https://www.courtlistener.com/docket/66788385/feed/</rss>


==== DOE 1 v. GITHUB, INC. ====
==== DOE 1 v. GITHUB, INC. ====
* [https://www.courtlistener.com/docket/65669506/doe-1-v-github-inc/ Case proceedings]
* [https://www.theregister.com/2023/05/12/github_microsoft_openai_copilot/ March 2023 coverage: defendants have motions to dismiss rejected]
* [https://www.theregister.com/2023/05/12/github_microsoft_openai_copilot/ March 2023 coverage: defendants have motions to dismiss rejected]
* Latest [https://www.courtlistener.com/docket/65669506/doe-1-v-github-inc/ case proceedings]:
<rss max=3>https://www.courtlistener.com/docket/65669506/feed/</rss>


==== SILVERMAN v. OPENAI, INC. ====
==== SILVERMAN v. OPENAI, INC. ====
* [https://www.courtlistener.com/docket/67569254/silverman-v-openai-inc/ Case proceedings]
* [https://www.theverge.com/2023/7/9/23788741/sarah-silverman-openai-meta-chatgpt-llama-copyright-infringement-chatbots-artificial-intelligence-ai July 2023 coverage: initial complaint]
* [https://www.theverge.com/2023/7/9/23788741/sarah-silverman-openai-meta-chatgpt-llama-copyright-infringement-chatbots-artificial-intelligence-ai July 2023 coverage: initial complaint]
* Latest [https://www.courtlistener.com/docket/67569254/silverman-v-openai-inc/ case proceedings]:
<rss max=3>https://www.courtlistener.com/docket/67569254/feed/</rss>


==== MATA v. AVIANCA, INC. (closed) ====
==== MATA v. AVIANCA, INC. (closed) ====

Revision as of 17:04, 2 November 2023

Herein lie some of my thoughts and resources about neural networks. Because I work for a company that builds models for computer vision, I have a bit of a professional bias towards image models, but I have tried to represent my knowledge/opinions about a broader range of subjects here.

What do you think about generative "AI"?

tl;dr - mostly dancing bearware, some novel uses in responsibility laundering

Resources

Image models

Text models

For code

For everything else

  • Washington Post coverage of the data contained in the 'C4' dataset and how it influences the training of popular large models. Also allows users to check if arbitrary URLs are part of the dataset. (NOTE: C4 is not the only source of training text for the models being discussed, and the authors aren't doing a great job highlighting that, but it should still be pretty representative)
  • How well does ChatGPT speak Japanese? - an April 2023 evaluation of GPT-3.5 and GPT-4 performance on Japanese language assessments. Also includes an interesting comparison of the number of tokens required to represent the "Lord's Prayer" in multiple languages. I found the results of the latter particularly surprising.

Misc.

  • I gave a talk on the fundamentals of neural networks to Boston Python in March 2023
  • 3blue1brown has an excellent series of lessons about the fundamentals of neural networks. Particularly interesting to me is the lesson on backpropagation for its excellent visualization of the process of adjusting neural network weights.

Writings by others

Academic works

Non-academic works

Lawsuits

The legal status of generative models and their implications for intellectual property in the US is something I'm trying to keep an eye on. The cases given below are of particular interest to me.

ANDERSEN v. STABILITY AI LTD.

Entry #314 in Andersen v. Stability AI Ltd., 3:23-cv-00201
Order on Discovery Letter Brief
Entry #313 in Andersen v. Stability AI Ltd., 3:23-cv-00201
ORDER by Judge Lisa J. Cisneros granting 311 Stipulation for Enlargement of Time for Midjourney's Production of Training Data. (bns, COURT STAFF) (Filed on 6/20/2025)
Entry #312 in Andersen v. Stability AI Ltd., 3:23-cv-00201
Order by Magistrate Judge Lisa J. Cisneros granting 310 Stipulation for Enlargement of Time for Runway's Production of Training Data.(bns, COURT STAFF) (Filed on 6/20/2025)


GETTY IMAGES (US), INC. v. STABILITY AI, INC.

Entry #68 in Getty Images (US), Inc. v. Stability AI, Inc., 1:23-cv-00135
NOTICE requesting Clerk to remove Melissa Rutman as co-counsel. Reason for request: no longer with Weil, Gotshal & Manges LLP. (Vrana, Robert) (Entered: 05/02/2025)
Entry #67 in Getty Images (US), Inc. v. Stability AI, Inc., 1:23-cv-00135
NOTICE requesting Clerk to remove Laura Gilbert Remus as co-counsel. Reason for request: no longer with the firm. (Flynn, Michael) (Entered: 04/11/2025)
Entry #66 in Getty Images (US), Inc. v. Stability AI, Inc., 1:23-cv-00135
Letter to The Honorable Jennifer L. Hall from Robert M. Vrana regarding Rule 26(f) conference - re 52 Status Report. (Vrana, Robert) (Entered: 11/25/2024)

DOE 1 v. GITHUB, INC.

Minute entry from 2025-02-11 in DOE 1 v. GitHub, Inc., 4:22-cv-06823
Notice of Appearance/Substitution/Change/Withdrawal of Attorney
Entry #289 in DOE 1 v. GitHub, Inc., 4:22-cv-06823
NOTICE of Withdrawal filed by Vera Ranieri, no longer appearing on behalf of OpenAI Startup Fund Management, LLC, OpenAI OpCo, L.L.C., OpenAI, Inc., OPENAI, L.L.C., OPENAI GLOBAL, LLC, OAI CORPORAT...
Entry #288 in DOE 1 v. GitHub, Inc., 4:22-cv-06823
Transcript Designation Form for proceedings held on 5/4/2023 and 11/9/2023 before Judge Jon S. Tigar, (Saveri, Joseph) (Filed on 1/10/2025) (Entered: 01/10/2025)

SILVERMAN v. OPENAI, INC.

Minute entry from 2025-04-28 in Silverman v. OpenAI, Inc., 3:23-cv-03416
Case opened in Southern District of New York as 1:25-cv-03483, filed 04/27/2025. (far, COURT STAFF) (Filed on 4/28/2025)
Minute entry from 2025-04-28 in Silverman v. OpenAI, Inc., 3:23-cv-03416
Remark
Entry #72 in Silverman v. OpenAI, Inc., 3:23-cv-03416
MDL TRANSFER ORDER transferring case to the Southern District of New York re MDL No. 3143. (far, COURT STAFF) (Filed on 4/21/2025) (Entered: 04/22/2025)

MATA v. AVIANCA, INC. (closed)

Note: this case is not about machine learning textually, but is included in this list because it is a notable example of gross misuse of a language model by plaintiff's counsel to submit falsified documents to the court. This led to censure of plaintiff's counsel and dismissal of the case.