2022 GSoC Towhee Proposal

Welcome to the Towhee community! Thanks for your interest in working with us through Google Summer of Code. If you’re interested in working on Towhee this summer, please give us a star on Github and check out our project ideas below.

Project ideas

Towhee hosted API

Description: Create a server that hosts Towhee and all of its pipelines behind a RESTful API. This API should implement all of the login/logout as well as key and token management. A user using the Towhee API can also upload a piece of data and have the server automatically select an appropriate embedding pipeline. The first phase of this project involves building a single-instance version of the API. This API should have all key features of a well-developed API, including asynchronicity via async/await, user authentication, and session management (via API keys or tokens). The second phase of this project involves horizontally scaling the Towhee API via Kubernetes or another container orchestration engine.

Skills needed: Proficiency in Python and a familiarity with API development using Python (FastAPI, Flask, or Django).

Difficulty level: hard Time: 350 hours

References:

https://spapas.github.io/2021/08/25/django-token-rest-auth/
https://www.cloudsavvyit.com/3987/how-do-you-build-an-api-server
https://www.django-rest-framework.org

Mentors: Krishna Katyal (krishna.katyal@zilliz.com), Filip Haltmayer (filip.haltmayer@zilliz.com), Frank Liu (frank.liu@zilliz.com)

Extend the Towhee training subframework

Description: Deep neural network models can be accessed in the Towhee framework via an abstraction called an Operator. Models can fine-tuned by providing Towhee with an existing model-based operator as well as a labeled training dataset (such as ImageNet or CIFAR) and a set of training hyperparameters. This method of training is good for many standard datasets, but falls short in other cases such as semi-supervised training methods. This project involves extending the Towhee training framework to include GAN, autoencoder, and RNN training. Each training type should be given its own Trainer class within the Towhee framework and tested extensively on different computer vision and natural language processing datasets.

Skills needed: Proficiency in Python and a general understanding of GANs, autoencoders and RNNs.

Difficulty level: hard Time: 350 hours

References:

Jia, Y. et al. Caffe: Convolutional Architecture for Feast Feature Embedding. https://arxiv.org/abs/1408.5093
Goodfellow, I. et al. Generative Adversarial Networks. https://arxiv.org/abs/1406.2661
Bank, D. et al. Autoencoders. Autoencoders
Sherstinsky, A. et al. Fundamentals of RNN and LSTM networks. https://arxiv.org/abs/1808.03314

Mentors: Kyle He (junchen.he@zilliz.com), Junjie Jiang (junjie.jiang@zilliz.com)

Get in touch

• Find us on GitHub. • Join our Slack Channel. • Join our Office Hours every Thursday from 4-5pm PST. • Follow us on Twitter. • Visit our Towhee hub. • Join our mailing list