Commit 0c02927d by tongtao.ling

Initial commit

parent 53a872fa
.vscode/
__pycache__/
outputs/
lightning_logs/
mengzi-t5-base/
\ No newline at end of file
MIT License
Copyright (c) 2021 Shivanand Roy
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
<img align="center" src="data/st5.png" alt="simpleT5">
<p align="center">
<b>Quickly train T5/mT5/byT5/CodeT5 models in just 3 lines of code
</b>
</p>
<p align="center">
<a href="https://badge.fury.io/py/simplet5"><img src="https://badge.fury.io/py/simplet5.svg" alt="PyPI version" height="18"></a>
<a href="https://badge.fury.io/py/simplet5">
<img alt="Stars" src="https://img.shields.io/github/stars/Shivanandroy/simpleT5?color=blue">
</a>
<a href="https://pepy.tech/project/simplet5">
<img alt="Stats" src="https://static.pepy.tech/personalized-badge/simplet5?period=total&units=international_system&left_color=black&right_color=brightgreen&left_text=Downloads">
</a>
<a href="https://opensource.org/licenses/MIT">
<img alt="License" src="https://img.shields.io/badge/License-MIT-yellow.svg">
</a>
</p>
**simpleT5** is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.
> T5 models can be used for several NLP tasks such as summarization, QA , QG , translation , text generation, and more.
Here's a link to [Medium article](https://snrspeaks.medium.com/simplet5-train-t5-models-in-just-3-lines-of-code-by-shivanand-roy-2021-354df5ae46ba) along with an [example colab notebook](https://colab.research.google.com/drive/1JZ8v9L0w0Ai3WbibTeuvYlytn0uHMP6O?usp=sharing)
## Install
```python
# It's advisable to create a new python environment and install simplet5
pip install --upgrade simplet5
```
## Usage
**simpleT5** for summarization task [![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1JZ8v9L0w0Ai3WbibTeuvYlytn0uHMP6O?usp=sharing)
```python
# import
from simplet5 import SimpleT5
# instantiate
model = SimpleT5()
# load (supports t5, mt5, byT5 and CodeT5 models)
model.from_pretrained("t5","t5-base")
# train
model.train(train_df=train_df, # pandas dataframe with 2 columns: source_text & target_text
eval_df=eval_df, # pandas dataframe with 2 columns: source_text & target_text
source_max_token_len = 512,
target_max_token_len = 128,
batch_size = 8,
max_epochs = 5,
use_gpu = True,
outputdir = "outputs",
early_stopping_patience_epochs = 0,
precision = 32
)
# load trained T5 model
model.load_model("t5","path/to/trained/model/directory", use_gpu=False)
# predict
model.predict("input text for prediction")
```
## Articles
- [Geek Culture: simpleT5 — Train T5 Models in Just 3 Lines of Code](https://medium.com/geekculture/simplet5-train-t5-models-in-just-3-lines-of-code-by-shivanand-roy-2021-354df5ae46ba)
- [Abstractive Summarization with SimpleT5⚡️](https://snrspeaks.medium.com/abstractive-summarization-with-simplet5-%EF%B8%8F-344a78f73265)
- [Training T5 model in just 3 lines of Code with ONNX Inference](https://medium.com/mlearning-ai/training-t5-model-in-just-3-lines-of-code-with-onnx-inference-ff5b6678c757)
- [Kaggle: simpleT5⚡️ - Generating one line summary of papers](https://www.kaggle.com/mathurinache/simplet5-generating-one-line-summary-of-papers)
- [Youtube: Abstractive Summarization Demo with SimpleT5](https://www.youtube.com/watch?v=jgKj-7v2UYU)
## Acknowledgements
- [Transformers by HuggingFace 🤗](https://huggingface.co/transformers/)
- [Pytorch Lightning ⚡️](https://www.pytorchlightning.ai/)
- [Fastt5](https://github.com/Ki6an/fastT5)
<img align="center" src="data/st5.png" alt="simpleT5">
<p align="center">
<b>Quickly train T5/mT5/byT5/CodeT5 models in just 3 lines of code
</b>
</p>
<p align="center">
<a href="https://badge.fury.io/py/simplet5"><img src="https://badge.fury.io/py/simplet5.svg" alt="PyPI version" height="18"></a>
<a href="https://badge.fury.io/py/simplet5">
<img alt="Stars" src="https://img.shields.io/github/stars/Shivanandroy/simpleT5?color=blue">
</a>
<a href="https://pepy.tech/project/simplet5">
<img alt="Stats" src="https://static.pepy.tech/personalized-badge/simplet5?period=total&units=international_system&left_color=black&right_color=brightgreen&left_text=Downloads">
</a>
<a href="https://opensource.org/licenses/MIT">
<img alt="License" src="https://img.shields.io/badge/License-MIT-yellow.svg">
</a>
</p>
**simpleT5** is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.
> T5 models can be used for several NLP tasks such as summarization, QA , QG , translation , text generation, and more.
Here's a link to [Medium article](https://snrspeaks.medium.com/simplet5-train-t5-models-in-just-3-lines-of-code-by-shivanand-roy-2021-354df5ae46ba) along with an [example colab notebook](https://colab.research.google.com/drive/1JZ8v9L0w0Ai3WbibTeuvYlytn0uHMP6O?usp=sharing)
## Install
```python
# It's advisable to create a new python environment and install simplet5
pip install --upgrade simplet5
```
## Usage
**simpleT5** for summarization task [![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1JZ8v9L0w0Ai3WbibTeuvYlytn0uHMP6O?usp=sharing)
```python
# import
from simplet5 import SimpleT5
# instantiate
model = SimpleT5()
# load (supports t5, mt5, byT5 and CodeT5 models)
model.from_pretrained("t5","t5-base")
# train
model.train(train_df=train_df, # pandas dataframe with 2 columns: source_text & target_text
eval_df=eval_df, # pandas dataframe with 2 columns: source_text & target_text
source_max_token_len = 512,
target_max_token_len = 128,
batch_size = 8,
max_epochs = 5,
use_gpu = True,
outputdir = "outputs",
early_stopping_patience_epochs = 0,
precision = 32
)
# load trained T5 model
model.load_model("t5","path/to/trained/model/directory", use_gpu=False)
# predict
model.predict("input text for prediction")
```
## Articles
- [Geek Culture: simpleT5 — Train T5 Models in Just 3 Lines of Code](https://medium.com/geekculture/simplet5-train-t5-models-in-just-3-lines-of-code-by-shivanand-roy-2021-354df5ae46ba)
- [Abstractive Summarization with SimpleT5⚡️](https://snrspeaks.medium.com/abstractive-summarization-with-simplet5-%EF%B8%8F-344a78f73265)
- [Training T5 model in just 3 lines of Code with ONNX Inference](https://medium.com/mlearning-ai/training-t5-model-in-just-3-lines-of-code-with-onnx-inference-ff5b6678c757)
- [Kaggle: simpleT5⚡️ - Generating one line summary of papers](https://www.kaggle.com/mathurinache/simplet5-generating-one-line-summary-of-papers)
- [Youtube: Abstractive Summarization Demo with SimpleT5](https://www.youtube.com/watch?v=jgKj-7v2UYU)
## Acknowledgements
- [Transformers by HuggingFace 🤗](https://huggingface.co/transformers/)
- [Pytorch Lightning ⚡️](https://www.pytorchlightning.ai/)
- [Fastt5](https://github.com/Ki6an/fastT5)
from fastapi import FastAPI, Request
import uvicorn, json, datetime, requests, os, time
import torch
import os
from simplet5 import SimpleT5
from transformers import T5Tokenizer, T5ForConditionalGeneration
PORT = 5555
app = FastAPI()
model = SimpleT5()
model.tokenizer = T5Tokenizer.from_pretrained("./mengzi-t5-base")
model.model = T5ForConditionalGeneration.from_pretrained("./outputs/simplet5-epoch-4-train-loss-0.1129-val-loss-0.2851")
model.device = torch.device("cpu")
@app.post("/disflu")
async def disfluency_detection(request: Request):
json_post_raw = await request.json()
json_post = json.dumps(json_post_raw)
json_post_list = json.loads(json_post)
text = json_post_list.get('text')
print(text)
result = model.predict(text)
now = datetime.datetime.now()
now_time = now.strftime("%Y-%m-%d %H:%M:%S")
response = {
"result": result,
"status": 200,
"time": now_time
}
return response
if __name__ == '__main__':
uvicorn.run(app, host='0.0.0.0', port=PORT, workers=1)
\ No newline at end of file
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
import setuptools
from os import path
here = path.abspath(path.dirname(__file__))
with open(path.join(here, "README.md"), encoding="utf-8") as f:
long_description = f.read()
setuptools.setup(
name="simplet5",
version="0.1.4",
license="apache-2.0",
author="Shivanand Roy",
author_email="shivanandroy.official@gmail.com",
description="simpleT5 is built on top of PyTorch-lightning ⚡️ and Transformers 🤗 that lets you quickly train your T5 models.",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/Shivanandroy/simpleT5",
project_urls={
"Repo": "https://github.com/Shivanandroy/simpleT5",
"Bug Tracker": "https://github.com/Shivanandroy/simpleT5/issues",
},
keywords=[
"T5",
"simpleT5",
"transformers",
"NLP",
"finetune",
"fine-tuning",
"pytorch",
"summarization",
"translation",
"training",
"classification",
"Q&A",
"inference",
"fast inference",
],
packages=setuptools.find_packages(),
python_requires=">=3.5",
install_requires=[
"numpy",
"pandas",
"sentencepiece",
"torch>=1.7.0,!=1.8.0", # excludes torch v1.8.0
"transformers==4.16.2",
"pytorch-lightning==1.5.10",
],
classifiers=[
"Intended Audience :: Developers",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.5",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
],
)
from .simplet5 import SimpleT5
This diff is collapsed. Click to expand it.
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration
from simplet5 import SimpleT5
model = SimpleT5()
model_path = "./outputs/simplet5-epoch-4-train-loss-0.1129-val-loss-0.2851"
model.tokenizer = T5Tokenizer.from_pretrained(model_path)
model.model = T5ForConditionalGeneration.from_pretrained(model_path)
model.device = torch.device("cpu")
# model.load_model("t5","./outputs/simplet5-epoch-4-train-loss-0.1129-val-loss-0.2851", use_gpu=False)
test1 = "你这是要准备干嘛呀?我,我再进一个人啊你这不就一个人吗?"
test2 = "你这样不是会刺激吗?你这样。"
result = model.predict("现在呢孩子大了已经读初中了,我呢自己呢这个也思反思了。")
print(result)
result1 = model.predict(test1)
print(result1)
result2 = model.predict(test2)
print(result2)
\ No newline at end of file
curl -X POST "http://127.0.0.1:5555/disflu" \
-H 'Content-Type: application/json' \
-d '{"text": "现在呢孩子大了已经读初中了,我呢自己呢这个也思反思了。"}'
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment