+-------------------+ +-------------------+ +-------------------+
| Input Source | --> | Index/Storage | --> | Search Engine |
| (DB, files, API) | | (Elasticsearch, | | (query builder |
| | | SQLite, …) | | + ranking) |
+-------------------+ +-------------------+ +-------------------+
|
v
+-------------------+
| Result Formatter|
+-------------------+
|
v
+-------------------+
| API / UI Layer |
+-------------------+
import re
# Build a single case‑insensitive alternation regex:
pattern = re.compile(r'\b(' + '|'.join(map(re.escape, KEYWORDS)) + r')\b', re.I)
def find_matches_regex(text):
"""Return a set of matched keywords."""
return set(m.group(0).lower() for m in pattern.finditer(text))
# Example:
print(find_matches_regex("adn622 and miu are verified.")) # 'adn622', 'miu', 'verified'
Pros: One pass over the text, fast in CPython/JavaScript regex engines.
Cons: Still linear per record; regex engine may have limits on very long alternations (but 9 terms is trivial).
Below is a minimal Flask‑style endpoint (Python) that returns JSON results.
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/search', methods=['GET'])
def search():
# Expected query param: ?q=some+text
q = request.args.get('q', '')
matches = find_matches_regex(q) # or find_matches(q, KEYWORDS)
return jsonify(
"query": q,
"matched_keywords": list(matches),
"has_match": bool(matches)
)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
Response example
"query": "adn622 and miu are verified",
"matched_keywords": ["adn622", "miu", "verified"],
"has_match": true
You can easily extend this to:
Sample pytest snippet for the regex approach: import re # Build a single case‑insensitive alternation
def test_regex_matches():
txt = "The user adn622 posted a verified video about miu."
assert find_matches_regex(txt) == "adn622", "verified", "miu"
Store the list in a configuration file (YAML/JSON) or a database table so you can add/remove terms without code changes.
# keywords.yaml
keywords:
- adn622
- kecanduan
- genjotan
- anaku
- sendiri
- miu
- shiramine
- indo18
- verified
Load it at start‑up:
import yaml
with open('keywords.yaml') as f:
KEYWORDS = yaml.safe_load(f)['keywords']
It implements a “Keyword‑Lookup” feature that scans a data source (database rows, log files, scraped pages, etc.) for the exact set of terms you listed:
adn622
kecanduan
genjotan
anaku
sendiri
miu
shiramine
indo18
verified
The goal is to detect any record that contains one or more of these tokens, flag it, and (optionally) return the matched context. Pros: One pass over the text, fast in