Gitee/searx/locales.py

462 lines
15 KiB
Python
Raw Normal View History

# SPDX-License-Identifier: AGPL-3.0-or-later
"""
SearXNGs locale data
=====================
The variables :py:obj:`RTL_LOCALES` and :py:obj:`LOCALE_NAMES` are loaded from
:origin:`searx/data/locales.json` / see :py:obj:`locales_initialize` and
:ref:`update_locales.py`.
.. hint::
Whenever the value of :py:obj:`ADDITIONAL_TRANSLATIONS` or
:py:obj:`LOCALE_BEST_MATCH` is modified, the
:origin:`searx/data/locales.json` needs to be rebuild::
./manage data.locales
SearXNG's locale codes
======================
.. automodule:: searx.sxng_locales
:members:
SearXNGs locale implementations
================================
"""
from __future__ import annotations
from pathlib import Path
import babel
from babel.support import Translations
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
import babel.languages
import babel.core
import flask_babel
from flask.ctx import has_request_context
[refactor] typification of SearXNG (initial) / result items (part 1) Typification of SearXNG ======================= This patch introduces the typing of the results. The why and how is described in the documentation, please generate the documentation .. $ make docs.clean docs.live and read the following articles in the "Developer documentation": - result types --> http://0.0.0.0:8000/dev/result_types/index.html The result types are available from the `searx.result_types` module. The following have been implemented so far: - base result type: `searx.result_type.Result` --> http://0.0.0.0:8000/dev/result_types/base_result.html - answer results --> http://0.0.0.0:8000/dev/result_types/answer.html including the type for translations (inspired by #3925). For all other types (which still need to be set up in subsequent PRs), template documentation has been created for the transition period. Doc of the fields used in Templates =================================== The template documentation is the basis for the typing and is the first complete documentation of the results (needed for engine development). It is the "working paper" (the plan) with which further typifications can be implemented in subsequent PRs. - https://github.com/searxng/searxng/issues/357 Answer Templates ================ With the new (sub) types for `Answer`, the templates for the answers have also been revised, `Translation` are now displayed with collapsible entries (inspired by #3925). !en-de dog Plugins & Answerer ================== The implementation for `Plugin` and `Answer` has been revised, see documentation: - Plugin: http://0.0.0.0:8000/dev/plugins/index.html - Answerer: http://0.0.0.0:8000/dev/answerers/index.html With `AnswerStorage` and `AnswerStorage` to manage those items (in follow up PRs, `ArticleStorage`, `InfoStorage` and .. will be implemented) Autocomplete ============ The autocompletion had a bug where the results from `Answer` had not been shown in the past. To test activate autocompletion and try search terms for which we have answerers - statistics: type `min 1 2 3` .. in the completion list you should find an entry like `[de] min(1, 2, 3) = 1` - random: type `random uuid` .. in the completion list, the first item is a random UUID Extended Types ============== SearXNG extends e.g. the request and response types of flask and httpx, a module has been set up for type extensions: - Extended Types --> http://0.0.0.0:8000/dev/extended_types.html Unit-Tests ========== The unit tests have been completely revised. In the previous implementation, the runtime (the global variables such as `searx.settings`) was not initialized before each test, so the runtime environment with which a test ran was always determined by the tests that ran before it. This was also the reason why we sometimes had to observe non-deterministic errors in the tests in the past: - https://github.com/searxng/searxng/issues/2988 is one example for the Runtime issues, with non-deterministic behavior .. - https://github.com/searxng/searxng/pull/3650 - https://github.com/searxng/searxng/pull/3654 - https://github.com/searxng/searxng/pull/3642#issuecomment-2226884469 - https://github.com/searxng/searxng/pull/3746#issuecomment-2300965005 Why msgspec.Struct ================== We have already discussed typing based on e.g. `TypeDict` or `dataclass` in the past: - https://github.com/searxng/searxng/pull/1562/files - https://gist.github.com/dalf/972eb05e7a9bee161487132a7de244d2 - https://github.com/searxng/searxng/pull/1412/files - https://github.com/searxng/searxng/pull/1356 In my opinion, TypeDict is unsuitable because the objects are still dictionaries and not instances of classes / the `dataclass` are classes but ... The `msgspec.Struct` combine the advantages of typing, runtime behaviour and also offer the option of (fast) serializing (incl. type check) the objects. Currently not possible but conceivable with `msgspec`: Outsourcing the engines into separate processes, what possibilities this opens up in the future is left to the imagination! Internally, we have already defined that it is desirable to decouple the development of the engines from the development of the SearXNG core / The serialization of the `Result` objects is a prerequisite for this. HINT: The threads listed above were the template for this PR, even though the implementation here is based on msgspec. They should also be an inspiration for the following PRs of typification, as the models and implementations can provide a good direction. Why just one commit? ==================== I tried to create several (thematically separated) commits, but gave up at some point ... there are too many things to tackle at once / The comprehensibility of the commits would not be improved by a thematic separation. On the contrary, we would have to make multiple changes at the same places and the goal of a change would be vaguely recognizable in the fog of the commits. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-12-15 09:59:50 +01:00
from searx import (
data,
logger,
searx_dir,
)
[refactor] typification of SearXNG (initial) / result items (part 1) Typification of SearXNG ======================= This patch introduces the typing of the results. The why and how is described in the documentation, please generate the documentation .. $ make docs.clean docs.live and read the following articles in the "Developer documentation": - result types --> http://0.0.0.0:8000/dev/result_types/index.html The result types are available from the `searx.result_types` module. The following have been implemented so far: - base result type: `searx.result_type.Result` --> http://0.0.0.0:8000/dev/result_types/base_result.html - answer results --> http://0.0.0.0:8000/dev/result_types/answer.html including the type for translations (inspired by #3925). For all other types (which still need to be set up in subsequent PRs), template documentation has been created for the transition period. Doc of the fields used in Templates =================================== The template documentation is the basis for the typing and is the first complete documentation of the results (needed for engine development). It is the "working paper" (the plan) with which further typifications can be implemented in subsequent PRs. - https://github.com/searxng/searxng/issues/357 Answer Templates ================ With the new (sub) types for `Answer`, the templates for the answers have also been revised, `Translation` are now displayed with collapsible entries (inspired by #3925). !en-de dog Plugins & Answerer ================== The implementation for `Plugin` and `Answer` has been revised, see documentation: - Plugin: http://0.0.0.0:8000/dev/plugins/index.html - Answerer: http://0.0.0.0:8000/dev/answerers/index.html With `AnswerStorage` and `AnswerStorage` to manage those items (in follow up PRs, `ArticleStorage`, `InfoStorage` and .. will be implemented) Autocomplete ============ The autocompletion had a bug where the results from `Answer` had not been shown in the past. To test activate autocompletion and try search terms for which we have answerers - statistics: type `min 1 2 3` .. in the completion list you should find an entry like `[de] min(1, 2, 3) = 1` - random: type `random uuid` .. in the completion list, the first item is a random UUID Extended Types ============== SearXNG extends e.g. the request and response types of flask and httpx, a module has been set up for type extensions: - Extended Types --> http://0.0.0.0:8000/dev/extended_types.html Unit-Tests ========== The unit tests have been completely revised. In the previous implementation, the runtime (the global variables such as `searx.settings`) was not initialized before each test, so the runtime environment with which a test ran was always determined by the tests that ran before it. This was also the reason why we sometimes had to observe non-deterministic errors in the tests in the past: - https://github.com/searxng/searxng/issues/2988 is one example for the Runtime issues, with non-deterministic behavior .. - https://github.com/searxng/searxng/pull/3650 - https://github.com/searxng/searxng/pull/3654 - https://github.com/searxng/searxng/pull/3642#issuecomment-2226884469 - https://github.com/searxng/searxng/pull/3746#issuecomment-2300965005 Why msgspec.Struct ================== We have already discussed typing based on e.g. `TypeDict` or `dataclass` in the past: - https://github.com/searxng/searxng/pull/1562/files - https://gist.github.com/dalf/972eb05e7a9bee161487132a7de244d2 - https://github.com/searxng/searxng/pull/1412/files - https://github.com/searxng/searxng/pull/1356 In my opinion, TypeDict is unsuitable because the objects are still dictionaries and not instances of classes / the `dataclass` are classes but ... The `msgspec.Struct` combine the advantages of typing, runtime behaviour and also offer the option of (fast) serializing (incl. type check) the objects. Currently not possible but conceivable with `msgspec`: Outsourcing the engines into separate processes, what possibilities this opens up in the future is left to the imagination! Internally, we have already defined that it is desirable to decouple the development of the engines from the development of the SearXNG core / The serialization of the `Result` objects is a prerequisite for this. HINT: The threads listed above were the template for this PR, even though the implementation here is based on msgspec. They should also be an inspiration for the following PRs of typification, as the models and implementations can provide a good direction. Why just one commit? ==================== I tried to create several (thematically separated) commits, but gave up at some point ... there are too many things to tackle at once / The comprehensibility of the commits would not be improved by a thematic separation. On the contrary, we would have to make multiple changes at the same places and the goal of a change would be vaguely recognizable in the fog of the commits. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-12-15 09:59:50 +01:00
from searx.extended_types import sxng_request
logger = logger.getChild('locales')
# safe before monkey patching flask_babel.get_translations
_flask_babel_get_translations = flask_babel.get_translations
LOCALE_NAMES = {}
"""Mapping of locales and their description. Locales e.g. 'fr' or 'pt-BR' (see
:py:obj:`locales_initialize`).
:meta hide-value:
"""
RTL_LOCALES: set[str] = set()
"""List of *Right-To-Left* locales e.g. 'he' or 'fa-IR' (see
:py:obj:`locales_initialize`)."""
ADDITIONAL_TRANSLATIONS = {
2022-11-04 16:49:43 +00:00
"dv": "ދިވެހި (Dhivehi)",
"oc": "Occitan",
2022-05-06 09:40:45 +00:00
"szl": "Ślōnski (Silesian)",
2022-07-08 10:00:20 +02:00
"pap": "Papiamento",
}
"""Additional languages SearXNG has translations for but not supported by
python-babel (see :py:obj:`locales_initialize`)."""
LOCALE_BEST_MATCH = {
2022-11-04 16:49:43 +00:00
"dv": "si",
"oc": 'fr-FR',
"szl": "pl",
"nl-BE": "nl",
"zh-HK": "zh-Hant-TW",
2022-07-08 10:00:20 +02:00
"pap": "pt-BR",
}
"""Map a locale we do not have a translations for to a locale we have a
translation for. By example: use Taiwan version of the translation for Hong
Kong."""
def localeselector():
locale = 'en'
if has_request_context():
[refactor] typification of SearXNG (initial) / result items (part 1) Typification of SearXNG ======================= This patch introduces the typing of the results. The why and how is described in the documentation, please generate the documentation .. $ make docs.clean docs.live and read the following articles in the "Developer documentation": - result types --> http://0.0.0.0:8000/dev/result_types/index.html The result types are available from the `searx.result_types` module. The following have been implemented so far: - base result type: `searx.result_type.Result` --> http://0.0.0.0:8000/dev/result_types/base_result.html - answer results --> http://0.0.0.0:8000/dev/result_types/answer.html including the type for translations (inspired by #3925). For all other types (which still need to be set up in subsequent PRs), template documentation has been created for the transition period. Doc of the fields used in Templates =================================== The template documentation is the basis for the typing and is the first complete documentation of the results (needed for engine development). It is the "working paper" (the plan) with which further typifications can be implemented in subsequent PRs. - https://github.com/searxng/searxng/issues/357 Answer Templates ================ With the new (sub) types for `Answer`, the templates for the answers have also been revised, `Translation` are now displayed with collapsible entries (inspired by #3925). !en-de dog Plugins & Answerer ================== The implementation for `Plugin` and `Answer` has been revised, see documentation: - Plugin: http://0.0.0.0:8000/dev/plugins/index.html - Answerer: http://0.0.0.0:8000/dev/answerers/index.html With `AnswerStorage` and `AnswerStorage` to manage those items (in follow up PRs, `ArticleStorage`, `InfoStorage` and .. will be implemented) Autocomplete ============ The autocompletion had a bug where the results from `Answer` had not been shown in the past. To test activate autocompletion and try search terms for which we have answerers - statistics: type `min 1 2 3` .. in the completion list you should find an entry like `[de] min(1, 2, 3) = 1` - random: type `random uuid` .. in the completion list, the first item is a random UUID Extended Types ============== SearXNG extends e.g. the request and response types of flask and httpx, a module has been set up for type extensions: - Extended Types --> http://0.0.0.0:8000/dev/extended_types.html Unit-Tests ========== The unit tests have been completely revised. In the previous implementation, the runtime (the global variables such as `searx.settings`) was not initialized before each test, so the runtime environment with which a test ran was always determined by the tests that ran before it. This was also the reason why we sometimes had to observe non-deterministic errors in the tests in the past: - https://github.com/searxng/searxng/issues/2988 is one example for the Runtime issues, with non-deterministic behavior .. - https://github.com/searxng/searxng/pull/3650 - https://github.com/searxng/searxng/pull/3654 - https://github.com/searxng/searxng/pull/3642#issuecomment-2226884469 - https://github.com/searxng/searxng/pull/3746#issuecomment-2300965005 Why msgspec.Struct ================== We have already discussed typing based on e.g. `TypeDict` or `dataclass` in the past: - https://github.com/searxng/searxng/pull/1562/files - https://gist.github.com/dalf/972eb05e7a9bee161487132a7de244d2 - https://github.com/searxng/searxng/pull/1412/files - https://github.com/searxng/searxng/pull/1356 In my opinion, TypeDict is unsuitable because the objects are still dictionaries and not instances of classes / the `dataclass` are classes but ... The `msgspec.Struct` combine the advantages of typing, runtime behaviour and also offer the option of (fast) serializing (incl. type check) the objects. Currently not possible but conceivable with `msgspec`: Outsourcing the engines into separate processes, what possibilities this opens up in the future is left to the imagination! Internally, we have already defined that it is desirable to decouple the development of the engines from the development of the SearXNG core / The serialization of the `Result` objects is a prerequisite for this. HINT: The threads listed above were the template for this PR, even though the implementation here is based on msgspec. They should also be an inspiration for the following PRs of typification, as the models and implementations can provide a good direction. Why just one commit? ==================== I tried to create several (thematically separated) commits, but gave up at some point ... there are too many things to tackle at once / The comprehensibility of the commits would not be improved by a thematic separation. On the contrary, we would have to make multiple changes at the same places and the goal of a change would be vaguely recognizable in the fog of the commits. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-12-15 09:59:50 +01:00
value = sxng_request.preferences.get_value('locale')
if value:
locale = value
# first, set the language that is not supported by babel
if locale in ADDITIONAL_TRANSLATIONS:
[refactor] typification of SearXNG (initial) / result items (part 1) Typification of SearXNG ======================= This patch introduces the typing of the results. The why and how is described in the documentation, please generate the documentation .. $ make docs.clean docs.live and read the following articles in the "Developer documentation": - result types --> http://0.0.0.0:8000/dev/result_types/index.html The result types are available from the `searx.result_types` module. The following have been implemented so far: - base result type: `searx.result_type.Result` --> http://0.0.0.0:8000/dev/result_types/base_result.html - answer results --> http://0.0.0.0:8000/dev/result_types/answer.html including the type for translations (inspired by #3925). For all other types (which still need to be set up in subsequent PRs), template documentation has been created for the transition period. Doc of the fields used in Templates =================================== The template documentation is the basis for the typing and is the first complete documentation of the results (needed for engine development). It is the "working paper" (the plan) with which further typifications can be implemented in subsequent PRs. - https://github.com/searxng/searxng/issues/357 Answer Templates ================ With the new (sub) types for `Answer`, the templates for the answers have also been revised, `Translation` are now displayed with collapsible entries (inspired by #3925). !en-de dog Plugins & Answerer ================== The implementation for `Plugin` and `Answer` has been revised, see documentation: - Plugin: http://0.0.0.0:8000/dev/plugins/index.html - Answerer: http://0.0.0.0:8000/dev/answerers/index.html With `AnswerStorage` and `AnswerStorage` to manage those items (in follow up PRs, `ArticleStorage`, `InfoStorage` and .. will be implemented) Autocomplete ============ The autocompletion had a bug where the results from `Answer` had not been shown in the past. To test activate autocompletion and try search terms for which we have answerers - statistics: type `min 1 2 3` .. in the completion list you should find an entry like `[de] min(1, 2, 3) = 1` - random: type `random uuid` .. in the completion list, the first item is a random UUID Extended Types ============== SearXNG extends e.g. the request and response types of flask and httpx, a module has been set up for type extensions: - Extended Types --> http://0.0.0.0:8000/dev/extended_types.html Unit-Tests ========== The unit tests have been completely revised. In the previous implementation, the runtime (the global variables such as `searx.settings`) was not initialized before each test, so the runtime environment with which a test ran was always determined by the tests that ran before it. This was also the reason why we sometimes had to observe non-deterministic errors in the tests in the past: - https://github.com/searxng/searxng/issues/2988 is one example for the Runtime issues, with non-deterministic behavior .. - https://github.com/searxng/searxng/pull/3650 - https://github.com/searxng/searxng/pull/3654 - https://github.com/searxng/searxng/pull/3642#issuecomment-2226884469 - https://github.com/searxng/searxng/pull/3746#issuecomment-2300965005 Why msgspec.Struct ================== We have already discussed typing based on e.g. `TypeDict` or `dataclass` in the past: - https://github.com/searxng/searxng/pull/1562/files - https://gist.github.com/dalf/972eb05e7a9bee161487132a7de244d2 - https://github.com/searxng/searxng/pull/1412/files - https://github.com/searxng/searxng/pull/1356 In my opinion, TypeDict is unsuitable because the objects are still dictionaries and not instances of classes / the `dataclass` are classes but ... The `msgspec.Struct` combine the advantages of typing, runtime behaviour and also offer the option of (fast) serializing (incl. type check) the objects. Currently not possible but conceivable with `msgspec`: Outsourcing the engines into separate processes, what possibilities this opens up in the future is left to the imagination! Internally, we have already defined that it is desirable to decouple the development of the engines from the development of the SearXNG core / The serialization of the `Result` objects is a prerequisite for this. HINT: The threads listed above were the template for this PR, even though the implementation here is based on msgspec. They should also be an inspiration for the following PRs of typification, as the models and implementations can provide a good direction. Why just one commit? ==================== I tried to create several (thematically separated) commits, but gave up at some point ... there are too many things to tackle at once / The comprehensibility of the commits would not be improved by a thematic separation. On the contrary, we would have to make multiple changes at the same places and the goal of a change would be vaguely recognizable in the fog of the commits. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-12-15 09:59:50 +01:00
sxng_request.form['use-translation'] = locale
# second, map locale to a value python-babel supports
locale = LOCALE_BEST_MATCH.get(locale, locale)
if locale == '':
# if there is an error loading the preferences
# the locale is going to be ''
locale = 'en'
# babel uses underscore instead of hyphen.
locale = locale.replace('-', '_')
return locale
def get_translations():
"""Monkey patch of :py:obj:`flask_babel.get_translations`"""
if has_request_context():
[refactor] typification of SearXNG (initial) / result items (part 1) Typification of SearXNG ======================= This patch introduces the typing of the results. The why and how is described in the documentation, please generate the documentation .. $ make docs.clean docs.live and read the following articles in the "Developer documentation": - result types --> http://0.0.0.0:8000/dev/result_types/index.html The result types are available from the `searx.result_types` module. The following have been implemented so far: - base result type: `searx.result_type.Result` --> http://0.0.0.0:8000/dev/result_types/base_result.html - answer results --> http://0.0.0.0:8000/dev/result_types/answer.html including the type for translations (inspired by #3925). For all other types (which still need to be set up in subsequent PRs), template documentation has been created for the transition period. Doc of the fields used in Templates =================================== The template documentation is the basis for the typing and is the first complete documentation of the results (needed for engine development). It is the "working paper" (the plan) with which further typifications can be implemented in subsequent PRs. - https://github.com/searxng/searxng/issues/357 Answer Templates ================ With the new (sub) types for `Answer`, the templates for the answers have also been revised, `Translation` are now displayed with collapsible entries (inspired by #3925). !en-de dog Plugins & Answerer ================== The implementation for `Plugin` and `Answer` has been revised, see documentation: - Plugin: http://0.0.0.0:8000/dev/plugins/index.html - Answerer: http://0.0.0.0:8000/dev/answerers/index.html With `AnswerStorage` and `AnswerStorage` to manage those items (in follow up PRs, `ArticleStorage`, `InfoStorage` and .. will be implemented) Autocomplete ============ The autocompletion had a bug where the results from `Answer` had not been shown in the past. To test activate autocompletion and try search terms for which we have answerers - statistics: type `min 1 2 3` .. in the completion list you should find an entry like `[de] min(1, 2, 3) = 1` - random: type `random uuid` .. in the completion list, the first item is a random UUID Extended Types ============== SearXNG extends e.g. the request and response types of flask and httpx, a module has been set up for type extensions: - Extended Types --> http://0.0.0.0:8000/dev/extended_types.html Unit-Tests ========== The unit tests have been completely revised. In the previous implementation, the runtime (the global variables such as `searx.settings`) was not initialized before each test, so the runtime environment with which a test ran was always determined by the tests that ran before it. This was also the reason why we sometimes had to observe non-deterministic errors in the tests in the past: - https://github.com/searxng/searxng/issues/2988 is one example for the Runtime issues, with non-deterministic behavior .. - https://github.com/searxng/searxng/pull/3650 - https://github.com/searxng/searxng/pull/3654 - https://github.com/searxng/searxng/pull/3642#issuecomment-2226884469 - https://github.com/searxng/searxng/pull/3746#issuecomment-2300965005 Why msgspec.Struct ================== We have already discussed typing based on e.g. `TypeDict` or `dataclass` in the past: - https://github.com/searxng/searxng/pull/1562/files - https://gist.github.com/dalf/972eb05e7a9bee161487132a7de244d2 - https://github.com/searxng/searxng/pull/1412/files - https://github.com/searxng/searxng/pull/1356 In my opinion, TypeDict is unsuitable because the objects are still dictionaries and not instances of classes / the `dataclass` are classes but ... The `msgspec.Struct` combine the advantages of typing, runtime behaviour and also offer the option of (fast) serializing (incl. type check) the objects. Currently not possible but conceivable with `msgspec`: Outsourcing the engines into separate processes, what possibilities this opens up in the future is left to the imagination! Internally, we have already defined that it is desirable to decouple the development of the engines from the development of the SearXNG core / The serialization of the `Result` objects is a prerequisite for this. HINT: The threads listed above were the template for this PR, even though the implementation here is based on msgspec. They should also be an inspiration for the following PRs of typification, as the models and implementations can provide a good direction. Why just one commit? ==================== I tried to create several (thematically separated) commits, but gave up at some point ... there are too many things to tackle at once / The comprehensibility of the commits would not be improved by a thematic separation. On the contrary, we would have to make multiple changes at the same places and the goal of a change would be vaguely recognizable in the fog of the commits. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-12-15 09:59:50 +01:00
use_translation = sxng_request.form.get('use-translation')
if use_translation in ADDITIONAL_TRANSLATIONS:
babel_ext = flask_babel.current_app.extensions['babel']
return Translations.load(babel_ext.translation_directories[0], use_translation)
return _flask_babel_get_translations()
_TR_LOCALES: list[str] = []
def get_translation_locales() -> list[str]:
"""Returns the list of translation locales (*underscore*). The list is
generated from the translation folders in :origin:`searx/translations`"""
global _TR_LOCALES # pylint:disable=global-statement
if _TR_LOCALES:
return _TR_LOCALES
tr_locales = []
for folder in (Path(searx_dir) / 'translations').iterdir():
if not folder.is_dir():
continue
if not (folder / 'LC_MESSAGES').is_dir():
continue
tr_locales.append(folder.name)
_TR_LOCALES = sorted(tr_locales)
return _TR_LOCALES
def locales_initialize():
"""Initialize locales environment of the SearXNG session.
- monkey patch :py:obj:`flask_babel.get_translations` by :py:obj:`get_translations`
- init global names :py:obj:`LOCALE_NAMES`, :py:obj:`RTL_LOCALES`
"""
flask_babel.get_translations = get_translations
LOCALE_NAMES.update(data.LOCALES["LOCALE_NAMES"])
RTL_LOCALES.update(data.LOCALES["RTL_LOCALES"])
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
def region_tag(locale: babel.Locale) -> str:
"""Returns SearXNG's region tag from the locale (e.g. zh-TW , en-US)."""
if not locale.territory:
raise ValueError('babel.Locale %s: missed a territory' % locale)
return locale.language + '-' + locale.territory
def language_tag(locale: babel.Locale) -> str:
"""Returns SearXNG's language tag from the locale and if exits, the tag
includes the script name (e.g. en, zh_Hant).
"""
sxng_lang = locale.language
if locale.script:
sxng_lang += '_' + locale.script
return sxng_lang
def get_locale(locale_tag: str) -> babel.Locale | None:
"""Returns a :py:obj:`babel.Locale` object parsed from argument
``locale_tag``"""
try:
locale = babel.Locale.parse(locale_tag, sep='-')
return locale
except babel.core.UnknownLocaleError:
return None
2023-09-15 00:53:03 -07:00
def get_official_locales(
territory: str, languages=None, regional: bool = False, de_facto: bool = True
) -> set[babel.Locale]:
"""Returns a list of :py:obj:`babel.Locale` with languages from
:py:obj:`babel.languages.get_official_languages`.
:param territory: The territory (country or region) code.
:param languages: A list of language codes the languages from
:py:obj:`babel.languages.get_official_languages` should be in
(intersection). If this argument is ``None``, all official languages in
this territory are used.
:param regional: If the regional flag is set, then languages which are
regionally official are also returned.
:param de_facto: If the de_facto flag is set to `False`, then languages
which are de facto official are not returned.
"""
ret_val = set()
o_languages = babel.languages.get_official_languages(territory, regional=regional, de_facto=de_facto)
if languages:
languages = [l.lower() for l in languages]
o_languages = set(l for l in o_languages if l.lower() in languages)
for lang in o_languages:
try:
locale = babel.Locale.parse(lang + '_' + territory)
ret_val.add(locale)
except babel.UnknownLocaleError:
continue
return ret_val
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
def get_engine_locale(searxng_locale, engine_locales, default=None):
"""Return engine's language (aka locale) string that best fits to argument
``searxng_locale``.
Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to
corresponding *engine locales*::
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
<engine>: {
# SearXNG string : engine-string
'ca-ES' : 'ca_ES',
'fr-BE' : 'fr_BE',
'fr-CA' : 'fr_CA',
'fr-CH' : 'fr_CH',
'fr' : 'fr_FR',
...
'pl-PL' : 'pl_PL',
'pt-PT' : 'pt_PT'
..
'zh' : 'zh'
'zh_Hans' : 'zh'
'zh_Hant' : 'zh_TW'
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
}
.. hint::
The *SearXNG locale* string has to be known by babel!
If there is no direct 1:1 mapping, this functions tries to narrow down
engine's language (locale). If no value can be determined by these
approximation attempts the ``default`` value is returned.
Assumptions:
A. When user select a language the results should be optimized according to
the selected language.
B. When user select a language and a territory the results should be
2023-09-15 00:53:03 -07:00
optimized with first priority on territory and second on language.
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
First approximation rule (*by territory*):
2023-09-15 00:53:03 -07:00
When the user selects a locale with territory (and a language), the
territory has priority over the language. If any of the official languages
in the territory is supported by the engine (``engine_locales``) it will
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
be used.
Second approximation rule (*by language*):
If "First approximation rule" brings no result or the user selects only a
2023-09-15 00:53:03 -07:00
language without a territory. Check in which territories the language
has an official status and if one of these territories is supported by the
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
engine.
"""
# pylint: disable=too-many-branches, too-many-return-statements
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
engine_locale = engine_locales.get(searxng_locale)
if engine_locale is not None:
# There was a 1:1 mapping (e.g. a region "fr-BE --> fr_BE" or a language
# "zh --> zh"), no need to narrow language-script nor territory.
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
return engine_locale
try:
locale = babel.Locale.parse(searxng_locale, sep='-')
except babel.core.UnknownLocaleError:
try:
locale = babel.Locale.parse(searxng_locale.split('-')[0])
except babel.core.UnknownLocaleError:
return default
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
searxng_lang = language_tag(locale)
engine_locale = engine_locales.get(searxng_lang)
if engine_locale is not None:
# There was a 1:1 mapping (e.g. "zh-HK --> zh_Hant" or "zh-CN --> zh_Hans")
return engine_locale
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
# SearXNG's selected locale is not supported by the engine ..
if locale.territory:
2023-09-15 00:53:03 -07:00
# Try to narrow by *official* languages in the territory (??-XX).
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
for official_language in babel.languages.get_official_languages(locale.territory, de_facto=True):
searxng_locale = official_language + '-' + locale.territory
engine_locale = engine_locales.get(searxng_locale)
if engine_locale is not None:
return engine_locale
2023-09-15 00:53:03 -07:00
# Engine does not support one of the official languages in the territory or
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
# there is only a language selected without a territory.
# Now lets have a look if the searxng_lang (the language selected by the
2023-09-15 00:53:03 -07:00
# user) is a official language in other territories. If so, check if
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
# engine does support the searxng_lang in this other territory.
if locale.language:
terr_lang_dict = {}
for territory, langs in babel.core.get_global("territory_languages").items():
if not langs.get(searxng_lang, {}).get('official_status'):
continue
terr_lang_dict[territory] = langs.get(searxng_lang)
# first: check fr-FR, de-DE .. is supported by the engine
# exception: 'en' --> 'en-US'
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
territory = locale.language.upper()
if territory == 'EN':
territory = 'US'
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
if terr_lang_dict.get(territory):
searxng_locale = locale.language + '-' + territory
engine_locale = engine_locales.get(searxng_locale)
if engine_locale is not None:
return engine_locale
# second: sort by population_percent and take first match
2023-09-15 00:53:03 -07:00
# drawback of "population percent": if there is a territory with a
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
# small number of people (e.g 100) but the majority speaks the
2023-09-15 00:53:03 -07:00
# language, then the percentage might be 100% (--> 100 people) but in
# a different territory with more people (e.g. 10.000) where only 10%
[mod] add locale.get_engine_locale to get predictable results The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-08-12 17:46:20 +02:00
# speak the language the total amount of speaker is higher (--> 200
# people).
#
# By example: The population of Saint-Martin is 33.000, of which 100%
# speak French, but this is less than the 30% of the approximately 2.5
# million Belgian citizens
#
# - 'fr-MF', 'population_percent': 100.0, 'official_status': 'official'
# - 'fr-BE', 'population_percent': 38.0, 'official_status': 'official'
terr_lang_list = []
for k, v in terr_lang_dict.items():
terr_lang_list.append((k, v))
for territory, _lang in sorted(terr_lang_list, key=lambda item: item[1]['population_percent'], reverse=True):
searxng_locale = locale.language + '-' + territory
engine_locale = engine_locales.get(searxng_locale)
if engine_locale is not None:
return engine_locale
# No luck: narrow by "language from territory" and "territory from language"
# does not fit to a locale supported by the engine.
if engine_locale is None:
engine_locale = default
return default
def match_locale(searxng_locale: str, locale_tag_list: list[str], fallback: str | None = None) -> str | None:
"""Return tag from ``locale_tag_list`` that best fits to ``searxng_locale``.
:param str searxng_locale: SearXNG's internal representation of locale (de,
de-DE, fr-BE, zh, zh-CN, zh-TW ..).
:param list locale_tag_list: The list of locale tags to select from
:param str fallback: fallback locale tag (if unset --> ``None``)
The rules to find a match are implemented in :py:obj:`get_engine_locale`,
the ``engine_locales`` is build up by :py:obj:`build_engine_locales`.
.. hint::
The *SearXNG locale* string and the members of ``locale_tag_list`` has to
be known by babel! The :py:obj:`ADDITIONAL_TRANSLATIONS` are used in the
UI and are not known by babel --> will be ignored.
"""
# searxng_locale = 'es'
# locale_tag_list = ['es-AR', 'es-ES', 'es-MX']
if not searxng_locale:
return fallback
locale = get_locale(searxng_locale)
if locale is None:
return fallback
# normalize to a SearXNG locale that can be passed to get_engine_locale
searxng_locale = language_tag(locale)
if locale.territory:
searxng_locale = region_tag(locale)
# clean up locale_tag_list
tag_list = []
for tag in locale_tag_list:
if tag in ('all', 'auto') or tag in ADDITIONAL_TRANSLATIONS:
continue
tag_list.append(tag)
# emulate fetch_traits
engine_locales = build_engine_locales(tag_list)
return get_engine_locale(searxng_locale, engine_locales, default=fallback)
def build_engine_locales(tag_list: list[str]):
"""From a list of locale tags a dictionary is build that can be passed by
argument ``engine_locales`` to :py:obj:`get_engine_locale`. This function
is mainly used by :py:obj:`match_locale` and is similar to what the
``fetch_traits(..)`` function of engines do.
If there are territory codes in the ``tag_list`` that have a *script code*
additional keys are added to the returned dictionary.
.. code:: python
>>> import locales
>>> engine_locales = locales.build_engine_locales(['en', 'en-US', 'zh', 'zh-CN', 'zh-TW'])
>>> engine_locales
{
'en': 'en', 'en-US': 'en-US',
'zh': 'zh', 'zh-CN': 'zh-CN', 'zh_Hans': 'zh-CN',
'zh-TW': 'zh-TW', 'zh_Hant': 'zh-TW'
}
>>> get_engine_locale('zh-Hans', engine_locales)
'zh-CN'
This function is a good example to understand the language/region model
of SearXNG:
SearXNG only distinguishes between **search languages** and **search
regions**, by adding the *script-tags*, languages with *script-tags* can
be assigned to the **regions** that SearXNG supports.
"""
engine_locales = {}
for tag in tag_list:
locale = get_locale(tag)
if locale is None:
logger.warning("build_engine_locales: skip locale tag %s / unknown by babel", tag)
continue
if locale.territory:
engine_locales[region_tag(locale)] = tag
if locale.script:
engine_locales[language_tag(locale)] = tag
else:
engine_locales[language_tag(locale)] = tag
return engine_locales