This is an old revision of the document!
Semana 49 de 2024
Capacitação
- AWS
- Início do curso “AWS Cloud Technical Essentials”.
- Criação de conta no https://repost.aws. O re:Post é um fórum para perguntas sobre o AWS.
Pesquisa
- Correção automática de programas
- Um problema que tive recentemente foi quanto a redefinição de módulos e funções da biblioteca básica do Python.
- Normalmente, poderíamos usar dublês (mocks) para sobrepor o comportamento de alguma função ou método de uma classe. No entanto, isso não é tecnicamente possível (ao menos com as ferramentas atuais) para os módulos e funções da biblioteca básica/padrão do Python (builtin).
- Mais especificamente, eu precisava sobreescrever o comportamento da função
time
do módulotime
. - Existe mais um adendo de que, por ser para uma disciplina introdutório, eu não queria criar um objeto ou forçar uma abstração além daquilo que os estudantes, que estão aprendendo o básico da programação, precisam.
- Em tese, seria suficiente criar um arquivo
'time.py
' com a função em questão. Isso seria verdade se, no ambiente de avaliação automática que utilizo, o CodeRunner do Moodle, isso fosse permitido. - Assim, resolvi ser criativo e verificar como ocorre o carregamento dos módulos e se existia algo que eu pudesse trabalhar naquele nível. Eis que os problemas surgem
- A implementação padrão do Python, CPython, está disponível de forma aberta em https://github.com/python/cpython. Outro aspecto positivo é que, embora a implementação padrão do
import
seja em C (Python/import.c, em https://github.com/python/cpython), o carregamento de módulos também pode ser realizado por código escrito em Python. O móduloimportlib
possui tudo que é necessário: https://github.com/python/cpython/blob/main/Lib/importlib/. Não é um módulo bonito e fácil de ler, mas servirá como inspiração para uma solução. - Um desenvolvedor criou um módulo ''module-found'' que, usando o
importlib
, faz algo bem legal: ele utiliza uma aplicação de modelo de linguagem (LLM), mais precisamente um modelo da OpenAI, para gerar automaticamente o código em Python, considerando uma descrição informada como parâmetro, referente a uma função de um módulo que não existem em Python. Ou seja, ao invés de lançar uma exceção de que o módulo ou função não foi encontrado, ele cria automaticamente o módulo e a função necessários. No caso, eu não preciso gerar automaticamente código, dado que eu sei exatamente o que eu preciso. No entanto, o mecanismo é o mesmo (com a diferença que eu quero sobreescrever um módulo ou função existente).- Só para complementar, outro módulo Python que faz algo nessa linha, no caso instalando automaticamente (em tempo de execução) os módulos não encontrados é o ''pipimport''.
- A importação de pacotes do Python até que é simples de entender, sob certo aspecto. Existe uma classe para encontrar módulos e outra classe para carregar módulos.
- Para encontrar os módulos, são utilizadas as classes BuiltinImporter, FrozenImporter e PathFinder. As duas primeiras são para carregar os módulos padrões do Python e a terceira é para carregar os módulos disponíveis no sistema de arquivo (este último utiliza a variável
sys.path
, amplamente empregada para configuração de ambientes virtuais). - As classes a serem utilizadas estão definidas na variável
sys.meta_path
. Basicamente, ao procurar por um nome (módulo, função ou variável), o Python percorre os objetos disponíveis nessa variável, utilizando-os para encontrar o que deseja.
- Entendido isso, bastaria criar uma nova classe Finder e colocá-la no início do caminho.
- Bom, nem tudo são flores. Por algum motivo muito estranho, alguns módulos builtins do Python não podem ser sobreescritos por esse mecanismo. Mais especificamente o módulo
time
não pode. No código de importação do móduloimportlib
é feito um grande esforço para não permitir a sobreescrita de módulos builtins. Na função_find_and_load()
do_bootstrap.py
, antes de começar a busca pelo módulo é verificado se ele está presente emsys.modules' (um dicionário de nome de módulo para o módulo em si). Por exemplo, hoje, em minha máquina, tenho os seguintes módulos nessa variável (e olha o
timelá no fim!): _abc, abc, abrt_exception_handler3, _ast, ast, atexit, builtins, _codecs, codecs, _collections, collections, _collections_abc, collections.abc, contextlib, copyreg, _datetime, datetime, dis, _distutils_hack, encodings, encodings.aliases, encodings.utf_8, encodings.utf_8_sig, enum, errno, _frozen_importlib, _frozen_importlib_external, _functools, functools, future, genericpath, google, _imp, importlib, importlib._abc, importlib._bootstrap, importlib._bootstrap_external, importlib.machinery, importlib.util, inspect, _io, io, itertools, keyword, linecache, logging, main, marshal, _opcode, opcode, _operator, operator, os, os.path, paste, platform, posix, posixpath, re, readline, re._casefix, re._compiler, re._constants, re._parser, reprlib, rlcompleter, _signal, site, _sitebuiltins, _sre, _stat, stat, _string, string, sys, syslog, systemd, systemd.id128, systemd._journal, systemd.journal, systemd._reader, textwrap, _thread, threading, time, token, _tokenize, tokenize, traceback, types, _uuid, uuid, _warnings, warnings, _weakref, weakref, _weakrefset, zipimport. * Minha primeira abordagem foi alterar o código da importação para que não ocorresse mais esse impedimento de sobreescrever elementos builtins. Isso funcionou! Colocarei o código a seguir, que sobreescreve o
time.time(). O código compreende a criação de duas classes,
MissingNameFindere
MissingLoader, responsáveis por procurar o nome e, ao não encontrá-lo, usar o
MissingLoaderpara criar, em tempo de execução, o módulo (classe
LazyModule), constante ou função (classe
LazyFunction). O que sobrou é código aproveitado do
importlib: <code language=“python”> from typing import Any, Callable, Dict, List, Optional import importlib import sys import importlib.abc import _imp def _verbose_message(message, *args, verbosity=1): “”“Print the message to stderr if -v/PYTHONVERBOSE is turned on.”“” if sys.flags.verbose >= verbosity: if not message.startswith1): message = '# ' + message print(message.format(*args), file=sys.stderr) def _find_spec(name, path, target=None): “”“Find a module's spec.”“” meta_path = sys.meta_path if meta_path is None: # PyImport_Cleanup() is running or has been called. raise ImportError(“sys.meta_path is None, Python is likely shutting down”) if not meta_path: _warnings.warn('sys.meta_path is empty', ImportWarning) # We check sys.modules here for the reload case. While a passed-in # target will usually indicate a reload there is no guarantee, whereas # sys.modules provides one. is_reload = name in sys.modules for finder in meta_path: try: find_spec = finder.find_spec except AttributeError: continue else: spec = find_spec(name, path, target) if spec is not None: # The parent import may have already imported this module. if not is_reload and name in sys.modules: module = sys.modules[name] try: spec = module.spec except AttributeError: # We use the found spec since that is the one that # we would have used if the parent module hadn't # beaten us to the punch. return spec else: if spec is None: return spec else: return spec else: return spec else: return None def _sanity_check(name, package, level): “”“Verify arguments are “sane”.”“” if not isinstance(name, str): raise TypeError(f'module name must be str, not {type(name)}') if level < 0: raise ValueError('level must be >= 0') if level > 0: if not isinstance(package, str): raise TypeError('package not set to a string') elif not package: raise ImportError('attempted relative import with no known parent package') if not name and level == 0: raise ValueError('Empty module name') def _resolve_name(name, package, level): “”“Resolve a relative module name to an absolute one.”“” bits = package.rsplit('.', level - 1) if len(bits) < level: raise ImportError('attempted relative import beyond top-level package') base = bits[0] return f'{base}.{name}' if name else base def _init_module_attrs(spec, module, *, override=False): # The passed-in module may be not support attribute assignment, # in which case we simply don't set the attributes. # name if (override or getattr(module, 'name', None) is None): try: module.name = spec.name except AttributeError: pass # loader if override or getattr(module, 'loader', None) is None: loader = spec.loader if loader is None: # A backward compatibility hack. if spec.submodule_search_locations is not None: if _bootstrap_external is None: raise NotImplementedError NamespaceLoader = _bootstrap_external.NamespaceLoader loader = NamespaceLoader.new(NamespaceLoader) loader._path = spec.submodule_search_locations spec.loader = loader # While the docs say that module.file is not set for # built-in modules, and the code below will avoid setting it if # spec.has_location is false, this is incorrect for namespace # packages. Namespace packages have no location, but their # spec.origin is None, and thus their module.file # should also be None for consistency. While a bit of a hack, # this is the best place to ensure this consistency. # # See # https://docs.python.org/3/library/importlib.html#importlib.abc.Loader.load_module # and bpo-32305 module.file = None try: module.loader = loader except AttributeError: pass # package if override or getattr(module, 'package', None) is None: try: module.package = spec.parent except AttributeError: pass # spec try: module.spec = spec except AttributeError: pass # path if override or getattr(module, 'path', None) is None: if spec.submodule_search_locations is not None: # XXX We should extend path if it's already a list. try: module.path = spec.submodule_search_locations except AttributeError: pass # file/cached if spec.has_location: if override or getattr(module, 'file', None) is None: try: module.file = spec.origin except AttributeError: pass if override or getattr(module, 'cached', None) is None: if spec.cached is not None: try: module.cached = spec.cached except AttributeError: pass return module def module_from_spec(spec): “”“Create a module based on the provided spec.”“” # Typically loaders will not implement create_module(). module = None if hasattr(spec.loader, 'create_module'): # If create_module() returns `None` then it means default # module creation should be used. module = spec.loader.create_module(spec) elif hasattr(spec.loader, 'exec_module'): raise ImportError('loaders that define exec_module() ' 'must also define create_module()') if module is None: module = _new_module(spec.name) _init_module_attrs(spec, module) return module def _load_unlocked(spec): # A helper for direct use by the import system. if spec.loader is not None: # Not a namespace package. if not hasattr(spec.loader, 'exec_module'): msg = (f“{_object_name(spec.loader)}.exec_module() not found; ” “falling back to load_module()”) _warnings.warn(msg, ImportWarning) return _load_backward_compatible(spec) module = module_from_spec(spec) # This must be done before putting the module in sys.modules # (otherwise an optimization shortcut in import.c becomes # wrong). spec._initializing = True try: sys.modules[spec.name] = module try: if spec.loader is None: if spec.submodule_search_locations is None: raise ImportError('missing loader', name=spec.name) # A namespace package so do nothing. else: spec.loader.exec_module(module) except: try: del sys.modules[spec.name] except KeyError: pass raise # Move the module to the end of sys.modules. # We don't ensure that the import-related module attributes get # set in the sys.modules replacement case. Such modules are on # their own. module = sys.modules.pop(spec.name) sys.modules[spec.name] = module _verbose_message('import {!r} # {!r}', spec.name, spec.loader) finally: spec._initializing = False return module def _find_and_load_unlocked(name, import_): path = None parent = name.rpartition('.')[0] parent_spec = None if parent: if parent not in sys.modules: _call_with_frames_removed(import_, parent) # Crazy side-effects! if name in sys.modules: return sys.modules[name] parent_module = sys.modules[parent] try: path = parent_module.path except AttributeError: msg = f'{_ERR_MSG_PREFIX}{name!r}; {parent!r} is not a package' raise ModuleNotFoundError(msg, name=name) from None parent_spec = parent_module.spec child = name.rpartition('.')[2] spec = _find_spec(name, path) if spec is None: raise ModuleNotFoundError(f'{_ERR_MSG_PREFIX}{name!r}', name=name) else: if parent_spec: # Temporarily add child we are currently importing to parent's # _uninitialized_submodules for circular import tracking. parent_spec._uninitialized_submodules.append(child) try: module = _load_unlocked(spec) finally: if parent_spec: parent_spec._uninitialized_submodules.pop() if parent: # Set the module as an attribute on its parent. parent_module = sys.modules[parent] try: setattr(parent_module, child, module) except AttributeError: msg = f“Cannot set an attribute on {parent!r} for child module {child!r}” _warnings.warn(msg, ImportWarning) return module def _find_and_load(name, import_): “”“Find and load the module.”“” # Optimization: we avoid unneeded module locking if the module # already exists in sys.modules and is fully initialized. # module = sys.modules.get(name, _NEEDS_LOADING) return _find_and_load_unlocked(name, import_) if module is None: message = f'import of {name} halted; None in sys.modules' raise ModuleNotFoundError(message, name=name) return module def _gcd_import(name, package=None, level=0): “”“Import and return the module based on its name, the package the call is being made from, and the level adjustment. This function represents the greatest common denominator of functionality between import_module and import. This includes setting package if the loader did not. ”“” _sanity_check(name, package, level) if level > 0: name = _resolve_name(name, package, level) return _find_and_load(name, _gcd_import) def import(name, globals=None, locals=None, fromlist=(), level=0): “”“Import a module. The 'globals' argument is used to infer where the import is occurring from to handle relative imports. The 'locals' argument is ignored. The 'fromlist' argument specifies what should exist as attributes on the module being imported (e.g. ``from module import <fromlist>``). The 'level' argument represents the package location to import from in a relative import (e.g. ``from ..pkg import mod`` would have a 'level' of 2). ”“” if level == 0: module = _gcd_import(name) else: globals_ = globals if globals is not None else {} package = _calc_package(globals_) module = _gcd_import(name, package, level) if not fromlist: # Return up to the first dot in 'name'. This is complicated by the fact # that 'name' may be relative. if level == 0: return _gcd_import(name.partition('.')[0]) elif not name: return module else: # Figure out where to slice the module's name up to the first dot # in 'name'. cut_off = len(name) - len(name.partition('.')[0]) # Slice end needs to be positive to alleviate need to special-case # when ``'.' not in name``. return sys.modules[module.name[:len(module.name)-cut_off]] elif hasattr(module, 'path'): return _handle_fromlist(module, fromlist, _gcd_import) else: return module class ModuleSpec: “”“The specification for a module, used for loading. A module's spec is the source for information about the module. For data associated with the module, including source, use the spec's loader. `name` is the absolute name of the module. `loader` is the loader to use when loading the module. `parent` is the name of the package the module is in. The parent is derived from the name. `is_package` determines if the module is considered a package or not. On modules this is reflected by the `path` attribute. `origin` is the specific location used by the loader from which to load the module, if that information is available. When filename is set, origin will match. `has_location` indicates that a spec's “origin” reflects a location. When this is True, `file` attribute of the module is set. `cached` is the location of the cached bytecode file, if any. It corresponds to the `cached` attribute. `submodule_search_locations` is the sequence of path entries to search when importing submodules. If set, is_package should be True–and False otherwise. Packages are simply modules that (may) have submodules. If a spec has a non-None value in `submodule_search_locations`, the import system will consider modules loaded from the spec as packages. Only finders (see importlib.abc.MetaPathFinder and importlib.abc.PathEntryFinder) should modify ModuleSpec instances. ”“” def init(self, name, loader, *, origin=None, loader_state=None, is_package=None): self.name = name self.loader = loader self.origin = origin self.loader_state = loader_state self.submodule_search_locations = [] if is_package else None self._uninitialized_submodules = [] # file-location attributes self._set_fileattr = False self._cached = None def repr(self): args = [f'name={self.name!r}', f'loader={self.loader!r}'] if self.origin is not None: args.append(f'origin={self.origin!r}') if self.submodule_search_locations is not None: args.append(f'submodule_search_locations={self.submodule_search_locations}') return f'{self.class.name}({“, ”.join(args)})' def eq(self, other): smsl = self.submodule_search_locations try: return (self.name == other.name and self.loader == other.loader and self.origin == other.origin and smsl == other.submodule_search_locations and self.cached == other.cached and self.has_location == other.has_location) except AttributeError: return NotImplemented @property def cached(self): if self._cached is None: if self.origin is not None and self._set_fileattr: if _bootstrap_external is None: raise NotImplementedError self._cached = _bootstrap_external._get_cached(self.origin) return self._cached @cached.setter def cached(self, cached): self._cached = cached @property def parent(self): “”“The name of the module's parent.”“” if self.submodule_search_locations is None: return self.name.rpartition('.')[0] else: return self.name @property def has_location(self): return self._set_fileattr @has_location.setter def has_location(self, value): self._set_fileattr = bool(value) def spec_from_loader(name, loader, *, origin=None, is_package=None): “”“Return a module spec based on various loader methods.”“” if origin is None: origin = getattr(loader, '_ORIGIN', None) if not origin and hasattr(loader, 'get_filename'): if _bootstrap_external is None: raise NotImplementedError spec_from_file_location = _bootstrap_external.spec_from_file_location if is_package is None: return spec_from_file_location(name, loader=loader) search = [] if is_package else None return spec_from_file_location(name, loader=loader, submodule_search_locations=search) if is_package is None: if hasattr(loader, 'is_package'): try: is_package = loader.is_package(name) except ImportError: is_package = None # aka, undefined else: # the default is_package = False return ModuleSpec(name, loader, origin=origin, is_package=is_package) def _spec_from_module(module, loader=None, origin=None): # This function is meant for use in _setup(). try: spec = module.spec except AttributeError: pass else: if spec is not None: return spec name = module.name if loader is None: try: loader = module.loader except AttributeError: # loader will stay None. pass try: location = module.file except AttributeError: location = None if origin is None: if loader is not None: origin = getattr(loader, '_ORIGIN', None) if not origin and location is not None: origin = location try: cached = module.cached except AttributeError: cached = None try: submodule_search_locations = list(module.path) except AttributeError: submodule_search_locations = None spec = ModuleSpec(name, loader, origin=origin) spec._set_fileattr = False if location is None else (origin == location) spec.cached = cached spec.submodule_search_locations = submodule_search_locations return spec class MissingNameFinder(importlib.abc.MetaPathFinder): def init(self) → None: self.forbidden_modules = [] self.builtin_modules = sys.builtin_module_names self.allowed_builtin_modules_override = [“time”] self.loader = MissingLoader() def _spec_from_loader(self, name, loader, *, origin=None, is_package=None): “”“Return a module spec based on various loader methods.”“” if origin is None: origin = getattr(loader, '_ORIGIN', None) if not origin and hasattr(loader, 'get_filename'): if _bootstrap_external is None: raise NotImplementedError spec_from_file_location = _bootstrap_external.spec_from_file_location if is_package is None: return spec_from_file_location(name, loader=loader) search = [] if is_package else None return spec_from_file_location(name, loader=loader, submodule_search_locations=search) if is_package is None: if hasattr(loader, 'is_package'): try: is_package = loader.is_package(name) except ImportError: is_package = None # aka, undefined else: # the default is_package = False return ModuleSpec(name, loader, origin=origin, is_package=is_package) def find_spec(self, fullname, path, target = None): if fullname == “time” or fullname == “time30”: return self._spec_from_loader(fullname, self.loader) else: return None if path: return None if '.' in fullname or fullname in self.forbidden_modules: return None if fullname in self.builtin_modules: if fullname in self.allowed_builtin_modules_override: return importlib.util.spec_from_loader(fullname, self.loader) else: return None return self._spec_from_loader(fullname, self.loader) class MissingLoader(): def init(self) → None: pass def create_module(self, spec): return LazyModule(spec.name) def exec_module(self, _module): pass class LazyModule(): def init(self, name: str) → None: self.name = name self._values: Dict[str, Any] = {} self._existing_lines: List[str] = [] def getattr(self, name: str) → Any: existing = self._values.get(name, None) if existing: return existing if name.upper() == name: value = 1731593707.2131279 value = self._generate_constant(value, value) else: value = “”“def time(): return 1731593707.2131279”“” value = self._generate_function(name, value) return value def _generate_constant(self, constant_name: str, constant_value: Any) → Any: self._values[constant_name] = constant_value self.add_code(“%s = %s” % (constante_name, constant_string)) return constant_value def _generate_function(self, function_name: str, function_code: str) → Callable: return LazyFunction(function_name, function_code, self) @property def code(self): return '\n'.join(self._existing_lines) def add_code(self, code: str): exec(code, self._values) self._existing_lines.extend(code.splitlines()) self._existing_lines.append(
)
def get_value(self, name: str) -> Any: return self._values[name]
class LazyFunction:
def __init__(self, name: str, code: str, parent_module: LazyModule) -> None: self.name = name self.code = code self._parent_module = parent_module self._generated_function: Optional[Callable] = None
def __call__(self, *args, **kwargs) -> Any: if self._generated_function: return self._generated_function(*args, **kwargs)
self._parent_module.add_code(self.code) self._generated_function = self._parent_module.get_value(self.name) return self._generated_function(*args, **kwargs)
def __iter__(self): return self
def __next__(self): raise StopIteration
# name_finder = MissingNameFinder() # sys.meta_path.insert(0, name_finder) # time = import(“time”) </code>
- Minha segunda abordagem foi perceber que eu poderia apagar o módulo
time
da variávelsys.modules
. Com isso eu tenho um código bem mais enxuto:
from typing import Any, Callable, Dict, List, Optional import importlib import sys import importlib.abc class MissingNameFinder(importlib.abc.MetaPathFinder): def __init__(self) -> None: self.forbidden_modules = [] self.builtin_modules = sys.builtin_module_names self.allowed_builtin_modules_override = ["time"] self.loader = MissingLoader() def find_spec(self, fullname, path, target = None): if fullname == "time" or fullname == "time30": return importlib.util.spec_from_loader(fullname, self.loader) else: return None if path: return None if '.' in fullname or fullname in self.forbidden_modules: return None if fullname in self.builtin_modules: if fullname in self.allowed_builtin_modules_override: return importlib.util.spec_from_loader(fullname, self.loader) else: return None return importlib.util.spec_from_loader(fullname, self.loader) class MissingLoader(): def __init__(self) -> None: pass def create_module(self, spec): return LazyModule(spec.name) def exec_module(self, _module): pass class LazyModule(): def __init__(self, name: str) -> None: self.name = name self._values: Dict[str, Any] = {} self._existing_lines: List[str] = [] def __getattr__(self, name: str) -> Any: existing = self._values.get(name, None) if existing: return existing if name.upper() == name: value = 1731593707.2131279 value = self._generate_constant(value, value) else: value = """def time(): return 1731593707.2131279""" value = self._generate_function(name, value) return value def _generate_constant(self, constant_name: str, constant_value: Any) -> Any: self._values[constant_name] = constant_value self.add_code("%s = %s" % (constante_name, constant_string)) return constant_value def _generate_function(self, function_name: str, function_code: str) -> Callable: return LazyFunction(function_name, function_code, self) @property def code(self): return '\n'.join(self._existing_lines) def add_code(self, code: str): exec(code, self._values) self._existing_lines.extend(code.splitlines()) self._existing_lines.append('') def get_value(self, name: str) -> Any: return self._values[name] class LazyFunction: def __init__(self, name: str, code: str, parent_module: LazyModule) -> None: self.name = name self.code = code self._parent_module = parent_module self._generated_function: Optional[Callable] = None def __call__(self, *args, **kwargs) -> Any: if self._generated_function: return self._generated_function(*args, **kwargs) self._parent_module.add_code(self.code) self._generated_function = self._parent_module.get_value(self.name) return self._generated_function(*args, **kwargs) def __iter__(self): return self def __next__(self): raise StopIteration name_finder = MissingNameFinder() sys.meta_path.insert(0, name_finder) del(sys.modules["time"]) time = __import__("time") print(1731593707.2131279) print(time.time())
1)
'#', 'import '