Welcome to Odyssey’s documentation!¶
GithubPython.py¶
The module for using Google BigQuery on Github Data.
-
class
odyssey.core.bigquery.GithubPython.
GithubPython
(package='', exclude_forks='auto', limit=None)¶ Provides functionality to build SQL query, connect with BigQuery, etc.
-
__init__
(package='', exclude_forks='auto', limit=None)¶ Initialize the GithubPython object.
Parameters: - package (string) – Name of python package you are interested in using Odyssey to analyze.
- exclude_forks (string, list or tuple, optional (default="auto")) – In SQL query, exclude both path that contains exclude_forks and repo_name that contains exclude_forks. If exclude_forks is auto, it is set to a list that contains package name.
- limit (int or None) – Limit your analysis to a certain amount of results. Usually set for billing limit or performance reason.
Returns: returns an initialized GithubPython object.
Return type: object
-
__weakref__
¶ list of weak references to the object (if defined)
-
get_all
(_filter=None)¶ Get all data (id, code, repo_name and path) subject to filter.
- _filter : Filter object or None, optional (default=None)
- Filter the result as defined in the filter object.
Returns
- list
- Returns a list of BigQueryGithubEntry object
-
get_context
(class_name)¶ Get context for class usage.
Parameters: class_name (string) – Which class to examine context. Returns: Returns a list of tuple of (context_string, path, repo_name, count). Return type: list
-
get_count
(_filter=None)¶ Get count of files subject to filter.
- _filter : Filter object or None, optional (default=None)
- Filter the result as defined in the filter object.
Returns
- int
- Returns an integer for count.
-
get_import_source
(val)¶ Returns a list of BigQueryGithubEntry that imported val.
Parameters: val (string) – The class/submodule/function to examine sources file on Returns: Returns a list of BigQueryGithubEntry Return type: list
-
get_instantiation
(class_name)¶ Get instantiation information for class usage.
Parameters: class_name (string) – Which class to examine instantiation. Returns: Returns a nested dict: dict(key=arg, value=dict(key=value_that_arg_sets_to, value=count)) Return type: dict
-
get_least_imported_class
(n=None, use_count_less_than=None, use_count_more_than=None, _filter=None)¶ Get n least imported class within a certain use count range, subject to filter.
Parameters: - n (int or None, optional (default=None)) – the top n least imported classes to be returned. If set to None, all results will be returned.
- use_count_less_than (int or None, optional (default=None)) – only include classes that have use count less than this amount. If none, there will be no restriction.
- use_count_more_than (int or None, optional (default=None)) – only include classes that have use count more than this amount. If none, there will be no restriction.
- _filter (Filter object or None (default=None)) – Filter the result as defined in the filter object.
Returns: Returns a list of tuple (name, count)
Return type: list
-
get_least_imported_function
(n=None, use_count_less_than=None, use_count_more_than=None, _filter=None)¶ Get n least imported function within a certain use count range, subject to filter.
Parameters: - n (int or None, optional (default=None)) – the top n least imported function to be returned. If set to None, all results will be returned.
- use_count_less_than (int or None, optional (default=None)) – only include functions that have use count less than this amount. If none, there will be no restriction.
- use_count_more_than (int or None, optional (default=None)) – only include functions that have use count more than this amount. If none, there will be no restriction.
- _filter (Filter object or None (default=None)) – Filter the result as defined in the filter object.
Returns: Returns a list of tuple (name, count)
Return type: list
-
get_least_imported_submodule
(n=None, use_count_less_than=None, use_count_more_than=None, _filter=None)¶ Get n least imported submodule within a certain use count range, subject to filter.
Parameters: - n (int or None, optional (default=None)) – the top n least imported submodule to be returned. If set to None, all results will be returned.
- use_count_less_than (int or None, optional (default=None)) – only include submodules that have use count less than this amount. If none, there will be no restriction.
- use_count_more_than (int or None, optional (default=None)) – only include submodules that have use count more than this amount. If none, there will be no restriction.
- _filter (Filter object or None (default=None)) – Filter the result as defined in the filter object.
Returns: Returns a list of tuple (name, count)
Return type: list
-
get_most_imported_class
(n=None, use_count_less_than=None, use_count_more_than=None, _filter=None)¶ Get n most imported classes within a certain use count range, subject to filter.
Parameters: - n (int or None, optional (default=None)) – the top n most imported classes to be returned. If set to None, all results will be returned.
- use_count_less_than (int or None, optional (default=None)) – only include classes that have use count less than this amount. If none, there will be no restriction.
- use_count_more_than (int or None, optional (default=None)) – only include classes that have use count more than this amount. If none, there will be no restriction.
- _filter (Filter object or None (default=None)) – Filter the result as defined in the filter object.
Returns: Returns a list of tuple (name, count)
Return type: list
-
get_most_imported_function
(n=None, use_count_less_than=None, use_count_more_than=None, _filter=None)¶ Get n most imported function within a certain use count range, subject to filter.
Parameters: - n (int or None, optional (default=None)) – the top n least imported function to be returned. If set to None, all results will be returned.
- use_count_less_than (int or None, optional (default=None)) – only include functions that have use count less than this amount. If none, there will be no restriction.
- use_count_more_than (int or None, optional (default=None)) – only include functions that have use count more than this amount. If none, there will be no restriction.
- _filter (Filter object or None (default=None)) – Filter the result as defined in the filter object.
Returns: Returns a list of tuple (name, count)
Return type: list
-
get_most_imported_submodule
(n=None, use_count_less_than=None, use_count_more_than=None, _filter=None)¶ Get n most imported submodule within a certain use count range, subject to filter.
Parameters: - n (int or None, optional (default=None)) – the top n least imported submodule to be returned. If set to None, all results will be returned.
- use_count_less_than (int or None, optional (default=None)) – only include submodules that have use count less than this amount. If none, there will be no restriction.
- use_count_more_than (int or None, optional (default=None)) – only include submodules that have use count more than this amount. If none, there will be no restriction.
- _filter (Filter object or None (default=None)) – Filter the result as defined in the filter object.
Returns: Returns a list of tuple (name, count)
Return type: list
-
get_top_import_repo
(n=None, _filter=None)¶ Get top imported repo. See RepoImportCounter for details.
- n : int or None, optional (default=None)
- the top n most imported repo name to be returned. If set to None, all results will be returned.
Returns
- list
- Returns a list of repo name.
-
run
(query, project='stellar-arcadia-173703')¶ Run SQL query with Google BigQuery. Allow large results. Timeout set to 99999999.
Parameters: - query (string) – SQL query to be executed.
- project (string, optional (default="stellar-arcadia-173703")) – Project to run the query on (for billing, logging, etc. purpose)
Returns: Returns result in python list.
Return type: list
-
set_class_list
(L)¶ Set class list which will be used for ImportAnalyzer to classify import.
Parameters: L (list of string) – class list that should be identified by ImportAnalyzer.
-
set_exclude_forks
(exclude_forks)¶ Reset exclude_forks.
Parameters: exclude_forks (list or None) – See doc of __init__ for definition of exclude_forks attribute.
-
set_function_list
(L)¶ Set function list which will be used for ImportAnalyzer to classify import.
Parameters: L (list of string) – function list that should be identified by ImportAnalyzer.
-
set_limit
(limit)¶ Reset limit.
Parameters: limit (int) – See doc of __init__ for definition of limit attribute.
-
set_package
(package)¶ Reset package name.
Parameters: package (string) – See doc of __init__ for definition of package attribute.
-
set_submodule_list
(L)¶ Set submodule list which will be used for ImportAnalyzer to classify import.
Parameters: L (list of string) – submodule list that should be identified by ImportAnalyzer.
-
BigQueryGithubEntry.py¶
The module that defines BigQueryGithubEntry class.
-
class
odyssey.core.bigquery.BigQueryGithubEntry.
BigQueryGithubEntry
(_id, code, repo_name, path)¶ A struct that contains relevant information about an entry in BigQuery Github table.
-
__init__
(_id, code, repo_name, path)¶ Initialize the BigQueryGithubEntry object.
Parameters: - _id (string) – a hashed value representing a file entry in BigQuery Github table. This is provided in Google BigQuery Github table.
- code (string) – code string.
- repo_name (string) – name of the repo. (e.g.: scikit-learn/scikit-learn)
- path (string) – path of the file. (e.g.: doc/HOWTO_DOCUMENT.rst)
Returns: returns an initialized BigQueryGithubEntry object.
Return type: object
-
__str__
()¶ Encode the code string in utf-8 and return. For printing purpose.
-
__weakref__
¶ list of weak references to the object (if defined)
-
get_url
()¶ Returns a GitHub url linking to the file. Possibly an invalid link if the file has been removed.
-
filter.py¶
The module that defines Filters.
-
class
odyssey.core.bigquery.filter.
And
(f1, f2)¶ And filter takes in two filters and requires both to be true.
-
__init__
(f1, f2)¶ Initialize the And filter.
Parameters: Returns: returns an initialized And filter.
Return type: object
-
__str__
()¶ String representation of the filter. Also the string that will appear in SQL query
-
-
class
odyssey.core.bigquery.filter.
Contains
(s)¶ Require code content to contain a specific string.
-
__init__
(s)¶ Initialize the Contains filter.
Parameters: s (string) – String that needs to contain in code content. Returns: returns an initialized Contains filter. Return type: object
-
__str__
()¶ String representation of the filter. Also the string that will appear in SQL query
-
-
class
odyssey.core.bigquery.filter.
Filter
¶ Base class for other filters to inherit.
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
class
odyssey.core.bigquery.filter.
Or
(f1, f2)¶ Or filter takes in two filters and requires one of them to be true.
-
__init__
(f1, f2)¶ Initialize the Or filter.
Parameters: Returns: returns an initialized Or filter.
Return type: object
-
__str__
()¶ String representation of the filter. Also the string that will appear in SQL query
-
ImportAnalyzer.py¶
The module that defines ImportAnalyzer.
-
class
odyssey.core.analyzer.ImportAnalyzer.
ImportAnalyzer
(package, accepted_list)¶ ImportAnalyzer analyzes how classes, submodules and functions are imported.
-
__init__
(package, accepted_list)¶ Initialize the ImportAnalyzer.
Parameters: - package (string) – Python package to be counted.
- accepted_list (string) – A list of tokens that will be extracted out and counted.
Returns: returns an initialized ImportAnalyzer object.
Return type: object
-
__weakref__
¶ list of weak references to the object (if defined)
-
get_by_filter
(f)¶ Get imported values, filtered by f.
Parameters: f (function) – used as filter(f, get_most_common()) Returns: return a list of tuples containing (value, count) Return type: list
-
get_common
(n=None, _reverse=True)¶ Get common imported values.
Parameters: - n (int or None, optional (default=None)) – the top n most/least imported values to be returned. If set to None, all results will be returned.
- _reverse (bool, optional (default=True)) – if _reverse, returns value in descending order.
Returns: return a list of tuples containing (value, count)
Return type: list
-
get_least_common
(n=None)¶ Get least common n imported values.
Parameters: n (int or None, optional (default=None)) – the top n least imported values to be returned. If set to None, all results will be returned. Returns: return a list of tuples containing (value, count) Return type: list
-
get_most_common
(n=None)¶ Get most common n imported values.
Parameters: n (int or None, optional (default=None)) – the top n most imported values to be returned. If set to None, all results will be returned. Returns: return a list of tuples containing (value, count) Return type: list
-
get_source
(s)¶ Get the source entries for a specific value
Parameters: s (string) – Value to get source for. Should be in accepted_list. Returns: return a list of BigQueryGithubEntry. Return type: list
-
parse
(entry)¶ Parse a BigQueryGithubEntry for import analysis.
Parameters: entry (BigQueryGithubEntry) – A BigQueryGithubEntry to be parsed
-
InstantiationAnalyzer.py¶
The module that defines InstantiationAnalyzer.
-
class
odyssey.core.analyzer.InstantiationAnalyzer.
InstantiationAnalyzer
(class_name)¶ InstantiationAnalyzer parses the code to get the instantiation of classes.
-
__init__
(class_name)¶ Initialize the InstantiationAnalyzer.
Parameters: class_name (string) – class to be analyzed for instantiation Returns: returns an initialized InstantiationAnalyzer object. Return type: object
-
__weakref__
¶ list of weak references to the object (if defined)
-
parse
(code)¶ Parse code and analyze for instantiation.
Parameters: code (string) – code string to be parsed.
-
RepoImportCounter.py¶
The module that defines RepoImportCounter.
-
class
odyssey.core.analyzer.RepoImportCounter.
RepoImportCounter
(package)¶ RepoImportCounter counts how many times other repos import the analyzed package.
-
__init__
(package)¶ Initialize the RepoImportCounter.
Parameters: package (string) – Python package to be counted Returns: returns an initialized RepoImportCounter object. Return type: object
-
__weakref__
¶ list of weak references to the object (if defined)
-
get_most_common
(n=None)¶ Get most common n repos.
Parameters: n (int or None, optional (default=None)) – the name of top n repo that imports the package. If set to None, all results will be returned. Returns: list of tuple containing package name and count. Return type: list
-
parse
(entry)¶ Parse a BigQueryGithubEntry for repo import count.
Parameters: entry (BigQueryGithubEntry) – A BigQueryGithubEntry to be parsed
-
query_builder.py¶
The module contains helper functions to build SQL query. For example, for the where clause, if there are multiple constraints, we can connect them using connect_with_and / connect_with_or defined in this file.
-
odyssey.utils.query_builder.
connect
(connector, *args)¶ Connect the strings in args with connector. Make sure there’s no additional connector.
-
odyssey.utils.query_builder.
connect_with_and
(*args)¶ Connect the strings in args with AND. Make sure there’s no additional connector.
-
odyssey.utils.query_builder.
connect_with_or
(*args)¶ Connect the strings in args with OR. Make sure there’s no additional connector.
sklearn_meta_data.py¶
The module contains meta data for sklearn to be used by ImportAnalyzer.
-
odyssey.utils.sklearn_meta_data.
get_all_functions
()¶ A list of all functions. Generated by classes.rst in scikit-learn package.
-
odyssey.utils.sklearn_meta_data.
get_all_models
()¶ A list of all models. Generated by a walk using all_estimators.
-
odyssey.utils.sklearn_meta_data.
get_all_submodules
()¶ A list of all submodules. Generated by classes.rst in scikit-learn package.