5158
Comment:
|
7268
|
Deletions are marked like this. | Additions are marked like this. |
Line 12: | Line 12: |
* sage.categories.examples.factorial_languages: ??? (need an example) | * sage.categories.examples.factorial_languages: ??? (need an example, Thierry is on) |
Line 16: | Line 16: |
For tilings, there is not yet examples. | For shifts and tilings, there is (up to now) almost nothing: * sage.dynamics.symbolic.full_shift: the full shift |
Line 25: | Line 26: |
* .shifts: the category of shift A^G where G is almost anything and A is a set called alphabet | * .shifts: the category of shift A^G (where G is a semigroup for which the operation x -> gx is injective for any g in G and A is a set called alphabet) |
Line 28: | Line 29: |
* backward compatibility with the previous version | |
Line 43: | Line 45: |
What do we keep? What categories do we create? | What do we keep? What categories do we create? Do we provide a default _element_constructor_ in the category (if yes, it is highly incompatible with facade)? |
Line 79: | Line 81: |
== Naming convention / duplication == === Parikh vector, evaluation, abelianization === The name '''abelianization''' is the most generic one. '''Parikh vector''' is standard in word combinatorics. '''Evaluation''' is mainly used in combinatorics. Notice that evaluation is formally a composition, in other words the alphabet should be finite and ordered. === Pattern matching === Algorithms for pattern matching may be optimized in subclasses. Hence, we should pay attention for function of low and high level. 1) Each algorithm has some precomputations. The actual implementation uses Boyer-Moore algorithm and provides methods * last_position_dict * prefix_function_table * good_suffix_table But the results are cached which may be very expansive in memory! 2) low level routines: * x.first_pos_in(y,p): research of the first occurence of x in y from position p. The precomputations for x can be made in a generic way whereas the research depends mainly on the datastructure of y. I suggest to rename it into y.first_pos(x). That way, it should be easy to implement low level routines. * x.factor_occurences_iterator(y) which is renamed into x.occurence_iterator(y,start,end): return a generator for the position of the occurences of y in x. The generic implementation uses Boyer-Moore algorithm. 3) high level routines: * x.return_word_iterator(y): cut x into return word of y. * x.return_word_coding(y): cut x into return words of y and return it as a word x0.x1.x2. ... where each xi is a word which contains exactly one occurence of y as a prefix and these are the only occurences of y in x. The word x can be reconstructed as the concatenation of the xi. |
|
Line 84: | Line 112: |
* think about naming convention. For example, to get the subset of words of length n of a language L, do you prefer L.subset(n=4) or L.subset(length=4) | * think about naming convention. For example, to get the subset of words of length n of a language L, do you prefer L.subset(n=4) or L.subset(length=4): task taken by VincentDelecroix * make a dedicated class for palindromic closures |
Languages and tilings
This page gathers ideas for refactorization of sage.combinat.words and implementation of tilings.
You can subscribe to the associated mailing-list to discuss about this.
How do I implement my language ? my tiling ?
There are different places to look at for examples:
sage.categories.examples.languages: two examples of languages. PalindromicLanguages (the language of palindromes) and UniformMonoid (the submonoid of the free monoid on {a,b} that contains as many a as b).
- sage.categories.examples.factorial_languages: ??? (need an example, Thierry is on)
- sage.monoids.free_monoid: implementation of the free monoid.
- sage.combinat.languages.*: where most implementation of languages should go.
For shifts and tilings, there is (up to now) almost nothing:
- sage.dynamics.symbolic.full_shift: the full shift
Structure
The refactorization of the current code should go in the patch #12224 which is almost done. Up to now the code is a bit dissaminated everywhere in Sage:
- sage.categories: Most of the generic code is contained there.
- .languages: A language is a subset of A^* where A is a set called alphabet. It is naturally graded by N and the grading is called the length.
- .factorial_languages: category of factorial languages (= language stable under taking factors)
.shifts: the category of shift A^G (where G is a semigroup for which the operation x -> gx is injective for any g in G and A is a set called alphabet)
- sage.combinat.words
- data structure for finite and infinite words
- backward compatibility with the previous version
- sage.monoids
- .free_monoid: the free monoid (replaces part of sage.combinat.words.words.Words)
.free_monoid_morphism: (replaces sage.combinat.words.morphism.WordMorphism)
- sage.dynamics.symbolic
- .full_shift: an implementation of the full shift (replaces part of sage.combinat.words.words.Words)
- sage.combinat.languages
- implementation of different languages (balanced, language of a finite word, ...)
- specific data structure (suffix tree/trie, rauzy graph, return tree, ...)
What is bad/nice with categories:
- inheritance of generic code
- a bit confusing for the user who want to find the implementation of a method
- confusing for the person who writes the code and ask "where should I put this ?"
- ...
What do we keep? What categories do we create? Do we provide a default _element_constructor_ in the category (if yes, it is highly incompatible with facade)?
Behavior of algorithms with infinite input data
What to do for equality of infinite words ?
What should do
sage: w1 == w2
Two possibilities:
- test the first XXX letters for finding a difference. If find one then returns False otherwise raise an error, "seems to be equal use .is_equal(force=True) to launch the infinite test".
- test all letters and never return True
Other suggestions ?
Subprojects
Finite languages and factor set
Most of it was implemented by Franco (suffix tree and suffix trie). We would like to enhance it and make a specific data structure (called Rauzy castle) for FiniteFactorialLanguages. See #12225.
Substitutive and adic languages
There are many algorithms for languages described by a sequence of substitutions (called a directive word). The particular case of morphic and purely morphic languages correspond respectively to periodic and purely_periodic directive words.
Enumeration of factors, desubstitution (#12227)
Factor complexity for purely morphic languages (#12231)
- Equality for purely morphic languages (following J. Honkala, CANT, chapter 10)
Eventually periodic languages / words
They will be useful to define eventually periodic directive words for adic languages. See #12228.
Naming convention / duplication
Parikh vector, evaluation, abelianization
The name abelianization is the most generic one. Parikh vector is standard in word combinatorics. Evaluation is mainly used in combinatorics.
Notice that evaluation is formally a composition, in other words the alphabet should be finite and ordered.
Pattern matching
Algorithms for pattern matching may be optimized in subclasses. Hence, we should pay attention for function of low and high level.
1) Each algorithm has some precomputations. The actual implementation uses Boyer-Moore algorithm and provides methods
- last_position_dict
- prefix_function_table
- good_suffix_table But the results are cached which may be very expansive in memory!
2) low level routines:
- x.first_pos_in(y,p): research of the first occurence of x in y from position p. The precomputations for x can be made in a generic way whereas the research depends mainly on the datastructure of y. I suggest to rename it into y.first_pos(x). That way, it should be easy to implement low level routines.
- x.factor_occurences_iterator(y) which is renamed into x.occurence_iterator(y,start,end): return a generator for the position of the occurences of y in x. The generic implementation uses Boyer-Moore algorithm.
3) high level routines:
- x.return_word_iterator(y): cut x into return word of y.
- x.return_word_coding(y): cut x into return words of y and return it as a word x0.x1.x2. ... where each xi is a word which contains exactly one occurence of y as a prefix and these are the only occurences of y in x. The word x can be reconstructed as the concatenation of the xi.
TODO list
for #12224:
update factorial langages, with doc and tests: task taken by ThierryMonteil
implement a simple example of factorial languages in sage.categories.example.factorial_languages.py : task taken by ThierryMonteil
think about naming convention. For example, to get the subset of words of length n of a language L, do you prefer L.subset(n=4) or L.subset(length=4): task taken by VincentDelecroix
- make a dedicated class for palindromic closures
be sure that the methods in sage.categories.languages.ElementMethods are as minimal as possible
- words path (currently in sage.combinat.words.paths) which have to be modified to fit with the new implementation (question: should we do this right now ?)
- backward compatibility with the previous implementation (in particular with respect to pickling)
- make difference between finite/infinite/enumerated/ordered alphabet (in particular when the parents are initialized with a specific category)
for other tickets:
- specific data structure rauzy graphs and return tree (Thierry)
- 1-dim subshift of finite type / sofic
- n-dim finite words and n-dimensional shifts
- n-dim subshifts of finite type
- n-dim substitutive subshift
- cellular automata
... add your whishes