Synax Blocks and Folding

Top  Previous  Next

Text parsing goes by two stages:

Token searching, using <RegexRule>, <RegexBlock> and <Keyword[s]>. At this stage, syntax tokens, produced by those rules collected for second stage:
Second stage: here parser uses tokens produced at stage1, to generate folds from high-level syntax constructs, using <SyntaxBlock> elements.

 

Element: <SyntaxBlock>

 

Attribute: priority, type: Integer
This property gives priority for this rule on parsing token sequence, acceptable for several <SyntaxBlock> rules. Meanging of priority attribute is same as priority attribute of <RegexRule> and <RegexBlock> elements.
Attribute: start, type: Regular expression in special token syntax.
Regular expressions used in <SyntaxBlock> are constructed using special syntax, where each atom in regexp is name of token, produced at stage1. As for <RegexBlock> rule, instead of using start attribute, you can use <Start>regex</Start> sub-element.

 

Example 1:

 

[ keyword:while keyword:for ] .+? keyword:do

 

Means start of Lua “while” or “for” construct. As you can see, expression is same as usual regular expression, with one difference: intead of simple chars, we use names of tokens with optional token content given. Also, you can’t use here character class related regexp constructs like “\s, \S, \W, \w, \d, \D, \0xFF, \U{Unicode_cat}”, and char-related modifiers like (?ims), just because here are no chars, only int-codes for tokens, case insensitivity has no sense, and all token sequence always interpreted as single line.

 

For those token names: keyword, identifier, symbol we can use shortcuts: kw, id, sym respectively.

 

Example 2:

 

[ kw:while kw:for ] .+? kw:do

 

Example 3:

 

Five any keywords, after that Lua while/for construct start.

 

kw{5} [ kw:while kw:for ] .+? kw:do

 

Example 4:

 

JavaScript function: “function”  keyword, any identifier (detected by <KeywordRegex> rule), “(“ symbol, anything except “; {}” symbols, “)” symbol, and “{” symbol.

 

kw:function  id

sym:(

           [^ sym:;    sym:}   sym:{   ]*

     sym:) sym:{

 

Attribute: end, type: Regular expression in special token syntax.
Syntax is same as for start attribute, with one difference: you can use $0..$9 variables to reference matched start expression group, as for end attribute of <RegexBlock>.
Attribute: capture, type: Boolean (“true/false” or “0/1”)
Should this <SyntaxBlock> produce fold for TLMDEditView, or just should be skipped? See JavaScript function example for more. Also, you can use <SkipSyntaxToken> elements in scheme, to specify tokens which will not be used in syntax parsing.

 

Example1 (syntax blocks):

 

<Scheme name='Comment' defaultToken='comment' />

 

<!--Sample JavaScript scheme -->

<Scheme name='JavaScriptMain' defaultToken='default'           

        keywordsIgnoreCase='false'>

 

    <!--Regexp for keywords and identifiers -->

    <KeywordRegex>\b[a-zA-Z_][\w_]*\b</KeywordRegex>

 

    <!--Keyword list (short list, for this example) -->

    <Keywords>

        for  in  if else return  while             

        function new this var with  arguments             

        throw  try catch finally with

    </Keywords>

    

    <Regex innerScheme='Comment' regex='//.*$' />

    <Regex token0='symbol' 

           regex='[   \}   \{    \]    \[  \( \) &gt; &lt; ]' />

    <Regex token0='symbol' regex='[-:?\~=+!^;,]' />

    <SkipSyntaxToken token='comment' />

    <SyntaxBlock capture="true">

        <Start> kw:function  id  

                sym:(  

                        [^ sym:;    sym:}   sym:{   ]*  

                sym:) 

                sym:{

        </Start>

        <End> sym:\}  </End>

    </SyntaxBlock>

    

    <!-- We can use common syntax for many language constructs -->

    <SyntaxBlock capture="true" priority='10'>

        <Start>                 

            [ kw:while  kw:do kw:if  kw:else  kw:try 

              kw:catch  kw:finally  kw:switch ]

                                

            [^ sym:;  sym:}  ]*?    sym:\{

        </Start>

 

        <End> sym:}  </End>

    </SyntaxBlock>        

 

    <!-- We don't want folds for code in simple { .. }

         We should just skip it, for parens balance,

         because other constructs ends with } too. -->

    <SyntaxBlock capture="false" priority='0'  >

        <Start> sym:{  </Start>

        <End> sym:}  </End>

    </SyntaxBlock>

</Scheme>

 

Example2: VB syntax (using references to start of block)

 

<SyntaxBlock capture="true">

    <Start> 

        [ kw:sub kw:class kw:if 

          kw:function kw:property  

          kw:select kw:with ]  

    </Start>      

    <End> kw:end  $0 </End>

</SyntaxBlock>

 

Here we fold everything like  Sub FuncName .... End Sub, Class ClassName .....  End Class ... etc.

 

Element: <SkipSyntaxToken>

 

This element is sub-element of <Scheme>, it works as helper for <SyntaxBlock> element

 

Attribute: token, type: string, case-sensitive, token reference.
Specifies token, which not used in high-level syntax parsing.

 

Example:

 

<SkipSyntaxToken token='comment' />

 

All comments will be skipped at syntax parsing stage, so, you can write

 

kw:function id

sym:(

           [^ sym:;    sym:}   sym:{   ]*

     sym:) sym:{

 

Instead of

 

kw:function comment* id  comment*

sym:(  comment*

           [^ sym:;    sym:}   sym:{   ]*

     sym:) comment* sym:{

 

for JavaScript function.

 

You can set multiple <SkipSyntaxToken> in scheme.